pydigree.io package¶
Submodules¶
pydigree.io.base module¶
-
class
pydigree.io.base.PEDRecord(line, delimiter=' ')¶ Bases:
object-
create_individual(population=None)¶ Creates an Individual object from a Pedigree Record.
The individual will have the id tuple of (fam_id, ind_id)
Parameters: population (Population) – Population for the individual to belong to Return type: Individual
-
-
pydigree.io.base.connect_individuals(pop)¶ Makes the connections in the genealogy from parents to children and vice versa.
Parameters: population (IndividualContainer) – Set of individuals to connect Rtype void:
-
pydigree.io.base.genotypes_from_sequential_alleles(chromosomes, data, missing_code='0')¶ Takes a series of alleles and turns them into genotypes.
For example: The series ‘1 2 1 2 1 2’ becomes chrom1 = [1, 1, 1] chrom2 = [2, 2, 2]
These are returned in the a list in the form:
- ::
- [(chroma, chromb), (chroma, chromb)...]
Parameters: - chromosomes (list of ChromosomeTemplate) – genotype data
- data – The alleles to be turned into genotypes
- missing_code (string) – value representing a missing allele
Returns: A list of 2-tuples of Alleles objects
-
pydigree.io.base.read_ped(filename, population=None, delimiter=None, affected_labels=None, population_handler=None, data_handler=None, connect_inds=True, onlyinds=None)¶ Reads a plink format pedigree file, ie:
- ::
- familyid indid father mother sex whatever whatever whatever
into a pydigree pedigree object, with optional population to assign to pedigree members. If you don’t provide a population you can’t simulate genotypes!
Parameters: - filename (string) – The file to be read
- population (Population) – The population to assign individuals to
- delimiter (string) – a string defining the field separator, default: any whitespace
- affected_labels (dict (str -> value)) – The labels that determine affection status.
- population_handler – a function to set up the population
- data_handler (callable) – a function to turn the data into useful individual information
- connect_inds (bool) – build references between individuals. Requires all individuals be present in the file
- onlyinds (iterable) – only include data for specified individuals
Returns: individuals contained in the pedigree file
Return type:
-
pydigree.io.base.read_phenotypes(pedigrees, csvfile, delimiter=', ', missingcode='X')¶ Reads a csv with header famid, ind, phen, phen, phen, phen etc etc
Arguments :param pedigrees: data to update :param csvfile: the filename of the file containing phenotypes. :param delimiter: the field delimiter for the file :param missingcode: the code for missing values :type pedigrees: PedigreeCollection :type csvfile: string :type missingcode: string
Return type: void
-
pydigree.io.base.sort_pedigrees(inds, population_handler)¶ Takes a set of individuals and sorts them into pedigrees.
Individuals must have labels that are (pedid, indid) tuples
Parameters: - inds (iterable) – Individuals to be sorted
- population_handler (Callable) – a function to set up the population
Returns: Collection of pedigrees from the individuals
Return type: PedigreeCollections
-
pydigree.io.base.write_pedigree(pedigrees, filename, delim=' ')¶ Writes pedigree to a LINKAGE formatted pedigree file
Parameters: - pedigrees – Data to write
- filename – filename to write to
- delim – output field separator
Return type: void
-
pydigree.io.base.write_phenotypes(pedigrees, filename, predicate=None, missingcode='X', delim=', ')¶ Writes phenotypes to a CSV (or other field delimited) file
Parameters: - pedigrees – Data to write
- filename – filename to write to
- missingcode (string) – code to use for missing values
- delim (string) – output field separator
pydigree.io.beagle module¶
Utilities for reading BEAGLE formatted genotype data
-
class
pydigree.io.beagle.BeagleGenotypeRecord(line)¶ Bases:
objectA class corresponding to a record in a BEAGLE genotype file
-
data¶
-
identifier¶
-
is_phenotype_record()¶ Returns true if record corresponds to a phenotype field
-
label¶
-
-
class
pydigree.io.beagle.BeagleMarkerRecord(line)¶ Bases:
objectA class corresponding to a marker in a BEAGLE marker file
-
alleles¶
-
alternates¶ The alternate alleles for the marker
Returns: Each alternate allele Return type: iterable
-
label¶
-
pos¶
-
reference¶ The reference allele for the marker
Returns: reference allele Return type: string
-
-
pydigree.io.beagle.read_beagle(genofile, markerfile)¶ Reads BEAGLE formatted genotype data
Parameters: - genofile (string) – Filename containing genotype information for individuals
- markerfile (string) – Filename containing marker location and allele information corresponding to genofile
Return type:
-
pydigree.io.beagle.read_beagle_genotypefile(filename, pop, missingcode='0')¶ Reads BEAGLE formatted genotype files
Arguments
Parameters: - filename – Filename of BEAGLE genotype file
- pop – the population to add these individuals to
- missingcode (string) – The value that indicates a missing genotype
Return type: void
-
pydigree.io.beagle.read_beagle_markerfile(filename, label=None)¶ Reads marker locations from a BEAGLE formatted file
Parameters: - filename (string) – The file to be read
- label – An optional label to give the chromosome, since the BEAGLE format does not require it
Return type:
pydigree.io.genomesimla module¶
Read GenomeSIMLA formatted chromosome templates
-
pydigree.io.genomesimla.read_gs_chromosome_template(templatef)¶ Reads a genomeSIMLA format chromosome template file
Parameters: templatef (string) – The filename of the template file Return type: A ChromosomeTemplate object corresponding to the file
pydigree.io.kinship module¶
Reads kinships from KinInbCoef
-
pydigree.io.kinship.read_kinship(filename)¶ Reads a KinInbCoef formatted file of kinship and inbreeding coefficients
Parameters: filename (string) – the filename to be read Returns: a dictionary in the format {frozenset({(fam, ind_a), (fam, ind_b)}): kinship/inbreeding
pydigree.io.plink module¶
Functions for reading PLINK formatted genotype files
-
pydigree.io.plink.create_pop_handler_func(mapfile)¶ Creates a closure to provide as the population handler for pydigree.io.base.read_ped.
Parameters: mapfile (string) – Filename of PLINK .map file Return type: callable
-
pydigree.io.plink.plink_data_handler(ind, data)¶ A function to handle the data payload from a plink line.
Parameters: - ind (Individual) – Individual for the record
- data (string) – the data for the record
Return type: void
-
pydigree.io.plink.read_map(mapfile)¶ Reads a PLINK map file into a list of ChromosomeTemplate objects
Parameters: mapfile (string) – Path of the file to be read Return type: a list of ChromosomeTemplate objects
-
pydigree.io.plink.read_plink(pedfile=None, mapfile=None, prefix=None, **kwargs)¶ Read a plink file by specifying pedfile and mapfile directly, or by using a prefix. Pass additional arguments to pydigree.io.base.read_ped with kwargs
Parameters: - pedfile – a plink PED file to be read
- mapfile – a plink MAP file to be read
- prefix – sets mapfile to ‘prefix.map’ and pedfile to ‘prefix.ped’
- kwargs – additional arguments passed to read_ped
Returns: A PedigreeCollection object
-
pydigree.io.plink.write_map(pedigrees, mapfile, output_chromosomes=None)¶ Writes the genotype location data to a PLINK MAP file
Parameters: - pedigrees – the population containing the data to be written
- mapfile – the name of the file to be output to
- output_chromosomes – which chromosomes to write
Returns: Nothing
-
pydigree.io.plink.write_ped(pedigrees, pedfile, delim=' ', predicate=None, output_chromosomes=None)¶ write_ped writes data in a plink-format PED file, and optionally a plink-format map file.
Parameters: - pedigrees – An object of class PedigreeCollection containing what you want to output
- pedfile – a string giving the name out the file to output to.
- mapfile – the name of a mapfile to output, if you want to output one. an object that evaluates as False or None will skip the mapfile
- genotypes – Should genotypes be output True/False
- delim – Field seperator
- predicate – Which inputs to include in the output file. If not specified all are output. If the string is ‘affected’, only affected individuals are output. If the string is ‘phenotyped’, all individuals with phenotype information are output. Any other value of predicate must be a function to perform on the individual that evaluates to True/False for whether the individual should be output.
Returns: Nothing
-
pydigree.io.plink.write_plink(pedigrees, filename_prefix, predicate=None, mapfile=False, compression=None, output_chromosomes=None)¶ Write individual genotypes to a file in plink PED data format. Optionally outputs the genotype locations to the mapfile.
Parameters: - pedigrees (IndividualContainer) – the pedigrees to be written
- filename_prefix (string) – The output ped file (‘.ped’ will be appended)
- predicate (callable) – a callable that evaluates True when the individual should be outputted to the file
- mapfile (string) – True if the plink MAP file should be written
- compression ('gzip' or 'bz2') – Compress the data? Options are bz2 and gzip, otherwise data will be written uncompressed
Return type: void
pydigree.io.sgs module¶
Utilities for file I/O with SGS data
-
class
pydigree.io.sgs.GermlineRecord(line)¶ Bases:
objectA class for working with records in GERMLINE formatted files
-
bp_locations¶ Are the segments provided in physical or genetic positions?
Returns: True if physical, False if centimorgans Return type: bool
-
location¶ Returns the location of the segment
-
pair¶ Returns the individuals for the segment
-
-
pydigree.io.sgs.read_germline(filename)¶ Reads a GERMLINE formatted SGS filename into an SGSAnalysis object
GERMLINE files are text files with the format:
- Family ID 1
- Individual ID 1
- Family ID 2
- Individual ID 2
- Chromosome
- Segment start (bp/cM)
- Segment end (bp/cM)
- Segment start (SNP)
- Segment end (SNP)
- Total SNPs in segment
- Length of segment
- Units for genetic length (cM or MB)
- Mismatching SNPs in segment
- 1 if Individual 1 is homozygous in match; 0 otherwise
- 1 if Individual 2 is homozygous in match; 0 otherwise
This function only uses 0-6.
-
pydigree.io.sgs.write_sgs(data, filename)¶ GERMLINE files are text files with the format:
- Family ID 1
- Individual ID 1
- Family ID 2
- Individual ID 2
- Chromosome
- Segment start (bp/cM)
- Segment end (bp/cM)
- Segment start (SNP)
- Segment end (SNP)
- Total SNPs in segment
- Genetic length of segment
- Units for genetic length (cM or MB)
- Mismatching SNPs in segment
- 1 if Individual 1 is homozygous in match; 0 otherwise
- 1 if Individual 2 is homozygous in match; 0 otherwise
pydigree.io.smartopen module¶
-
pydigree.io.smartopen.smartopen(filename, mode='r')¶ Seamlessly open compressed files. Use in place of regular open.
Note
Python’s compression modules iterate over compressed files as bytes, not strings. Unless ‘b’ (binary) is specified in the mode, we add ‘t’ (text) to the mode to force iteration as strings, operating consistently
pydigree.io.vcf module¶
-
class
pydigree.io.vcf.VCFRecord(line)¶ Bases:
objectA class for parsing lines in VCF files
-
genotypes()¶ Extracts the genotypes from a VCF record
-
getitems(item)¶ Gets the value in each data field for the key specified in the format field.
Parameters: item – key to find Returns: values for each individual Return type: list
-
info¶ A dictionary representing the INFO field in a VCF record
Rtype dict:
-
-
pydigree.io.vcf.read_vcf(filename, require_pass=False, freq_info=None)¶ Reads a VCF file and returns a Population object with the individuals represented in the file
Genotypes generated by this function will be sparse
Parameters: - require_pass (bool) – only allow variants with PASS under FILTER
- freq_info – INFO field to get allele frequency from
- freq_info – string
Returns: Individuals in the VCF
Return type: