pydigree.io package¶

Submodules¶

pydigree.io.base module¶

class pydigree.io.base.PEDRecord(line, delimiter=' ')¶

Bases: object

create_individual(population=None)¶

Creates an Individual object from a Pedigree Record.

The individual will have the id tuple of (fam_id, ind_id)

Parameters:	population (Population) – Population for the individual to belong to
Return type:	Individual

pydigree.io.base.connect_individuals(pop)¶

Makes the connections in the genealogy from parents to children and vice versa.

Parameters:	population (IndividualContainer) – Set of individuals to connect
Rtype void:

pydigree.io.base.genotypes_from_sequential_alleles(chromosomes, data, missing_code='0')¶

Takes a series of alleles and turns them into genotypes.

For example: The series ‘1 2 1 2 1 2’ becomes chrom1 = [1, 1, 1] chrom2 = [2, 2, 2]

These are returned in the a list in the form:

::: [(chroma, chromb), (chroma, chromb)...]

Parameters:	chromosomes (list of ChromosomeTemplate) – genotype data data – The alleles to be turned into genotypes missing_code (string) – value representing a missing allele
Returns:	A list of 2-tuples of Alleles objects

pydigree.io.base.read_ped(filename, population=None, delimiter=None, affected_labels=None, population_handler=None, data_handler=None, connect_inds=True, onlyinds=None)¶

Reads a plink format pedigree file, ie:

::: familyid indid father mother sex whatever whatever whatever

into a pydigree pedigree object, with optional population to assign to pedigree members. If you don’t provide a population you can’t simulate genotypes!

Parameters:	filename (string) – The file to be read population (Population) – The population to assign individuals to delimiter (string) – a string defining the field separator, default: any whitespace affected_labels (dict (str -> value)) – The labels that determine affection status. population_handler – a function to set up the population data_handler (callable) – a function to turn the data into useful individual information connect_inds (bool) – build references between individuals. Requires all individuals be present in the file onlyinds (iterable) – only include data for specified individuals
Returns:	individuals contained in the pedigree file
Return type:	PedigreeCollection

pydigree.io.base.read_phenotypes(pedigrees, csvfile, delimiter=', ', missingcode='X')¶

Reads a csv with header famid, ind, phen, phen, phen, phen etc etc

Arguments :param pedigrees: data to update :param csvfile: the filename of the file containing phenotypes. :param delimiter: the field delimiter for the file :param missingcode: the code for missing values :type pedigrees: PedigreeCollection :type csvfile: string :type missingcode: string

Return type:	void

pydigree.io.base.sort_pedigrees(inds, population_handler)¶

Takes a set of individuals and sorts them into pedigrees.

Individuals must have labels that are (pedid, indid) tuples

Parameters:	inds (iterable) – Individuals to be sorted population_handler (Callable) – a function to set up the population
Returns:	Collection of pedigrees from the individuals
Return type:	PedigreeCollections

pydigree.io.base.write_pedigree(pedigrees, filename, delim=' ')¶

Writes pedigree to a LINKAGE formatted pedigree file

Parameters:	pedigrees – Data to write filename – filename to write to delim – output field separator
Return type:	void

pydigree.io.base.write_phenotypes(pedigrees, filename, predicate=None, missingcode='X', delim=', ')¶

Writes phenotypes to a CSV (or other field delimited) file

Parameters:	pedigrees – Data to write filename – filename to write to missingcode (string) – code to use for missing values delim (string) – output field separator

pydigree.io.beagle module¶

Utilities for reading BEAGLE formatted genotype data

class pydigree.io.beagle.BeagleGenotypeRecord(line)¶

Bases: object

A class corresponding to a record in a BEAGLE genotype file

data¶

identifier¶

is_phenotype_record()¶: Returns true if record corresponds to a phenotype field

label¶

class pydigree.io.beagle.BeagleMarkerRecord(line)¶

Bases: object

A class corresponding to a marker in a BEAGLE marker file

alleles¶

alternates¶

The alternate alleles for the marker

Returns:	Each alternate allele
Return type:	iterable

label¶

pos¶

reference¶

The reference allele for the marker

Returns:	reference allele
Return type:	string

pydigree.io.beagle.read_beagle(genofile, markerfile)¶

Reads BEAGLE formatted genotype data

Parameters:	genofile (string) – Filename containing genotype information for individuals markerfile (string) – Filename containing marker location and allele information corresponding to genofile
Return type:	Population

pydigree.io.beagle.read_beagle_genotypefile(filename, pop, missingcode='0')¶

Reads BEAGLE formatted genotype files

Arguments

Parameters:	filename – Filename of BEAGLE genotype file pop – the population to add these individuals to missingcode (string) – The value that indicates a missing genotype
Return type:	void

pydigree.io.beagle.read_beagle_markerfile(filename, label=None)¶

Reads marker locations from a BEAGLE formatted file

Parameters:	filename (string) – The file to be read label – An optional label to give the chromosome, since the BEAGLE format does not require it
Return type:	ChromosomeTemplate

pydigree.io.genomesimla module¶

Read GenomeSIMLA formatted chromosome templates

pydigree.io.genomesimla.read_gs_chromosome_template(templatef)¶

Reads a genomeSIMLA format chromosome template file

Parameters:	templatef (string) – The filename of the template file
Return type:	A ChromosomeTemplate object corresponding to the file

pydigree.io.kinship module¶

Reads kinships from KinInbCoef

pydigree.io.kinship.read_kinship(filename)¶

Reads a KinInbCoef formatted file of kinship and inbreeding coefficients

Parameters:	filename (string) – the filename to be read

Returns: a dictionary in the format {frozenset({(fam, ind_a), (fam, ind_b)}): kinship/inbreeding

pydigree.io.plink module¶

Functions for reading PLINK formatted genotype files

pydigree.io.plink.create_pop_handler_func(mapfile)¶

Creates a closure to provide as the population handler for pydigree.io.base.read_ped.

Parameters:	mapfile (string) – Filename of PLINK .map file
Return type:	callable

pydigree.io.plink.plink_data_handler(ind, data)¶

A function to handle the data payload from a plink line.

Parameters:	ind (Individual) – Individual for the record data (string) – the data for the record
Return type:	void

pydigree.io.plink.read_map(mapfile)¶

Reads a PLINK map file into a list of ChromosomeTemplate objects

Parameters:	mapfile (string) – Path of the file to be read
Return type:	a list of ChromosomeTemplate objects

pydigree.io.plink.read_plink(pedfile=None, mapfile=None, prefix=None, **kwargs)¶

Read a plink file by specifying pedfile and mapfile directly, or by using a prefix. Pass additional arguments to pydigree.io.base.read_ped with kwargs

Parameters:	pedfile – a plink PED file to be read mapfile – a plink MAP file to be read prefix – sets mapfile to ‘prefix.map’ and pedfile to ‘prefix.ped’ kwargs – additional arguments passed to read_ped

Returns: A PedigreeCollection object

pydigree.io.plink.write_map(pedigrees, mapfile, output_chromosomes=None)¶

Writes the genotype location data to a PLINK MAP file

Parameters:	pedigrees – the population containing the data to be written mapfile – the name of the file to be output to output_chromosomes – which chromosomes to write

Returns: Nothing

pydigree.io.plink.write_ped(pedigrees, pedfile, delim=' ', predicate=None, output_chromosomes=None)¶

write_ped writes data in a plink-format PED file, and optionally a plink-format map file.

Parameters:

pedigrees – An object of class PedigreeCollection containing what you want to output
pedfile – a string giving the name out the file to output to.
mapfile – the name of a mapfile to output, if you want to output one. an object that evaluates as False or None will skip the mapfile
genotypes – Should genotypes be output True/False
delim – Field seperator
predicate – Which inputs to include in the output file. If not specified all are output. If the string is ‘affected’, only affected individuals are output. If the string is ‘phenotyped’, all individuals with phenotype information are output. Any other value of predicate must be a function to perform on the individual that evaluates to True/False for whether the individual should be output.

Returns: Nothing

pydigree.io.plink.write_plink(pedigrees, filename_prefix, predicate=None, mapfile=False, compression=None, output_chromosomes=None)¶

Write individual genotypes to a file in plink PED data format. Optionally outputs the genotype locations to the mapfile.

Parameters:

pedigrees (IndividualContainer) – the pedigrees to be written
filename_prefix (string) – The output ped file (‘.ped’ will be appended)
predicate (callable) – a callable that evaluates True when the individual should be outputted to the file
mapfile (string) – True if the plink MAP file should be written
compression ('gzip' or 'bz2') – Compress the data? Options are bz2 and gzip, otherwise data will be written uncompressed

Return type:

void

pydigree.io.sgs module¶

Utilities for file I/O with SGS data

class pydigree.io.sgs.GermlineRecord(line)¶

Bases: object

A class for working with records in GERMLINE formatted files

bp_locations¶

Are the segments provided in physical or genetic positions?

Returns:	True if physical, False if centimorgans
Return type:	bool

location¶: Returns the location of the segment

pair¶: Returns the individuals for the segment

pydigree.io.sgs.read_germline(filename)¶

Reads a GERMLINE formatted SGS filename into an SGSAnalysis object

GERMLINE files are text files with the format:

Family ID 1

Individual ID 1

Family ID 2

Individual ID 2

Chromosome

Segment start (bp/cM)

Segment end (bp/cM)

Segment start (SNP)

Segment end (SNP)

Total SNPs in segment

Length of segment

Units for genetic length (cM or MB)

Mismatching SNPs in segment

1 if Individual 1 is homozygous in match; 0 otherwise

1 if Individual 2 is homozygous in match; 0 otherwise

This function only uses 0-6.

pydigree.io.sgs.write_sgs(data, filename)¶

GERMLINE files are text files with the format:

Family ID 1

Individual ID 1

Family ID 2

Individual ID 2

Chromosome

Segment start (bp/cM)

Segment end (bp/cM)

Segment start (SNP)

Segment end (SNP)

Total SNPs in segment

Genetic length of segment

Units for genetic length (cM or MB)

Mismatching SNPs in segment

1 if Individual 1 is homozygous in match; 0 otherwise

1 if Individual 2 is homozygous in match; 0 otherwise

pydigree.io.smartopen module¶

pydigree.io.smartopen.smartopen(filename, mode='r')¶: Seamlessly open compressed files. Use in place of regular open.

Note

Python’s compression modules iterate over compressed files as bytes, not strings. Unless ‘b’ (binary) is specified in the mode, we add ‘t’ (text) to the mode to force iteration as strings, operating consistently

pydigree.io.vcf module¶

class pydigree.io.vcf.VCFRecord(line)¶

Bases: object

A class for parsing lines in VCF files

genotypes()¶: Extracts the genotypes from a VCF record

getitems(item)¶

Gets the value in each data field for the key specified in the format field.

Parameters:	item – key to find
Returns:	values for each individual
Return type:	list

info¶

A dictionary representing the INFO field in a VCF record

Rtype dict:

pydigree.io.vcf.read_vcf(filename, require_pass=False, freq_info=None)¶

Reads a VCF file and returns a Population object with the individuals represented in the file

Genotypes generated by this function will be sparse

Parameters:	require_pass (bool) – only allow variants with PASS under FILTER freq_info – INFO field to get allele frequency from freq_info – string
Returns:	Individuals in the VCF
Return type:	Population

pydigree.io package¶

Submodules¶

pydigree.io.base module¶

pydigree.io.beagle module¶

pydigree.io.genomesimla module¶

pydigree.io.kinship module¶

pydigree.io.plink module¶

pydigree.io.sgs module¶

pydigree.io.smartopen module¶

pydigree.io.vcf module¶

Module contents¶