pydigree.io package

Submodules

pydigree.io.base module

class pydigree.io.base.PEDRecord(line, delimiter=' ')

Bases: object

create_individual(population=None)

Creates an Individual object from a Pedigree Record.

The individual will have the id tuple of (fam_id, ind_id)

Parameters:population (Population) – Population for the individual to belong to
Return type:Individual
pydigree.io.base.connect_individuals(pop)

Makes the connections in the genealogy from parents to children and vice versa.

Parameters:population (IndividualContainer) – Set of individuals to connect
Rtype void:
pydigree.io.base.genotypes_from_sequential_alleles(chromosomes, data, missing_code='0')

Takes a series of alleles and turns them into genotypes.

For example: The series ‘1 2 1 2 1 2’ becomes chrom1 = [1, 1, 1] chrom2 = [2, 2, 2]

These are returned in the a list in the form:

::
[(chroma, chromb), (chroma, chromb)...]
Parameters:
  • chromosomes (list of ChromosomeTemplate) – genotype data
  • data – The alleles to be turned into genotypes
  • missing_code (string) – value representing a missing allele
Returns:

A list of 2-tuples of Alleles objects

pydigree.io.base.read_ped(filename, population=None, delimiter=None, affected_labels=None, population_handler=None, data_handler=None, connect_inds=True, onlyinds=None)

Reads a plink format pedigree file, ie:

::
familyid indid father mother sex whatever whatever whatever

into a pydigree pedigree object, with optional population to assign to pedigree members. If you don’t provide a population you can’t simulate genotypes!

Parameters:
  • filename (string) – The file to be read
  • population (Population) – The population to assign individuals to
  • delimiter (string) – a string defining the field separator, default: any whitespace
  • affected_labels (dict (str -> value)) – The labels that determine affection status.
  • population_handler – a function to set up the population
  • data_handler (callable) – a function to turn the data into useful individual information
  • connect_inds (bool) – build references between individuals. Requires all individuals be present in the file
  • onlyinds (iterable) – only include data for specified individuals
Returns:

individuals contained in the pedigree file

Return type:

PedigreeCollection

pydigree.io.base.read_phenotypes(pedigrees, csvfile, delimiter=', ', missingcode='X')

Reads a csv with header famid, ind, phen, phen, phen, phen etc etc

Arguments :param pedigrees: data to update :param csvfile: the filename of the file containing phenotypes. :param delimiter: the field delimiter for the file :param missingcode: the code for missing values :type pedigrees: PedigreeCollection :type csvfile: string :type missingcode: string

Return type:void
pydigree.io.base.sort_pedigrees(inds, population_handler)

Takes a set of individuals and sorts them into pedigrees.

Individuals must have labels that are (pedid, indid) tuples

Parameters:
  • inds (iterable) – Individuals to be sorted
  • population_handler (Callable) – a function to set up the population
Returns:

Collection of pedigrees from the individuals

Return type:

PedigreeCollections

pydigree.io.base.write_pedigree(pedigrees, filename, delim=' ')

Writes pedigree to a LINKAGE formatted pedigree file

Parameters:
  • pedigrees – Data to write
  • filename – filename to write to
  • delim – output field separator
Return type:

void

pydigree.io.base.write_phenotypes(pedigrees, filename, predicate=None, missingcode='X', delim=', ')

Writes phenotypes to a CSV (or other field delimited) file

Parameters:
  • pedigrees – Data to write
  • filename – filename to write to
  • missingcode (string) – code to use for missing values
  • delim (string) – output field separator

pydigree.io.beagle module

Utilities for reading BEAGLE formatted genotype data

class pydigree.io.beagle.BeagleGenotypeRecord(line)

Bases: object

A class corresponding to a record in a BEAGLE genotype file

data
identifier
is_phenotype_record()

Returns true if record corresponds to a phenotype field

label
class pydigree.io.beagle.BeagleMarkerRecord(line)

Bases: object

A class corresponding to a marker in a BEAGLE marker file

alleles
alternates

The alternate alleles for the marker

Returns:Each alternate allele
Return type:iterable
label
pos
reference

The reference allele for the marker

Returns:reference allele
Return type:string
pydigree.io.beagle.read_beagle(genofile, markerfile)

Reads BEAGLE formatted genotype data

Parameters:
  • genofile (string) – Filename containing genotype information for individuals
  • markerfile (string) – Filename containing marker location and allele information corresponding to genofile
Return type:

Population

pydigree.io.beagle.read_beagle_genotypefile(filename, pop, missingcode='0')

Reads BEAGLE formatted genotype files

Arguments

Parameters:
  • filename – Filename of BEAGLE genotype file
  • pop – the population to add these individuals to
  • missingcode (string) – The value that indicates a missing genotype
Return type:

void

pydigree.io.beagle.read_beagle_markerfile(filename, label=None)

Reads marker locations from a BEAGLE formatted file

Parameters:
  • filename (string) – The file to be read
  • label – An optional label to give the chromosome, since the BEAGLE format does not require it
Return type:

ChromosomeTemplate

pydigree.io.genomesimla module

Read GenomeSIMLA formatted chromosome templates

pydigree.io.genomesimla.read_gs_chromosome_template(templatef)

Reads a genomeSIMLA format chromosome template file

Parameters:templatef (string) – The filename of the template file
Return type:A ChromosomeTemplate object corresponding to the file

pydigree.io.kinship module

Reads kinships from KinInbCoef

pydigree.io.kinship.read_kinship(filename)

Reads a KinInbCoef formatted file of kinship and inbreeding coefficients

Parameters:filename (string) – the filename to be read

Returns: a dictionary in the format {frozenset({(fam, ind_a), (fam, ind_b)}): kinship/inbreeding

pydigree.io.sgs module

Utilities for file I/O with SGS data

class pydigree.io.sgs.GermlineRecord(line)

Bases: object

A class for working with records in GERMLINE formatted files

bp_locations

Are the segments provided in physical or genetic positions?

Returns:True if physical, False if centimorgans
Return type:bool
location

Returns the location of the segment

pair

Returns the individuals for the segment

pydigree.io.sgs.read_germline(filename)

Reads a GERMLINE formatted SGS filename into an SGSAnalysis object

GERMLINE files are text files with the format:

  1. Family ID 1
  2. Individual ID 1
  3. Family ID 2
  4. Individual ID 2
  5. Chromosome
  6. Segment start (bp/cM)
  7. Segment end (bp/cM)
  8. Segment start (SNP)
  9. Segment end (SNP)
  10. Total SNPs in segment
  11. Length of segment
  12. Units for genetic length (cM or MB)
  13. Mismatching SNPs in segment
  14. 1 if Individual 1 is homozygous in match; 0 otherwise
  15. 1 if Individual 2 is homozygous in match; 0 otherwise

This function only uses 0-6.

pydigree.io.sgs.write_sgs(data, filename)

GERMLINE files are text files with the format:

  1. Family ID 1
  2. Individual ID 1
  3. Family ID 2
  4. Individual ID 2
  5. Chromosome
  6. Segment start (bp/cM)
  7. Segment end (bp/cM)
  8. Segment start (SNP)
  9. Segment end (SNP)
  10. Total SNPs in segment
  11. Genetic length of segment
  12. Units for genetic length (cM or MB)
  13. Mismatching SNPs in segment
  14. 1 if Individual 1 is homozygous in match; 0 otherwise
  15. 1 if Individual 2 is homozygous in match; 0 otherwise

pydigree.io.smartopen module

pydigree.io.smartopen.smartopen(filename, mode='r')

Seamlessly open compressed files. Use in place of regular open.

Note

Python’s compression modules iterate over compressed files as bytes, not strings. Unless ‘b’ (binary) is specified in the mode, we add ‘t’ (text) to the mode to force iteration as strings, operating consistently

pydigree.io.vcf module

class pydigree.io.vcf.VCFRecord(line)

Bases: object

A class for parsing lines in VCF files

genotypes()

Extracts the genotypes from a VCF record

getitems(item)

Gets the value in each data field for the key specified in the format field.

Parameters:item – key to find
Returns:values for each individual
Return type:list
info

A dictionary representing the INFO field in a VCF record

Rtype dict:
pydigree.io.vcf.read_vcf(filename, require_pass=False, freq_info=None)

Reads a VCF file and returns a Population object with the individuals represented in the file

Genotypes generated by this function will be sparse

Parameters:
  • require_pass (bool) – only allow variants with PASS under FILTER
  • freq_info – INFO field to get allele frequency from
  • freq_info – string
Returns:

Individuals in the VCF

Return type:

Population

Module contents