pydigree package

Subpackages

Submodules

pydigree.common module

pydigree.common.count(val, iterable)

Counts how many times a value (val) occurs in an iterable, excluding None

Parameters:
  • val – The value to be counted
  • iterable (iterable) – values to be counted over
Returns:

the count of values

Return type:

int

pydigree.common.cumsum(iter)

Cumulative sum: >>> pydigree.cumsum([0,1,2,3,4]) [0, 1, 3, 6, 10]

Parameters:iter – the iterable to be cumsum’ed

Returns: cumulative sums :rtype: integer

pydigree.common.flatten(x)

Recursively flattens lists.

pydigree.common.grouper(iterable, n, fillvalue=None)

Collect data into fixed-length chunks or blocks :param iterable: values to evaluate :param n: size of the groups :param fillvalue: Value to pad the last group with if len(iterable) % n != 0

Return type:a generator
pydigree.common.invert_dict(d)

Makes the keys the values and the values the keys

Warning

No guarantee of dictionary structure if mappings are not unique

Parameters:d – dictionary to be inverted
Return type:dict
pydigree.common.log_base_change(value, old, new)

Changes the base of the logarithm used on value from old to new

Arguments: value: the value to be converted old: old base (numeric) new: new base (numeric)

Returns:log of value in new base
Return type:float
pydigree.common.merge_dicts(*args)

Merges two dictionaries into one bigger one

pydigree.common.mode(seq)

Returns the most common value in a sequence

Parameters:seq – sequence of values to evaluate
pydigree.common.product(iter)

Reduces an iterable by multiplication. Analogous to sum, but with multiplication instead of addition.

Returns:overall product
Return type:numeric
pydigree.common.random_choice(iterable)

Randomly chooses an item from an iterable

pydigree.common.table(seq)

For each unique value in seq, runs count() on it. Returns a dictionary in the form of {value1: count, value2: count}.

Parameters:seq – Values to make a table of
Return type:dict

pydigree.exceptions module

Exceptions raised by pydigree routines

exception pydigree.exceptions.FileFormatError

Bases: Exception

Error raised when a problem is encountered parsing a file

exception pydigree.exceptions.IterationError

Bases: Exception

The procedure has run out of iterations without finding a result

exception pydigree.exceptions.NotMeaningfulError

Bases: Exception

The error raised for something that does not a meaningul result”

exception pydigree.exceptions.SimulationError

Bases: Exception

Error raised when an error has occurred during a simulation

pydigree.ibs module

pydigree.ibs.chromwide_ibs(a, b, c, d, missingval=64)

Efficiently evaluates IBS across a diploid set of chromosomes, sets IBS where one genotype is missing to missingval.

Parameters:
Returns:

IBS states, with missing values coded as missingval

Return type:

numpy array of type uint8

pydigree.ibs.get_ibs_states(ind1, ind2, chromosome_index, missingval=64)

Efficiently returns IBS states across an entire chromsome.

Arguments: Two individuals, and the index of the chromosome to scan.

Returns: A numpy array (dtype: np.uint8) of IBS states, with IBS between missing values coded as missingval

pydigree.individual module

A class representing individuals

class pydigree.individual.Individual(population, label, father=None, mother=None, sex=None)

Bases: object

An object for working with the phenotypes and genotypes of an individual in a genetic study or simulation

ancestors()

Recursively searches for ancestors.

Returns:A collection of all the ancestors of the individual
Return type:set of Individuals
chromosomes

Returns a list of the individuals ChromosomeTemplate objects

clear_genotypes()

Removes genotypes

constrained_gamete(constraints, attempts=1000)
delabel_genotypes()

When an individual has label genotypes, replaces the labels with the ancestral allele corresponding to the label

delete_phenotype(trait)

Removes a phenotype from the phenotype dictionary

depth

Returns the depth of an individual in a genealogy, a rough measure of what generation in the pedigree the individual is. Defined as: depth = 0 if individual is a founder, else the maximum of the depth of each parent

Returns:Indiviual depth
Return type:integer
descendants()

Recursively searches for descendants.

Returns:A collection of all the descendants of the individual
Return type:set of Individuals
static fertilize(father, mother)

Combines a set of half-genotypes (from method gamete) to a full set of genotypes

full_label
Returns:the population label and individual label
Return type:2-tuple
gamete()

Provides a set of half-genotypes to use with method fertilize

Returns:a collection of AlleleContainers
Return type:list
genotype_as_phenotype(locus, minor_allele, label)

Creates a phenotype record representing the additive effect of a minor allele at the specified locus. That is, the phenotype created is the number of copies of the minor allele at the locus.

Parameters:
  • locus – the site to count minor alleles
  • minor_allele – the allele to count
  • label – The name of the phenotype to be added
Returns:

void

get_constrained_genotypes(constraints, linkeq=True)

Gets genotypes from parents (or population if the individual is a founder) subject to constraints. Used by the simulation objects in pydigree.simulation

get_genotype(loc, checkhasgeno=True)

Returns alleles at a position.

Returns:Genotype tuple
Return type:tuple
get_genotypes(linkeq=False)

Retrieve genotypes from a chromosome pool if present, or else a chromosome generated under linkage equilibrium

Parameters:linkeq (bool) – should the retrieved genotypes be generated under linkage equilibrium?
Returns:Nothing
has_allele(location, allele)

Returns True if individual has the specified allele at location

has_genotypes()

Tests if individual has genotypes

Returns:genotype available
Return type:bool
inbreeding()

Returns the inbreeding coefficient (F) for the individual.

is_founder()

Returns true if individual is a founder

is_marryin_founder()

Returns true if an individual is a marry-in founder i.e.: the individual is a founder (depth: 0) and has a child with depth > 1

label_genotypes()

Gives the individual label genotypes.

When label genotypes are transmitted on to the next generation, you can see where each allele in the next generation came from. This is sseful for gene dropping simulations and reducing the memory footprint of forward-time simulation.

matriline()

Returns a label by recursively searching for the individual’s mother’s mother’s mother’s etc. until it reachesa founder mother, in which case it returns that ancestor’s id.

Useful reference: Courtenay et al. ‘Mitochondrial haplogroup X is associated with successful aging in the Amish.’ Human Genetics (2012). 131(2):201-8. doi: 10.1007/s00439-011-1060-3. Epub 2011 Jul 13.

Returns:Label of the matriline founder
parents()

Returns the individual’s father and mother in a 2-tuple

patriline()

Returns a label by recursively searching for the individual’s mother’s father’s father’s etc. until it reaches a founder father, in which case it returns that ancestor’s id.

Analagous to individual.matriline.

Returns:Label of patriline founder
predict_phenotype(trait)

Predicts phenotypes from a given trait architecture and sets it

register_child(child)

Add a child for this individual

Parameters:child (Individual) – child object
register_with_parents()

Inform the parent Individuals that this is their child

remove_ancestry()

Removes ancestry: makes a person a founder. Cannot be used on an individual in a pedigree, because the pedigree structure is already set.

set_genotype(location, genotype)

Manually set a genotype

siblings(include_halfsibs=False)

Returns this individuals sibliings.

Parameters:include_halfsibs (bool) – Include half-siblings
Returns:A collection of the individual’s siblings
Return type:set of Individuals
update(other)

Takes another individual object, merges/updates phenotypes with the other individual object and replace self’s genotypes with other’s

Warning

This does not merge genotypes, it replaces them! Additionally, phenotypes in the other data overwrite the existing ones

Parameters:other (Individual) – the data to update with
pydigree.individual.is_missing_genotype(g)

pydigree.individualcontainer module

class pydigree.individualcontainer.IndividualContainer

Bases: object

allele_frequency(location, allele, constraint=None)

Returns the frequency (as a percentage) of an allele in this population

Parameters:
  • location – the locus to be evaluated
  • allele – the allele to be counted
  • constraint (callable) – Function that acts on an individual. If constraint(individual) can evaluate as True that person is included
Return type:

float

allele_list(location, constraint=None)

The list of available alleles at a location in this population

The argument constraint is a function that acts on an individual. If constraint(individual) can evaluate as True that person is included

alleles(location, constraint=None)

Returns the set of available alleles at a locus in the population.

The argument constraint is a function that acts on an individual. If constraint(individual) can evaluate as True that person is included

Returns: a set of the available genotypes at a location

apply(func)

Calls a function on each individual object in the collection and yields its value

Parameters:func (callable) – A function that takes an Individual as a parameter
Return type:generator
apply_inplace(func)

Calls a function on each individual object in the collection

Parameters:func (callable) – A function that takes an Individual as a parameter
Return type:void
clear_genotypes()

Removes all the genotypes for individuals

delete_phenotype(trait)
females()

Returns list of females in population

founders()

Returns a list of founders in population

genotype_as_phenotype(locus, minor_allele, label)

Dispatches a genotype_as_phenotype to each individual in the pedigree.

See docstring for Individual.genotype_as_phenotype for more details

genotype_missingness(location)

Returns the percentage of individuals in the population missing a genotype at location.

Returns: A float

get_founder_genotypes()

Have founder individuals request genotypes

get_genotypes()

Have individuals request genotypes

major_allele(location, constraint=None)

Returns the major (most common) allele in this population at a locus.

Parameters:
  • location – the position to find the major allele at
  • constraint – a constraint (see population.alleles)
Returns:

the major allele at a locus

males()

Returns list of males in population

nonfounders()

Returns a list of founders in population

phenotype_dataframe(onlyphenotyped=True)

Returns a pandas dataframe of the phenotypes available for the individuals present in the collection.

Arguments: onlyphenotyped: Only include individuals who have phenotypes

Returns: A pandas dataframe

phenotypes()

Returns the available phenotypes for analysis

predict_phenotype(trait)

Shortcut function to call predict_phenotype(trait) for every individual in the population.

Returns: nothing

sex_ratio()

Returns the sex ratio in the population, defined as n_male / n_female

Returns:sex ratio
Return type:float

pydigree.paths module

Functions for finding paths through pedigrees and genealogies

pydigree.paths.common_ancestors(ind1, ind2)

Common ancestors of ind1 and ind2.

Recursively searches ancestors for both individuals, and then performs a set intersection on on each set of ancestors

Parameters:
Returns:

Common ancestors

Return type:

set

pydigree.paths.fraternity(ind1, ind2)

The coefficient of fraternity is the probability that both alleles in a pair of individuals are IBD. This is equal to: k(x_m,y_m) * k(x_f,y_f) + k(x_m,y_f) * k(x_f,y_m), where x_m represents the mother of x, etc. and k(x,y) represents the Malecot kinship between x and y.

Parameters:
Returns:

coefficient of fraternity

Return type:

float

pydigree.paths.kinship(ind1, ind2)

Returns the Malecot kinship coefficient for ind1 and ind2, calculated by path counting. The kinship coefficient is the probability that a randomly selected pair of alleles (one from each individual) is IBD.

This quantity is calculated by finding the common ancestors of ind1 and ind2. For each ancestor, the paths between ind1 and ind2 are found.

The kinship coefficient is calculated as the sum for every ancestor of the sum of ((1/2) ** N) * (1+ F) for each path, where N is the number of individuals in the path and F is the inbreeding coefficient for that ancestor.

Parameters:
Returns:

Malecot’s coefficient of coancestry

Return type:

float

pydigree.paths.path_downward(start, end, path=None)

Returns a list of paths (if they exist) from an ancestor (start) to a descentdant (end).

Parameters:
  • start – The individual at the start of the path
  • end – The individual to be found
Path:

a path to append to

Returns:

A list of individuals, in path order

pydigree.paths.paths(ind1, ind2)

Returns a list of all valid paths through the pedigree connecting individual 1 and individual 2. A valid path can only go through an individual once, or it’s not a valid path.

This function consists of repeated calls to paths_through_ancestor for each common ancestor of ind1 and ind2

For pedigrees, where typically many kinship coefficients must be calculated simulateously, kinships are calculated by this function. See notes on pedigree.kinship for more information.

Parameters:
Returns:

identified paths

Return type:

list of lists of Individuals

pydigree.paths.paths_through_ancestor(ind1, ind2, ancestor)

Finds all paths through a genealogy between ind1 and ind2 that pass through a specific ancestor.

Parameters:
Returns:

identified paths

Return type:

list of lists of Individuals

pydigree.pedigree module

A collection of individuals with fixed relationships

class pydigree.pedigree.Pedigree(label=None)

Bases: pydigree.population.Population

A collection of individuals with fixed relationships

additive_relationship_matrix(ids=None)

Calculates an additive relationship matrix (the A matrix) for quantiatitive genetics.

A_ij = 2 * kinship(i,j) if i != j. (See the notes on function ‘kinship’) A_ij = 1 + inbreeding(i) if i == j (inbreeding(i) is equivalent to kinship(i.father,i.mother))

Parameters:ids – IDs of pedigree members to include in the matrix

Important: if not given, the rows/columns are all individuals in the pedigree, sorted by id. If you’re not sure about this, try sorted(x.label for x in ped) to see the ordering.

Returns:additive relationship matrix
Return type:matrix
bit_size()

Returns the bit size of the pedigree. The bitsize is defined as 2*n-f where n is the number of nonfounders and f is the number of founders. This represents the number of bits it takes to represent the inheritance vector in the Lander-Green algorithm.

Returns:bit size
Return type:pedigree
dominance_relationship_matrix(ids=None)

Calculates the dominance genetic relationship matrix (the D matrix) for quantitative genetics.

D_ij = fraternity(i,j) if i != j D_ij = 1 if i == j

Parameters:ids – IDs of pedigree members to include in the matrix

Important: if not given, the rows/columns are all individuals in the pedigree, sorted by id. If you’re not sure about this, try sorted(x.label for x in ped) to see the ordering.

Returns:dominance relationship matrix
Return type:matrix
fraternity(id1, id2)

Like Pedigree.kinship, this is a convenience function for getting fraternity coefficients for two pedigree memebers by their ID label.

This is a wrapper for paths.fraternity

Parameters:
  • id1 – the label of a individual to be evaluated
  • id2 – the label of a individual to be evaluated
Returns:

coefficient of fraternity

Return type:

float

inbreeding(indlab)

Like Pedigree.kinship, this is a convenience function for getting inbreeding coefficients for individuals in pedigrees by their id label. As inbreeding coefficients are the kinship coefficient of the parents, this function calls Pedigree.kinship to check for stored values.

Parameters:id – the label of the individual to be evaluated
Returns:inbreeding coefficient
Return type:a double
kinship(id1, id2)

Get the Malecot coefficient of coancestry for two individuals in the pedigree. These are calculated recursively. For pedigree objects, results are stored to reduce the calculation time for kinship matrices.

Parameters:
  • id1 – the label of a individual to be evaluated
  • id2 – the label of a individual to be evaluated
Returns:

Malecot’s coefficient of coancestry

Return type:

float

Reference: Lange. Mathematical and Statistical Methods for Genetic Analysis. 1997. Springer.

mitochondrial_relationship_matrix(ids=None)

Calculates the mitochondrial relationship matrix. M_ij = 1 if matriline(i) == matriline(j)

Parameters:ids – IDs of pedigree members to include in the matrix

Important: if not given, the rows/columns are all individuals in the pedigree, sorted by id. If you’re not sure about this, try sorted(x.label for x in ped) to see the ordering.

Returns: A numpy matrix

Reference: Liu et al. “Association Testing of the Mitochondrial Genome Using Pedigree Data”. Genetic Epidemiology. (2013). 37,3:239-247

simulate_ibd_states(inds=None)

Simulate IBD patterns by gene dropping: Everyone’s genotypes reflect the founder chromosome that they received the genotype from. You can then use the ibs function to determine IBD state. This effectively an infinite-alleles simulation.

Returns: Nothing

pydigree.pedigreecollection module

class pydigree.pedigreecollection.PedigreeCollection(peds=None)

Bases: pydigree.individualcontainer.IndividualContainer

add_chromosome(chrom)
add_pedigree(ped)
additive_relationship_matrix(ids=None)

Returns a block diagonal matrix of additive relationships for each pedigree.

See notes on Pedigree.additive_relationship_matrix

chromosomes
dominance_relationship_matrix(ids=None)

Returns a block diagonal matrix of dominance relationships for each pedigree.

See notes on Pedigree.dominance_relationship_matrix

individuals

Returns a list of the individuals represented by all pedigrees, sorted by pedigree label, id label

keys()
mitochondrial_relationship_matrix(ids=None)

Returns a block diagonal matrix of mitochondrial relationships for each pedigree.

See notes on Pedigree.mitochondrial_relationship_matrix

pedigrees

Returns a list of the pedigree objects contained in the collection

update(pop)

pydigree.phenotypes module

A phenotype holder

class pydigree.phenotypes.Phenotypes(data=None)

Bases: object

A container for the set of phenotypes and exposures associated with an Individual in a population

clear()

Clears all phenotypes from the object

delete_phenotype(key)

Clears the phenotype from the object

get(key, default)

Gets a phenotype or default value if not present

has_phenotype(key)

Does the individual have a certain phenotype?

Parameters:phenotype (string) – what phenotype are we looking for
Returns:True if phenotype present and not None
Return type:bool
items()

Iterate over phenotype name, value pairs

keys()

Iterate over the available phenotype names

to_series()

Returns phenotypes as a pandas Series

update(other)

Updates the current Phenotype object with data from the other

values()

Iterate over the available phenotype values

pydigree.population module

class pydigree.population.Population(intial_pop_size=0, name=None)

Bases: pydigree.individualcontainer.IndividualContainer

add_chromosome(chrom)

Adds a chromosome to the population

add_founders(n)

Adds a number of founder individuals to the population

Parameters:n (int) – number of individuals to add
Return type:void
advance_generation(gensize, mating=None)

Simulates a generation of random mating.

Parameters:
  • gensize (numeric) – The size of the new generation
  • mating (MatingScheme) – MatingScheme for the generation
chromosome_count()

Returns the number of chromosomes in this population

founder_individual(register=True, sex=None)

Creates a new founder individual and adds to the population

get_founder_genotypes()

Gives genotypes to each founder in the population with chromosomes from the chromosome pool. If there is no pool, genotypes are generated under linkage equilibrium

get_genotypes()

Causes each Individual object in the pedigree to request genotypes from its parents

get_linkage_equilibrium_genotypes()

Returns a set of genotypes for an individual in linkage equilibrium

individuals

Returns a list of individuals in the population

mate(ind1, ind2, indlab, sex=None)

Creates an individual as the child of two specificied individual objects and randomly chooses a sex.

Parameters:
  • ind1 (Individual) – The first parent
  • ind2 (Individual) – The second parent
  • indlab – ID label for the child
  • sex ({0,1}) – Sex of child, randomly chosen if not specified
Returns:

An individual with ind1 and ind2 as parents

Return type:

Individual

register_individual(ind)

Adds an individual to the population

remove_ancestry()

Makes every individual in the population a founder

remove_individual(ind)

Removes an individual from the population

size()

Returns the number of individuals in the population.

update(other)

Merges two datasets (i.e. performs Individual.update for each individual in the pedigree)

Assumes unique individual IDs

Parameters:other (Population) – New data to merge in
Returns:void
pydigree.population.exponential_growth(p, r, t)

Models exponential growth over discrete generations.

Parameters:
  • p (numeric) – initial population
  • r (numeric) – growth rate
  • t (numeric) – number of generations
Returns:

population size at time t

Return type:

numeric

pydigree.population.is_missing_genotype(g)
pydigree.population.logistic_growth(p, r, k, t)

Models logistic growth over discrete generations.

Parameters:
  • p – initial population
  • r – growth rate
  • k – final population
  • t – number of generations
Returns:

population size at time t

Return type:

numeric

pydigree.rand module

Convenience functions for randomness

pydigree.rand.choice(seq)

Randomly chooses an item from a sequence. Probabilities are uniform.

Parameters:seq – choices
Returns:randomly chosen item
pydigree.rand.sample_with_replacement(seq, n)

Choose a sample of n items with replacement from a sequence c

Parameters:
  • seq – sequence to choose from
  • n – sample size
Returns:

randomly chosen items

Return type:

list

pydigree.rand.set_seed(seed)

Set the random seed.

Parameters:seed – random seed
Return type:void

pydigree.recombination module

Functions for recombining haploid chromosomes

pydigree.recombination.recombine(chr1, chr2, genetic_map)

Takes two chromatids and returns a simulated one by an exponential process

Parameters:
Returns:

Recombined chromosome

Return type:

AlleleContainer

Module contents