pydigree package¶
Subpackages¶
- pydigree.cydigree package
- pydigree.datastructures package
- pydigree.genotypes package
- pydigree.io package
- pydigree.sgs package
- pydigree.simulation package
- pydigree.stats package
Submodules¶
pydigree.common module¶
-
pydigree.common.count(val, iterable)¶ Counts how many times a value (val) occurs in an iterable, excluding None
Parameters: - val – The value to be counted
- iterable (iterable) – values to be counted over
Returns: the count of values
Return type: int
-
pydigree.common.cumsum(iter)¶ Cumulative sum: >>> pydigree.cumsum([0,1,2,3,4]) [0, 1, 3, 6, 10]
Parameters: iter – the iterable to be cumsum’ed Returns: cumulative sums :rtype: integer
-
pydigree.common.flatten(x)¶ Recursively flattens lists.
-
pydigree.common.grouper(iterable, n, fillvalue=None)¶ Collect data into fixed-length chunks or blocks :param iterable: values to evaluate :param n: size of the groups :param fillvalue: Value to pad the last group with if len(iterable) % n != 0
Return type: a generator
-
pydigree.common.invert_dict(d)¶ Makes the keys the values and the values the keys
Warning
No guarantee of dictionary structure if mappings are not unique
Parameters: d – dictionary to be inverted Return type: dict
-
pydigree.common.log_base_change(value, old, new)¶ Changes the base of the logarithm used on value from old to new
Arguments: value: the value to be converted old: old base (numeric) new: new base (numeric)
Returns: log of value in new base Return type: float
-
pydigree.common.merge_dicts(*args)¶ Merges two dictionaries into one bigger one
-
pydigree.common.mode(seq)¶ Returns the most common value in a sequence
Parameters: seq – sequence of values to evaluate
-
pydigree.common.product(iter)¶ Reduces an iterable by multiplication. Analogous to sum, but with multiplication instead of addition.
Returns: overall product Return type: numeric
-
pydigree.common.random_choice(iterable)¶ Randomly chooses an item from an iterable
-
pydigree.common.table(seq)¶ For each unique value in seq, runs count() on it. Returns a dictionary in the form of {value1: count, value2: count}.
Parameters: seq – Values to make a table of Return type: dict
pydigree.exceptions module¶
Exceptions raised by pydigree routines
-
exception
pydigree.exceptions.FileFormatError¶ Bases:
ExceptionError raised when a problem is encountered parsing a file
-
exception
pydigree.exceptions.IterationError¶ Bases:
ExceptionThe procedure has run out of iterations without finding a result
-
exception
pydigree.exceptions.NotMeaningfulError¶ Bases:
ExceptionThe error raised for something that does not a meaningul result”
-
exception
pydigree.exceptions.SimulationError¶ Bases:
ExceptionError raised when an error has occurred during a simulation
pydigree.ibs module¶
-
pydigree.ibs.chromwide_ibs(a, b, c, d, missingval=64)¶ Efficiently evaluates IBS across a diploid set of chromosomes, sets IBS where one genotype is missing to missingval.
Parameters: - a (AlleleContainer) – haploid genotypes
- b (AlleleContainer) – haploid genotypes
- c (AlleleContainer) – haploid genotypes
- d (AlleleContainer) – haploid genotypes
Returns: IBS states, with missing values coded as missingval
Return type: numpy array of type uint8
-
pydigree.ibs.get_ibs_states(ind1, ind2, chromosome_index, missingval=64)¶ Efficiently returns IBS states across an entire chromsome.
Arguments: Two individuals, and the index of the chromosome to scan.
Returns: A numpy array (dtype: np.uint8) of IBS states, with IBS between missing values coded as missingval
pydigree.individual module¶
A class representing individuals
-
class
pydigree.individual.Individual(population, label, father=None, mother=None, sex=None)¶ Bases:
objectAn object for working with the phenotypes and genotypes of an individual in a genetic study or simulation
-
ancestors()¶ Recursively searches for ancestors.
Returns: A collection of all the ancestors of the individual Return type: set of Individuals
-
chromosomes¶ Returns a list of the individuals ChromosomeTemplate objects
-
clear_genotypes()¶ Removes genotypes
-
constrained_gamete(constraints, attempts=1000)¶
-
delabel_genotypes()¶ When an individual has label genotypes, replaces the labels with the ancestral allele corresponding to the label
-
delete_phenotype(trait)¶ Removes a phenotype from the phenotype dictionary
-
depth¶ Returns the depth of an individual in a genealogy, a rough measure of what generation in the pedigree the individual is. Defined as: depth = 0 if individual is a founder, else the maximum of the depth of each parent
Returns: Indiviual depth Return type: integer
-
descendants()¶ Recursively searches for descendants.
Returns: A collection of all the descendants of the individual Return type: set of Individuals
-
static
fertilize(father, mother)¶ Combines a set of half-genotypes (from method gamete) to a full set of genotypes
-
full_label¶ Returns: the population label and individual label Return type: 2-tuple
-
gamete()¶ Provides a set of half-genotypes to use with method fertilize
Returns: a collection of AlleleContainers Return type: list
-
genotype_as_phenotype(locus, minor_allele, label)¶ Creates a phenotype record representing the additive effect of a minor allele at the specified locus. That is, the phenotype created is the number of copies of the minor allele at the locus.
Parameters: - locus – the site to count minor alleles
- minor_allele – the allele to count
- label – The name of the phenotype to be added
Returns: void
-
get_constrained_genotypes(constraints, linkeq=True)¶ Gets genotypes from parents (or population if the individual is a founder) subject to constraints. Used by the simulation objects in pydigree.simulation
-
get_genotype(loc, checkhasgeno=True)¶ Returns alleles at a position.
Returns: Genotype tuple Return type: tuple
-
get_genotypes(linkeq=False)¶ Retrieve genotypes from a chromosome pool if present, or else a chromosome generated under linkage equilibrium
Parameters: linkeq (bool) – should the retrieved genotypes be generated under linkage equilibrium? Returns: Nothing
-
has_allele(location, allele)¶ Returns True if individual has the specified allele at location
-
has_genotypes()¶ Tests if individual has genotypes
Returns: genotype available Return type: bool
-
inbreeding()¶ Returns the inbreeding coefficient (F) for the individual.
-
is_founder()¶ Returns true if individual is a founder
-
is_marryin_founder()¶ Returns true if an individual is a marry-in founder i.e.: the individual is a founder (depth: 0) and has a child with depth > 1
-
label_genotypes()¶ Gives the individual label genotypes.
When label genotypes are transmitted on to the next generation, you can see where each allele in the next generation came from. This is sseful for gene dropping simulations and reducing the memory footprint of forward-time simulation.
-
matriline()¶ Returns a label by recursively searching for the individual’s mother’s mother’s mother’s etc. until it reachesa founder mother, in which case it returns that ancestor’s id.
Useful reference: Courtenay et al. ‘Mitochondrial haplogroup X is associated with successful aging in the Amish.’ Human Genetics (2012). 131(2):201-8. doi: 10.1007/s00439-011-1060-3. Epub 2011 Jul 13.
Returns: Label of the matriline founder
-
parents()¶ Returns the individual’s father and mother in a 2-tuple
-
patriline()¶ Returns a label by recursively searching for the individual’s mother’s father’s father’s etc. until it reaches a founder father, in which case it returns that ancestor’s id.
Analagous to individual.matriline.
Returns: Label of patriline founder
-
predict_phenotype(trait)¶ Predicts phenotypes from a given trait architecture and sets it
-
register_child(child)¶ Add a child for this individual
Parameters: child (Individual) – child object
-
register_with_parents()¶ Inform the parent Individuals that this is their child
-
remove_ancestry()¶ Removes ancestry: makes a person a founder. Cannot be used on an individual in a pedigree, because the pedigree structure is already set.
-
set_genotype(location, genotype)¶ Manually set a genotype
-
siblings(include_halfsibs=False)¶ Returns this individuals sibliings.
Parameters: include_halfsibs (bool) – Include half-siblings Returns: A collection of the individual’s siblings Return type: set of Individuals
-
update(other)¶ Takes another individual object, merges/updates phenotypes with the other individual object and replace self’s genotypes with other’s
Warning
This does not merge genotypes, it replaces them! Additionally, phenotypes in the other data overwrite the existing ones
Parameters: other (Individual) – the data to update with
-
-
pydigree.individual.is_missing_genotype(g)¶
pydigree.individualcontainer module¶
-
class
pydigree.individualcontainer.IndividualContainer¶ Bases:
object-
allele_frequency(location, allele, constraint=None)¶ Returns the frequency (as a percentage) of an allele in this population
Parameters: - location – the locus to be evaluated
- allele – the allele to be counted
- constraint (callable) – Function that acts on an individual. If constraint(individual) can evaluate as True that person is included
Return type: float
-
allele_list(location, constraint=None)¶ The list of available alleles at a location in this population
The argument constraint is a function that acts on an individual. If constraint(individual) can evaluate as True that person is included
-
alleles(location, constraint=None)¶ Returns the set of available alleles at a locus in the population.
The argument constraint is a function that acts on an individual. If constraint(individual) can evaluate as True that person is included
Returns: a set of the available genotypes at a location
-
apply(func)¶ Calls a function on each individual object in the collection and yields its value
Parameters: func (callable) – A function that takes an Individualas a parameterReturn type: generator
-
apply_inplace(func)¶ Calls a function on each individual object in the collection
Parameters: func (callable) – A function that takes an Individualas a parameterReturn type: void
-
clear_genotypes()¶ Removes all the genotypes for individuals
-
delete_phenotype(trait)¶
-
females()¶ Returns list of females in population
-
founders()¶ Returns a list of founders in population
-
genotype_as_phenotype(locus, minor_allele, label)¶ Dispatches a genotype_as_phenotype to each individual in the pedigree.
See docstring for Individual.genotype_as_phenotype for more details
-
genotype_missingness(location)¶ Returns the percentage of individuals in the population missing a genotype at location.
Returns: A float
-
get_founder_genotypes()¶ Have founder individuals request genotypes
-
get_genotypes()¶ Have individuals request genotypes
-
major_allele(location, constraint=None)¶ Returns the major (most common) allele in this population at a locus.
Parameters: - location – the position to find the major allele at
- constraint – a constraint (see population.alleles)
Returns: the major allele at a locus
-
males()¶ Returns list of males in population
-
nonfounders()¶ Returns a list of founders in population
-
phenotype_dataframe(onlyphenotyped=True)¶ Returns a pandas dataframe of the phenotypes available for the individuals present in the collection.
Arguments: onlyphenotyped: Only include individuals who have phenotypes
Returns: A pandas dataframe
-
phenotypes()¶ Returns the available phenotypes for analysis
-
predict_phenotype(trait)¶ Shortcut function to call predict_phenotype(trait) for every individual in the population.
Returns: nothing
-
sex_ratio()¶ Returns the sex ratio in the population, defined as n_male / n_female
Returns: sex ratio Return type: float
-
pydigree.paths module¶
Functions for finding paths through pedigrees and genealogies
-
pydigree.paths.common_ancestors(ind1, ind2)¶ Common ancestors of ind1 and ind2.
Recursively searches ancestors for both individuals, and then performs a set intersection on on each set of ancestors
Parameters: - ind1 (Individual) – the first individual
- ind2 (Individual) – the second individual
Returns: Common ancestors
Return type: set
-
pydigree.paths.fraternity(ind1, ind2)¶ The coefficient of fraternity is the probability that both alleles in a pair of individuals are IBD. This is equal to: k(x_m,y_m) * k(x_f,y_f) + k(x_m,y_f) * k(x_f,y_m), where x_m represents the mother of x, etc. and k(x,y) represents the Malecot kinship between x and y.
Parameters: - ind1 (Individual) – the first individual
- ind2 (Individual) – the second individual
Returns: coefficient of fraternity
Return type: float
-
pydigree.paths.kinship(ind1, ind2)¶ Returns the Malecot kinship coefficient for ind1 and ind2, calculated by path counting. The kinship coefficient is the probability that a randomly selected pair of alleles (one from each individual) is IBD.
This quantity is calculated by finding the common ancestors of ind1 and ind2. For each ancestor, the paths between ind1 and ind2 are found.
The kinship coefficient is calculated as the sum for every ancestor of the sum of ((1/2) ** N) * (1+ F) for each path, where N is the number of individuals in the path and F is the inbreeding coefficient for that ancestor.
Parameters: - ind1 (Individual) – the first individual
- ind2 (Individual) – the second individual
Returns: Malecot’s coefficient of coancestry
Return type: float
-
pydigree.paths.path_downward(start, end, path=None)¶ Returns a list of paths (if they exist) from an ancestor (start) to a descentdant (end).
Parameters: - start – The individual at the start of the path
- end – The individual to be found
Path: a path to append to
Returns: A list of individuals, in path order
-
pydigree.paths.paths(ind1, ind2)¶ Returns a list of all valid paths through the pedigree connecting individual 1 and individual 2. A valid path can only go through an individual once, or it’s not a valid path.
This function consists of repeated calls to paths_through_ancestor for each common ancestor of ind1 and ind2
For pedigrees, where typically many kinship coefficients must be calculated simulateously, kinships are calculated by this function. See notes on pedigree.kinship for more information.
Parameters: - ind1 (Individual) – the first individual
- ind2 (Individual) – the second individual
Returns: identified paths
Return type: list of lists of Individuals
-
pydigree.paths.paths_through_ancestor(ind1, ind2, ancestor)¶ Finds all paths through a genealogy between ind1 and ind2 that pass through a specific ancestor.
Parameters: - ind1 (Individual) – the first endpoint
- ind2 (Individual) – the other endpoint
- ancestor (Individual) – the ancestor to pass through
Returns: identified paths
Return type: list of lists of Individuals
pydigree.pedigree module¶
A collection of individuals with fixed relationships
-
class
pydigree.pedigree.Pedigree(label=None)¶ Bases:
pydigree.population.PopulationA collection of individuals with fixed relationships
-
additive_relationship_matrix(ids=None)¶ Calculates an additive relationship matrix (the A matrix) for quantiatitive genetics.
A_ij = 2 * kinship(i,j) if i != j. (See the notes on function ‘kinship’) A_ij = 1 + inbreeding(i) if i == j (inbreeding(i) is equivalent to kinship(i.father,i.mother))
Parameters: ids – IDs of pedigree members to include in the matrix Important: if not given, the rows/columns are all individuals in the pedigree, sorted by id. If you’re not sure about this, try sorted(x.label for x in ped) to see the ordering.
Returns: additive relationship matrix Return type: matrix
-
bit_size()¶ Returns the bit size of the pedigree. The bitsize is defined as 2*n-f where n is the number of nonfounders and f is the number of founders. This represents the number of bits it takes to represent the inheritance vector in the Lander-Green algorithm.
Returns: bit size Return type: pedigree
-
dominance_relationship_matrix(ids=None)¶ Calculates the dominance genetic relationship matrix (the D matrix) for quantitative genetics.
D_ij = fraternity(i,j) if i != j D_ij = 1 if i == j
Parameters: ids – IDs of pedigree members to include in the matrix Important: if not given, the rows/columns are all individuals in the pedigree, sorted by id. If you’re not sure about this, try sorted(x.label for x in ped) to see the ordering.
Returns: dominance relationship matrix Return type: matrix
-
fraternity(id1, id2)¶ Like Pedigree.kinship, this is a convenience function for getting fraternity coefficients for two pedigree memebers by their ID label.
This is a wrapper for paths.fraternity
Parameters: - id1 – the label of a individual to be evaluated
- id2 – the label of a individual to be evaluated
Returns: coefficient of fraternity
Return type: float
-
inbreeding(indlab)¶ Like Pedigree.kinship, this is a convenience function for getting inbreeding coefficients for individuals in pedigrees by their id label. As inbreeding coefficients are the kinship coefficient of the parents, this function calls Pedigree.kinship to check for stored values.
Parameters: id – the label of the individual to be evaluated Returns: inbreeding coefficient Return type: a double
-
kinship(id1, id2)¶ Get the Malecot coefficient of coancestry for two individuals in the pedigree. These are calculated recursively. For pedigree objects, results are stored to reduce the calculation time for kinship matrices.
Parameters: - id1 – the label of a individual to be evaluated
- id2 – the label of a individual to be evaluated
Returns: Malecot’s coefficient of coancestry
Return type: float
Reference: Lange. Mathematical and Statistical Methods for Genetic Analysis. 1997. Springer.
-
mitochondrial_relationship_matrix(ids=None)¶ Calculates the mitochondrial relationship matrix. M_ij = 1 if matriline(i) == matriline(j)
Parameters: ids – IDs of pedigree members to include in the matrix Important: if not given, the rows/columns are all individuals in the pedigree, sorted by id. If you’re not sure about this, try sorted(x.label for x in ped) to see the ordering.
Returns: A numpy matrix
Reference: Liu et al. “Association Testing of the Mitochondrial Genome Using Pedigree Data”. Genetic Epidemiology. (2013). 37,3:239-247
-
simulate_ibd_states(inds=None)¶ Simulate IBD patterns by gene dropping: Everyone’s genotypes reflect the founder chromosome that they received the genotype from. You can then use the ibs function to determine IBD state. This effectively an infinite-alleles simulation.
Returns: Nothing
-
pydigree.pedigreecollection module¶
-
class
pydigree.pedigreecollection.PedigreeCollection(peds=None)¶ Bases:
pydigree.individualcontainer.IndividualContainer-
add_chromosome(chrom)¶
-
add_pedigree(ped)¶
-
additive_relationship_matrix(ids=None)¶ Returns a block diagonal matrix of additive relationships for each pedigree.
See notes on Pedigree.additive_relationship_matrix
-
chromosomes¶
-
dominance_relationship_matrix(ids=None)¶ Returns a block diagonal matrix of dominance relationships for each pedigree.
See notes on Pedigree.dominance_relationship_matrix
-
individuals¶ Returns a list of the individuals represented by all pedigrees, sorted by pedigree label, id label
-
keys()¶
-
mitochondrial_relationship_matrix(ids=None)¶ Returns a block diagonal matrix of mitochondrial relationships for each pedigree.
See notes on Pedigree.mitochondrial_relationship_matrix
-
pedigrees¶ Returns a list of the pedigree objects contained in the collection
-
update(pop)¶
-
pydigree.phenotypes module¶
A phenotype holder
-
class
pydigree.phenotypes.Phenotypes(data=None)¶ Bases:
objectA container for the set of phenotypes and exposures associated with an Individual in a population
-
clear()¶ Clears all phenotypes from the object
-
delete_phenotype(key)¶ Clears the phenotype from the object
-
get(key, default)¶ Gets a phenotype or default value if not present
-
has_phenotype(key)¶ Does the individual have a certain phenotype?
Parameters: phenotype (string) – what phenotype are we looking for Returns: True if phenotype present and not None Return type: bool
-
items()¶ Iterate over phenotype name, value pairs
-
keys()¶ Iterate over the available phenotype names
-
to_series()¶ Returns phenotypes as a pandas Series
-
update(other)¶ Updates the current Phenotype object with data from the other
-
values()¶ Iterate over the available phenotype values
-
pydigree.population module¶
-
class
pydigree.population.Population(intial_pop_size=0, name=None)¶ Bases:
pydigree.individualcontainer.IndividualContainer-
add_chromosome(chrom)¶ Adds a chromosome to the population
-
add_founders(n)¶ Adds a number of founder individuals to the population
Parameters: n (int) – number of individuals to add Return type: void
-
advance_generation(gensize, mating=None)¶ Simulates a generation of random mating.
Parameters: - gensize (numeric) – The size of the new generation
- mating (MatingScheme) – MatingScheme for the generation
-
chromosome_count()¶ Returns the number of chromosomes in this population
-
founder_individual(register=True, sex=None)¶ Creates a new founder individual and adds to the population
-
get_founder_genotypes()¶ Gives genotypes to each founder in the population with chromosomes from the chromosome pool. If there is no pool, genotypes are generated under linkage equilibrium
-
get_genotypes()¶ Causes each Individual object in the pedigree to request genotypes from its parents
-
get_linkage_equilibrium_genotypes()¶ Returns a set of genotypes for an individual in linkage equilibrium
-
individuals¶ Returns a list of individuals in the population
-
mate(ind1, ind2, indlab, sex=None)¶ Creates an individual as the child of two specificied individual objects and randomly chooses a sex.
Parameters: - ind1 (Individual) – The first parent
- ind2 (Individual) – The second parent
- indlab – ID label for the child
- sex ({0,1}) – Sex of child, randomly chosen if not specified
Returns: An individual with ind1 and ind2 as parents
Return type:
-
register_individual(ind)¶ Adds an individual to the population
-
remove_ancestry()¶ Makes every individual in the population a founder
-
remove_individual(ind)¶ Removes an individual from the population
-
size()¶ Returns the number of individuals in the population.
-
update(other)¶ Merges two datasets (i.e. performs Individual.update for each individual in the pedigree)
Assumes unique individual IDs
Parameters: other (Population) – New data to merge in Returns: void
-
-
pydigree.population.exponential_growth(p, r, t)¶ Models exponential growth over discrete generations.
Parameters: - p (numeric) – initial population
- r (numeric) – growth rate
- t (numeric) – number of generations
Returns: population size at time t
Return type: numeric
-
pydigree.population.is_missing_genotype(g)¶
-
pydigree.population.logistic_growth(p, r, k, t)¶ Models logistic growth over discrete generations.
Parameters: - p – initial population
- r – growth rate
- k – final population
- t – number of generations
Returns: population size at time t
Return type: numeric
pydigree.rand module¶
Convenience functions for randomness
-
pydigree.rand.choice(seq)¶ Randomly chooses an item from a sequence. Probabilities are uniform.
Parameters: seq – choices Returns: randomly chosen item
-
pydigree.rand.sample_with_replacement(seq, n)¶ Choose a sample of n items with replacement from a sequence c
Parameters: - seq – sequence to choose from
- n – sample size
Returns: randomly chosen items
Return type: list
-
pydigree.rand.set_seed(seed)¶ Set the random seed.
Parameters: seed – random seed Return type: void
pydigree.recombination module¶
Functions for recombining haploid chromosomes
-
pydigree.recombination.recombine(chr1, chr2, genetic_map)¶ Takes two chromatids and returns a simulated one by an exponential process
Parameters: - chr1 (AlleleContainer) – first chrom
- chr2 (AlleleContainer) – second chrom
- genetic_map (sequence of floats) – map positions (in centiMorgans)
Returns: Recombined chromosome
Return type: