pydigree.genotypes package

Submodules

pydigree.genotypes.alleles module

class pydigree.genotypes.alleles.Alleles

Bases: numpy.ndarray, pydigree.genotypes.genoabc.AlleleContainer

A class for holding genotypes

copy_span(template, copy_start, copy_stop)

Copies a span of another AlleleContainer to this one

Parameters:
  • template (AlleleContainer) – Container to copy from
  • copy_start (int) – start point for copy (inclusive)
  • copy_stop (int) – end_point for copy (exclusive)
Return type:

void

empty_like()

Returns an empty Alleles object like this one

missing

Returns a numpy array indicating which markers have missing data

Returns:missingness array
Return type:np.array
missingcode
nmark()

Return the number of markers represented by the Alleles object

Returns:number of markers
Return type:int

pydigree.genotypes.chromosometemplate module

Classes for storing population level genotype information

class pydigree.genotypes.chromosometemplate.ChromosomeSet

Bases: object

An object representing the full complement of variants in a population

add_chromosome(template)

Add a chromosome to the set.

Parameters:template (ChromosomeTemplate) – the chromosome to add
finalize()

Finalize each template in the set

frequency(chrom, variant)

Get the frequency of a variant

Parameters:
  • chrom (int) – index of chromosome
  • variant (int) – index of marker on the chromosome
marker_label(chrom, variant)

Get the label of a variant

Parameters:
  • chrom (int) – index of chromosome
  • variant (int) – index of marker on the chromosome
nchrom()

Returns the number of chromosomes in the set

nloci()

Returns the total number of variants in the set

physical_map(chrom, variant)

Get the physical position of a variant

Parameters:
  • chrom (int) – index of chromosome
  • variant (int) – index of marker on the chromosome
select_random_loci(nloc)

Chooses nloc random sites throughout the set of chromosomes

Parameters:nloc (int) – number of loci to select
Return type:generator of locations
class pydigree.genotypes.chromosometemplate.ChromosomeTemplate(label=None)

Bases: object

Chromsome is a class that keeps track of marker frequencies and distances. Not an actual chromosome with genotypes, which you would find under Individual.

Markers are currently diallelic and frequencies are given for minor alleles. Marker frequencies must sum to 1. Major allele frequency is then f = 1 - f_minor.

linkageequilibrium_chromosome generates chromsomes that are generated from simulating all markers with complete independence (linkage equilibrium). This is not typically what you want: you won’t find any LD for association etc. linkageequilibrium_chromosome is used for ‘seed’ chromosomes when initializing a population pool or when simulating purely family-based studies for linkage analysis.

add_genotype(frequency=None, map_position=None, label=None, bp=None, reference=None, alternates=None)

Adds a variant to the chromosome

closest_marker(position, map_type='physical')

Returns the index of the closest marker to a position

Parameters:
  • position – desired location
  • map_type ('physical' or 'genetic') – distance metric to use
Returns:

closest index

Return type:

int

empty_chromosome(dtype=<class 'numpy.uint8'>, sparse=False, refcode=None)

Produces a completely empty chromosome associated with this template.

Parameters:
  • sparse (bool) – Should a SparseAlleles object be returned
  • refcode (int8_t) – if sparse, what should the refcode be?
Returns:

empty alleles container

finalize()

When no more major modifications (i.e. adding or removing sites), ChromosomeTemplate can be reorganized into something more efficient

numpy arrays instead of lists, for example

Return type:void
static from_genomesimla(filename)

Reads positions and frequencies from a genomeSIMLA template file

Parameters:filename (string) – path to the template
Returns:Template containing the data from the file
Return type:ChromosomeTemplate
iterinfo()

Iterator over genotype labels, cM position, bp position, MAF

linkageequilibrium_chromosome(sparse=False)

Returns a randomly generated chromosome in linage equilibrium

Parameters:sparse (bool) – Should the output be sparse
Returns:random chromosome
Return type:Alleles or SparseAlleles
linkageequilibrium_chromosomes(nchrom)

Returns a numpy array of many randomly generated chromosomes

nmark()

Returns the number of markers on the chromosome

Returns:marker count
Return type:int
outputlabel

The label outputted when written to disk

set_frequency(position, frequency)

Manually change an allele’s frequency

Parameters:
  • position (int) – Index to change
  • frequency (float) – new minor allele frequency
size()

Returns the size of the chromosome in centimorgans

Return type:float

pydigree.genotypes.genoabc module

class pydigree.genotypes.genoabc.AlleleContainer

Bases: object

A base class for the interface allele containers object must implement

copy_span(template, start, stop)
dtype()
empty_like()

pydigree.genotypes.labelledalleles module

class pydigree.genotypes.labelledalleles.AncestralAllele(anc, hap)

Bases: object

ancestor
haplotype
class pydigree.genotypes.labelledalleles.InheritanceSpan(ancestor, chromosomeidx, haplotype, start, stop)

Bases: object

ancestor
ancestral_allele
ancestral_chromosome
chromosomeidx
contains(index)

Returns true if the index specified falls within this span

haplotype
interval
start
stop
to_tuple()
class pydigree.genotypes.labelledalleles.LabelledAlleles(spans=None, chromobj=None, nmark=None)

Bases: pydigree.genotypes.genoabc.AlleleContainer

add_span(new_span)
copy_span(template, copy_start, copy_stop)
delabel()
dtype
empty_like()
static founder_chromosome(ind, chromidx, hap, chromobj=None, nmark=None)

pydigree.genotypes.sparsealleles module

class pydigree.genotypes.sparsealleles.SparseAlleles(data=None, refcode=0, size=None, template=None)

Bases: pydigree.genotypes.genoabc.AlleleContainer

An object representing a set of haploid genotypes efficiently by storing allele differences from a reference. Useful for manipulating genotypes from sequence data (e.g. VCF files)

In the interest of conserving memory for sequencing data, all alleles must be represented by a signed 8-bit integer (i.e. between -128 and 127). Negative values are interpreted as missing.

copy()

Creates a copy of the current data

Returns:cloned allele set
Return type:SparseAlleles
copy_span(template, copy_start, copy_stop)

Copies one segment of a chromosome over to the other

Parameters:
  • template (AlleleContainer) – the data to be copied from
  • copy_start (int) – where to start copying (inclusive)
  • copy_stop (int) – where to stop copying (exclusive)
Rtype void:
dtype
static empty(template)

Creates an empty SparseAlleles (everybody is wild-type)

Parameters:template (ChromosomeTemplate) – The chromosome info associated with this set of alleles
Returns:Empty container
Return type:SparseAlleles
empty_like()

Creates a blank SparseAlleles with same parameters

Returns:empty SparseAlleles
keys()
missing

Returns a numpy array indicating which markers have missing data

missingcode

Returns the code used for missing values

nmark()

Return the number of markers (both reference and non-reference) represented by the SparseAlleles object

Returns:markercount
Return type:int
refcode

Returns the sparse value in the container

Return type:int8_t
todense()

Converts to a dense representation of the same genotypes (Alleles).

Returns:dense version
Return type:Alleles
values()

Module contents