pydigree.cydigree package

Submodules

pydigree.cydigree.cyfuncs module

class pydigree.cydigree.cyfuncs.Segment

Bases: object

marker_labels
nmark
physical_position
physical_start
physical_stop
start
stop
pydigree.cydigree.cyfuncs.all_same_type()

Quickly checks if all items in iterable are the same type

Parameters:
  • iter – sequence to be checked
  • t – type desired
Returns:

items are all same type

Return type:

bool

pydigree.cydigree.cyfuncs.fastfirstitem()

Rapidly gets the first item from each element in an interable of iterables

pydigree.cydigree.cyfuncs.ibs()

Returns how many alleles (0, 1, or 2) are identical-by-state between two diploid genotypes

Parameters:
  • g1 (tuple) – genotype 1
  • g2 (tuple) – genotype 2
  • missingval – value if either g1 or g2 is missing
pydigree.cydigree.cyfuncs.interleave()

Takes two lists and interleaves them. For example interleave(“AAA”, “BBB”) gives [“A”, “B”, “A”, “B”, “A”, “B”]

Returns:interleaved iterables
Return type:list
pydigree.cydigree.cyfuncs.is_sorted()

Check if the sequence is sorted

Returns:sorted?
Return type:bool
pydigree.cydigree.cyfuncs.runs()

Identifies runs of values in a sequence for which predicate(value) evaluates True and yields 2-tuples of the start and end (inclusive) indices

Parameters:
  • sequence (iterable) – Sequence to run through
  • predicate (callable) – function to call
  • minlength (int) – shortest allowable run
Returns:

Runs

Return type:

list of tuples

pydigree.cydigree.cyfuncs.runs_gte()

Identifies runs of values in an iterable where each value is greater than or equal to a value minval, and returns a list of 2-tuples with the start and end (inclusive) indices of the runs

Parameters:
  • sequence
  • minval – minimum value to occur in run
  • minlength – minimum allowable runlength
Returns:

runs

Return type:

list of tuples

pydigree.cydigree.cyfuncs.runs_gte_uint8()
pydigree.cydigree.cyfuncs.set_intervals_to_value()

Creates a numpy integer array and sets intervals to a single value

Parameters:
  • intervals (iterable of 2-tuples) – Intervals tuples in format (start_idx, stop_idx_inclusive)
  • size (unsigned int) – outgoing array size
  • value (np.int) – value to set itervals to

pydigree.cydigree.datastructures module

class pydigree.cydigree.datastructures.IntTree

Bases: object

clear()

Removes all nodes from tree

delete()
delrange()

Deletes keys where start <= key < stop

empty()
find()
static from_keys()
static from_pairs()
get()
getrange()
insert()
intersection()
keys()
size()
to_stack()
traverse()
union()
values()
verify()
class pydigree.cydigree.datastructures.NodeStack

Bases: object

class pydigree.cydigree.datastructures.SparseArray

Bases: object

all()
any()
container
copy()
static from_dense()
static from_items()
static from_numpy()
indices
items()
keys()
logical_not()
refcode
set_item()
size
sparsity()

Returns the proportion of sparse sites in the array

tolist()
values()
pydigree.cydigree.datastructures.print_sizes()

pydigree.cydigree.sparsearray module

class pydigree.cydigree.sparsearray.SparseArray

Bases: object

A data structure for working with sparse sets of small ints. Can support an array of size \(2^{32}-1\).

Dense values are stored in a self balancing tree, so lookups, setting a dense value, or changing a dense value to sparse will have slower algorithmic performance (O(log n) instead of O(1)). The bookeeping of the tree will also incur some penalties in memory use. For each non-sparse value, a uint32_t is used for the key (4 bytes), int8_t (1 byte) for the value.

Variables:
  • size – the size of the array
  • ref – the sparse value
  • data – the non-sparse positions values
all()
Returns:are all values are nondense?
Rtype bool:
any()

Are there any non-sparse values?

clear()

Removes a non-sparse value

Parameters:k (uint32_t) – the key to remove
clear_range()

Removes all non-sparse values in a region

Parameters:
  • start (uint32_t) – start of the location (inclusive)
  • stop (uint32_t) – the end of the region (exclusive)
cmp_single()
copy()

Creates a copy of the array.

Returns:the copy
Return type:SparseArray
data
dense_cmp()
density()

Proportion of non-sparse sites

Returns:Percent non-sparse
Return type:float
eq_single()
static from_dense()

Creates a SparseArray from a dense sequence

Returns:resulting array
Return type:SparseArray
static from_items()

Creates a SparseArray from pairs of itemss

Parameters:
  • seq – A sequence of pairs of type (uint32_t, int8_t)
  • size (uint32_t) – the size of the array
  • refcode (int8_t) – the sparse value of the array
Returns:

the resulting array

Return type:

SparseArray

get_item()
get_slice()
items()

Gets the non-sparse indices and their values

Returns:non-sparse locations and values
Return type:list of (uint32_t, int8_t) tuples
keys()

Gets the non-sparse locations

Returns:locations of the non-sparse values
Return type:list
logical_not()

Performs a logical not on the entire array

Returns:the not-ed array
ndense()

The number of non-sparse sites in the array

Returns:number of non-sparse items
Return type:int
ref
set_item()
size
sparse_cmp()
sparse_eq()
sparsity()

Proportion of array that is sparse

Returns:Percent sparse
Return type:float
tolist()

Returns the SparseArray in a dense format :rtype: list in python, C++ std::vector<uint8_t>

values()

Gets the non-sparse values

Returns:non-sparse values, in order
Return type:list

pydigree.cydigree.varianttree module

pydigree.cydigree.vcfparse module

class pydigree.cydigree.vcfparse.VariantStack

Bases: object

tolist()
pydigree.cydigree.vcfparse.assign_genorow()
pydigree.cydigree.vcfparse.vcf_allele_parser()

Module contents