Basic classes for working with genomic data (genometools.basic)

GeneSet A gene set.
GeneSetCollection A collection of gene sets.
class genometools.basic.GeneSet(id, name, genes, source=None, collection=None, description=None)[source]

A gene set.

A gene set is just what the name implies: A set of genes. Usually, gene sets are used to group genes that share a certain property (e.g., genes that perform related functions, or genes that are frequently co-expressed). The genes in the gene set are not ordered.

GeneSet instances are hashable and should therefore be considered to be immutable.

Parameters:
  • id (str) – See id attribute.
  • name (str) – See name attribute.
  • genes (set, list or tuple of str) – See genes attribute.
  • source (str, optional) – See source attribute. (None)
  • collection (str, optional) – See collection attribute. (None)
  • description (str, optional) – See description attribute. (None)
id_

str – The (unique) ID of the gene set.

name

str – The name of the gene set.

genes

set of str – The list of genes in the gene set.

source

None or str – The source / origin of the gene set (e.g., “MSigDB”)

collection

None or str – The collection that the gene set belongs to (e.g., “c4” for gene sets from MSigDB).

description

None or str – The description of the gene set.

classmethod from_list(l)[source]

Generate an GeneSet object from a list of strings.

Note: See also to_list().

Parameters:l (list or tuple of str) – A list of strings representing gene set ID, name, genes, source, collection, and description. The genes must be comma-separated. See also to_list().
Returns:The gene set.
Return type:genometools.basic.GeneSet
hash

MD5 hash value for the gene set.

size

The size of the gene set (i.e., the number of genes in it).

to_list()[source]

Converts the GeneSet object to a flat list of strings.

Note: see also from_list().

Returns:The data from the GeneSet object as a flat list.
Return type:list of str
class genometools.basic.GeneSetCollection(gene_sets)[source]

A collection of gene sets.

This is a class that basically just contains a list of gene sets, and supports different ways of accessing individual gene sets. The gene sets are ordered, so each gene set has a unique position (index) in the database.

Parameters:gene_sets (list or tuple of GeneSet) – See gene_sets attribute.
gene_sets

tuple of GeneSet – The list of gene sets in the database. Note that this is a read-only property.

get_by_id(id_)[source]

Look up a gene set by its ID.

Parameters:id (str) – The ID of the gene set.
Returns:The gene set.
Return type:GeneSet
Raises:ValueError – If the given ID is not in the database.
get_by_index(i)[source]

Look up a gene set by its index.

Parameters:i (int) – The index of the gene set.
Returns:The gene set.
Return type:GeneSet
Raises:ValueError – If the given index is out of bounds.
index(id_)[source]

Get the index corresponding to a gene set, identified by its ID.

Parameters:id (str) – The ID of the gene set.
Returns:The index of the gene set.
Return type:int
Raises:ValueError – If the given ID is not in the database.
n

The number of gene sets in the database.

classmethod read_msigdb_xml(path, entrez2gene, species=None)[source]

Read the complete MSigDB database from an XML file.

The XML file can be downloaded from here: http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/msigdb_v5.0.xml

Parameters:
  • path (str) – The path name of the XML file.
  • entrez2gene (dict or OrderedDict (str: str)) – A dictionary mapping Entrez Gene IDs to gene symbols (names).
  • species (str, optional) – A species name (e.g., “Homo_sapiens”). Only gene sets for that species will be retained. (None)
Returns:

The gene set database containing the MSigDB gene sets.

Return type:

GeneSetCollection

classmethod read_tsv(path, encoding=u'utf-8')[source]

Read a gene set database from a tab-delimited text file.

Parameters:
  • path (str) – The path name of the the file.
  • encoding (str) – The encoding of the text file.
Returns:

Return type:

None

write_tsv(path)[source]

Write the database to a tab-delimited text file.

Parameters:path (str) – The path name of the file.
Returns:
Return type:None