Basic classes for working with genomic data (genometools.basic)¶
GeneSet |
A gene set. |
GeneSetCollection |
A collection of gene sets. |
-
class
genometools.basic.GeneSet(id, name, genes, source=None, collection=None, description=None)[source]¶ A gene set.
A gene set is just what the name implies: A set of genes. Usually, gene sets are used to group genes that share a certain property (e.g., genes that perform related functions, or genes that are frequently co-expressed). The genes in the gene set are not ordered.
GeneSet instances are hashable and should therefore be considered to be immutable.
Parameters: -
id_¶ str – The (unique) ID of the gene set.
-
name¶ str – The name of the gene set.
-
genes¶ set of str – The list of genes in the gene set.
-
source¶ None or str – The source / origin of the gene set (e.g., “MSigDB”)
-
collection¶ None or str – The collection that the gene set belongs to (e.g., “c4” for gene sets from MSigDB).
-
description¶ None or str – The description of the gene set.
-
classmethod
from_list(l)[source]¶ Generate an GeneSet object from a list of strings.
Note: See also
to_list().Parameters: l (list or tuple of str) – A list of strings representing gene set ID, name, genes, source, collection, and description. The genes must be comma-separated. See also to_list().Returns: The gene set. Return type: genometools.basic.GeneSet
-
hash¶ MD5 hash value for the gene set.
-
size¶ The size of the gene set (i.e., the number of genes in it).
-
to_list()[source]¶ Converts the GeneSet object to a flat list of strings.
Note: see also
from_list().Returns: The data from the GeneSet object as a flat list. Return type: list of str
-
-
class
genometools.basic.GeneSetCollection(gene_sets)[source]¶ A collection of gene sets.
This is a class that basically just contains a list of gene sets, and supports different ways of accessing individual gene sets. The gene sets are ordered, so each gene set has a unique position (index) in the database.
Parameters: gene_sets (list or tuple of GeneSet) – Seegene_setsattribute.-
gene_sets¶ tuple of
GeneSet– The list of gene sets in the database. Note that this is a read-only property.
-
get_by_id(id_)[source]¶ Look up a gene set by its ID.
Parameters: id (str) – The ID of the gene set. Returns: The gene set. Return type: GeneSet Raises: ValueError– If the given ID is not in the database.
-
get_by_index(i)[source]¶ Look up a gene set by its index.
Parameters: i (int) – The index of the gene set. Returns: The gene set. Return type: GeneSet Raises: ValueError– If the given index is out of bounds.
-
index(id_)[source]¶ Get the index corresponding to a gene set, identified by its ID.
Parameters: id (str) – The ID of the gene set. Returns: The index of the gene set. Return type: int Raises: ValueError– If the given ID is not in the database.
-
n¶ The number of gene sets in the database.
-
classmethod
read_msigdb_xml(path, entrez2gene, species=None)[source]¶ Read the complete MSigDB database from an XML file.
The XML file can be downloaded from here: http://software.broadinstitute.org/gsea/msigdb/download_file.jsp?filePath=/resources/msigdb/5.0/msigdb_v5.0.xml
Parameters: Returns: The gene set database containing the MSigDB gene sets.
Return type:
-