Classes for working with expression data (genometools.expression
)¶
ExpGene |
A gene in a gene expression analysis. |
ExpGenome |
A complete set of genes in a gene expression analysis. |
ExpMatrix |
A gene expression matrix. |
ExpProfile |
A gene expression profile. |
-
class
genometools.expression.
ExpGene
(name, chromosome=None, position=None, length=None, ensembl_id=None, source=None, type_=None)[source]¶ A gene in a gene expression analysis.
Instances are to be treated as immutable, to allow use of ExpGene objects to be used in sets etc.
Parameters: - name (str) – See
name
attribute. - chromosome (str or None, optional) – See
chromosome
attribute. [None] - position (int or None, optional) – See
position
attribute. [None] - length (int or None, optional) – See
length
attribute. [None] - ensembl_id (str or None, optional) – See
ensembl_id
attribute. [None]
-
name
¶ str – The gene name (use the official gene symbol, if available).
-
chromosome
¶ str or None – The chromosome that the gene is located on.
-
position
¶ int or None – The chromosomal location (base-pair index) of the gene. The sign of the this attribute indicates whether the gene is on the plus or minus strand. Base pair indices are 0-based.
-
ensembl_id
¶ list of str – The Ensembl ID of the gene.
- name (str) – See
-
class
genometools.expression.
ExpGenome
(genes)[source]¶ A complete set of genes in a gene expression analysis.
The class represents a “genome” in the form of an ordered set of genes. This means that each gene has an index value, i.e. an integer indicating its 0-based position in the genome.
Parameters: genes (Iterable of ExpGene
objects) – Seegenes
attribute.Notes
The implementation is very similar to the
genometools.basic.GeneSetCollection
class. It uses ordered dictionaries to support efficient access by gene name or index, as well as looking up the index of specific gene.-
classmethod
from_gene_names
(names)[source]¶ Generate a genome from a list of gene names.
Parameters: names (Iterable of str) – The list of gene names. Returns: The genome. Return type: ExpGenome
-
gene_names
¶ Returns a list of all gene names.
-
gene_set
¶ Returns a set of all genes.
-
genes
Returns a list with all genes.
-
hash
¶ Returns an MD5 hash value for the genome.
-
index
(gene_or_name)[source]¶ Returns the index of a given gene.
The index is 0-based, so the first gene in the genome has the index 0, and the last one has index
len(genome) - 1
.Parameters: gene_or_name (str) – The gene or its name (symbol). Returns: The gene index. Return type: int
-
classmethod
-
class
genometools.expression.
ExpProfile
(*args, **kwargs)[source]¶ A gene expression profile.
This class inherits from
pandas.Series
.Parameters: - x (1-dimensional
numpy.ndarray
) – Seex
attribute. - Parameters (Additional) –
- ----------------------- –
- genes (list or tuple of str) – See
genes
attribute. - name (str) – See
name
attribute. - Parameters –
- --------------------- –
- pandas.Series parameters. (All) –
-
x
¶ 1-dimensional
numpy.ndarray
– The vector with expression values.
-
genes
¶ pandas.Index
– Alias forpandas.Series.index
. Contains the names of the genes in the matrix.
-
label
¶ str – Alias for
pandas.Series.name
. The sample label.
-
filter_against_genome
(genome)[source]¶ Filter the expression matrix against a _genome (set of genes).
Parameters: genome ( genometools.expression.ExpGenome
) – The genome to filter the genes against.Returns: The filtered expression matrix. Return type: ExpMatrix
-
genes
Alias for
Series.index
.
-
label
Alias for
Series.name
.
-
p
¶ The number of genes.
-
classmethod
read_tsv
(path, genome=None, encoding=u'UTF-8')[source]¶ Read expression profile from a tab-delimited text file.
Parameters: Returns: The expression profile.
Return type:
-
sort_genes
(inplace=False)[source]¶ Sort the rows of the profile alphabetically by gene name.
Parameters: inplace (bool, optional) – If set to True, perform the sorting in-place. Returns: Return type: None Notes
pandas 0.18.0’s
Series.sort_index
method does not support thekind
keyword, which is needed to select a stable sort algorithm.
-
write_tsv
(path, encoding=u'UTF-8')[source]¶ Write expression matrix to a tab-delimited text file.
Parameters: Returns: Return type:
-
x
Alias for
Series.values
.
- x (1-dimensional
-
class
genometools.expression.
ExpMatrix
(*args, **kwargs)[source]¶ A gene expression matrix.
This class inherits from
pandas.DataFrame
.Parameters: -
genes
¶ tuple of str – The names of the genes (rows) in the matrix.
-
samples
¶ tuple of str – The names of the samples (columns) in the matrix.
-
X
¶ 2-dimensional
numpy.ndarray
– The matrix of expression values.
-
X
Alias for
DataFrame.values
.
-
filter_against_genome
(genome, inplace=False)[source]¶ Filter the expression matrix against a _genome (set of genes).
Parameters: - genome (
genometools.expression.ExpGenome
) – The genome to filter the genes against. - inplace (bool, optional) – Whether to perform the operation in-place.
Returns: The filtered expression matrix.
Return type: - genome (
-
genes
Alias for
DataFrame.index
.
-
get_figure
(heatmap_kw=None, **kwargs)[source]¶ Generate a plotly figure showing the matrix as a heatmap.
This is a shortcut for
ExpMatrix.get_heatmap(...).get_figure(...)
.See
ExpHeatmap.get_figure()
for keyword arguments.Parameters: heatmap_kw (dict or None) – If not None, dictionary containing keyword arguments to be passed to the ExpHeatmap
constructor.Returns: The plotly figure. Return type: plotly.graph_objs.Figure
-
get_heatmap
(highlight_genes=None, highlight_samples=None, highlight_color=None, **kwargs)[source]¶ Generate a heatmap (
ExpHeatmap
) of the matrix.See
ExpHeatmap
constructor for keyword arguments.Parameters: - highlight_genes (list of str) – List of genes to highlight
- highlight_color (str) – Color to use for highlighting
Returns: The heatmap.
Return type:
-
n
¶ The number of samples.
-
p
¶ The number of genes.
-
classmethod
read_tsv
(path, genome=None, encoding=u'UTF-8')[source]¶ Read expression matrix from a tab-delimited text file.
Parameters: Returns: The expression matrix.
Return type:
-
sample_correlations
¶ Returns an
ExpMatrix
containing all pairwise sample correlations.Returns: The sample correlation matrix. Return type: ExpMatrix
-
samples
Alias for
DataFrame.columns
.
-
sort_genes
(stable=True, inplace=False, ascending=True)[source]¶ Sort the rows of the matrix alphabetically by gene name.
Parameters: Returns: The sorted matrix.
Return type:
-
-
genometools.expression.
quantile_normalize
(matrix, inplace=False, target=None)[source]¶ Quantile normalization, allowing for missing values (NaN).
In case of nan values, this implementation will calculate evenly distributed quantiles and fill in the missing data with those values. Quantile normalization is then performed on the filled-in matrix, and the nan values are restored afterwards.
Parameters: - matrix (
ExpMatrix
) – The expression matrix (rows = genes, columns = samples). - inplace (bool) – Whether or not to perform the operation in-place. [False]
- target (
numpy.ndarray
) – Target distribution to use. needs to be a vector whose first dimension matches that of the expression matrix. IfNone
, the target distribution is calculated based on the matrix itself. [None]
Returns: The normalized matrix.
Return type: numpy.ndarray (ndim = 2)
- matrix (
-
genometools.expression.
filter_variance
(matrix, top)[source]¶ Filter genes in an expression matrix by variance.
Parameters: Returns: The filtered expression matrix.
Return type:
-
class
genometools.expression.
ExpMatrix
(*args, **kwargs)[source] A gene expression matrix.
This class inherits from
pandas.DataFrame
.Parameters: -
genes
tuple of str – The names of the genes (rows) in the matrix.
-
samples
tuple of str – The names of the samples (columns) in the matrix.
-
X
2-dimensional
numpy.ndarray
– The matrix of expression values.
-