Classes for working with expression data (genometools.expression)

ExpGene A gene in a gene expression analysis.
ExpGenome A complete set of genes in a gene expression analysis.
ExpMatrix A gene expression matrix.
ExpProfile A gene expression profile.
class genometools.expression.ExpGene(name, chromosome=None, position=None, length=None, ensembl_id=None, source=None, type_=None)[source]

A gene in a gene expression analysis.

Instances are to be treated as immutable, to allow use of ExpGene objects to be used in sets etc.

Parameters:
name

str – The gene name (use the official gene symbol, if available).

chromosome

str or None – The chromosome that the gene is located on.

position

int or None – The chromosomal location (base-pair index) of the gene. The sign of the this attribute indicates whether the gene is on the plus or minus strand. Base pair indices are 0-based.

ensembl_id

list of str – The Ensembl ID of the gene.

classmethod from_dict(data)[source]

Generate an ExpGene gene object from a dictionary.

Parameters:data (dict) – A dictionary with keys corresponding to attribute names. Attributes with missing keys will be assigned None. See also to_list().
Returns:The gene.
Return type:ExpGene
class genometools.expression.ExpGenome(genes)[source]

A complete set of genes in a gene expression analysis.

The class represents a “genome” in the form of an ordered set of genes. This means that each gene has an index value, i.e. an integer indicating its 0-based position in the genome.

Parameters:genes (Iterable of ExpGene objects) – See genes attribute.
genes

list of ExpGene – The genes in the genome.

Notes

The implementation is very similar to the genometools.basic.GeneSetCollection class. It uses ordered dictionaries to support efficient access by gene name or index, as well as looking up the index of specific gene.

classmethod from_gene_names(names)[source]

Generate a genome from a list of gene names.

Parameters:names (Iterable of str) – The list of gene names.
Returns:The genome.
Return type:ExpGenome
gene_names

Returns a list of all gene names.

gene_set

Returns a set of all genes.

genes

Returns a list with all genes.

hash

Returns an MD5 hash value for the genome.

index(gene_or_name)[source]

Returns the index of a given gene.

The index is 0-based, so the first gene in the genome has the index 0, and the last one has index len(genome) - 1.

Parameters:gene_or_name (str) – The gene or its name (symbol).
Returns:The gene index.
Return type:int
classmethod read_tsv(path, encoding=u'UTF-8')[source]

Read genes from tab-delimited text file.

Parameters:
  • path (str) – The path of the text file.
  • encoding (str, optional) – The file encoding. (‘UTF-8’)
Returns:

Return type:

None

write_tsv(path, encoding=u'UTF-8', overwrite=False)[source]

Write genes to tab-delimited text file in alphabetical order.

Parameters:
  • path (str) – The path of the output file.
  • encoding (str, optional) – The file encoding. [“UTF-8”]
Returns:

Return type:

None

class genometools.expression.ExpProfile(*args, **kwargs)[source]

A gene expression profile.

This class inherits from pandas.Series.

Parameters:
  • x (1-dimensional numpy.ndarray) – See x attribute.
  • Parameters (Additional) –
  • -----------------------
  • genes (list or tuple of str) – See genes attribute.
  • name (str) – See name attribute.
  • Parameters
  • ---------------------
  • pandas.Series parameters. (All) –
x

1-dimensional numpy.ndarray – The vector with expression values.

genes

pandas.Index – Alias for pandas.Series.index. Contains the names of the genes in the matrix.

label

str – Alias for pandas.Series.name. The sample label.

filter_against_genome(genome)[source]

Filter the expression matrix against a _genome (set of genes).

Parameters:genome (genometools.expression.ExpGenome) – The genome to filter the genes against.
Returns:The filtered expression matrix.
Return type:ExpMatrix
genes

Alias for Series.index.

genome

Get an ExpGenome representation of the genes in the profile.

label

Alias for Series.name.

p

The number of genes.

classmethod read_tsv(path, genome=None, encoding=u'UTF-8')[source]

Read expression profile from a tab-delimited text file.

Parameters:
  • path (str) – The path of the text file.
  • genome (ExpGenome object, optional) – The set of valid genes. If given, the genes in the text file will be filtered against this set of genes. (None)
  • encoding (str, optional) – The file encoding. (“UTF-8”)
Returns:

The expression profile.

Return type:

ExpProfile

sort_genes(inplace=False)[source]

Sort the rows of the profile alphabetically by gene name.

Parameters:inplace (bool, optional) – If set to True, perform the sorting in-place.
Returns:
Return type:None

Notes

pandas 0.18.0’s Series.sort_index method does not support the kind keyword, which is needed to select a stable sort algorithm.

write_tsv(path, encoding=u'UTF-8')[source]

Write expression matrix to a tab-delimited text file.

Parameters:
  • path (str) – The path of the output file.
  • encoding (str, optional) – The file encoding. (“UTF-8”)
Returns:

Return type:

None

x

Alias for Series.values.

class genometools.expression.ExpMatrix(*args, **kwargs)[source]

A gene expression matrix.

This class inherits from pandas.DataFrame.

Parameters:
  • X (2-dimensional numpy.ndarray) – See X attribute.
  • Parameters (Additional) –
  • -----------------------
  • genes (list or tuple of str) – See genes attribute.
  • samples (list or tuple of str) – See samples attribute.
  • Parameters
  • ---------------------
  • pandas.DataFrame parameters. (All) –
genes

tuple of str – The names of the genes (rows) in the matrix.

samples

tuple of str – The names of the samples (columns) in the matrix.

X

2-dimensional numpy.ndarray – The matrix of expression values.

X

Alias for DataFrame.values.

center_genes(use_median=False, inplace=False)[source]

Center the expression of each gene (row).

filter_against_genome(genome, inplace=False)[source]

Filter the expression matrix against a _genome (set of genes).

Parameters:
Returns:

The filtered expression matrix.

Return type:

ExpMatrix

genes

Alias for DataFrame.index.

genome

Get an ExpGenome representation of the genes in the matrix.

get_figure(heatmap_kw=None, **kwargs)[source]

Generate a plotly figure showing the matrix as a heatmap.

This is a shortcut for ExpMatrix.get_heatmap(...).get_figure(...).

See ExpHeatmap.get_figure() for keyword arguments.

Parameters:heatmap_kw (dict or None) – If not None, dictionary containing keyword arguments to be passed to the ExpHeatmap constructor.
Returns:The plotly figure.
Return type:plotly.graph_objs.Figure
get_heatmap(highlight_genes=None, highlight_samples=None, highlight_color=None, **kwargs)[source]

Generate a heatmap (ExpHeatmap) of the matrix.

See ExpHeatmap constructor for keyword arguments.

Parameters:
  • highlight_genes (list of str) – List of genes to highlight
  • highlight_color (str) – Color to use for highlighting
Returns:

The heatmap.

Return type:

ExpHeatmap

n

The number of samples.

p

The number of genes.

classmethod read_tsv(path, genome=None, encoding=u'UTF-8')[source]

Read expression matrix from a tab-delimited text file.

Parameters:
  • path (str) – The path of the text file.
  • genome (ExpGenome object, optional) – The set of valid genes. If given, the genes in the text file will be filtered against this set of genes. (None)
  • encoding (str, optional) – The file encoding. (“UTF-8”)
Returns:

The expression matrix.

Return type:

ExpMatrix

sample_correlations

Returns an ExpMatrix containing all pairwise sample correlations.

Returns:The sample correlation matrix.
Return type:ExpMatrix
samples

Alias for DataFrame.columns.

sort_genes(stable=True, inplace=False, ascending=True)[source]

Sort the rows of the matrix alphabetically by gene name.

Parameters:
  • stable (bool, optional) – Whether to use a stable sorting algorithm. [True]
  • inplace (bool, optional) – Whether to perform the operation in place.[False]
  • ascending (bool, optional) – Whether to sort in ascending order [True]
Returns:

The sorted matrix.

Return type:

ExpMatrix

sort_samples(stable=True, inplace=False, ascending=True)[source]

Sort the columns of the matrix alphabetically by sample name.

Parameters:
  • stable (bool, optional) – Whether to use a stable sorting algorithm. [True]
  • inplace (bool, optional) – Whether to perform the operation in place.[False]
  • ascending (bool, optional) – Whether to sort in ascending order [True]
Returns:

The sorted matrix.

Return type:

ExpMatrix

standardize_genes(inplace=False)[source]

Standardize the expression of each gene (row).

write_tsv(path, encoding=u'UTF-8')[source]

Write expression matrix to a tab-delimited text file.

Parameters:
  • path (str) – The path of the output file.
  • encoding (str, optional) – The file encoding. (“UTF-8”)
Returns:

Return type:

None

genometools.expression.quantile_normalize(matrix, inplace=False, target=None)[source]

Quantile normalization, allowing for missing values (NaN).

In case of nan values, this implementation will calculate evenly distributed quantiles and fill in the missing data with those values. Quantile normalization is then performed on the filled-in matrix, and the nan values are restored afterwards.

Parameters:
  • matrix (ExpMatrix) – The expression matrix (rows = genes, columns = samples).
  • inplace (bool) – Whether or not to perform the operation in-place. [False]
  • target (numpy.ndarray) – Target distribution to use. needs to be a vector whose first dimension matches that of the expression matrix. If None, the target distribution is calculated based on the matrix itself. [None]
Returns:

The normalized matrix.

Return type:

numpy.ndarray (ndim = 2)

genometools.expression.filter_variance(matrix, top)[source]

Filter genes in an expression matrix by variance.

Parameters:
  • matrix (ExpMatrix) – The expression matrix.
  • top (int) – The number of genes to retain.
Returns:

The filtered expression matrix.

Return type:

ExpMatrix

class genometools.expression.ExpMatrix(*args, **kwargs)[source]

A gene expression matrix.

This class inherits from pandas.DataFrame.

Parameters:
  • X (2-dimensional numpy.ndarray) – See X attribute.
  • Parameters (Additional) –
  • -----------------------
  • genes (list or tuple of str) – See genes attribute.
  • samples (list or tuple of str) – See samples attribute.
  • Parameters
  • ---------------------
  • pandas.DataFrame parameters. (All) –
genes

tuple of str – The names of the genes (rows) in the matrix.

samples

tuple of str – The names of the samples (columns) in the matrix.

X

2-dimensional numpy.ndarray – The matrix of expression values.