Miscellaneous convenience functions (genometools.misc)

exception genometools.misc.ArithmeticError

Base class for arithmetic errors.

exception genometools.misc.AssertionError

Assertion failed.

exception genometools.misc.AttributeError

Attribute not found.

exception genometools.misc.BaseException

Common base class for all exceptions

exception genometools.misc.BufferError

Buffer error.

exception genometools.misc.BytesWarning

Base class for warnings about bytes and buffer related problems, mostly related to conversion from str or comparing to str.

exception genometools.misc.DeprecationWarning

Base class for warnings about deprecated features.

exception genometools.misc.EOFError

Read beyond end of file.

exception genometools.misc.EnvironmentError

Base class for I/O related errors.

errno

exception errno

filename

exception filename

strerror

exception strerror

exception genometools.misc.Exception

Common base class for all non-exit exceptions.

exception genometools.misc.FloatingPointError

Floating point operation failed.

exception genometools.misc.FutureWarning

Base class for warnings about constructs that will change semantically in the future.

exception genometools.misc.GeneratorExit

Request that a generator exit.

exception genometools.misc.IOError

I/O operation failed.

exception genometools.misc.ImportError

Import can’t find module, or can’t find name in module.

exception genometools.misc.ImportWarning

Base class for warnings about probable mistakes in module imports

exception genometools.misc.IndentationError

Improper indentation.

exception genometools.misc.IndexError

Sequence index out of range.

exception genometools.misc.KeyError

Mapping key not found.

exception genometools.misc.KeyboardInterrupt

Program interrupted by user.

exception genometools.misc.LookupError

Base class for lookup errors.

exception genometools.misc.MemoryError

Out of memory.

exception genometools.misc.NameError

Name not found globally.

exception genometools.misc.NotImplementedError

Method or function hasn’t been implemented yet.

exception genometools.misc.OSError

OS system call failed.

exception genometools.misc.OverflowError

Result too large to be represented.

exception genometools.misc.PendingDeprecationWarning

Base class for warnings about features which will be deprecated in the future.

exception genometools.misc.ReferenceError

Weak ref proxy used after referent went away.

exception genometools.misc.RuntimeError

Unspecified run-time error.

exception genometools.misc.RuntimeWarning

Base class for warnings about dubious runtime behavior.

exception genometools.misc.StandardError

Base class for all standard Python exceptions that do not represent interpreter exiting.

exception genometools.misc.StopIteration

Signal the end from iterator.next().

exception genometools.misc.SyntaxError

Invalid syntax.

filename

exception filename

lineno

exception lineno

msg

exception msg

offset

exception offset

print_file_and_line

exception print_file_and_line

text

exception text

exception genometools.misc.SyntaxWarning

Base class for warnings about dubious syntax.

exception genometools.misc.SystemError

Internal error in the Python interpreter.

Please report this to the Python maintainer, along with the traceback, the Python version, and the hardware/OS platform and version.

exception genometools.misc.SystemExit

Request to exit from the interpreter.

code

exception code

exception genometools.misc.TabError

Improper mixture of spaces and tabs.

exception genometools.misc.TypeError

Inappropriate argument type.

exception genometools.misc.UnboundLocalError

Local name referenced but not bound to a value.

exception genometools.misc.UnicodeDecodeError

Unicode decoding error.

encoding

exception encoding

end

exception end

object

exception object

reason

exception reason

start

exception start

exception genometools.misc.UnicodeEncodeError

Unicode encoding error.

encoding

exception encoding

end

exception end

object

exception object

reason

exception reason

start

exception start

exception genometools.misc.UnicodeError

Unicode related error.

exception genometools.misc.UnicodeTranslateError

Unicode translation error.

encoding

exception encoding

end

exception end

object

exception object

reason

exception reason

start

exception start

exception genometools.misc.UnicodeWarning

Base class for warnings about Unicode related problems, mostly related to conversion problems.

exception genometools.misc.UserWarning

Base class for warnings generated by user code.

exception genometools.misc.ValueError

Inappropriate argument value (of correct type).

exception genometools.misc.Warning

Base class for warning categories.

exception genometools.misc.ZeroDivisionError

Second argument to a division or modulo operation was zero.

genometools.misc.abs(number) → number

Return the absolute value of the argument.

genometools.misc.all(iterable) → bool

Return True if bool(x) is True for all values x in the iterable. If the iterable is empty, return True.

genometools.misc.any(iterable) → bool

Return True if bool(x) is True for any x in the iterable. If the iterable is empty, return False.

genometools.misc.apply(object[, args[, kwargs]]) → value

Call a callable object with positional arguments taken from the tuple args, and keyword arguments taken from the optional dictionary kwargs. Note that classes are callable, as are instances with a __call__() method.

Deprecated since release 2.3. Instead, use the extended call syntax:
function(*args, **keywords).
genometools.misc.argmax(seq)[source]

Obtains the index of the largest element in a list.

Parameters:seq (List) – The list
Returns:The index of the largest element.
Return type:int
genometools.misc.argmin(seq)[source]

Obtains the index of the smallest element in a list.

Parameters:seq (List) – The list.
Returns:The index of the smallest element.
Return type:int
genometools.misc.argsort(seq)[source]

Returns a list of indices that would sort a list.

Parameters:seq (List) – The list.
Returns:The list of indices that would sort the given list seq.
Return type:List[int]

Notes

If the returned list of indices can be a NumPy array, use numpy.lexsort instead. If the given list seq is a NumPy array, use numpy.argsort instead.

genometools.misc.ascii(object) → string

Return the same as repr(). In Python 3.x, the repr() result will contain printable characters unescaped, while the ascii() result will have such characters backslash-escaped.

class genometools.misc.basestring

Type basestring cannot be instantiated; it is the base for str and unicode.

genometools.misc.bin(number) → string

Return the binary representation of an integer or long integer.

genometools.misc.bisect_index(a, x)[source]

Find the leftmost index of an element in a list using binary search.

Parameters:
  • a (list) – A sorted list.
  • x (arbitrary) – The element.
Returns:

The index.

Return type:

int

class genometools.misc.bool(x) → bool

Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.

class genometools.misc.buffer(object[, offset[, size]])

Create a new buffer object which references the given object. The buffer will reference a slice of the target object from the start of the object (or at the specified offset). The slice will extend to the end of the target object (or with the specified size).

class genometools.misc.bytearray(iterable_of_ints) → bytearray.

bytearray(string, encoding[, errors]) -> bytearray. bytearray(bytes_or_bytearray) -> mutable copy of bytes_or_bytearray. bytearray(memory_view) -> bytearray.

Construct a mutable bytearray object from:
  • an iterable yielding integers in range(256)
  • a text string encoded using the specified encoding
  • a bytes or a bytearray object
  • any object implementing the buffer API.

bytearray(int) -> bytearray.

Construct a zero-initialized bytearray of the given length.

append(int) → None

Append a single item to the end of B.

capitalize() → copy of B

Return a copy of B with only its first character capitalized (ASCII) and the rest lower-cased.

center(width[, fillchar]) → copy of B

Return B centered in a string of length width. Padding is done using the specified fill character (default is a space).

count(sub[, start[, end]]) → int

Return the number of non-overlapping occurrences of subsection sub in bytes B[start:end]. Optional arguments start and end are interpreted as in slice notation.

decode([encoding[, errors]]) → unicode object.

Decodes B using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is ‘strict’ meaning that encoding errors raise a UnicodeDecodeError. Other possible values are ‘ignore’ and ‘replace’ as well as any other name registered with codecs.register_error that is able to handle UnicodeDecodeErrors.

endswith(suffix[, start[, end]]) → bool

Return True if B ends with the specified suffix, False otherwise. With optional start, test B beginning at that position. With optional end, stop comparing B at that position. suffix can also be a tuple of strings to try.

expandtabs([tabsize]) → copy of B

Return a copy of B where all tab characters are expanded using spaces. If tabsize is not given, a tab size of 8 characters is assumed.

extend(iterable int) → None

Append all the elements from the iterator or sequence to the end of B.

find(sub[, start[, end]]) → int

Return the lowest index in B where subsection sub is found, such that sub is contained within B[start,end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

fromhex(string) → bytearray

Create a bytearray object from a string of hexadecimal numbers. Spaces between two numbers are accepted. Example: bytearray.fromhex(‘B9 01EF’) -> bytearray(b’xb9x01xef’).

index(sub[, start[, end]]) → int

Like B.find() but raise ValueError when the subsection is not found.

insert(index, int) → None

Insert a single item into the bytearray before the given index.

isalnum() → bool

Return True if all characters in B are alphanumeric and there is at least one character in B, False otherwise.

isalpha() → bool

Return True if all characters in B are alphabetic and there is at least one character in B, False otherwise.

isdigit() → bool

Return True if all characters in B are digits and there is at least one character in B, False otherwise.

islower() → bool

Return True if all cased characters in B are lowercase and there is at least one cased character in B, False otherwise.

isspace() → bool

Return True if all characters in B are whitespace and there is at least one character in B, False otherwise.

istitle() → bool

Return True if B is a titlecased string and there is at least one character in B, i.e. uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return False otherwise.

isupper() → bool

Return True if all cased characters in B are uppercase and there is at least one cased character in B, False otherwise.

join(iterable_of_bytes) → bytes

Concatenates any number of bytearray objects, with B in between each pair.

ljust(width[, fillchar]) → copy of B

Return B left justified in a string of length width. Padding is done using the specified fill character (default is a space).

lower() → copy of B

Return a copy of B with all ASCII characters converted to lowercase.

lstrip([bytes]) → bytearray

Strip leading bytes contained in the argument. If the argument is omitted, strip leading ASCII whitespace.

partition(sep) -> (head, sep, tail)

Searches for the separator sep in B, and returns the part before it, the separator itself, and the part after it. If the separator is not found, returns B and two empty bytearray objects.

pop([index]) → int

Remove and return a single item from B. If no index argument is given, will pop the last value.

remove(int) → None

Remove the first occurrence of a value in B.

replace(old, new[, count]) → bytes

Return a copy of B with all occurrences of subsection old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

reverse() → None

Reverse the order of the values in B in place.

rfind(sub[, start[, end]]) → int

Return the highest index in B where subsection sub is found, such that sub is contained within B[start,end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[, start[, end]]) → int

Like B.rfind() but raise ValueError when the subsection is not found.

rjust(width[, fillchar]) → copy of B

Return B right justified in a string of length width. Padding is done using the specified fill character (default is a space)

rpartition(sep) -> (head, sep, tail)

Searches for the separator sep in B, starting at the end of B, and returns the part before it, the separator itself, and the part after it. If the separator is not found, returns two empty bytearray objects and B.

rsplit(sep[, maxsplit]) → list of bytearray

Return a list of the sections in B, using sep as the delimiter, starting at the end of B and working to the front. If sep is not given, B is split on ASCII whitespace characters (space, tab, return, newline, formfeed, vertical tab). If maxsplit is given, at most maxsplit splits are done.

rstrip([bytes]) → bytearray

Strip trailing bytes contained in the argument. If the argument is omitted, strip trailing ASCII whitespace.

split([sep[, maxsplit]]) → list of bytearray

Return a list of the sections in B, using sep as the delimiter. If sep is not given, B is split on ASCII whitespace characters (space, tab, return, newline, formfeed, vertical tab). If maxsplit is given, at most maxsplit splits are done.

splitlines(keepends=False) → list of lines

Return a list of the lines in B, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.

startswith(prefix[, start[, end]]) → bool

Return True if B starts with the specified prefix, False otherwise. With optional start, test B beginning at that position. With optional end, stop comparing B at that position. prefix can also be a tuple of strings to try.

strip([bytes]) → bytearray

Strip leading and trailing bytes contained in the argument. If the argument is omitted, strip ASCII whitespace.

swapcase() → copy of B

Return a copy of B with uppercase ASCII characters converted to lowercase ASCII and vice versa.

title() → copy of B

Return a titlecased version of B, i.e. ASCII words start with uppercase characters, all remaining cased characters have lowercase.

translate(table[, deletechars]) → bytearray

Return a copy of B, where all characters occurring in the optional argument deletechars are removed, and the remaining characters have been mapped through the given translation table, which must be a bytes object of length 256.

upper() → copy of B

Return a copy of B with all ASCII characters converted to uppercase.

zfill(width) → copy of B

Pad a numeric string B with zeros on the left, to fill a field of the specified width. B is never truncated.

genometools.misc.bytes

alias of newbytes

genometools.misc.callable(object) → bool

Return whether the object is callable (i.e., some kind of function). Note that classes are callable, as are instances with a __call__() method.

genometools.misc.chr()

unichr(i) -> Unicode character

Return a Unicode string of one character with ordinal i; 0 <= i <= 0x10ffff.

class genometools.misc.classmethod(function) → method

Convert a function to be a class method.

A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:

class C:

@classmethod def f(cls, arg1, arg2, ...):

...

It can be called either on the class (e.g. C.f()) or on an instance (e.g. C().f()). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.

Class methods are different than C++ or Java static methods. If you want those, see the staticmethod builtin.

genometools.misc.cmp(x, y) → integer

Return negative if x<y, zero if x==y, positive if x>y.

genometools.misc.coerce(x, y) -> (x1, y1)

Return a tuple consisting of the two numeric arguments converted to a common type, using the same rules as used by arithmetic operations. If coercion is not possible, raise TypeError.

genometools.misc.compile(source, filename, mode[, flags[, dont_inherit]]) → code object

Compile the source string (a Python module, statement or expression) into a code object that can be executed by the exec statement or eval(). The filename will be used for run-time error messages. The mode must be ‘exec’ to compile a module, ‘single’ to compile a single (interactive) statement, or ‘eval’ to compile an expression. The flags argument, if present, controls which future statements influence the compilation of the code. The dont_inherit argument, if non-zero, stops the compilation inheriting the effects of any future statements in effect in the code calling compile; if absent or zero these statements do influence the compilation, in addition to any features explicitly specified.

class genometools.misc.complex(real[, imag]) → complex number

Create a complex number from a real part and an optional imaginary part. This is equivalent to (real + imag*1j) where imag defaults to 0.

conjugate() → complex

Return the complex conjugate of its argument. (3-4j).conjugate() == 3+4j.

imag

the imaginary part of a complex number

real

the real part of a complex number

genometools.misc.configure_logger(name, log_stream=<open file '<stdout>', mode 'w'>, log_file=None, log_level=20, keep_old_handlers=False, propagate=False)[source]

Configures and returns a logger.

This function serves to simplify the configuration of a logger that writes to a file and/or to a stream (e.g., stdout).

Parameters:
  • name (str) – The name of the logger. Typically set to __name__.
  • log_stream (a stream object, optional) – The stream to write log messages to. If None, do not write to any stream. The default value is sys.stdout.
  • log_file (str, optional) – The path of a file to write log messages to. If None, do not write to any file. The default value is None.
  • log_level (int, optional) – A logging level as defined in Python’s logging module. The default value is logging.INFO.
  • keep_old_handlers (bool, optional) – If set to True, keep any pre-existing handlers that are attached to the logger. The default value is False.
  • propagate (bool, optional) – If set to True, propagate the loggers messages to the parent logger. The default value is False.
Returns:

The logger.

Return type:

logging.Logger

Notes

Note that if log_stream and log_file are both None, no handlers will be created.

genometools.misc.delattr(object, name)

Delete a named attribute on an object; delattr(x, ‘y’) is equivalent to ``del x.y’‘.

genometools.misc.dict

alias of newdict

genometools.misc.dir([object]) → list of strings

If called without an argument, return the names in the current scope. Else, return an alphabetized list of names comprising (some of) the attributes of the given object, and of attributes reachable from it. If the object supplies a method named __dir__, it will be used; otherwise the default dir() logic is used and returns:

for a module object: the module’s attributes. for a class object: its attributes, and recursively the attributes

of its bases.
for any other object: its attributes, its class’s attributes, and
recursively the attributes of its class’s base classes.
genometools.misc.divmod(x, y) -> (quotient, remainder)

Return the tuple (x//y, x%y). Invariant: div*y + mod == x.

class genometools.misc.enumerate(iterable[, start]) → iterator for index, value of iterable

Return an enumerate object. iterable must be another object that supports iteration. The enumerate object yields pairs containing a count (from start, which defaults to zero) and a value yielded by the iterable argument. enumerate is useful for obtaining an indexed list:

(0, seq[0]), (1, seq[1]), (2, seq[2]), ...
next
genometools.misc.eval(source[, globals[, locals]]) → value

Evaluate the source in the context of globals and locals. The source may be a string representing a Python expression or a code object as returned by compile(). The globals must be a dictionary and locals can be any mapping, defaulting to the current globals and locals. If only globals is given, locals defaults to it.

genometools.misc.execfile(filename[, globals[, locals]])

Read and execute a Python script from a file. The globals and locals are dictionaries, defaulting to the current globals and locals. If only globals is given, locals defaults to it.

class genometools.misc.file(name[, mode[, buffering]]) → file object

Open a file. The mode can be ‘r’, ‘w’ or ‘a’ for reading (default), writing or appending. The file will be created if it doesn’t exist when opened for writing or appending; it will be truncated when opened for writing. Add a ‘b’ to the mode for binary files. Add a ‘+’ to the mode to allow simultaneous reading and writing. If the buffering argument is given, 0 means unbuffered, 1 means line buffered, and larger numbers specify the buffer size. The preferred way to open a file is with the builtin open() function. Add a ‘U’ to mode to open the file for input with universal newline support. Any line ending in the input file will be seen as a ‘n’ in Python. Also, a file so opened gains the attribute ‘newlines’; the value for this attribute is one of None (no newline read yet), ‘r’, ‘n’, ‘rn’ or a tuple containing all the newline types seen.

‘U’ cannot be combined with ‘w’ or ‘+’ mode.

close() → None or (perhaps) an integer. Close the file.

Sets data attribute .closed to True. A closed file cannot be used for further I/O operations. close() may be called more than once without error. Some kinds of file objects (for example, opened by popen()) may return an exit status upon closing.

closed

True if the file is closed

encoding

file encoding

errors

Unicode error handler

fileno() → integer "file descriptor".

This is needed for lower-level file interfaces, such os.read().

flush() → None. Flush the internal I/O buffer.
isatty() → true or false. True if the file is connected to a tty device.
mode

file mode (‘r’, ‘U’, ‘w’, ‘a’, possibly with ‘b’ or ‘+’ added)

name

file name

newlines

end-of-line convention used in this file

next
read([size]) → read at most size bytes, returned as a string.

If the size argument is negative or omitted, read until EOF is reached. Notice that when in non-blocking mode, less data than what was requested may be returned, even if no size parameter was given.

readinto() → Undocumented. Don't use this; it may go away.
readline([size]) → next line from the file, as a string.

Retain newline. A non-negative size argument limits the maximum number of bytes to return (an incomplete line may be returned then). Return an empty string at EOF.

readlines([size]) → list of strings, each a line from the file.

Call readline() repeatedly and return a list of the lines so read. The optional size argument, if given, is an approximate bound on the total number of bytes in the lines returned.

seek(offset[, whence]) → None. Move to new file position.

Argument offset is a byte count. Optional argument whence defaults to 0 (offset from start of file, offset should be >= 0); other values are 1 (move relative to current position, positive or negative), and 2 (move relative to end of file, usually negative, although many platforms allow seeking beyond the end of a file). If the file is opened in text mode, only offsets returned by tell() are legal. Use of other offsets causes undefined behavior. Note that not all file objects are seekable.

softspace

flag indicating that a space needs to be printed; used by print

tell() → current file position, an integer (may be a long integer).
truncate([size]) → None. Truncate the file to at most size bytes.

Size defaults to the current file position, as returned by tell().

write(str) → None. Write string str to file.

Note that due to buffering, flush() or close() may be needed before the file on disk reflects the data written.

writelines(sequence_of_strings) → None. Write the strings to the file.

Note that newlines are not added. The sequence can be any iterable object producing strings. This is equivalent to calling write() for each string.

xreadlines() → returns self.

For backward compatibility. File objects now include the performance optimizations previously implemented in the xreadlines module.

genometools.misc.filter

alias of ifilter

genometools.misc.flatten(l)[source]

Flattens a list of lists.

Parameters:l (list) – The list of lists.
Returns:The flattened list.
Return type:list
class genometools.misc.float(x) → floating point number

Convert a string or number to a floating point number, if possible.

as_integer_ratio() -> (int, int)

Return a pair of integers, whose ratio is exactly equal to the original float and with a positive denominator. Raise OverflowError on infinities and a ValueError on NaNs.

>>> (10.0).as_integer_ratio()
(10, 1)
>>> (0.0).as_integer_ratio()
(0, 1)
>>> (-.25).as_integer_ratio()
(-1, 4)
conjugate()

Return self, the complex conjugate of any float.

fromhex(string) → float

Create a floating-point number from a hexadecimal string. >>> float.fromhex(‘0x1.ffffp10’) 2047.984375 >>> float.fromhex(‘-0x1p-1074’) -4.9406564584124654e-324

hex() → string

Return a hexadecimal representation of a floating-point number. >>> (-0.1).hex() ‘-0x1.999999999999ap-4’ >>> 3.14159.hex() ‘0x1.921f9f01b866ep+1’

imag

the imaginary part of a complex number

is_integer()

Return True if the float is an integer.

real

the real part of a complex number

genometools.misc.format(value[, format_spec]) → string

Returns value.__format__(format_spec) format_spec defaults to “”

class genometools.misc.frozenset → empty frozenset object

frozenset(iterable) -> frozenset object

Build an immutable unordered collection of unique elements.

copy()

Return a shallow copy of a set.

difference()

Return the difference of two or more sets as a new set.

(i.e. all elements that are in this set but not the others.)

intersection()

Return the intersection of two or more sets as a new set.

(i.e. elements that are common to all of the sets.)

isdisjoint()

Return True if two sets have a null intersection.

issubset()

Report whether another set contains this set.

issuperset()

Report whether this set contains another set.

symmetric_difference()

Return the symmetric difference of two sets as a new set.

(i.e. all elements that are in exactly one of the sets.)

union()

Return the union of sets as a new set.

(i.e. all elements that are in either set.)

genometools.misc.ftp_download(url, download_file, if_exists=u'error', user_name=u'anonymous', password=u'', blocksize=4194304)[source]

Downloads a file from an FTP server.

Parameters:
  • url (str) – The URL of the file to download.
  • download_file (str) – The path of the local file to download to.
  • if_exists (str, optional) –
    Desired behavior when the download file already exists. One of:
    ‘error’ - Raise an OSError ‘skip’ - Do nothing, only report a warning. ‘overwrite’ - Overwrite the file. reporting a warning.

    Default: ‘error’.

  • user_name (str, optional) – The user name to use for logging into the FTP server. [‘anonymous’]
  • password (str, optional) – The password to use for logging into the FTP server. [‘’]
  • blocksize (int, optional) – The blocksize (in bytes) to use for downloading. [4194304]
Returns:

Return type:

None

genometools.misc.get_file_checksum(path)[source]

Get the checksum of a file (using sum, Unix-only).

This function is only available on certain platforms.

Parameters:path (str) – The path of the file.
Returns:The checksum.
Return type:int
Raises:IOError – If the file does not exist.
genometools.misc.get_file_md5sum(path)[source]

Calculate the MD5 hash for a file.

genometools.misc.get_file_size(path)[source]

The the size of a file in bytes.

Parameters:

path (str) – The path of the file.

Returns:

The size of the file in bytes.

Return type:

int

Raises:
  • IOError – If the file does not exist.
  • OSError – If a file system error occurs.
genometools.misc.get_fize_size(path)[source]

The the size of a file.

Parameters:path (str) – The file path.
Returns:The size of the file in bytes.
Return type:int
genometools.misc.get_logger(name=u'', log_stream=None, log_file=None, quiet=False, verbose=False)[source]

Convenience function for getting a logger.

genometools.misc.get_url_file_name(url)[source]

Get the file name from an url

Parameters:url (str) –
Returns:The file name
Return type:str
genometools.misc.get_url_size(url)[source]

Get the size of a URL.

Note: Uses requests, so it does not work for FTP URLs.

Source: StackOverflow user “Burhan Khalid”. (http://stackoverflow.com/a/24585314/5651021)

Parameters:url (str) – The URL.
Returns:The size of the URL in bytes.
Return type:int
genometools.misc.getattr(object, name[, default]) → value

Get a named attribute from an object; getattr(x, ‘y’) is equivalent to x.y. When a default argument is given, it is returned when the attribute doesn’t exist; without it, an exception is raised in that case.

genometools.misc.globals() → dictionary

Return the dictionary containing the current scope’s global variables.

genometools.misc.gzip_open_text(path, encoding=None)[source]

Opens a plain-text file that may be gzip’ed.

Parameters:
  • path (str) – The file.
  • encoding (str, optional) – The encoding to use.
Returns:

A file-like object.

Return type:

file-like

Notes

Generally, reading gzip’ed files with gzip.open is very slow, and it is preferable to pipe the file into the python script using gunzip -c. The script then reads the file from stdin.

genometools.misc.hasattr(object, name) → bool

Return whether the object has an attribute with the given name. (This is done by calling getattr(object, name) and catching exceptions.)

genometools.misc.hash(object) → integer

Return a hash value for the object. Two objects with the same value have the same hash value. The reverse is not necessarily true, but likely.

genometools.misc.hex(number) → string

Return the hexadecimal representation of an integer or long integer.

genometools.misc.http_download(url, download_file, overwrite=False, raise_http_exception=True)[source]

Download a file over HTTP(S).

See: http://stackoverflow.com/a/13137873/5651021

Parameters:
  • url (str) – The URL.
  • download_file (str) – The path of the local file to write to.
  • overwrite (bool, optional) – Whether to overwrite an existing file (if present). [False]
  • raise_http_exception (bool, optional) – Whether to raise an exception if there is an HTTP error. [True]
Raises:
  • OSError – If the file already exists and overwrite is set to False.
  • requests.HTTPError – If an HTTP error occurred and raise_http_exception was set to True.
genometools.misc.id(object) → integer

Return the identity of an object. This is guaranteed to be unique among simultaneously existing objects. (Hint: it’s the object’s memory address.)

genometools.misc.input()

raw_input([prompt]) -> string

Read a string from standard input. The trailing newline is stripped. If the user hits EOF (Unix: Ctl-D, Windows: Ctl-Z+Return), raise EOFError. On Unix, GNU readline is used if enabled. The prompt string, if given, is printed without a trailing newline before reading.

genometools.misc.int

alias of newint

genometools.misc.intern(string) → string

``Intern’’ the given string. This enters the string in the (global) table of interned strings whose purpose is to speed up dictionary lookups. Return the string itself or the previously interned string object with the same value.

genometools.misc.is_writable(path)[source]

Tests if a file is writable.

genometools.misc.isinstance(object, class-or-type-or-tuple) → bool

Return whether an object is an instance of a class or of a subclass thereof. With a type as second argument, return whether that is the object’s type. The form using a tuple, isinstance(x, (A, B, ...)), is a shortcut for isinstance(x, A) or isinstance(x, B) or ... (etc.).

genometools.misc.issubclass(C, B) → bool

Return whether class C is a subclass (i.e., a derived class) of class B. When using a tuple as the second argument issubclass(X, (A, B, ...)), is a shortcut for issubclass(X, A) or issubclass(X, B) or ... (etc.).

genometools.misc.iter(collection) → iterator

iter(callable, sentinel) -> iterator

Get an iterator from an object. In the first form, the argument must supply its own iterator, or be a sequence. In the second form, the callable is called until it returns the sentinel.

genometools.misc.len(object) → integer

Return the number of items of a sequence or collection.

genometools.misc.list

alias of newlist

genometools.misc.locals() → dictionary

Update and return a dictionary containing the current scope’s local variables.

class genometools.misc.long(x=0) → long

long(x, base=10) -> long

Convert a number or string to a long integer, or return 0L if no arguments are given. If x is floating point, the conversion truncates towards zero.

If x is not a number or if base is given, then x must be a string or Unicode object representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-‘ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4L

bit_length() → int or long

Number of bits necessary to represent self in binary. >>> bin(37L) ‘0b100101’ >>> (37L).bit_length() 6

conjugate()

Returns self, the complex conjugate of any long.

denominator

the denominator of a rational number in lowest terms

imag

the imaginary part of a complex number

numerator

the numerator of a rational number in lowest terms

real

the real part of a complex number

genometools.misc.make_sure_dir_exists(dir_, create_subfolders=False)[source]

Ensures that a directory exists.

Adapted from StackOverflow users “Bengt” and “Heikki Toivonen” (http://stackoverflow.com/a/5032238).

Parameters:
  • dir (str) – The directory path.
  • create_subfolders (bool, optional) – Whether to create any inexistent subfolders. [False]
Returns:

Return type:

None

Raises:

OSError – If a file system error occurs.

genometools.misc.map

alias of imap

genometools.misc.max(iterable[, key=func]) → value

max(a, b, c, ...[, key=func]) -> value

With a single iterable argument, return its largest item. With two or more arguments, return the largest argument.

class genometools.misc.memoryview(object)

Create a new memoryview object which references the given object.

genometools.misc.min(iterable[, key=func]) → value

min(a, b, c, ...[, key=func]) -> value

With a single iterable argument, return its smallest item. With two or more arguments, return the smallest argument.

genometools.misc.next(iterator[, default])

Return the next item from the iterator. If default is given and the iterator is exhausted, it is returned instead of raising StopIteration.

genometools.misc.object

alias of newobject

genometools.misc.oct(number) → string

Return the octal representation of an integer or long integer.

genometools.misc.open()

Open file and return a stream. Raise IOError upon failure.

file is either a text or byte string giving the name (and the path if the file isn’t in the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor is given, it is closed when the returned I/O object is closed, unless closefd is set to False.)

mode is an optional string that specifies the mode in which the file is opened. It defaults to ‘r’ which means open for reading in text mode. Other common values are ‘w’ for writing (truncating the file if it already exists), and ‘a’ for appending (which on some Unix systems, means that all writes append to the end of the file regardless of the current seek position). In text mode, if encoding is not specified the encoding used is platform dependent. (For reading and writing raw bytes use binary mode and leave encoding unspecified.) The available modes are:

Character Meaning
‘r’ open for reading (default)
‘w’ open for writing, truncating the file first
‘a’ open for writing, appending to the end of the file if it exists
‘b’ binary mode
‘t’ text mode (default)
‘+’ open a disk file for updating (reading and writing)
‘U’ universal newline mode (for backwards compatibility; unneeded for new code)

The default mode is ‘rt’ (open for reading text). For binary random access, the mode ‘w+b’ opens and truncates the file to 0 bytes, while ‘r+b’ opens the file without truncation.

Python distinguishes between files opened in binary and text modes, even when the underlying operating system doesn’t. Files opened in binary mode (appending ‘b’ to the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when ‘t’ is appended to the mode argument), the contents of the file are returned as strings, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.

buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows:

  • Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device’s “block size” and falling back on io.DEFAULT_BUFFER_SIZE. On many systems, the buffer will typically be 4096 or 8192 bytes long.
  • “Interactive” text files (files for which isatty() returns True) use line buffering. Other text files use the policy described above for binary files.

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent, but any encoding supported by Python can be passed. See the codecs module for the list of supported encodings.

errors is an optional string that specifies how encoding errors are to be handled—this argument should not be used in binary mode. Pass ‘strict’ to raise a ValueError exception if there is an encoding error (the default of None has the same effect), or pass ‘ignore’ to ignore errors. (Note that ignoring encoding errors can lead to data loss.) See the documentation for codecs.register for a list of the permitted encoding error strings.

newline controls how universal newlines works (it only applies to text mode). It can be None, ‘’, ‘n’, ‘r’, and ‘rn’. It works as follows:

  • On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in ‘n’, ‘r’, or ‘rn’, and these are translated into ‘n’ before being returned to the caller. If it is ‘’, universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
  • On output, if newline is None, any ‘n’ characters written are translated to the system default line separator, os.linesep. If newline is ‘’, no translation takes place. If newline is any of the other legal values, any ‘n’ characters written are translated to the given string.

If closefd is False, the underlying file descriptor will be kept open when the file is closed. This does not work when a file name is given and must be True in that case.

open() returns a file object whose type depends on the mode, and through which the standard file operations such as reading and writing are performed. When open() is used to open a file in a text mode (‘w’, ‘r’, ‘wt’, ‘rt’, etc.), it returns a TextIOWrapper. When used to open a file in a binary mode, the returned class varies: in read binary mode, it returns a BufferedReader; in write binary and append binary modes, it returns a BufferedWriter, and in read/write mode, it returns a BufferedRandom.

It is also possible to use a string or bytearray as a file for both reading and writing. For strings StringIO can be used like a file opened in a text mode, and for bytes a BytesIO can be used like a file opened in a binary mode.

genometools.misc.ord(c) → integer

Return the integer ordinal of a one-character string.

genometools.misc.pow(x, y[, z]) → number[source]

With two arguments, equivalent to x**y. With three arguments, equivalent to (x**y) % z, but may be more efficient (e.g. for ints).

genometools.misc.print(value, ..., sep=' ', end='n', file=sys.stdout)

Prints the values to a stream, or to sys.stdout by default. Optional keyword arguments: file: a file-like object (stream); defaults to the current sys.stdout. sep: string inserted between values, default a space. end: string appended after the last value, default a newline.

class genometools.misc.property(fget=None, fset=None, fdel=None, doc=None) → property attribute

fget is a function to be used for getting an attribute value, and likewise fset is a function for setting, and fdel a function for del’ing, an attribute. Typical use is to define a managed attribute x:

class C(object):
def getx(self): return self._x def setx(self, value): self._x = value def delx(self): del self._x x = property(getx, setx, delx, “I’m the ‘x’ property.”)

Decorators make defining new properties or modifying existing ones easy:

class C(object):

@property def x(self):

“I am the ‘x’ property.” return self._x

@x.setter def x(self, value):

self._x = value

@x.deleter def x(self):

del self._x
deleter()

Descriptor to change the deleter on a property.

getter()

Descriptor to change the getter on a property.

setter()

Descriptor to change the setter on a property.

genometools.misc.range

alias of newrange

genometools.misc.raw_input([prompt]) → string

Read a string from standard input. The trailing newline is stripped. If the user hits EOF (Unix: Ctl-D, Windows: Ctl-Z+Return), raise EOFError. On Unix, GNU readline is used if enabled. The prompt string, if given, is printed without a trailing newline before reading.

genometools.misc.read_all(path, encoding=u'UTF-8')[source]

Reads a tab-delimited text file.

The file can either be uncompressed or gzip’ed.

Parameters:
  • path (str) – The path of the file.
  • enc (str, optional) – The file encoding.
Returns:

A list, which each element containing the contents of a row (as a tuple).

Return type:

List of (tuple of str)

genometools.misc.read_single(path, encoding=u'UTF-8')[source]

Reads the first column of a tab-delimited text file.

The file can either be uncompressed or gzip’ed.

Parameters:
  • path (str) – The path of the file.
  • enc (str) – The file encoding.
Returns:

A list containing the elements in the first column.

Return type:

List of str

genometools.misc.reduce(function, sequence[, initial]) → value

Apply a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). If initial is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty.

genometools.misc.reload(module) → module

Reload the module. The module must have been successfully imported before.

genometools.misc.repr(object) → string

Return the canonical string representation of the object. For most object types, eval(repr(object)) == object.

class genometools.misc.reversed(sequence) → reverse iterator over values of the sequence

Return a reverse iterator

next
genometools.misc.round(number, ndigits=None)

See Python 3 documentation: uses Banker’s Rounding.

Delegates to the __round__ method if for some reason this exists.

If not, rounds a number to a given precision in decimal digits (default 0 digits). This returns an int when called with one argument, otherwise the same type as the number. ndigits may be negative.

See the test_round method in future/tests/test_builtins.py for examples.

class genometools.misc.set → new empty set object

set(iterable) -> new set object

Build an unordered collection of unique elements.

add()

Add an element to a set.

This has no effect if the element is already present.

clear()

Remove all elements from this set.

copy()

Return a shallow copy of a set.

difference()

Return the difference of two or more sets as a new set.

(i.e. all elements that are in this set but not the others.)

difference_update()

Remove all elements of another set from this set.

discard()

Remove an element from a set if it is a member.

If the element is not a member, do nothing.

intersection()

Return the intersection of two or more sets as a new set.

(i.e. elements that are common to all of the sets.)

intersection_update()

Update a set with the intersection of itself and another.

isdisjoint()

Return True if two sets have a null intersection.

issubset()

Report whether another set contains this set.

issuperset()

Report whether this set contains another set.

pop()

Remove and return an arbitrary set element. Raises KeyError if the set is empty.

remove()

Remove an element from a set; it must be a member.

If the element is not a member, raise a KeyError.

symmetric_difference()

Return the symmetric difference of two sets as a new set.

(i.e. all elements that are in exactly one of the sets.)

symmetric_difference_update()

Update a set with the symmetric difference of itself and another.

union()

Return the union of sets as a new set.

(i.e. all elements that are in either set.)

update()

Update a set with the union of itself and others.

genometools.misc.setattr(object, name, value)

Set a named attribute on an object; setattr(x, ‘y’, v) is equivalent to ``x.y = v’‘.

class genometools.misc.slice(stop)

slice(start, stop[, step])

Create a slice object. This is used for extended slicing (e.g. a[0:10:2]).

indices(len) -> (start, stop, stride)

Assuming a sequence of length len, calculate the start and stop indices, and the stride length of the extended slice described by S. Out of bounds indices are clipped in a manner consistent with the handling of normal slices.

genometools.misc.smart_open_read(*args, **kwds)[source]

Open a file for reading or return stdin.

Adapted from StackOverflow user “Wolph” (http://stackoverflow.com/a/17603000).

genometools.misc.smart_open_write(*args, **kwds)[source]

Open a file for writing or return stdout.

Adapted from StackOverflow user “Wolph” (http://stackoverflow.com/a/17603000).

genometools.misc.sorted()

sorted(iterable, cmp=None, key=None, reverse=False) –> new sorted list

class genometools.misc.staticmethod(function) → method

Convert a function to be a static method.

A static method does not receive an implicit first argument. To declare a static method, use this idiom:

class C:

@staticmethod def f(arg1, arg2, ...):

...

It can be called either on the class (e.g. C.f()) or on an instance (e.g. C().f()). The instance is ignored except for its class.

Static methods in Python are similar to those found in Java or C++. For a more advanced concept, see the classmethod builtin.

genometools.misc.str

alias of newstr

genometools.misc.sum(sequence[, start]) → value

Return the sum of a sequence of numbers (NOT strings) plus the value of parameter ‘start’ (which defaults to 0). When the sequence is empty, return start.

genometools.misc.super(typ=<object object>, type_or_obj=<object object>, framedepth=1)

Like builtin super(), but capable of magic.

This acts just like the builtin super() function, but if called without any arguments it attempts to infer them at runtime.

genometools.misc.test_dir_writable(path)[source]

Test if we can write to a directory.

Parameters:dir (str) – The directory path.
Returns:Whether the directory is writable or not.
Return type:bool
genometools.misc.test_file_checksum(path, checksum)[source]

Test if a file has a given checksum (using sum, Unix-only).

Parameters:
  • path (str) – The path of the file.
  • checksum (int) – The checksum to compare.
Returns:

Whether or not the file has the given checksum.

Return type:

bool

Raises:

IOError – If the file does not exist.

genometools.misc.test_file_writable(path)[source]

Test if we can write to a file.

Parameters:path (str) – The file path.
Returns:Whether the file is writable or not.
Return type:bool
class genometools.misc.tuple → empty tuple

tuple(iterable) -> tuple initialized from iterable’s items

If the argument is a tuple, the return value is the same object.

count(value) → integer -- return number of occurrences of value
index(value[, start[, stop]]) → integer -- return first index of value.

Raises ValueError if the value is not present.

genometools.misc.unichr(i) → Unicode character

Return a Unicode string of one character with ordinal i; 0 <= i <= 0x10ffff.

class genometools.misc.unicode(object='') → unicode object

unicode(string[, encoding[, errors]]) -> unicode object

Create a new Unicode object from the given encoded string. encoding defaults to the current default string encoding. errors can be ‘strict’, ‘replace’ or ‘ignore’ and defaults to ‘strict’.

capitalize() → unicode

Return a capitalized version of S, i.e. make the first character have upper case and the rest lower case.

center(width[, fillchar]) → unicode

Return S centered in a Unicode string of length width. Padding is done using the specified fill character (default is a space)

count(sub[, start[, end]]) → int

Return the number of non-overlapping occurrences of substring sub in Unicode string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

decode([encoding[, errors]]) → string or unicode

Decodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is ‘strict’ meaning that encoding errors raise a UnicodeDecodeError. Other possible values are ‘ignore’ and ‘replace’ as well as any other name registered with codecs.register_error that is able to handle UnicodeDecodeErrors.

encode([encoding[, errors]]) → string or unicode

Encodes S using the codec registered for encoding. encoding defaults to the default encoding. errors may be given to set a different error handling scheme. Default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.

endswith(suffix[, start[, end]]) → bool

Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.

expandtabs([tabsize]) → unicode

Return a copy of S where all tab characters are expanded using spaces. If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[, start[, end]]) → int

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args, **kwargs) → unicode

Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{‘ and ‘}’).

index(sub[, start[, end]]) → int

Like S.find() but raise ValueError when the substring is not found.

isalnum() → bool

Return True if all characters in S are alphanumeric and there is at least one character in S, False otherwise.

isalpha() → bool

Return True if all characters in S are alphabetic and there is at least one character in S, False otherwise.

isdecimal() → bool

Return True if there are only decimal characters in S, False otherwise.

isdigit() → bool

Return True if all characters in S are digits and there is at least one character in S, False otherwise.

islower() → bool

Return True if all cased characters in S are lowercase and there is at least one cased character in S, False otherwise.

isnumeric() → bool

Return True if there are only numeric characters in S, False otherwise.

isspace() → bool

Return True if all characters in S are whitespace and there is at least one character in S, False otherwise.

istitle() → bool

Return True if S is a titlecased string and there is at least one character in S, i.e. upper- and titlecase characters may only follow uncased characters and lowercase characters only cased ones. Return False otherwise.

isupper() → bool

Return True if all cased characters in S are uppercase and there is at least one cased character in S, False otherwise.

join(iterable) → unicode

Return a string which is the concatenation of the strings in the iterable. The separator between elements is S.

ljust(width[, fillchar]) → int

Return S left-justified in a Unicode string of length width. Padding is done using the specified fill character (default is a space).

lower() → unicode

Return a copy of the string S converted to lowercase.

lstrip([chars]) → unicode

Return a copy of the string S with leading whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is a str, it will be converted to unicode before stripping

partition(sep) -> (head, sep, tail)

Search for the separator sep in S, and return the part before it, the separator itself, and the part after it. If the separator is not found, return S and two empty strings.

replace(old, new[, count]) → unicode

Return a copy of S with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

rfind(sub[, start[, end]]) → int

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[, start[, end]]) → int

Like S.rfind() but raise ValueError when the substring is not found.

rjust(width[, fillchar]) → unicode

Return S right-justified in a Unicode string of length width. Padding is done using the specified fill character (default is a space).

rpartition(sep) -> (head, sep, tail)

Search for the separator sep in S, starting at the end of S, and return the part before it, the separator itself, and the part after it. If the separator is not found, return two empty strings and S.

rsplit([sep[, maxsplit]]) → list of strings

Return a list of the words in S, using sep as the delimiter string, starting at the end of the string and working to the front. If maxsplit is given, at most maxsplit splits are done. If sep is not specified, any whitespace string is a separator.

rstrip([chars]) → unicode

Return a copy of the string S with trailing whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is a str, it will be converted to unicode before stripping

split([sep[, maxsplit]]) → list of strings

Return a list of the words in S, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator and empty strings are removed from the result.

splitlines(keepends=False) → list of strings

Return a list of the lines in S, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true.

startswith(prefix[, start[, end]]) → bool

Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.

strip([chars]) → unicode

Return a copy of the string S with leading and trailing whitespace removed. If chars is given and not None, remove characters in chars instead. If chars is a str, it will be converted to unicode before stripping

swapcase() → unicode

Return a copy of S with uppercase characters converted to lowercase and vice versa.

title() → unicode

Return a titlecased version of S, i.e. words start with title case characters, all remaining cased characters have lower case.

translate(table) → unicode

Return a copy of the string S, where all characters have been mapped through the given translation table, which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted.

upper() → unicode

Return a copy of S converted to uppercase.

zfill(width) → unicode

Pad a numeric string S with zeros on the left, to fill a field of the specified width. The string S is never truncated.

genometools.misc.vars([object]) → dictionary

Without arguments, equivalent to locals(). With an argument, equivalent to object.__dict__.

class genometools.misc.xrange(stop) → xrange object

xrange(start, stop[, step]) -> xrange object

Like range(), but instead of returning a list, returns an object that generates the numbers in the range on demand. For looping, this is slightly faster than range() and more memory efficient.

genometools.misc.zip

alias of izip