Table of Contents

Class: SeqMat Bio/SubsMat/__init__.py

A Generic sequence matrix class The key is a 2-tuple containing the letter indices of the matrix. Those should be sorted in the tuple (low, high). Because each matrix is dealt with as a half-matrix.

5/2001 added the following: * Methods for subtraction, addition and multiplication of matrices * Generation of an expected frequency table from an observed frequency matrix * Calculation of linear correlation coefficient between two matrices. Needs Bio.Tools.statfns * Calculation of relative entropy is now done using the _make_relative_entropy method and is stored in the member self.relative_entropy * Calculation of entropy is now done using the _make_entropy method and is stored in the member self.entropy * Jensen-Shannon distance between the distributions from which the matrices are derived. This is a distance function based on the distribution's entropies.

Substitution matrix routines Iddo Friedberg idoerg@cc.huji.ac.il Biopython license applies (http://biopython.org)

General: ------- You should have python 2.0 or above. http://www.python.org You should have biopython (http://biopython.org) installed.

This module provides a class and a few routines for generating substitution matrices, similar ot BLOSUM or PAM matrices, but based on user-provided data. The class used for these matrices is SeqMat

Matrices are implemented as a user dictionary. Each index contains a 2-tuple, which are the two residue/nucleotide types replaced. The value differs according to the matrix's purpose: e.g in a log-odds frequency matrix, the value would be log(Pij/(Pi*Pj)) where: Pij: frequency of substitution of letter (residue/nucletide) i by j Pi, Pj: expected frequencies of i and j, respectively.

Usage: ----- The following section is layed out in the order by which most people wish to generate a log-odds matrix. Of course, interim matrices can be generated and investigated. Most people just want a log-odds matrix, that's all.

Generating an Accepted Replacement Matrix: ----------------------------------------- Initially, you should generate an accepted replacement matrix (ARM) from your data. The values in ARM are the counted number of replacements according to your data. The data could be a set of pairs or multiple alignments. So for instance if Alanine was replaced by Cysteine 10 times, and Cysteine by Alanine 12 times, the corresponding ARM entries would be: ['A','C']: 10, ['C','A'] 12 as order doesn't matter, user can already provide only one entry: ['A','C']: 22 A SeqMat instance may be initialized with either a full (first method of counting: 10, 12) or half (the latter method, 22) matrices. A Full protein alphabet matrix would be of the size 20x20 = 400. A Half matrix of that alphabet would be 20x20/2 + 20/2 = 210. That is because same-letter entries don't change. (The matrix diagonal). Given an alphabet size of N: Full matrix size:N*N Half matrix size: N(N+1)/2

If you provide a full matrix, the constructore will create a half-matrix automatically. If you provide a half-matrix, make sure of a (low, high) sorted order in the keys: there should only be a (A,'C') not a (C,'A').

Internal functions:

Generating the observed frequency matrix (OFM): ---------------------------------------------- Use: OFM = _build_obs_freq_mat(ARM) The OFM is generated from the ARM, only instead of replacement counts, it contains replacement frequencies. Generating an expected frequency matrix (EFM): --------------------------------------------- Use: EFM = _build_exp_freq_mat(OFM,exp_freq_table) exp_freq_table: should be a freqTableC instantiation. See freqTable.py for detailed information. Briefly, the expected frequency table has the frequencies of appearance for each member of the alphabet Generating a substitution frequency matrix (SFM): ------------------------------------------------ Use: SFM = _build_subs_mat(OFM,EFM) Accepts an OFM, EFM. Provides the division product of the corresponding values. Generating a log-odds matrix (LOM): ---------------------------------- Use: LOM=_build_log_odds_mat(SFM[,logbase=10,factor=10.0,roundit=1]) Accepts an SFM. logbase: base of the logarithm used to generate the log-odds values. factor: factor used to multiply the log-odds values. roundit: default - true. Whether to round the values. Each entry is generated by log(LOM[key])*factor And rounded if required.

External: --------- In most cases, users will want to generate a log-odds matrix only, without explicitly calling the OFM --> EFM --> SFM stages. The function build_log_odds_matrix does that. User provides an ARM and an expected frequency table. The function returns the log-odds matrix

Base Classes   
UserDict.UserDict
Methods   
__init__
__mul__
__sub__
__sum__
_alphabet_from_matrix
_correct_matrix
_full_to_half
_init_zero
all_letters_sum
letter_sum
make_entropy
make_relative_entropy
print_full_mat
print_mat
  __init__ 
__init__ (
        self,
        data=None,
        alphabet=None,
        mat_type=NOTYPE,
        mat_name='',
        build_later=0,
        )

  __mul__ 
__mul__ ( self,  other )

returns a matrix for which each entry is the multiplication product of the two matrices passed

  __sub__ 
__sub__ ( self,  other )

returns a number which is the subtraction product of the two matrices

  __sum__ 
__sum__ ( self,  other )

  _alphabet_from_matrix 
_alphabet_from_matrix ( self )

  _correct_matrix 
_correct_matrix ( self )

  _full_to_half 
_full_to_half ( self )

Convert a full-matrix to a half-matrix

  _init_zero 
_init_zero ( self )

  all_letters_sum 
all_letters_sum ( self )

  letter_sum 
letter_sum ( self,  letter )

  make_entropy 
make_entropy ( self )

  make_relative_entropy 
make_relative_entropy ( self,  obs_freq_mat )

if this matrix is a log-odds matrix, return its entropy Needs the observed frequency matrix for that

Exceptions   
TypeError, "entropy: substitution or log-odds matrices only"
  print_full_mat 
print_full_mat (
        self,
        f=sys.stdout,
        format="%4d",
        topformat="%4s",
        alphabet=None,
        factor=1,
        )

  print_mat 
print_mat (
        self,
        f=sys.stdout,
        format="%4d",
        bottomformat="%4s",
        alphabet=None,
        factor=1,
        )

Print a nice half-matrix. f=sys.stdout to see on the screen User may pass own alphabet, which should contain all letters in the alphabet of the matrix, but may be in a different order. This order will be the order of the letters on the axes


Table of Contents

This document was automatically generated on Sat Jul 7 09:49:56 2001 by HappyDoc version r1_5