[BiO BB] About clustering genes to gene family

Marcos Oliveira de Carvalho operon at www.bioinformatics.org
Thu Aug 7 15:44:07 EDT 2003


Hi Carol,
I use TribeMCL software with good results.

Here is the URL -> http://www.ebi.ac.uk/research/cgg/tribe/

And here is the abstract of the paper about TribeMCL:

TribeMCL is a method for clustering proteins into related groups, which 
are termed 'protein families'. This clustering is achieved by analysing 
similarity patterns between proteins in a given dataset, and using these 
patterns to assign proteins into related groups. In many cases, proteins 
in the same protein familywill have similar functional properties. 
TribeMCL uses a novel clustering method (Markov Clustering or MCL) which 
solves problems which normally hinder protein sequence clustering. These 
problems include: multi-domain proteins, peptide fragments and proteins 
which possess domains which are very widespread (promiscuous domains). The 
efficiency of the method makes it applicable to the clustering of very 
large datasets. We routinely use the algorithm to cluster datasets as 
large as 500,000 peptides. 

Cheers
Marcos

On Thu, 7 Aug 2003, Zheng Fu wrote:

> Hi everyone,
> 
> Does anyone know how to clustering genes to a gene family based on the
> sequence alignments.
> For two genes, we can define a threshold to seperate the homolog and
> non-homolog. But for three or more genes,how to define the homologs?(Such
> as Gene A and Gene B has high alignment score, A and C also has high sore,
> but B and C doesn't have high socre, can we say ABC are homologs?
> 
> Thank you.
> 
> Carol
> 
> 

-- 
Marcos Oliveira de Carvalho
operon at bioinformatics.org




More information about the BBB mailing list