[BiO BB] About clustering genes to gene family
Marcos Oliveira de Carvalho
operon at www.bioinformatics.org
Thu Aug 7 15:44:07 EDT 2003
Hi Carol,
I use TribeMCL software with good results.
Here is the URL -> http://www.ebi.ac.uk/research/cgg/tribe/
And here is the abstract of the paper about TribeMCL:
TribeMCL is a method for clustering proteins into related groups, which
are termed 'protein families'. This clustering is achieved by analysing
similarity patterns between proteins in a given dataset, and using these
patterns to assign proteins into related groups. In many cases, proteins
in the same protein familywill have similar functional properties.
TribeMCL uses a novel clustering method (Markov Clustering or MCL) which
solves problems which normally hinder protein sequence clustering. These
problems include: multi-domain proteins, peptide fragments and proteins
which possess domains which are very widespread (promiscuous domains). The
efficiency of the method makes it applicable to the clustering of very
large datasets. We routinely use the algorithm to cluster datasets as
large as 500,000 peptides.
Cheers
Marcos
On Thu, 7 Aug 2003, Zheng Fu wrote:
> Hi everyone,
>
> Does anyone know how to clustering genes to a gene family based on the
> sequence alignments.
> For two genes, we can define a threshold to seperate the homolog and
> non-homolog. But for three or more genes,how to define the homologs?(Such
> as Gene A and Gene B has high alignment score, A and C also has high sore,
> but B and C doesn't have high socre, can we say ABC are homologs?
>
> Thank you.
>
> Carol
>
>
--
Marcos Oliveira de Carvalho
operon at bioinformatics.org
More information about the BBB
mailing list