[BiO BB] how to compare biological distance between four proteins

Pedro Fernandes pfern at igc.gulbenkian.pt
Thu Jun 5 02:00:43 EDT 2008

Hi Xue-Li,

There is no universal measurement of biological distance. There is controversy
on the use of the term itself.

Furthermore, when you "measure" something, you want your "measurement" to be
comparable to others, so there is a need too know what we are talking about.
For example if you do such a "measurement" on your four proteins today, you may
want to compare your present "measurement" with "measurements" that you take
tomorrow with another four proteins. Or another 500...

A generalized concept of "distance" may be applied to evaluate similarity. In
doing that you might be interested in conservation of motifs in proteins, so
you might be interested in finding out what is conderved to start with. And in
doing that you might be willing to consider similarity where non-silent
mutations might have been involved.

You may want to use a widespread multiple sequence alignment program. The simple
way is to use CLUSTALW to get a multiple sequence alignment. This program has a
bunch of parameters that can be used at their default value, but the careful
user would look for ways of using these parameter to better adapt to a concrete
biological situation. If ou use CLUSTALW you should state which parameters you
use at each time. And that will depend on the concrete problem that you want to
address. For example the matrix that is used to encode for mutation rates should
relect the problem. It may be not be indifferent if you are studying proteins
from mammals or procariots or plants! It may be good to play with more
parameters and tell the program if you are aligning very dissimilar or very
similar proteins.

But there is no harm in using CLUSTALW with all the defaults and see what
happens, provided that you know that you shuld not use the output without
thinking before stating things about "distances".

CLUSTALW outputs distances in the form of a tree file. Several formats are
commonly availale for this. Check with the documentation on the way it is
calculated and the possible further use of these distances.

As you may have undertood by now, getting these numbers is only the beginning,
the tip of the iceberg.

And, mind you, if you just want to judge similarity you may opt for not using
the term "distance" in the firt place!

God luck

Pedro Fernandes
Instituto Gulbenkian de Ciência
Apartado 14

