[BiO BB] Re: All-again-all protein sequence comparison

Hongyu Zhang e_hongyu at yahoo.com
Sun Dec 19 18:15:49 EST 2004

> The problem I see with the e-values is that the
> e-value is dependent 
> upon the search database size.e-value gives you the
> number of expected 
> false positives, given the database you are
> searching. If  your database 
> is the queried genome(s) only, you may receive
> skewed values becuase a 
> hit which would  be considered to have a high
> e-value (low significance, 
> more false positives expected by chance) when
> searched against nr, would 
> have a low e-value (high significance) when searched
> against the 
> genome(s). Similarities may be mistaken to be
> significant simply because 
> the predicted number of false positives will always
> be small due to a 
> small database size.

That's correct. But as long as you can "normalize"
your search using a fixed database size (like NR), I
don't see why  you need to sacrifice the computer

Do you Yahoo!? 
The all-new My Yahoo! - Get yours free! 

More information about the BBB mailing list