[BiO BB] Re: All-again-all protein sequence comparison
Hongyu Zhang
e_hongyu at yahoo.com
Sun Dec 19 18:15:49 EST 2004
> The problem I see with the e-values is that the
> e-value is dependent
> upon the search database size.e-value gives you the
> number of expected
> false positives, given the database you are
> searching. If your database
> is the queried genome(s) only, you may receive
> skewed values becuase a
> hit which would be considered to have a high
> e-value (low significance,
> more false positives expected by chance) when
> searched against nr, would
> have a low e-value (high significance) when searched
> against the
> genome(s). Similarities may be mistaken to be
> significant simply because
> the predicted number of false positives will always
> be small due to a
> small database size.
That's correct. But as long as you can "normalize"
your search using a fixed database size (like NR), I
don't see why you need to sacrifice the computer
time.
__________________________________
Do you Yahoo!?
The all-new My Yahoo! - Get yours free!
http://my.yahoo.com
More information about the BBB
mailing list