[Biodevelopers] Re: splice advice...
Joseph Landman
landman at scalableinformatics.com
Wed Aug 27 18:53:36 EDT 2003
On Wed, 2003-08-27 at 17:30, Dan Bolser wrote:
> Joseph Landman said:
> > Hi Dan:
> >
> > I am assuming that your arrays contain not HSP's, but some sort of
> > object representing the HSP ala BioPerl. Is this correct?
> >
>
> Nope, just a simple hash, key => value,
> I.E.
>
> {
> hsp_hit-from => 5,
> hsp_hit-to => 10,
> score => 20,
> }
>
Ok.
Comments interspersed:
> >> __SKIP__
> >>
> >> preamble
> >>
> >> @hsps = array of HSP hashes, for a particular protein
> >> each HSP can be from several SCOP sequences.
> >>
> >> __RESUME__
> >>
> >> my @result; # Final HSP's
> >>
> >> TOP:while (@hsps){ # NB: Ordered by Hsp_query-from
> >> # (for optimzation).
> >>
> >> my $p = 0; # Current HSP pointer.
> >>
> >> MID:for (my $j=$p+1; $j<@hsps; $j++){ # Overlap slider.
These two loops give you something comparable to an O(N^2) algorithm.
That is, I can rewrite the while and the for as
foreach my $hsp (@hsps) # Outer O(N): 1 iteration per entry
foreach my $index (0 .. $#hsps) # Inner O(N): 1 iteration per
entry
> >>
> >> # Family overlap only!
> >>
> >> next MID if
> >> $hsps[$p]->{SCCS} != $hsps[$j]->{SCCS};
Short circuits are good.
> >>
> >> # Optimization.
> >>
> >> if ( $THRESH >
> >> $hsps[$p]->{P_END} - $hsps[$j]->{P_START} ){
> >>
> >> shift @hsps;
> >> next TOP;
> >> }
A shift or a similar operation (splice, push, etc) will likely trigger
the garbage collector in Perl, as well as the memory allocator
> >>
> >> # Pick best of pair (removing the other from the list).
> >>
> >> if ( $hsps[$p]->{E_VALUE} > $hsps[$j]->{E_VALUE} ){
> >> splice (@hsps, $p, 1);
> >> $j--;
> >> $p = $j;
> >> }
> >> else {
> >> splice (@hsps, $j, 1);
> >> $j--;
> >> }
> >> }
> >> push @result, splice(@hsps, $p, 1);
> >> }
> >> print "OK\n\n";
I might suggest that instead of moving data around (splice, push, shift)
and causing allocation/GC events, that you might maintain several lists,
indicating which class the HSP's belong to.
--
Joseph Landman, Ph.D
Scalable Informatics LLC
email: landman at scalableinformatics.com
web: http://scalableinformatics.com
phone: +1 734 612 4615
More information about the Biodevelopers
mailing list