Hi, I performed All-vs-All blast for 214777 proteins and wanted to cluster these proteins. For clustering, I read that its better if we use average pairwise evalues for bi-directional hits. I wrote a perl script to read blast results, to store hit evalues for each query in a hash reference, to find bi-directionabl blast hits and calculate average pairwise evalues for them. I used hash references since these can be bit fast. I tested the following script on a machine with 2G of ram and on another with 94G of ram. It ran out of memory in both cases. Is it the case that it is not possible to hold this much data into a hash? Any suggestions to resolve the problem? Many Thanks, Intikhab Alam, I use the script as follows: To Fill the hash: while(<FILE>){ my ($q, $h, $iden, $alilen, $mism, $gapo, $qst, $qend, $sst, $send, $evalue, $bit)= split /\s+|\t+/, $_; $query_results->{$q}{$h}{evalue}=$evalue; } close FILE; To get average pairwise evalues: my ($c, $hits)=(0, 0); open F, ">pairwiseValues"; foreach my $query (%{$query_results}){ if (exists ($query_results->{$query})){ $c++; print "Query:$c:$query\n"; foreach my $hit (%{$query_results->{$query}}){ if (exists ($query_results->{$query}{$hit}{evalue})){ my $k=$query_results->{$query}{$hit}{evalue}; if ((exists ($query_results->{$query}{$hit}{evalue}) && ($query_results->{$hit}{$query}{evalue}))){ my $i= $query_results->{$query}{$hit}{evalue}; my $j= $query_results->{$hit}{$query}{evalue}; $k=(($i+$j)/2); $query_results->{$query}{$hit}{evalue}=$k; $query_results->{$hit}{$query}{evalue}=$k; } $hits++; print F "$query\t$hit\t$k\n"; } } } } close F; print "$hits hits read for $c queries \n"; ----------------------------------------------