[BiO BB] Matching and Filtering -- try grep- thanks

Pooja Jain pooja at igc.gulbenkian.pt
Mon Nov 17 13:30:25 EST 2003

Hi Dmitri I Gouliaev ,
Thank you for your suggestion. I followed the grep man pages and used 
grep -f  and it worked.

grep  -f 'file1.txt'  file2.txt  > file3.txt

Where file1.txt has the list of accession numbers corresponding to which I
would like to filter the details from file2.txt. But the above command
writes the contents of the file2.txt to file3.txt.

thanks again.


> Hi, Pooja Jain !
>  On Mon, Nov 17, 2003 at 11:15:10AM -0000, Pooja Jain wrote:
>> I am having a txt file with a list of accession numbers for few of the
>> seqeuence from entire Arabidopsis thaliana genome. I have another tab
>> delimited txt file with all the accession numbers and other details
>> about
>> every sequence peresent in the genome of it (row wise). From this later
>> file I want to filter the details about only those  sequences which have
>> the same accesion numbers as in the former file.
>> Could some one please suggest some simple way to do this matching and
>> filtering? I tried using the simple shell scripts commands like cmp and
>> diff but none of them worked. Is ther any other command I can use with
>> the
>> shell. Any other way to do so with perl is also welcome.
> From man pages:
>     grep, egrep, fgrep - print lines matching a pattern
> You should use grep.
> If
>     file-with-a-list is a txt file with a list of accession numbers
> and
>     file-with-all-the-details is the other file,
> then this shell one-liner
>     user at host$ cat file-with-a-list \
>                | while read AN ; do \
>                    grep "^$AN" file-with-all-the-details ; \
>                  done >> file-with-the-details-for-the-listed-accnum
> should work for you (if the accession numbers are at the beginning of the
> lines in the "other" file).  If they are not, but there are some
> white-space characters at the beginning of each lines, then change "^$AN"
> to "[ \t]$AN" (with quotation marks).
> Hope this helps,
> --
> DIG (Dmitri I GOULIAEV)        http://www.bioinformatics.org/~dig/
> 1024D/63A6C649: 26A0 E4D5 AB3F C2D4 0112  66CD 4343 C0AF 63A6 C649
> _______________________________________________
> BiO_Bulletin_Board maillist  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

More information about the BBB mailing list