[BiO BB] problem with LWP::Simple

DMUTANTZ at aol.com DMUTANTZ at aol.com
Sun Jun 26 13:27:35 EDT 2005

I would be garteful for any help with this.
I want to pull an id number (UniProt protein accession number) from a file  
using a regex.  This works OK.
I then wanted to use the number as part of a url to pull the relevant page  
back , so I could parse some information about the protein from the page.
The code is very basic.
My perl script:

# A script to pull out an id number from a file using a regex.
#The  id number(s0 are put into an array @accnumber.
#The file I read in is  html_test2.txt (attached to this mail).
#Then use the id number as part of a  url to get and store a webpage.
#In this case to simplify things I just want  to take the first 
#element of the @accnumber array and use that in the  url

use LWP::Simple;

$a = 0;
    #ask for the file name 
print "please enter file name", "\n"; 
    #open and read the file

$filename1 = <>;
open fileone,  "$filename1"
or die;
while (!eof(fileone))
my $line = <fileone>;

if ( $line =~/UNIPROT:?\w+\s(\w{6})\s/)
@accnumber[$a]= $1."\n";

close fileone;

$query_number = @accnumber[0]; 

#as  a sanity check I print the number to STDOUT
print $query_number;
   #I call the subroutine to return the webpage

sub get_page {
my $address = $_[0];

my $url =  'http://www.ebi.uniprot.org/uniprot-srv/xmlView.do?proteinId='
my $html_file = 'page.html';
my $status = getstore($url,  $html_file);
die "No _URL::Error_ (:Error) " unless  is_success($status);

and the text file I parse to get my regex:
BLASTP 2.0MP-WashU [13-Dec-2004] [decunix5.0a-ev6-IP32LF64  
Copyright (C) 1996-2004 Washington University, Saint Louis, Missouri  USA.
All Rights Reserved.
Reference:  Gish, W. (1996-2004) _http://blast.wustl.edu_ 
Query=  24061  17154533 emb|CAC80823.1 (AJ251791) putative IAA1  protein 
sativa]  1e-130 235 236 99.5% top  hit
(237 letters; record 1)
Database:   uniprot
1,880,849 sequences; 604,459,357 total  letters.
Searching....10....20....30....40....50....60....70....80....90....100%  done
High  Probability
Sequences producing High-scoring Segment  Pairs:               Score  P(N)    
UNIPROT:Q75KX3_ORYSA Q84PD9 Putative auxin-responsive pro...   1203  1.2e-121 

All Rights Reserved.
Reference:  Gish, W. (1996-2004) 
Thanks for any help.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.bioinformatics.org/pipermail/bbb/attachments/20050626/0c3ee633/attachment.html>

More information about the BBB mailing list