[BiO BB] genbank2swissprot ?

Gaj Stan (BIGCAT) Stan.Gaj at BIGCAT.unimaas.nl
Wed Oct 10 07:58:55 EDT 2007


Hi Christoph,

There are two other possibilities:

a) Use BioMART at www.ensembl.org to retrieve EnsEMBL gene IDs using your list of RefSeq ID (You did mention you used NM_-ID's, so I assume you mean RefSeq IDs) and export this list with their UniProt crosslinking as well.  A problem you'll surely encounter using this approach is that there are situations where more than one UniProt ID has been associated with an EnsEMBL gene. The generated list contains this information, but on seperate lines. You'll need to filter the list for this. 

b) The RefSeq group has recently announced that they updated their databases with information towards UniProt (since they collaborated closely on this one). I can't find the archive of their Gene-Announce-list, but here is the announcement:

=======
Announcing the availability of RefSeq-UniProtKB cross-link data 
        In collaboration with UniProtKB  (http://www.pir.uniprot.org/) ,  the RefSeq group is now  reporting explicit cross-references to Swiss-Prot and  TrEMBL proteins  that correspond to a RefSeq protein. These correspondences are being calculated by the UniProtKB group, and will be updated every three weeks to correspond to UniProt's release cycle. The data are being made available  from several sites within NCBI:
         
        1.   The  full report from Entrez Gene, in the Reference Sequences section. 
           For an example, go to the Full Report page for the sevenless gene of  Drosophila melanogaster  (http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gene&Cmd=DetailsSearch&Term=32039%5Buid%5D) and click on the Reference Sequences section in the table of contents on the right.  You will see

              mRNA and Protein(s) 
              NM_078559.2→NP_511114.2 sevenless CG18085-PA [Drosophila melanogaster] 
             UniProtKB/Swiss-Prot  P13368    <--- new data 
         2. Links in NCBI's  Protein database 
            Explicit links between corresponding RefSeq and Swiss-Prot proteins are now provided within  the NCBI Protein database.  These links are available in the ‘Links’ menu located at the upper right of the protein display page.  The link names are:

              Protein (RefSeq):          provides a link from a Swiss-Prot record the corresponding RefSeq record 
              Protein (UniProtKB):       provides a link to the equivalent Swiss-Prot record 
         3. Filter choices in NCBI's  Protein database 
                protein protein refseq2uniprot    find RefSeq protein records with a link to a UniProtKB protein in NCBI's protein database
                protein protein uniprot2refseq    find UniProtKB protein records with a link to a RefSeq protein in NCBI's protein database
        4. ftp sites 
                A new file was added to the gene and refseq ftp sites to report the relationship between NCBI Reference Sequence protein accessions and UniProtKB protein accessions.  The new gene_refseq_uniprotkb_collab.gz file specifies the corresponding pairs of NCBI and UniProtKB protein accessions.
                        ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_refseq_uniprotkb_collab.gz 
         or 
                        ftp://ftp.ncbi.nlm.nih.gov/refseq/uniprotkb/gene_refseq_uniprotkb_collab.gz 
The README file on the gene and refseq ftp sites has been updated to document this addition. See: 
                        ftp://ftp.ncbi.nlm.nih.gov/gene/README 
                        ftp://ftp.ncbi.nlm.nih.gov/refseq/README 
        5. the ASN.1 in Entrez Gene 
         
          New implementation of a gene-commentary: 
     Each cross-reference will be reported in a gene-commentary of type other. Note: more than one cross-reference per RefSeq protein record is possible.
                          type other, 
                          source { 
                             { 
                              src { 
                                  db "UniProtKB/Swiss-Prot", 
                                  tag str "P23760" 
                             }, 
                              anchor "P23760" 
                             } 
                             { 
                              src { 
                                  db "UniProtKB/TrEMBL", 
                                  tag str "O23760" 
                             }, 
                             anchor "O23760" 
                            } 
                  

====
I haven't tested this one out myself, but I think it might do the trick for you (:

Best wishes,

  -- Stan


-----Original Message-----
From: bio_bulletin_board-bounces+stan.gaj=bigcat.unimaas.nl at bioinformatics.org [mailto:bio_bulletin_board-bounces+stan.gaj=bigcat.unimaas.nl at bioinformatics.org] On Behalf Of Boris Steipe
Sent: 09 October 2007 20:18
To: General Forum at Bioinformatics.Org
Subject: Re: [BiO BB] genbank2swissprot ?

Does the UniProt ID mapping service fit your requirements?
   http://www.pir.uniprot.org/search/idmapping.shtml


Boris


On 9-Oct-07, at 12:09 PM, Dr. Christoph Gille wrote:

> Is there a mapping of identifiers from genbank nt sequences
> to identifiers of swissprot (protein) ?
> Using some tables in
> ftp://ftp.ncbi.nih.gov/refseq/
> ftp://ftp.ncbi.nih.gov/gene/DATA
> there seems to be a way indirectly over the proteinkb.
> But perhaps there is a more direct way?
> Many thanks
>
> Christoph
>
> _______________________________________________
> General Forum at Bioinformatics.Org -  
> BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board

_______________________________________________
General Forum at Bioinformatics.Org - BiO_Bulletin_Board at bioinformatics.org
https://bioinformatics.org/mailman/listinfo/bio_bulletin_board



More information about the BBB mailing list