Let's say you want to analyze co-evolution within or across one or more molecules. Do to this properly, you will need to collect a large set of ortholog sequences of this molecule across many species. Sure, you can simple search through NCBI and select and copy sequences one by one. Another approach is an iterative Blast search, a Psi-blast. But a faster approach is to access these database programmatically. Below are some tips and tools for carrying out these steps. You might also consider an iterative and automatic protein homology search to find orthologs - such can be provided by HMMer and Jack-HMMer searches, which scour Uniprot for sequences. In fact, this is the approach used by servers such as EVCoupling. Once again, some of the tools below will aid this process.
NCBI databases can be programmatically accessed via the E-utilities made available by NCBI. With these, you can input a search term, and receive a file that contains the IDs of the genes you want to add to the collection. Then, using these IDs, you can fetch the sequence.
* Esearch
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=REPLACE_with_SEARCH_TERMS&retmax=REPLACE_with_MAX_NUMBER_of_RESULTS
First blank is search term, second blank is number of sequences to retrieve
* Remove all
* Efetch
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=REPLACE_with_IDS&rettype=fasta
If the above solution seems to be too laborious, you can try some of the programs below, including an online form that helps expedite the above instructions. Find_Seqs automates the first step for any type of molecule, and NucSeqFetch is the proper followup for retrieving nucleotide sequences (important for looking at RNA co-evolution).
Please select the files you would like to download.
Please select the files you would like to download.