[BiO BB] Comparing sequences from GenBank and RefSeq...

Dan Bolser dan.bolser at gmail.com
Tue Apr 28 08:50:32 EDT 2009


2009/4/23 Ryan Raaum <ryan.raaum at gmail.com>:
> The refseq entry tells you which non-refseq entry/entries it was
> derived from. In this case it says DQ386163, which suggests there are
> at least 2 pototo chloroplast sequences available - one by an Italian
> group and one by a Korean group.

Right I see. Any way to judge the quality of the two?

In the RefSeq record I read "PROVISIONAL REFSEQ: This record has not
yet been subject to final NCBI review." - Anyway to kick them about
that?

i.e. Dear RefSeq, I have DQ231562 and DQ386163, should they be merged
into NC_008096?


Thanks for the info,
Dan.


> On Thu, Apr 23, 2009 at 11:42 AM, Dan Bolser <dan.bolser at gmail.com> wrote:
>> Hi,
>>
>> I found that the potato chloroplast sequence from GenBank (DQ231562.1)
>> has several differences (260 SNPs and 30 indels) relative to the same
>> sequence in RefSeq (NC_008096.1). As far as I am aware this sequence
>> has only been obtained once, why would the two differ? In general
>> should I trust the refseq sequence?
>>
>>
>> For your reference here is the output of dnadiff over the two files:
>>
>> Reference/DQ231562.fasta Query/NC_008096.fasta
>> NUCMER
>>
>>                               [REF]                [QRY]
>> [Sequences]
>> TotalSeqs                          1                    1
>> AlignedSeqs               1(100.00%)           1(100.00%)
>> UnalignedSeqs               0(0.00%)             0(0.00%)
>>
>> [Bases]
>> TotalBases                    155312               155298
>> AlignedBases         155312(100.00%)      155298(100.00%)
>> UnalignedBases              0(0.00%)             0(0.00%)
>>
>> [Alignments]
>> 1-to-1                             1                    1
>> TotalLength                   155312               155298
>> AvgLength                  155312.00            155298.00
>> AvgIdentity                    99.81                99.81
>>
>> M-to-M                             1                    1
>> TotalLength                   155312               155298
>> AvgLength                  155312.00            155298.00
>> AvgIdentity                    99.81                99.81
>>
>> [Feature Estimates]
>> Breakpoints                        0                    0
>> Relocations                        0                    0
>> Translocations                     0                    0
>> Inversions                         0                    0
>>
>> Insertions                         0                    0
>> InsertionSum                       0                    0
>> InsertionAvg                    0.00                 0.00
>>
>> TandemIns                          0                    0
>> TandemInsSum                       0                    0
>> TandemInsAvg                    0.00                 0.00
>>
>> [SNPs]
>> TotalSNPs                        260                  260
>> AC                         23(8.85%)            14(5.38%)
>> AG                         24(9.23%)           30(11.54%)
>> AT                         15(5.77%)            14(5.38%)
>> CA                         14(5.38%)            23(8.85%)
>> CG                         24(9.23%)            18(6.92%)
>> CT                        32(12.31%)            19(7.31%)
>> GA                        30(11.54%)            24(9.23%)
>> GC                         18(6.92%)            24(9.23%)
>> GT                         13(5.00%)           34(13.08%)
>> TA                         14(5.38%)            15(5.77%)
>> TC                         19(7.31%)           32(12.31%)
>> TG                        34(13.08%)            13(5.00%)
>>
>> TotalGSNPs                       113                  113
>> AC                          9(7.96%)             8(7.08%)
>> AG                        17(15.04%)           17(15.04%)
>> AT                          5(4.42%)             3(2.65%)
>> CA                          8(7.08%)             9(7.96%)
>> CG                          6(5.31%)             7(6.19%)
>> CT                        15(13.27%)             8(7.08%)
>> GA                        17(15.04%)           17(15.04%)
>> GC                          7(6.19%)             6(5.31%)
>> GT                          6(5.31%)           12(10.62%)
>> TA                          3(2.65%)             5(4.42%)
>> TC                          8(7.08%)           15(13.27%)
>> TG                        12(10.62%)             6(5.31%)
>>
>> TotalIndels                       30                   30
>> A.                        14(46.67%)            4(13.33%)
>> C.                          1(3.33%)             0(0.00%)
>> G.                          0(0.00%)             0(0.00%)
>> T.                         7(23.33%)            4(13.33%)
>>
>> TotalGIndels                      24                   24
>> A.                        10(41.67%)            4(16.67%)
>> C.                          1(4.17%)             0(0.00%)
>> G.                          0(0.00%)             0(0.00%)
>> T.                         5(20.83%)            4(16.67%)
>>
>>
>> Thanks for any pointers,
>> Dan.
>>
>> _______________________________________________
>> BBB mailing list
>> BBB at bioinformatics.org
>> http://www.bioinformatics.org/mailman/listinfo/bbb
>>
>
>
>
> --
> Ryan Raaum
> Assistant Professor
> Department of Anthropology
> Lehman College, The City University of New York
> 250 Bedford Park Blvd. West
> Bronx, NY 10468
> e: ryan.raaum at lehman.cuny.edu
> w: http://www.raaum.org
> o: (718) 960-8845
> f: (718) 960-8406
> _______________________________________________
> BBB mailing list
> BBB at bioinformatics.org
> http://www.bioinformatics.org/mailman/listinfo/bbb
>


More information about the BBB mailing list