[BiO BB] strange error parsing a specific NCBI gff file

William Hsiao william.hsiao at gmail.com
Tue Jun 27 15:52:03 EDT 2006

Hi all,
   I've encountered a strange problem while parsing a gff file from
NCBI using perl.  I'm hoping that someone on the list may have a
solution even though this is not a bioperl issue.  Maybe someone
familiar with gff3 parsing can help :)  Essentially, I'm parsing a gff
file into a nested hash structure using the following functions:

sub parse_gff {
    my $file = shift;
    my %hash_gff;
    open (INFILE, $file) or die "Cannot find file $file\n";
	next if (/^\#/);
	my ($seqid, $source, $type, $start, $end, $score, $strand, $phase,
$attributes) = split /\t/;
	my $attri_ref = &process_attributes($attributes);
	my %record = ('seqid'     => $seqid,
		      'source'    => $source,
		      'type'      => $type,
		      'start'     => $start,
		      'end'       => $end,
		      'score'     => $score,
		      'strand'    => $strand,
		      'phase'     => $phase,
		      'attribute' => $attri_ref);
	push @{$hash_gff{$type}}, \%record;
    close INFILE;
    print Dumper %hash_gff;
    return \%hash_gff;

sub process_attributes {
    my $attr_string = shift;
    my @attributes = split (/\;/, $attr_string);
    my %attr;
    foreach (@attributes){
	my ($key, $value) = split /=/;
	if ($value=~/\:/){
	    my ($subkey, $subvalue) = split (/:/, $value);
    return \%attr;

   It works for all the gff files we downloaded from NCBI's microbial
genomes refseq ftp repository.  However, 3 lines from one particular
file NC_005966.gff (of Acinetobacter_sp_ADP1) can not be parsed
properly.  These lines are:

NC_005966.1	RefSeq	CDS	635836	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

NC_005966.1	RefSeq	start_codon	636487	636489	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

NC_005966.1	RefSeq	stop_codon	635833	635835	.	-	0	locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1

   They generate an error: Can't use string
("adaptation%20to%20stress") as a HASH ref while "strict refs" in use.
 The strange part is that all I have to do is replace the word
"function" in front of "=adaptation%20to%20stress;" with another word
or simply change it to functions or functio or Function, etc, then the
line parses properly.  If I retype the word "function", it doesn't
solve the problem.  For some strange reason, when the word "function"
is there, perl tried to use "adaptation%20to%20stress" as the hash key
and failed.  The word "function" is used in other lines as well so I
don't think the problem is not caused by the word alone.
    Any suggestion on what might be happening would be greatly
appreciated.  Thank you.



William Hsiao
PhD Student, Brinkman Laboratory
Department of Molecular Biology and Biochemistry
Simon Fraser University, 8888 University Dr. Burnaby, BC, Canada V5A 1S6
Phone: 604-291-4206 Fax: 604-291-5583

More information about the BBB mailing list