[BiO BB] gff to sequence

Mike Marchywka marchywka at hotmail.com
Sat Oct 3 19:06:05 EDT 2009


 <55D5E770D835674587C89EE6239C5DAE1D4FFF4518 at EXMBX06.ad.oak.ox.ac.uk>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0



> This seems of such general use that it begs a small utility which will
> take a (possibly indexed) fasta file=2C a gff and output the sequences yo=
u
> want. What would people want from such a programme?
> Is GTF (http://mblab.wustl.edu/GTF2.html) more useful or GFF?
> Would different elements from the same group (gene/transcript) be joined
> together in order?

I wrote a small system like this based on ASCII hit files-
this means most of your temp files can be processed with
standard tools and usually they don't limit the speed although
with cygwin going through windoze this can add up.=20

> Would one want filtering on the "features" column so one could retrieve a=
ll
> splice sites or codon exons?
> What would be the output? Another fasta file? How would each "group" of
> Sequences (e.g. transcript) be labelled? By a user supplied regular expre=
ssion?
>
>
>> I guess it depends what you mean by quick- quick to write you could use =
awk
>> but then it depends what additional things you want to do with results.=
=3D20
>> I ended up writing a C++ fasta utility program since PERL can slow down =
som=3D
>> etimes but I ended up grabbing a couple of regex libraries to let me=3D2=
0
>> grep names etc.=3D20
> I hoped you used boost:regex which will be in the next c++ standard

If you had to read my posts on their mail list youwould change your attitud=
e=20
and wish I never heard of it:) Actually=2C as pointed out there=2C it
isn't clear how fast it is compared to greta ( for all my complaints
on msft that works well but maddock is at boost in any case). Finally
I wrote my own limited compiler but there seem to be boost expression
compilers that may be useful too. For editing fasta files=2C I doubt you
care this much about regex speed however.=20



Note: hotmail is now unusable for TEXT=2C I am moving to marchywka at gmail.co=
m or also use
marchywka at yahoo.com. Thanks.

Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
415-264-8477 (w)<- use this
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
Note: If I am asking for free stuff=2C I normally use for hobby/non-profit
information but may use in investment forums=2C public and private.
Please indicate any concerns if applicable.
 		 	   		  =0A=
_________________________________________________________________=0A=
Hotmail: Powerful Free email with security by Microsoft.=0A=
http://clk.atdmt.com/GBL/go/171222986/direct/01/=




More information about the BBB mailing list