[BiO BB] Protein Datatypes for function prediction
Mike Marchywka
marchywka at hotmail.com
Sat Jul 28 11:08:04 EDT 2007
>rapidly, don't dismiss even simple text processing.
[...]
>( a few thousand, enough to cluster perhaps)
I actually tried this with osteoglycins. If you download them, there aren't
that many,
pickout repeated "words", and cluster by presence of absence of the most
popular words, it turns out to do a decent automated job of separating by
species.
These are the vectors ( presence/absence of the words) along with members
having that vector ( names could be ambiguous ,for illustration only). I was
hoping it would
separate by type but that is a problem using most common words to
discriminate.
The zero vector amounts to a "miscellaneous" cluster.
$ for f in `cat osteo_groups | awk '{print $2}' ` ; do echo $f; g=`grep $f
osteo_vectors|awk '{print $1}'| sed -e 's/>//'`; echo $g; h=`echo $g|sed -e
's/\..*//g' |sed -e 's/ */\\\|/g'`; grep -A 2 "$h" osteo_rdict| grep
"DEFINITION"| sed -e 's/DEFINITION//' ; done |unix2dos >/dev/clipboard
1111111111111111111111111011111111110001
CAI16694 AAH95443 AAH37273 NP_148935 NP_054776 ABM85338 ABM82153 EAW62820
EAW62819 EAW62818 P20774 CAB53706
osteoglycin [Homo sapiens].
Osteoglycin [Homo sapiens].
Osteoglycin [Homo sapiens].
osteoglycin preproprotein isoform 2 [Homo sapiens].
osteoglycin preproprotein isoform 2 [Homo sapiens].
osteoglycin (osteoinductive factor, mimecan) [synthetic construct].
osteoglycin (osteoinductive factor, mimecan) [synthetic construct].
osteoglycin (osteoinductive factor, mimecan), isoform CRA_a [Homo
osteoglycin (osteoinductive factor, mimecan), isoform CRA_a [Homo
osteoglycin (osteoinductive factor, mimecan), isoform CRA_a [Homo
Mimecan precursor (Osteoglycin) (Osteoinductive factor) (OIF).
hypothetical protein [Homo sapiens].
1111111011111000100001011101001000001110
NP_032786 EDL41086 AAH21939 BAA06721 Q62000 BAE35995 BAC35462
osteoglycin [Mus musculus].
osteoglycin [Mus musculus].
Osteoglycin [Mus musculus].
osteoglycin precursor [Mus musculus].
Mimecan precursor (Osteoglycin).
unnamed protein product [Mus musculus].
unnamed protein product [Mus musculus].
0000000000000000000000000000000000000000
CAK03681 NP_002336 O42235 NP_032464 NP_989507 NP_033885
novel protein similar to vertebrate osteoglycin (osteoinductive
lumican precursor [Homo sapiens].
Keratocan precursor (KTN) (Keratan sulfate proteoglycan keratocan).
keratocan [Mus musculus].
keratocan [Gallus gallus].
bone morphogenetic protein 1 [Mus musculus].
1101111111100000001001011100001000001110
EDL98110 XP_001054654 XP_001054599 XP_001054725 XP_214441
osteoglycin (predicted) [Rattus norvegicus].
PREDICTED: similar to Mimecan precursor (Osteoglycin) isoform 2
PREDICTED: similar to Mimecan precursor (Osteoglycin) isoform 1
PREDICTED: similar to Mimecan precursor (Osteoglycin) isoform 3
PREDICTED: similar to Mimecan precursor (Osteoglycin) [Rattus
0000001000100110000100000000000000000000
NP_989540 AAD21085 Q9W6H0 Q9DE65
osteoglycin [Gallus gallus].
osteoglycin [Gallus gallus].
Mimecan precursor (Osteoglycin).
Mimecan precursor (Osteoglycin).
1111111111111111111111111011011111110001
AAP97142 Q5RBL2 CAH90848
osteoglycin OG [Homo sapiens].
Mimecan precursor (Osteoglycin).
hypothetical protein [Pongo pygmaeus].
1110101111110110010001011001011111111110
NP_001075585 AAM46865 Q8MJF1
osteoglycin [Oryctolagus cuniculus].
osteoglycin [Oryctolagus cuniculus].
Mimecan precursor (Osteoglycin).
1110111111100011110110111111011001100010
ABQ13007 P19879
osteoglycin preproprotein [Bos taurus].
Mimecan precursor (Osteoglycin) [Contains: Corneal keratan sulfate
1110011111100011110110111111001001100010
NP_776371 AAB70264
osteoglycin [Bos taurus].
mimecan [Bos taurus].
1111111111111111111111011011011111110000
NP_077727
osteoglycin preproprotein isoform 1 [Homo sapiens].
1111011111101111111110111011001111100001
XP_001103337
PREDICTED: osteoglycin isoform 2 [Macaca mulatta].
1111011111101111111110011011001111100000
XP_001103195
PREDICTED: osteoglycin isoform 1 [Macaca mulatta].
1110111111110011010001111110011001010000
ABL96619
osteoglycin [Capra hircus].
1101011111110010010111011100000001110110
XP_853340
PREDICTED: similar to Mimecan precursor (Osteoglycin)
1100000111011110110111000011000101110000
CAB61417
hypothetical protein [Homo sapiens].
1011111000100111011000011000011110000000
CAI16695
osteoglycin [Homo sapiens].
1011111000100111001000111000011010000001
AAX25979
SJCHGC07866 protein [Schistosoma japonicum].
0000001000100000000000000000000000000000
NP_001080164
osteoglycin [Xenopus laevis].
0000000110000000000000000000000000000000
CAJ57655
osteoglycin [Sus scrofa].
0000000000100100000000000000000000000000
XP_001512743
PREDICTED: similar to osteoglycin preproprotein [Ornithorhynchus
0000000000000001000000100000000000000001
AAD40453
mimecan [Homo sapiens].
Mike Marchywka
586 Saint James Walk
Marietta GA 30067-7165
404-788-1216 (C)<- leave message
989-348-4796 (P)<- emergency only
marchywka at hotmail.com
>From: "Mike Marchywka" <marchywka at hotmail.com>
>Reply-To: "General Forum at Bioinformatics.Org"
><bio_bulletin_board at bioinformatics.org>
>To: bio_bulletin_board at bioinformatics.org
>Subject: Re: [BiO BB] Protein Datatypes for function prediction
>Date: Tue, 24 Jul 2007 07:50:50 -0400
>
_________________________________________________________________
http://newlivehotmail.com
More information about the BBB
mailing list