[BiO BB] KEGG vs GO

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Thu Apr 6 18:07:07 EDT 2006


Michael Ashburner (Genetics) wrote:
> I think that there is some confusion in this thread.

:)

> 1. There is the Gene Ontology.  Its terms are used (primarily)
> for the annotation of gene products.  Both the Ontology and the
> annotations contributed by the members of the GO Consortium database
> are available from the GO site.
> 
> 2. There is the KEGG Orthology, available from the KEGG site.
> This is _both_ an ontology, seen, for example, by
> opening KO up to its 3rd level: http://www.genome.ad.jp/dbget-bin/get_htext?KO+-s+F+-f+F+C
> _and_ annotations of classes of gene product, seen if it is opened up
> to level 4:
> http://www.genome.ad.jp/dbget-bin/get_htext?KO+-s+F+-f+F+D

Thanks for the links, that looks useful.


> It would be easy for us to make a mapping between the Gene Ontology
> and KO (level 3), except that the KO includes domains outwith the GO
> (e.g.  01500 Human Diseases, and its child terms).

Here, "includes domains outwith the GO"? I am not sure what you mean.


> In fact we will
> do that and make it available as a ko2go mapping file on GO. We do not
> need the "SwissProt Relational Database" to do this. Indeed, KEGG already
> provide many of these mappings to the GO.

Yes I see. However, when dealing with multiple classification trees or 
whatever, I find it very useful to start from the viewpoint of the 
individual gene products or genes (hence my bias towards SwissProt). In 
general I find it makes it easier to understand the mappings between 
classifications from this point of view, but of course you are right, it 
is not necessary. Although it would be interesting to see the underlying 
data (or rationale) for the mapping between GO and Kegg level 3 at the 
KEGG site.

The ko2go file will be very interesting.


> Mapping to level 4 is more problematic.  The KO presents three levels:
> 
> Ontology terms ("Levels 1-3")
> 	e.g.: 00010 Glycolysis / Gluconeogenesis PATH:ko00010] [GO:0006096 0006094]
> Families of proteins ("Level 4")
> 	e.g.  K00845 E2.7.1.2, glk; glucokinase [EC:2.7.1.2] [COG:COG0837] [GO:0004340]
> Genes, whose products are members of this family
> 	e.g. Genes HSA: 2645(GCK)
> 
> While for those Level 4 terms that are enzymes a 'mapping' of KO to the GO
> would not be hard, 

Do you mean between EC and the GO Molecular function branch? The above 
mappings at level 3 are to the Biological process branch which makes sense.


> it gets more difficult further down.  Consider the term:
> K06051 DLL; delta
> This is a child of (among others)
> Notch signaling pathway [PATH:ko04330] {which would map to the GO)
> and has children:
> HSA: 10683(DLL3) 28514(DLL1) 54567(DLL4)
> MMU: 13388(Dll1) 13389(Dll3) 54485(Dll4)
> RNO: 114125(Dll3) 311332(Dll4_predicted) 84010(Dll1)
> XLA: 379238(MGC52561)
> DRE: 30120(dlc) 30131(dla) 30138(dld) 30141(dlb)
> DME: CG3619-PA(Dmel_CG3619)
> Which are clearly individual gene products.
> 
> Thus, I conclude, that KO's: K06051 DLL; delta  is a _genus_
> of gene products.  This is conceptually very different from the GO,
> despite what may seem to be superficial similarities.

I don't understand. As you say "Notch signaling pathway" does map to go 
([GO:0007219]), which has 183 assigned proteins (in my SwissProt 
Relational Database ;). Is this not a similar 'genus' of gene products? 
What do you imply by this term?

SwissProt annotates some (but not all) of the genes and gene products 
within the K06051 'genus' with the GO:0007219 term, and in addition 103 
further GO terms for those gene products (namely for Q9NYJ7, O00548, 
Q9NR61, Q61483, O88516, Q9JI71, O88671, P97677 and P10041).


> So, contra Lucy, the difference between the GO and KO has nothing to
> do with manual vs automatic annotation, or on the 'focus' of the KO,
> but rather they differ in their underlying structure.

But their may be a structure function relationship ;)

IMHO I think integrated databases, for example the 'Bio Warehouse',

http://www.biomedcentral.com/1471-2105/7/170

can provide a gold mine for investigating the relationships between 
different ontologies over genes and gene products, with the dual aims of 
consistency (error checking) and data mining. For this reason it is very 
interesting to compare differently derived, differently structured and 
differently focused ontologies at the fundamental level to look for 
higher level associations. However, this isn't a trivial task.

A really nice software project for working with ontologies and as many 
different 'data models' as you can think of is here,

http://www.prova.ws/

The nice thing about this project is that it makes your data and your 
model transparent, allowing them to be broken down or built up into 
other models. I think this area (data and model exchange with 
transparency) will become increasingly important in the field.


> Michael 
> 
> 
> =====
> Envelope-to: ma11 at gen.cam.ac.uk
> Delivery-date: Wed, 05 Apr 2006 11:14:38 +0100
> X-Cam-SpamDetails: scanned, SpamAssassin (score=0)
> X-Cam-AntiVirus: No virus found
> X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/
> X-Original-To: bio_bulletin_board at bioinformatics.org
> Delivered-To: bio_bulletin_board at bioinformatics.org
> X-Cam-SpamDetails: Not scanned
> X-Cam-AntiVirus: No virus found
> Date: Wed, 05 Apr 2006 11:13:14 +0100
> From: Dan Bolser <dmb at mrc-dunn.cam.ac.uk>
> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
> X-Accept-Language: en-us, en
> MIME-Version: 1.0
> To: "The general forum at Bioinformatics.Org" <bio_bulletin_board at bioinformatics.org>
> Subject: Re: [BiO BB] KEGG vs GO
> Content-Transfer-Encoding: 7bit
> X-BeenThere: bio_bulletin_board at bioinformatics.org
> X-Mailman-Version: 2.1.5
> List-Id: "The general forum at Bioinformatics.Org" <bio_bulletin_board.bioinformatics.org>
> List-Unsubscribe: <https://bioinformatics.org/mailman/listinfo/bio_bulletin_board>, 
> <mailto:bio_bulletin_board-request at bioinformatics.org?subject=unsubscribe>
> List-Archive: <http://bioinformatics.org/pipermail/bio_bulletin_board>
> List-Post: <mailto:bio_bulletin_board at bioinformatics.org>
> List-Help: <mailto:bio_bulletin_board-request at bioinformatics.org?subject=help>
> List-Subscribe: <https://bioinformatics.org/mailman/listinfo/bio_bulletin_board>, 
> <mailto:bio_bulletin_board-request at bioinformatics.org?subject=subscribe>
> X-Keywords: 
> 
> lucifer at slimy.greenend.org.uk wrote:
> 
>>"Samantha Fox" <bioinfosm at gmail.com> writes:
>>
>>
>>>I was wondering how KEGG and GO differ from a broad perspective of 
>>>grouping functionally related genes.  So a KEGG pathway lists all 
>>>genes that kind of work together, and a similar GO term would also 
>>>contain such > a gene list.
>>
>>
>>IIRC, KEGG is manually created from the literature whilst GO also 
>>contains automatic/electronic annotation based on sequence homology.  
>>KEGG also focuses more on metabolic pathways, whilst GO covers a more 
>>comprehensive set of cellular processes and molecular functions.
>>
>>Hope that helps,
> 
> 
> It should be possible to 'cross correlate' KEGG an GO in a number of 
> different ways using one of the SWISSPROT relational databases. However 
> you should know that generally 'ontology mapping' is an open problem :)
> 
> Good luck!
> 
> 
> 
>>Lucy
>>-- 
>>Lucy McWilliam
>>http://www.chiark.greenend.org.uk/~lucifer/
>>_______________________________________________
>>Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 
> 
> _______________________________________________
> Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
> 
> _______________________________________________
> Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board




More information about the BBB mailing list