BranchClust: A Phylogenetic Algorithm for Selecting Gene Families |
|
BranchClust is an algorithm for the automated selection of orthologous genes that recognizes orthologous genes from different species in a phylogenetic tree for any number of taxa. The algorithm is capable of distinguishing complete (containing all taxa) and incomplete (not containing all taxa) families and recognizes in- and out-paralogs. |
BMC Bioinformatics 2007, 8:120
Free access:
http://www.biomedcentral.com/1471-2105/8/120
|
BranchClust Tutorial - a step-by-step guide for assembling orthologous gene families |
BranchClust is a clustering algorithm that parses trees in order to delineate families of orthologs within a superfamily containing several paralogous gene families. The underlying idea is that closely related genes are placed on one branch emerging from one node on a tree, so the task of detecting families for n different taxa is simply a task to detect branches containing groups of genes from all, or almost all, species. |
|
Superfamily of ATP-synthases for 30 taxa: 16 bacteria and 14 archaea.
ATP-A designates all catalytic subunits, either from bacteria or from archaea, or subunit A, and ATP-B - all non-catalytic subunits as subunit B, ATP-F - flagellum specific ATP synthase. |
BrunchClust output:
------------ CLUSTER ----------- 56421917 16080761 15606215 21673103 39998198 39933373 15600432 32473454 32141261 62390087 21225334 55981034 15806355 15644219 32475544 15606716 ------------ FAMILY ------------ 15606215 16080761 21673103 62390087 15806355 56421917 39998198 15600432 32473454 39933373 32141261 15644219 55981034 INCOMPLETE: 13 >>>>> IN-PARALOGS ----------- 21225334 <<<<< OUT-OF_CLUSTER PARALOGS ----------- 32475544 15606716
------------ CLUSTER ----------- 15644360 32476315 21674843 39995222 39933255 17227501 37522474 56421895 16080736 55820565 62390098 21223731 15606090 15600749 20091265 32473399 ------------ FAMILY ------------ 15606090 16080736 21674843 62390098 56421895 39995222 37522474 20091265 17227501 15600749 32476315 39933255 55820565 21223731 15644360 INCOMPLETE: 15 >>>>> IN-PARALOGS ----------- 32473399 <<<<< OUT-OF_CLUSTER PARALOGS -----------
------------ CLUSTER ----------- 20091272 21673859 32473392 15607015 15644358 15600747 37522139 17232531 21675043 39995224 39933253 32476317 56421893 16080734 55820567 62390100 21223733 ------------ FAMILY ------------ 15607015 16080734 21673859 62390100 56421893 39995224 37522139 20091272 17232531 15600747 32473392 39933253 55820567 21223733 15644358 INCOMPLETE: 15 >>>>> IN-PARALOGS ----------- 21675043 32476317 <<<<< OUT-OF_CLUSTER PARALOGS -----------
------------ CLUSTER ----------- 55981241 15805728 16081191 14521959 57641538 20095109 15678972 45358608 15790972 55379722 11498767 20092952 14600684 15897485 18312435 41615057 ------------ FAMILY ------------ 14600684 11498767 15805728 55379722 15790972 45358608 20095109 20092952 15678972 41615057 18312435 14521959 15897485 57641538 16081191 55981241 INCOMPLETE: 16
------------ CLUSTER ----------- 11498766 20092951 15790973 55379721 16081190 55981242 15805727 20094453 14521960 57641537 45358607 15678973 14600685 15897484 41614899 18312083 ------------ FAMILY ------------ 14600685 11498766 15805727 55379721 15790973 45358607 20094453 20092951 15678973 41614899 18312083 14521960 15897484 57641537 16081190 55981242 INCOMPLETE: 16
- ----------- CLUSTER -----------56419757 16078687 39995521 39934703 15596301 32477553 15642991 15596894 ------------ FAMILY ------------ 16078687 56419757 39995521 15596301 32477553 39934703 15642991 INCOMPLETE: 7 >>>>> IN-PARALOGS ----------- <<<<< OUT-OF_CLUSTER PARALOGS ----------- 15596894 |
|
Download Perl: www.perl.org
Download BioPerl: www.bioperl.org
Links |
Gogarten Lab Home Page: http://gogarten.uconn.edu/
Email to: Maria.Poptsova@gmail.com
Page last updated: May 16, 2007