| 
       Perl Script Home  | 
    
| Perl-script only: | 
| 
     the latest version: version 1.01 November, 2006: BranchClust_1-01.pl previous versions: version 1.00 June, 2006: BranchClust_1-00.pl  | 
  
| Perl-script and examples: | 
contains the perl-script together with the examples of trees and various taxa recognition files.
| BranchClust Tutorial: | 
| 
       
        | 
      
      
       A step-by-step guide on how to download complete genomes, assemble superfamilies, reconstruct trees, select orthologous families with BranchClust, and analyze and depict results with TreeDyn.  | 
    
contains all the perl scripts described in the Tutorial (updated January 19, 2008)
| Program Usage | 
Required:
| 1. | 
     Bioperl module for parsing trees Bio::TreeIO. For instructions how to install bioperl go here.  | 
      
  
| 
     2. 
  | 
      
    
     Taxa recognition file gi_numbers.out must be present in the current directory. How to create this file, read the Taxa recognition file section.  | 
  
A
t the command line type:| 
     # perl BranchClust.pl <tree-file> <MANY>  | 
  
| 
     where <tree-file> is a superfamiliy tree in PHYLIP format, and <MANY> is a parameter designating the minimum number of different taxa on the branch sufficient for the branch to be a separate cluster (see Algorithm). It can be any number less or equal to the number of different taxa considered. The BranchClust algorithm usually works well with the value of 80% of total number of different taxa. See Examples below.  | 
      
  
| Output | 
| 
     BranchClust 
    creates three output files:
      
    
    clusters.out,  
    families.list
    and
      
    
    cluster.log. 
    File
      
    
    clusters.out
    reports clusters with selected families and IN- and OUT-OF-CLUSTER paralogs, 
    if any. File
      
    
    families.list
    contains only the list of selected families. Log-file
      
    
    cluster.log
    tracks selection algorithm, as it parses trees, node by node, and can be useful 
    to analyze 
    cases of wrong clustering.    | 
      
  
| Taxa recognition file | 
| 
     The structure of taxa recognition file gi_numbers.out is defined as follows:
 species1 | match_pattern1.1 match_pattern1.2 .. match_pattern1.m species2 | match_pattern2.1 match_pattern2.2 .. match_pattern2.m species3 | match_pattern3.1 match_pattern3.2 .. match_pattern3.m ... speciesN | match_patternN.1 match_patternN.2 ... match_patternN.m 
 where species1, species2, species 3 are different taxa and match_patterns are the ordinary perl pattern-matching expressions that can be used as an unique identifier of taxa. Species names are for user convinience only, and everything preceding sign | is ignored. Match patterns can be either names or gi_numbers.  | 
      
  
| 
     1. Example of how to use names as taxa identifiers:  | 
  
| 
      
    
     
 Suppose we have 4 different taxa, and we need to analyze a tree reconstructed from a mixture of paralogs and orthologs: 
 The taxa recognition file could be as follows: 
 species1| bacteria1.* species2| bacteria2.* species3| bacteria3.* species4| bacteria4.* 
 The paralogs of the same taxa can be designated as bacteria1.1, bacteria1.2, bacteria2.1, bacteria2.2, etc. 
 Here is example of tree containing paralogs and orthologs of 4 different taxa: 
 (((bacteria1_01,bacteria2_01),(bacteria3_01,bacteria4_01)),(bacteria3_03, (bacteria3_02,(bacteria2_02,(bacteria1_02,bacteria1_03))))); 
 How does BranchClust parse this tree, see Examples below.  | 
      
  
| 
     2. Example of how to use gi-numbers as taxa identifiers:  | 
  
| 
     
 Suppose we have 4 species of bacteria and archaea. The species could be distinguished by their gi-numbers:   
    Bacillus subtilis subsp. subtilis str. 168 | 1607.... 5081.... 1608.... 1867.... Escherichia 
    coli K12 | 1612.... 4917.... 1613.... 3334....  
    Methanosarcina mazei Go1 | 2122....  Sulfolobus 
    solfataricus P2 | 1589....   Here is the tree for the superfamily of ATP-ases (see Algorithm and Tutorial to learn how to assemble superfamilies), with gi_numbers to identify genes. 
 ((((16078687:0.38341,16129888:0.51097):0.17454,(16080761:0.29985, 16131639:0.32318):1.63425):0.29299,((16080736:0.26753,16131602:0.39301):0.70160, (15897485:0.22830,21226881:0.36977):0.81741):0.26223):0.20986,(16131600:0.17847, 16080734:0.27503):0.66717,21226882:1.19107); 
 How does BranchClust parse this tree, see Examples below.  | 
      
  
| 
     3. File with gi numbers identifiers for 319 bacterial and archaeal species:  | 
  
| 
 gi_numbers.out (created from genomes downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria on April 6, 2006)  | 
      
  
| 
       
 
       
  | 
      
      
       Sample superfamily for 4 bacteria 
 # perl branch_clust.pl tree.tre 3 results: clusters.out, families.list 
  | 
    
| 
       
 
       
  | 
      
       Superfamily of ATP synthases for 13 gamma proteobacteria 
 # perl branch_clust.pl tree.tre 8 results: clusters.out, families.list  | 
    
| 
       
 
     
  | 
      
       Superfamily of ATP synthases for 30 taxa (16 bacteria and 14 archaea) 
 # perl branch_clust.pl tree.tre 7 results: clusters.out, families.list  | 
    
| 
       
 
       
  | 
      
      
           Superfamily 
      of ATP synthases for 317 taxa (bacteria and archaea) 
 # perl branch_clust.pl tree.tre 150 results: clusters.out, families.list 
 
  | 
    
| 
       
 
       
  | 
      
       Superfamily of penicillin-binding proteins for 13 gamma proteobacteria 
 # perl branch_clust.pl tree.tre 5 results: clusters.out, families.list  | 
    
| How to do batch processing | 
| Example of a wrapper you can find here: branch_clust_wrapper.pl | 
| Coloring with TreeDyn | 
Go to TreeDyn web-site to learn how to create annotation and script files.
Example for Superfamily of ATP synthases for 30 taxa (16 bacteria and 14 archaea)
| 
 Annotation file: labelfile.tlf Script file: script.tds Resulting tree: tree.png 
  | 
    
| 
     Links  | 
    
  
Gogarten Lab Home Page: http://gogarten.uconn.edu/
Email to: Maria.Poptsova@uconn.edu
Page last updated: January 19, 2008