Corner LogoThe BioPHP page.  Or, one of them, anyway.
Home Page

Stuff Under Here:
-Short Term-

Further Goals

Wild Speculation

Current Design Plans and Philosophy

Note:plans subject to change, obviously. Watch this space.

BioPHP is not intended to be a "re-implementation" of BioPerl, BioPython, BioJava, etc., though there will likely be a fair amount of "overlap".

Instead, BioPHP will focus primarily on getting maximum usefulness for bioinformatics purposes out of PHP's primary strengths:

  • Extremely simple to incorporate into web-server backends/web browser frontends, yet still powerful on the command-line as well.
  • Powerful network communications capabilities (http/ftp/email/raw sockets/etc.)
  • Strong string-handling capabilities
  • Database interfaces
  • On-the-fly graphics generation via GD
  • Easy to learn and use

BioPHP will be made up of a collection of mostly-independent but interrelated classes. The actual names of the classes may very well change, but currently planned are:

  • A "nucleotide_sequence" class - containing a single sequence and having methods for e.g. generating a complement sequence, conversion to a protein sequence, G+C count, removal of "gaps", output in various formats (Genbank, ASN.1, FASTA, etc.) and so on.
  • A "protein_sequence" class - analogous to the nucleotide sequence class
  • "sequence_list" classes - containing an array of nucleotide or protein sequences and methods for output of the sequences in various file formats (Clustal, PHYLIP, etc.), comparison of pre-aligned sequence lists, shelling out to readily-available programs for computationally challenging manipulation (e.g. ClustalW for multiple sequence alignment) and importing the results, etc.
  • "(format)_parser" classes - capable of taking input from files, ftp, websites, etc. in various formats e.g. Genbank_parser, ASN_1_parser, etc., and generating either plain text or instances of nucleotide_ or protein_sequence objects with the appropriate information.
  • A "phylogenetic_tree" class - for parsing "Newick" tree files, rendering them in various forms (ASCII text, HTML tables, rooted and unrooted tree graphics via GD) and simple manipulations and analyses (e.g. adjusting the label names, selecting an outgroup, re-exporting in Newick format and possibly and XML-based format, etc.).
  • Query interface classes - for NCBI's online databases, local BLAST, etc.
All classes should be designed to be incorporated into EITHER command-line scripts (suitable for running as, e.g., regular cron jobs) OR into web-based interfaces. BioPHP will almost certainly be distributed under the GPL.

Note that this is an initial, cursory overview, at this point not critiqued by anyone but myself, so it is quite possible, even likely, that important features I've forgotten to mention will be added to the list, the planned classes restructured somewhat, and so on at some future time.

As always - you are encouraged to send comments, questions, and suggestions via the email form found at the link at the bottom of each page.