JaMBW Chapter 3.1.1

Window Composition


Aim

Given a sequence of nucleic acids or amino acids, and a compositional pattern, this program computes the running average percentage composition of that pattern on a window of a chosen size. Application of this analysis allows the visualization of composition-specific patterns (e.g. A/T, C/G, etc.) and thus visualize hypothesis on the functionality of the considered macromolecule. For example, one can identify potentially active regions of chromatin by observing the richness in A/T, or one can observe potentially hydrophobic regions by visualizing the compositional richness in aminoacids Leu, Ile, Val, Met, Phe, Tyr and Trp.

Mode of operation

The program accepts the following parameters:

  1. Sequence
  2. Pattern
    Either paste or type the symbol(s) whose windowed composition is requested. Please note that the program performs a logical "or" on each single symbol. Example:
    
    ATGCCCTTCGGAAGGTTCGCTAGCGA  input sequence
    AT                          pattern
    **    **   **  **   **   *  matches
    12345678901234567890123456  base position
             1         2
    
    
    The above example shows that the compositional pattern AT is found at positions 1,2,7,8,12,13,16,17,21,22, although there is only a single occurrence of the dinucleotide AT.

  3. Window
    The "window" parameter indicates the size of the averaging window on which compute the percentage on the composition for the specified pattern. By following the above example, given a window of size 5, the following will be the values that will be used for visualization:


    base position
    34567 8910
    %
    402020 4040404040

    Therefore, the effect of a large window size is to smooth differences.

  4. Step
    It indicates how to proceed along the sequence for the computation. A step of 1 (used in the above examples) has the effect of computing for each position along the sequence, while a value greater than 1 introduces "jumps" across the sequence.

  5. Locking parameters
    It also offers a detailed way to control the parameters to share with other applets present on the same page, by the following buttons: Clicking on one of the above 6 buttons allows you to perform modifications on other applets present in the same page. This mode of operation is extremely useful since allows to see how a certain pattern is present, for instance, in different sequences, and then allow to move along one sequence and see how the other sequences compare. Another useful application of these "locks" is to have the same sequence in several different copies of the program in the same page, and then compare how the graphics differ for different parameters, as example assessing how acidic, hydrophilic and hydrophobic regions correlate across the same sequence. A typical application of this program would be as "viewer" spawned from network-based applications (as done by SRS5 or by BIOCCELERATOR Services).
    In the following table are reported some commonly used combinations of parameters, in order to achieve specific aims of useful biological relevance to visualize biological functions in sequences as based on compositional richness.

    aim
    pattern
    size of averaging window
    DNA, identify A/T reach regionsat5
    DNA, identify C/G reach regionsct5
    PROTEINS, identify basic regionskrh5
    PROTEINS, identify acidic regionsandcqegpsy 5
    PROTEINS, identify hydrophilic regionsqnedbzhkr 5
    PROTEINS, identify hydrophobic regionslivmfyw 5
    PROTEINS, identify aromatic regionsfyw5
    PROTEINS, identify neutral regionspagst 5
    PROTEINS, identify crosslink-forming regionsc 5

  6. Compute
    Once the parameters have been chosen, by pressing the button "COMPUTE" the patterns are searched and the running average is displayed on a scrollable window.

A Java-enabled browser would have in this place two windows similar to this picture:



How to understand its output

The output consists of the running average percentage composition of a defined pattern into the given sequence. It is presented to the user as a line chart which can be scrolled horizontally and vertically by using the provided scrollbars. The presence of peaks across the sequence indicates abundance of the specified pattern, while a line on the base level suggests its absence.

References

Doelz,R.(1990)BioCompanion, Biocomputing Essentials series, ISBN 3-905 434-00-8


Author:Luca I.G. TOLDO, Edition date: 28 February 1997