Using Disperse in pipeline mode

Required files

Apart from the data and configuration files that are part of the Disperse installation, you will need the following files to execute Disperse successfully:

  • Input file with gene names.

    Text file with one gene name on each line.

  • Settings file

    A settings file with adapted for your particular application. For details, see the settings file specification.

  • Reaction file

    A text or xml file specifying the restriction reactions you want Disperse to choose from when finding the best reaction combination. See here for details.

  • Vector sequence file

    A fasta-formatted text file containing a sequence to include in all selector probes.

    The locations of the input and settings files are specified on the command line, while the reaction and vector sequence files are specified within the settings file.

Usage

Running Disperse in the pipeline mode involves executing a single perl script:

perl disperse.pl

Using the following arguments:

-i input (Required)

where <input> is the path to an input file containing gene names, one name per line.

-s settings (Required)

where <settings> is the path to a settings file, where details of the design are specified. See the settings file specification for details.

-c config (Optional)

where <config> is the path to a configuration file, specifying locations of data and services. See the config file specification for details. If not specified, the default config file will be used (config/default-config.properties)

-nosnp (Optional)

This turns off the processing stages using the optional SNP data. This option must be used if the snp data file is not installed.

-start # (Optional)

-stop # (Optional)

Start and/or stop the pipeline at the specified stages. If not used the full pipeline will execute.

Examples

perl disperse.pl -i myinput -s mysettings

will execute the full pipeline using the default config file and SNP data processing.

perl disperse.pl -i myinput -s mysettings -c myconfig -start 2 -stop 6 -nosnp

will execute the pipeline from stage 2 (ROI definition) to stage 6 (PieceMaker), using a custom config file, and omitting the SNP data processing steps. This requires that the relevant files created in stage 1 are present, either from a previous execution of the program, or perhaps manually created.

Results

A full execution of Disperse will generate a number of files in the same directory as the input file. These files are listed below. With the exception of the PiM_report file, all files are tab-delimited text files. Lines starting with # are comment lines. All sequence coordinates are 1-based.

  • myinput.cds

    Contains coordinates for all coding sequence regions found for the specified genes. Coding regions that overlap have been merged. Columns are ID, IDs of source entries (comma-delimited), chromosome accession, start coordinate, end coordinate, and strand.

  • myinput.roi

    Contains all regions of interest defined for the design. Columns are ROI ID, IDs of CDSs within this ROI (comma-delimited), IDs of source entries (comma-delimited), chromosome accession, start coordinate, end coordinate, and strand.

  • myinput.refseq

    Contains reference sequence extracted for each ROI plus the specified number of flanking positions. Sequences are always on the plus strand. Columns are ID, chromosome accession, start coordinate, end coordinate, and sequence.

  • myinput.target

    This is a target file for PieceMaker. It contains one line per target sequence, where each target is one ROI plus its flanking sequence. Columns specify ROI ID, chromosome accession, start of target sequence on chromosome, ROI start on target, ROI end on target, and the target sequence.

  • myinput.snp

    This file contains all known variations found (from the master variation file) within the set of target sequences. Each line specifies one variation with ID, chromosome accession, last upstream unaffected position, first downstream unaffacted position, and variants known at the position. Although only single nucleotide variants are handled by PieceMaker, this file contains other types of variation as well.

  • myinput.snp_target

    This is a target file for PieceMaker. It is identical to the .target file, except that nucleotide degeneracy codes (S,W,R,Y,K,M etc.) have been inserted at positions affected by single nucleotide varaints from the .snp file.

  • myinput.PiM_report

    This file contains a report of the results of the PieceMaker run.

  • myinput.fragments

    This file is the output of PieceMaker, containing all accepted restriction fragments generated by the selected combination of restriction reactions. Each line specifies a fragment by ID, chromosome accession, start position, end position, position for flap cleavage, and sequence. The fragment ID indicates ROI, restriction reaction, polarity (strand), and a number to separate fragment from this target. Note that long fragments may actually contain sequence from more than one ROI. Only one ROI ID will be given, though.

  • myinput.selection

    This file is formatted as the fragment file, but includes only the subset of fragments selected for use during the fragment subset selection stage.

  • myinput.probe

    This file is the output of ProbeMaker. It contains the selector probe sequences designed for the selected fragments. Columns specify probe ID and sequence.

  • myinput.amplicon

    This file specifies the coordinates of each designed amplicon. The amplicon corresponds to the part of a fragment remaining after flap cleavage. Columns specify fragment ID, chromosome accession, start position, end position, and strand (+/-).

  • OUT.myinput.cds.txt

    This is similar to the .cds file but with chromosome name inserted as the fourth column.

  • OUT.myinput.snp.txt

    This is similar to the .snp file but includes chromosome name inserted as the third column.

  • OUT.myinput.snp_target.txt

    This file specifies more information for each target sequence than the PieceMaker target file. Columns are ROI ID, gene name, ROI number, list of CDS IDs, list of CCDS source entries, chromosome accession, chromosome name, start of ROI on chromosome, end of ROI on chromosome, strand, start of target on chromosome. start of ROI on target, end of ROI on target, and sequence.

  • OUT.myinput.probe.txt

    This file specifies more complete information for each selected fragment and probe than the ProbeMaker target file. Columns are fragment ID, gene name, ROI number, list of CDS IDs, list of CCDS source entries, chromosome accession, chromosome name, strand of fragment, start of fragment on chromosome, end of fragment on chromosome, flap position, start of amplicon on fragment, end of amplicon on fragment, and probe sequence.