gp_mkmtx

GP

2000

NAME

gp_mkmtx - calculate frequencies of nucleotides

SYNOPSIS

gp_mkmtx [-a] [-g value] [-l] [-q] [-v] [-d] [-h] [inputfile] [outputfile]

OPTIONS

-a

print only the absolute numbers of occurencies

-g value

divide each frequency by the expected frequency at GC contents equal to value %.

-l

do not apply logarythmic scaling (as a default, gp_mkmtx calculates the logarythm of the frequencies.

-v

Prints the version information.

-d

Prints lots of debugging information.

-h

Shows usage information.

inputfile

file to proces; if not given, will use standard input

outputfile

file to write the data to; if not given, will use standard output

DESCRIPTION

gp_mkmtx is supposed to be a tool for an easy creation of matrices for the gp_matrix program. It takes a set of sequences, calculates the frequency of a nucleotide at each position starting from the first nucleotide and ending with the last nucleotide of the shortest sequence. For each position, four values are printed in a row, respectively for A, C, G and T/U. Each value is the logarithm of the calculated frequency (logarythmisation can be suppresed with the -l option). If the -g option is used, prior to the logarithmic scaling the values are diveded by the expected frequency at the given GC contents (that is, for example, at GC=50%, 0.25 for each nucleotide).

EXAMPLES

gp_mkmtx -g 50 somesequence.fasta somesequence.mtx

will produce a matrix file somesequence.mtx which, after some editing, will be directly suitable for the gp_matrix program.

DIAGNOSTICS

All Genpak programs complain in situations you would also complain, like when they cannot find a sequence you gave them or the sequence is not valid.

The Genpak programs do not write over existing files. I have found this feature very useful :-)

BUGS

I'm sure there are plenty left, so please mail me if you find them. I tried to clean up every bug I could find.

AUTHOR

January Weiner III <january@bioinformatics.org>