gp_mkmtx

GP

2000

NAME

gp_mkmtx - calculate frequencies of nucleotides

SYNOPSIS

gp_mkmtx [-a] [-g value] [-l] [-q] [-v] [-d] [-h] [inputfile] [outputfile]

OPTIONS

-a
print only the absolute numbers of occurencies

-g value
divide each frequency by the expected frequency at GC contents equal to value %.

-l
do not apply logarythmic scaling (as a default, gp_mkmtx calculates the logarythm of the frequencies.

-v
Prints the version information.

-d
Prints lots of debugging information.

-h
Shows usage information.

inputfile
file to proces; if not given, will use standard input

outputfile
file to write the data to; if not given, will use standard output

DESCRIPTION

gp_mkmtx is supposed to be a tool for an easy creation of matrices for the gp_matrix program. It takes a set of sequences, calculates the frequency of a nucleotide at each position starting from the first nucleotide and ending with the last nucleotide of the shortest sequence. For each position, four values are printed in a row, respectively for A, C, G and T/U. Each value is the logarithm of the calculated frequency (logarythmisation can be suppresed with the -l option). If the -g option is used, prior to the logarithmic scaling the values are diveded by the expected frequency at the given GC contents (that is, for example, at GC=50%, 0.25 for each nucleotide).

EXAMPLES

gp_mkmtx -g 50 somesequence.fasta somesequence.mtx

will produce a matrix file somesequence.mtx which, after some editing, will be directly suitable for the gp_matrix program.

SEE ALSO

Genpak(1) gp_acc(1) gp_cusage(1) gp_digest(1) gp_dimer(1) gp_findorf(1) gp_gc(1) gp_getseq(1) gp_map(1) gp_matrix(1) gp_pattern(1) gp_primer(1) gp_qs(1) gp_randseq(1) gp_seq2prot(1) gp_slen(1) gp_tm(1) gp_trimer(1)

DIAGNOSTICS

All Genpak programs complain in situations you would also complain, like when they cannot find a sequence you gave them or the sequence is not valid.

The Genpak programs do not write over existing files. I have found this feature very useful :-)

BUGS

I'm sure there are plenty left, so please mail me if you find them. I tried to clean up every bug I could find.

AUTHOR

January Weiner III <january@bioinformatics.org>