===== Logit Model Binary Outcome =====
Under this model, case control status are simulated based on logit model of odds ratio and disease prevalence. Power and sample size calculations can be performed using a number of theoretical and empirical methods.
==== Command Interface ====
spower LOGIT -h
\\
usage: spower LOGIT [-h] [-a GAMMA] [-b GAMMA] [-A GAMMA] [-B GAMMA]
[-c GAMMA] [-d GAMMA] [-f P] [--def_rare P]
[--def_neutral VALUE VALUE] [--def_protective VALUE VALUE]
[-P P] [-Q P] [--sample_size N] [--p1 P]
[--def_valid_locus VALUE VALUE] [--rare_only]
[--missing_as_wt] [--missing_low_maf P]
[--missing_sites P] [--missing_sites_deleterious P]
[--missing_sites_protective P] [--missing_sites_neutral P]
[--missing_sites_synonymous P] [--missing_calls P]
[--missing_calls_deleterious P]
[--missing_calls_protective P] [--missing_calls_neutral P]
[--missing_calls_synonymous P] [--error_calls P]
[--error_calls_deleterious P] [--error_calls_protective P]
[--error_calls_neutral P] [--error_calls_synonymous P]
[--power P] [-r N] [--alpha ALPHA] [--moi {A,D,R,M}]
[--resampling] [-l N] [-o file] [-t NAME] [-v {0,1,2,3}]
[-s N] [-j N] [-m METHODS [METHODS ...]]
[--discard_samples [EXPR [EXPR ...]]]
[--discard_variants [EXPR [EXPR ...]]]
DATA
positional arguments:
DATA name of input data or prefix of input data bundle (see
the documentation for details)
optional arguments:
-h, --help show this help message and exit
model parameters:
-a GAMMA, --OR_rare_detrimental GAMMA
odds ratio for detrimental rare variants (default set
to 1.0)
-b GAMMA, --OR_rare_protective GAMMA
odds ratio for protective rare variants (default set
to 1.0)
-A GAMMA, --ORmax_rare_detrimental GAMMA
maximum odds ratio for detrimental rare variants,
applicable to variable effects model (default set to
None)
-B GAMMA, --ORmin_rare_protective GAMMA
minimum odds ratio for protective rare variants,
applicable to variable effects model (default set to
None)
-c GAMMA, --OR_common_detrimental GAMMA
odds ratio for detrimental common variants (default
set to 1.0)
-d GAMMA, --OR_common_protective GAMMA
odds ratio for protective common variants (default set
to 1.0)
-f P, --baseline_effect P
penetrance of wildtype genotypes (default set to 0.01)
--moi {A,D,R,M} mode of inheritance: "A", additive (default); "D",
dominant; "R", recessive; "M", multiplicative (does
not apply to quantitative traits model)
--resampling directly draw sample genotypes from given haplotype
pools (sample genotypes will be simulated on the fly
if haplotype pools are not avaliable)
variants functionality:
--def_rare P definition of rare variants: variant having "MAF <=
frequency" will be considered a "rare" variant; the
opposite set is considered "common" (default set to
0.01)
--def_neutral VALUE VALUE
annotation value cut-offs that defines a variant to be
"neutral" (e.g. synonymous, non-coding etc. that will
not contribute to any phenotype); any variant with
"function_score" X falling in this range will be
considered neutral (default set to None)
--def_protective VALUE VALUE
annotation value cut-offs that defines a variant to be
"protective" (i.e., decrease disease risk or decrease
quantitative traits value); any variant with
"function_score" X falling in this range will be
considered protective (default set to None)
-P P, --proportion_detrimental P
proportion of deleterious variants associated with the
trait of interest, i.e., the random set of the rest (1
- p) x 100% deleterious variants are non-causal: they
do not contribute to the phenotype in simulations yet
will present as noise in analysis (default set to
None)
-Q P, --proportion_protective P
proportion of protective variants associated with the
trait of interest, i.e., the random set of the rest (1
- p) x 100% protective variants are non-causal: they
do not contribute to the phenotype in simulations yet
will present as noise in analysis (default set to
None)
sample population:
--sample_size N total sample size
--p1 P proportion of affected individuals (default set to
0.5), or individuals with high extreme QT values
sampled from infinite population (default set to None,
meaning to sample from finite population speficied by
--sample_size option).
quality control:
--def_valid_locus VALUE VALUE
upper and lower bounds of variant counts that defines
if a locus is "valid", i.e., locus having number of
variants falling out of this range will be ignored
from power calculation (default set to None)
--rare_only remove from analysis common variant sites in the
population, i.e., those in the haplotype pool having
MAF > $def_rare
--missing_as_wt label missing genotype calls as wildtype genotypes
sequencing / genotyping artifact:
--missing_low_maf P variant sites having population MAF < P are set to
missing
--missing_sites P proportion of missing variant sites
--missing_sites_deleterious P
proportion of missing deleterious sites
--missing_sites_protective P
proportion of missing protective sites
--missing_sites_neutral P
proportion of missing neutral sites
--missing_sites_synonymous P
proportion of missing synonymous sites
--missing_calls P proportion of missing genotype calls
--missing_calls_deleterious P
proportion of missing genotype calls at deleterious
sites
--missing_calls_protective P
proportion of missing genotype calls at protective
sites
--missing_calls_neutral P
proportion of missing genotype calls at neutral sites
--missing_calls_synonymous P
proportion of missing genotype calls at synonymous
sites
--error_calls P proportion of error genotype calls
--error_calls_deleterious P
proportion of error genotype calls at deleterious
sites
--error_calls_protective P
proportion of error genotype calls at protective sites
--error_calls_neutral P
proportion of error genotype calls at neutral sites
--error_calls_synonymous P
proportion of error genotype calls at synonymous sites
power calculation:
--power P power for which total sample size is calculated (this
option is mutually exclusive with option '--
sample_size')
-r N, --replicates N number of replicates for power evaluation (default set
to 1)
--alpha ALPHA significance level at which power will be evaluated
(default set to 0.05)
input/output specifications:
-l N, --limit N if specified, will limit calculations to the first N
groups in data (default set to None)
-o file, --output file
output filename
runtime options:
-t NAME, --title NAME
unique identifier of a single command run (default to
output filename prefix)
-v {0,1,2,3}, --verbosity {0,1,2,3}
verbosity level: 0 for absolutely quiet, 1 for less
verbose, 2 for verbose, 3 for more debug information
(default set to 2)
-s N, --seed N seed for random number generator, 0 for random seed
(default set to 0)
-j N, --jobs N number of CPUs to use when multiple replicates are
required via "-r" option (default set to 2)
association tests:
-m METHODS [METHODS ...], --methods METHODS [METHODS ...]
Method of one or more association tests. Parameters
for each method should be specified together as a
quoted long argument (e.g. --methods "m --alternative
2" "m1 --permute 1000"), although the common method
parameters can be specified separately, as long as
they do not conflict with command arguments. (e.g.
--methods m1 m2 -p 1000 is equivalent to --methods "m1
-p 1000" "m2 -p 1000".). You can use command 'spower
show tests' for a list of association tests, and
'spower show test TST' for details about a test.
samples and genotypes filtering:
--discard_samples [EXPR [EXPR ...]]
Discard samples that match specified conditions within
each test group. Currently only expressions in the
form of "%(NA)>p" is provided to remove samples that
have more 100*p percent of missing values.
--discard_variants [EXPR [EXPR ...]]
Discard variant sites based on specified conditions
within each test group. Currently only expressions in
the form of '%(NA)>p' is provided to remove variant
sites that have more than 100*p percent of missing
genotypes. Note that this filter will be applied after
"--discard_samples" is applied, if the latter also is
specified.
\\
==== Command Options ====
Model specific options are documented in details below. You should find the rest of the options [[http://bioinformatics.org/spower/options|otherwise documented]].
=== -a/b/c/d/A/B ===
These options specify the odds ratios of variants in the region of interest. ''-a'' is odds ratio for detrimental rare variants. When used by itself, all detrimental rare variants will be assigned a fixed odds ratio as specified. With the ''-A'' option, they together model "variable effects" with ''-a'' being the minimum odds ratio and ''-A'' being the maximum odds ratio for detrimental rare variants. In variable effects model the maximum odds ratio will be assigned to the variant having smallest MAF, and the minimum odds ratio to the one having largest MAF. Other odds ratios in between are interpolated based on the max. and min. values. Similarly ''-b'' and ''-B'' are for fixed and variable effects of protective variants. For a protective variant the smaller the OR the larger the protective effect. As a result ''-B'' is the minimum OR for protective variants instead. ''-c'' and ''-d'' are odds ratio for common detrimental and protective variants respectively. No variable effects model for common variants is available for this model.
Odds ratios of 1.2 to 3.0 are reasonable choices of detrimental rare variants for complex traits. Protective variants may take an odds ratio if \(1/\gamma\) where \(\gamma\) is effect of detrimental variants. However due to the choice of baseline penetrance the protective effect thus generated is not symmetric to detrimental effect.
=== -f ===
Baseline penetrance. For rare variant analysis this can be approximated by disease prevalence.
Users should determine this parameter to reflect the particular phenotype of interest for which the power analysis is performed. The default value is 1% which is a valid assumption for common disease in general, but for certain disease e.g. type II diabetes the prevalence can be much higher.
=== --moi ===
Mode of inheritance. Most aggregated analysis method implicitly assumes an "additive" effect of rare variants, thus simulating and analyzing data under additive model represents a best case scenario for many methods, which is also the default value for both the simulation and the analysis. This option only controls on how the data is simulated. In some ''%%--%%method'' option there are sub-options to specify the mode of inheritance under which the data will be analyzed.
==== Simulation via Repeated Sampling ====
With the ''%%--%%resampling'' option, a different implementation of phenotypic simulation is activated. Without this option, input parameters such as odds ratio, prevalence, are used to calculate and update group specific MAF for cases and controls. Analytic power calculation can be performed using the MAFs thus obtained, or for empirical power analysis case control genotype data are simulated from the group specific MAF. With the ''%%--%%resampling'' option, a sample genotype will be generated first, either from the population MAF or drawn directly from given haplotype pools, depending on the input data being in SEQPower SFS format or SEQPower GDAT format. Then a penetrance value \(f\) for the genotype will be calculated from input parameters, and the sample will be labeled as a //case// with probability \(f\). This is a more computationally cumbersome implementation, since for example with \(f=1\%\) it will require generation of about 100 samples before a //case// can be collected. Also by implementation analytic power analysis cannot be performed with this switch on. The advantage of this model is the potential to sample directly from external haplotype pools given in GDAT file rather than to recover such pools from summary statistics under HWE and no LD assumption, which might be biased.
==== Examples ====
* [[http://bioinformatics.org/spower/analytic-tutorial|Analytic power analysis]] for case control studies
* [[http://bioinformatics.org/spower/empirical-tutorial|Empirical power analysis]] for case control studies