This page documents command options shared by all simulation settings and association methods with explanations. For a complete table of program options please check here. For model or association test specific option please refer to their respective documentation pages.
Please refer to documentation page of ''spower simulate''
Definition of rare variants. The input is a MAF cutoff. Variant having MAF smaller than this cutoff is defined as a “rare” variant. Other variants will be defined “common”. In SEQPower common variants and rare variants are modeled differently for effect size and direction of effect.
Note
Definition of neutral variants. This option takes in two values: the lower and upper bound for some “function score”. The variants will then be annotated such that variants having annotation score greater than the lower bound yet smaller than the upper bound will be defined “neutral”. Neutral variants will not be taken into consideration when phenotypes are simulated. Neutral variants can also be removed from actual power analysis, as people would normally do in real world applications (annotate rare variants and only focus on functional ones).
No default value is set for this option, meaning that by default all variants are considered functional.
Note
spower simulate
we can define neutral variants as those having purifying selection coefficient −10−4<S<10−4.
Definition of protective variants. Same as --def_neutral
this option takes in two values that define a “protective” variant. Such variants will be assigned a “protective effect” (odds ratio less than 1, negative PAR or QT mean shift) in phenotype simulations. For simulated data from spower simulate
we define protective variants as those having purifying selection coefficient S←10−4.
No default value is set for this option.
This is definition for “non-causal” variants among all deleterious variants. Deleterious variants are functional, but we assume only part of them will contribute to the disease phenotype (detrimental). With this option on, we can define p×100% out of all deleterious variants to be directly effecting the phenotype. The rest of variants will not be taken into consideration in simulation of phenotype, but will be included in association analysis since in practice there is no knowledge on these variants that could justify the removal of them. Such variants are noise in data and will result in decreased power.
No default value is set for this option.
Note
This is definition for “non-causal” variants among all protective variants. No default value is set for this option.
Options below removes variant sites on the basis of locus attributes.
This option takes in two integer values of variant counts in a locus, one is the upper bound and one the lower bound. If a locus has variant counts smaller than the lower bound or larger than the upper bound, the locus will be discarded in power analysis. This is analogous to the real world analysis when we discard genes having too few variant sites for rare variant analysis.
No default value is set for this option.
Note
This option removes from analysis all common variant sites defined by --def_rare
. This accesses the performance “oracle” tests when the chosen MAF cutoff for analysis is exactly the “underlying” MAF cutoff for rare variant related disease etiology. For such situations tests with a fixed threshold usually out-performs tests with variable thresholds such as VT and RareCover. It is also possible to input user specified MAF range based on observed data to determine which variants will be excluded from analysis due to having too small or large MAF (see --method
option).
Fill missing genotype calls as wildtype genotypes. This option might be required for some methods to work, but can create bias in many scenarios. In SEQPower the association methods all have build-in mechanisms to handle missing data, thus this option is generally not necessary unless one wants to specifically evaluate the bias created by such behavior.
Options below introduces missing variant calls on the basis of specified simulated data properties.
Variants having underlying population MAF from SFS data smaller than specified value are set to missing. This option is particularly designed for exome chip power analysis. No default value is set for this option.
Note
This option defines a proportion of randomly missing variant sites, creating a situation of exclusion of causal variants. In practice it is possible that causal variants are missing due to the sequencing / genotyping procedures, or quality control measures applied before association analysis. No default value is set for this option.
This option defines a proportion of randomly missing genotype calls at each variant site. Missing calls do not eliminate the entire variant sites but the missing data it creates will either be treated as wildtype or be imputed via mean dosage, resulting in decrease power. No default value is set for this option.
This option defines a random proportion of genotyping errors among all genotype calls. An error is created by replacing a wildtype genotype with a mutant heterozygous genotype, or replacing a non-wildtype genotype into wildtype. No default value is set for this option.
Specify required sample size or power. These options are mutually exclusive and for empirical power analysis --sample_size
must be provided to calculate the power estimate.
Note
Number of replicates required for empirical power calculation. Default value is 1.
Note
Significance level at which power will be evaluated. Default value is 0.05. For an exome-wide association scan of 20,000 genes, the significance level after Bonferroni correction is 2.5E-6.
Method of association tests for empirical power calculation or saving simulated data. Please use spower show tests
and spower show test TEST
to read details on methods.
These are gene group specific quality control filters applied on the fly as the analysis are being carried out for each replicate. Samples or sites having too much missing data (defined by these options) will be removed from analysis.
A character specifying delimiter of input data, default to white space
Specify output csv file name. Default output file name uses the same prefix as the input data file name.
verbosity levels
Default level is set to 2.
An integer setting the seed for random number generator. Use 0 for random seed.
Number of CPUs to use for parallel analysis when multiple replicates are required via -r/--replicates
option. Default is set to 2.
This is the tag for the name of association test, useful when multiple association methods are evaluated or the same association method with different parameter settings. In such situations each test can be assigned a unique name which will be appended to the result of analysis, and can be viewed via spower show *.csv NAME
.
The lower and upper limit of MAF to be analyzed. For most tests the default values are set to 0 and 0.01, which is typically used for analyzing rare variants data. For common variant analysis the lower and upper limits needs to be adjusted.
Note
--def_rare
parameter value different than 0.01 is set.
Number of permutations.
Adaptive permutation using Edwin Wilson 95 percent confidence interval for binomial distribution. The program will compute a p-value every 1000 permutations and compare the lower bound of the 95 percent CI of p-value against C, and quit permutations with the p-value if it is larger than C. It is recommended to specify a C that is slightly larger than the significance level for the study. To disable the adaptive procedure, set C=1. Default is C=0.1
Note
Besides the common options listed above, each association test may have specific options. Use spower show test TST
(replace TST
with the name of test you want to read) to view these options.