===== Population Attributable Risk Model ===== Under this model, case control status are simulated based on difference in population attributable risk between case control groups. Power and sample size calculations can be performed using a number of theoretical and empirical methods. ==== Command Interface ==== spower PAR -h \\ usage: spower PAR [-h] [-a GAMMA] [-b GAMMA] [-c GAMMA] [-d GAMMA] [--PAR_variable] [-f P] [--def_rare P] [--def_neutral VALUE VALUE] [--def_protective VALUE VALUE] [-P P] [-Q P] [--sample_size N] [--p1 P] [--def_valid_locus VALUE VALUE] [--rare_only] [--missing_as_wt] [--missing_low_maf P] [--missing_sites P] [--missing_sites_deleterious P] [--missing_sites_protective P] [--missing_sites_neutral P] [--missing_sites_synonymous P] [--missing_calls P] [--missing_calls_deleterious P] [--missing_calls_protective P] [--missing_calls_neutral P] [--missing_calls_synonymous P] [--error_calls P] [--error_calls_deleterious P] [--error_calls_protective P] [--error_calls_neutral P] [--error_calls_synonymous P] [--power P] [-r N] [--alpha ALPHA] [--moi {A,D,R,M}] [--resampling] [-l N] [-o file] [-t NAME] [-v {0,1,2,3}] [-s N] [-j N] [-m METHODS [METHODS ...]] [--discard_samples [EXPR [EXPR ...]]] [--discard_variants [EXPR [EXPR ...]]] DATA positional arguments: DATA name of input data or prefix of input data bundle (see the documentation for details) optional arguments: -h, --help show this help message and exit model parameters: -a GAMMA, --PAR_rare_detrimental GAMMA Population attributable risk for detrimental rare variants (default set to 0.0) -b GAMMA, --PAR_rare_protective GAMMA Population attributable risk for protective rare variants (default set to 0.0) -c GAMMA, --PAR_common_detrimental GAMMA Population attributable risk for detrimental common variants (default set to 0.0) -d GAMMA, --PAR_common_protective GAMMA Population attributable risk for protective common variants (default set to 0.0) --PAR_variable use variable population attributable risks: the smaller the MAF the larger the PAR -f P, --baseline_effect P penetrance of wildtype genotypes (default set to 0.01) --moi {A,D,R,M} mode of inheritance: "A", additive (default); "D", dominant; "R", recessive; "M", multiplicative (does not apply to quantitative traits model) --resampling directly draw sample genotypes from given haplotype pools (sample genotypes will be simulated on the fly if haplotype pools are not avaliable) variants functionality: --def_rare P definition of rare variants: variant having "MAF <= frequency" will be considered a "rare" variant; the opposite set is considered "common" (default set to 0.01) --def_neutral VALUE VALUE annotation value cut-offs that defines a variant to be "neutral" (e.g. synonymous, non-coding etc. that will not contribute to any phenotype); any variant with "function_score" X falling in this range will be considered neutral (default set to None) --def_protective VALUE VALUE annotation value cut-offs that defines a variant to be "protective" (i.e., decrease disease risk or decrease quantitative traits value); any variant with "function_score" X falling in this range will be considered protective (default set to None) -P P, --proportion_detrimental P proportion of deleterious variants associated with the trait of interest, i.e., the random set of the rest (1 - p) x 100% deleterious variants are non-causal: they do not contribute to the phenotype in simulations yet will present as noise in analysis (default set to None) -Q P, --proportion_protective P proportion of protective variants associated with the trait of interest, i.e., the random set of the rest (1 - p) x 100% protective variants are non-causal: they do not contribute to the phenotype in simulations yet will present as noise in analysis (default set to None) sample population: --sample_size N total sample size --p1 P proportion of affected individuals (default set to 0.5), or individuals with high extreme QT values sampled from infinite population (default set to None, meaning to sample from finite population speficied by --sample_size option). quality control: --def_valid_locus VALUE VALUE upper and lower bounds of variant counts that defines if a locus is "valid", i.e., locus having number of variants falling out of this range will be ignored from power calculation (default set to None) --rare_only remove from analysis common variant sites in the population, i.e., those in the haplotype pool having MAF > $def_rare --missing_as_wt label missing genotype calls as wildtype genotypes sequencing / genotyping artifact: --missing_low_maf P variant sites having population MAF < P are set to missing --missing_sites P proportion of missing variant sites --missing_sites_deleterious P proportion of missing deleterious sites --missing_sites_protective P proportion of missing protective sites --missing_sites_neutral P proportion of missing neutral sites --missing_sites_synonymous P proportion of missing synonymous sites --missing_calls P proportion of missing genotype calls --missing_calls_deleterious P proportion of missing genotype calls at deleterious sites --missing_calls_protective P proportion of missing genotype calls at protective sites --missing_calls_neutral P proportion of missing genotype calls at neutral sites --missing_calls_synonymous P proportion of missing genotype calls at synonymous sites --error_calls P proportion of error genotype calls --error_calls_deleterious P proportion of error genotype calls at deleterious sites --error_calls_protective P proportion of error genotype calls at protective sites --error_calls_neutral P proportion of error genotype calls at neutral sites --error_calls_synonymous P proportion of error genotype calls at synonymous sites power calculation: --power P power for which total sample size is calculated (this option is mutually exclusive with option '-- sample_size') -r N, --replicates N number of replicates for power evaluation (default set to 1) --alpha ALPHA significance level at which power will be evaluated (default set to 0.05) input/output specifications: -l N, --limit N if specified, will limit calculations to the first N groups in data (default set to None) -o file, --output file output filename runtime options: -t NAME, --title NAME unique identifier of a single command run (default to output filename prefix) -v {0,1,2,3}, --verbosity {0,1,2,3} verbosity level: 0 for absolutely quiet, 1 for less verbose, 2 for verbose, 3 for more debug information (default set to 2) -s N, --seed N seed for random number generator, 0 for random seed (default set to 0) -j N, --jobs N number of CPUs to use when multiple replicates are required via "-r" option (default set to 2) association tests: -m METHODS [METHODS ...], --methods METHODS [METHODS ...] Method of one or more association tests. Parameters for each method should be specified together as a quoted long argument (e.g. --methods "m --alternative 2" "m1 --permute 1000"), although the common method parameters can be specified separately, as long as they do not conflict with command arguments. (e.g. --methods m1 m2 -p 1000 is equivalent to --methods "m1 -p 1000" "m2 -p 1000".). You can use command 'spower show tests' for a list of association tests, and 'spower show test TST' for details about a test. samples and genotypes filtering: --discard_samples [EXPR [EXPR ...]] Discard samples that match specified conditions within each test group. Currently only expressions in the form of "%(NA)>p" is provided to remove samples that have more 100*p percent of missing values. --discard_variants [EXPR [EXPR ...]] Discard variant sites based on specified conditions within each test group. Currently only expressions in the form of '%(NA)>p' is provided to remove variant sites that have more than 100*p percent of missing genotypes. Note that this filter will be applied after "--discard_samples" is applied, if the latter also is specified. \\ ==== Details ==== Model specific options are documented in details below. You should find the rest of the options [[http://bioinformatics.org/spower/options|otherwise documented]]. === -a/b/c/d === These options specify the **total PAR** for variants in the region of interest. ''-a'' is PAR for detrimental rare variants. When used without ''%%--%%PAR_variable'' option (see below) all detrimental rare variants will be assigned a fixed PAR equals \(\frac{\gamma}{M}\) where \(\gamma\) is the specified total PAR and \(M\) is total number of detrimental rare variants. Similarly ''-b'' specifies PAR for protective variants. Unlike for detrimental variant, protective variants will decrease PAR for samples having a mutation. ''-c'' and ''-d'' specifies PAR for common variants. Reasonable assumption of PAR for rare variants associated with complex diseases is 0.5% to 5%. === --PAR_variable === With this switch on, the total PAR will be distributed to each variant based on the inverse of their MAF, i.e., the rarer the variant the higher the PAR value. === --moi === Mode of inheritance. Most aggregated analysis method implicitly assumes an "additive" effect of rare variants, thus simulating and analyzing data under additive model represents a best case scenario for many methods, which is also the default value for both the simulation and the analysis. This option only controls on how the data is simulated. In some ''%%--%%method'' option there are sub-options to specify the mode of inheritance under which the data will be analyzed. ==== Simulation via Repeated Sampling ==== Motivation of the ''%%--%%resampling'' option was previously discussed for the [[http://bioinformatics.org/spower/mlogit|LOGIT model]]. In PAR model it is implemented as follows: * A sample genotype be generated either from the population MAF or drawn directly from given haplotype pools * A penetrance value \(f\) for the genotype be calculated from input parameters * PAR -> Odds ratio -> penetrance * Sample be labeled as a //case// with probability \(f\); control otherwise ==== Examples ==== * [[http://bioinformatics.org/spower/analytic-tutorial|Analytic power analysis]] for case control studies * [[http://bioinformatics.org/spower/empirical-tutorial|Empirical power analysis]] for case control studies