This shows you the differences between two versions of the page.
analytic-tutorial [2016/04/13 18:02] |
analytic-tutorial [2016/04/13 18:02] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ===== Analytic Power Analysis ===== | ||
+ | ==== Analytic Power and Sample Size Calculation for Case Control Studies ==== | ||
+ | === LOGIT model === | ||
+ | == Basic example == | ||
+ | To calculate power at given sample size assuming equal case control samples (1000 cases, 1000 controls), at an effect size of odds ratio equals 2 for rare variants, 1 for common variants, evaluated at \(\alpha=0.05\): | ||
+ | |||
+ | <code bash> | ||
+ | spower LOGIT KIT.gdat -a 2 --sample_size 2000 --alpha 0.05 -v2 -o K1AP | ||
+ | </code>\\ | ||
+ | To calculate sample size assuming equal case control samples given 80% power and using the same setup as above: | ||
+ | |||
+ | <code bash> | ||
+ | spower LOGIT KIT.gdat -a 2 --power 0.8 --alpha 0.05 -v2 -o K1AS | ||
+ | </code>\\ | ||
+ | To view results | ||
+ | |||
+ | <code bash> | ||
+ | spower show K1AP.csv power* | ||
+ | spower show K1AS.csv sample_size* | ||
+ | </code>\\ | ||
+ | == Adjust effect size == | ||
+ | A variable effect model below will assign to rare variants odds ratio \(\in(1,3)\) depending on the MAF of rare variants: | ||
+ | |||
+ | <code bash> | ||
+ | spower LOGIT KIT.gdat -a 1 -A 3 --sample_size 2000 --alpha 0.05 -v2 -o K1AP | ||
+ | </code>\\ | ||
+ | Adding effects for common variants, fixed to odds ratio 1.2: | ||
+ | |||
+ | <code bash> | ||
+ | spower LOGIT KIT.gdat -a 1 -A 3 -c 1.2 --sample_size 2000 --alpha 0.05 -v2 -o K1AP | ||
+ | spower LOGIT KIT.gdat -a 1 -A 3 -c 1.2 --power 0.8 --alpha 0.05 -v2 -o K1AS | ||
+ | </code>\\ | ||
+ | == Adjust variant properties and analysis filters == | ||
+ | Now based on the basic example, we change definition for rare variants to MAF > 5%: | ||
+ | |||
+ | <code bash> | ||
+ | spower LOGIT KIT.gdat -a 2 --def_rare 0.05 --sample_size 2000 --alpha 0.05 -v2 -o K1AP | ||
+ | </code>\\ | ||
+ | Power of the test boosts significantly, although it is not a reasonable setup to apply aggregated rare variant analysis to high frequency variants like in this example. There is usually adequate power to detect common variants association when analyzed individually. | ||
+ | |||
+ | == Set a random proportion of non-causal variants == | ||
+ | It is often the case that not all functional rare variants are directly causal to the phenotype. To add such non-causal "noise" to data and evaluate the impact on power / sample size, we can set a random set of 50% variants to be non-causal (''-P'' option) and be included in analysis. Since the assignment of non-causal variant is random, the final estimate should be based on the average of multiple replicates, for example 100 replicates: | ||
+ | |||
+ | <code bash> | ||
+ | spower LOGIT KIT.gdat -a 2 --def_valid_locus 3 1000 --sample_size 2000 --alpha 0.05 -P 0.5 -r 100 -v2 -o K1APP | ||
+ | spower LOGIT KIT.gdat -a 2 --def_valid_locus 3 1000 --power 0.8 --alpha 0.05 -P 0.5 -r 100 -v2 -o K1APS | ||
+ | spower show K1APP.csv power power_std | ||
+ | </code>\\ | ||
+ | Note that standard deviation for the 100 replicates is also calculated and can be displayed. | ||
+ | |||
+ | === PAR model === | ||
+ | == Basic example == | ||
+ | To calculate power at given sample size assuming equal case control samples (1000 cases, 1000 controls), at an effect size of PAR equals 5% for rare variants, 1% for common variants, evaluated at \(\alpha=0.05\): | ||
+ | |||
+ | <code bash> | ||
+ | spower PAR KIT.gdat -a 0.05 -c 0.01 --sample_size 2000 --alpha 0.05 -v2 -o K1AP | ||
+ | </code>\\ | ||
+ | To calculate sample size assuming equal case control samples given 80% power and using the same setup as above: | ||
+ | |||
+ | <code bash> | ||
+ | spower PAR KIT.gdat -a 0.05 -c 0.01 --power 0.8 --alpha 0.05 -v2 -o K1AS | ||
+ | </code>\\ | ||
+ | To view results | ||
+ | |||
+ | <code bash> | ||
+ | spower show K1AP.csv power* | ||
+ | spower show K1AS.csv sample_size* | ||
+ | </code>\\ | ||
+ | == Adjust effect size == | ||
+ | A variable effect model below will assign site specific PAR to deleterious rare variants depending on the MAF of rare variants: | ||
+ | |||
+ | <code bash> | ||
+ | spower PAR KIT.gdat -a 0.05 --PAR_variable --sample_size 2000 --alpha 0.05 -v2 -o K1AP | ||
+ | spower PAR KIT.gdat -a 0.05 --PAR_variable --power 0.8 --alpha 0.05 -v2 -o K1AS | ||
+ | </code>\\ | ||
+ | == Set a random proportion of non-causal variants == | ||
+ | The use of ''-P'' and ''-r'' options to model the effect of non-causal variants was previously introduced in logit model. The same idea applies to PAR model. See section above for details. | ||
+ | |||
+ | ==== Analytic Power and Sample Size Calculation for Quantitative Traits Analysis ==== | ||
+ | === Linear QT mean shift model === | ||
+ | == Basic example == | ||
+ | To calculate power at given sample size for randomly ascertained QT samples of 2000 unrelated individuals, at an effect size of [[http://bioinformatics.org/spower/simtraits#quantitative_traits|\(0.25\sigma\)]], evaluated at \(\alpha=0.05\): | ||
+ | |||
+ | <code bash> | ||
+ | spower LNR KIT.gdat -a 0.25 --sample_size 2000 --alpha 0.05 -v2 -o K1AP # power 0.22 | ||
+ | </code>\\ | ||
+ | To calculate sample size assuming equal case control samples given 80% power and using the same setup as above: | ||
+ | |||
+ | <code bash> | ||
+ | spower LNR KIT.gdat -a 0.25 --power 0.8 --alpha 0.05 -v2 -o K1AS # sample size 10806 | ||
+ | </code>\\ | ||
+ | == Adjust effect size == | ||
+ | A variable effect model below will assign to rare variants mean shift \(\in(0.1, 0.5)\) depending on the MAF of rare variants: | ||
+ | |||
+ | <code bash> | ||
+ | spower LNR KIT.gdat -a 0.1 -A 0.5 --sample_size 2000 --alpha 0.05 -v2 -o K1AP # 0.289 | ||
+ | </code>\\ | ||
+ | Adding effects for common variants, fixed mean shift to 0.15: | ||
+ | |||
+ | <code bash> | ||
+ | spower LNR KIT.gdat -a 0.1 -A 0.5 -c 0.15 --sample_size 2000 --alpha 0.05 -v2 -o K1AP # 0.86 | ||
+ | spower LNR KIT.gdat -a 0.1 -A 0.5 -c 0.15 --power 0.8 --alpha 0.05 -v2 -o K1AS # 1966 | ||
+ | </code>\\ | ||
+ | == Adjust variant properties and analysis filters == | ||
+ | Now based on the basic example, we change definition for rare variants to MAF > 5%: | ||
+ | |||
+ | <code bash> | ||
+ | spower LNR KIT.gdat -a 0.25 --def_rare 0.05 --sample_size 2000 --alpha 0.05 -v2 -o K1AP # 0.946 | ||
+ | </code>\\ | ||
+ | Power of the test boosts significantly, although it is not a reasonable setup to apply aggregated rare variant analysis to high frequency variants like in this example. There is usually adequate power to detect common variants association when analyzed individually. | ||
+ | |||
+ | == Set a random proportion of non-causal variants == | ||
+ | For example we set a random set of 50% variants that would not contribute to the quantitative phenotype, but will be included in analysis as noise and as a result larger sample size is required to achieve the same power. Since each time a random proportion of variants are considered non-causal, the final estimate should be based on average of multiple replicates, for example 100 replicates: | ||
+ | |||
+ | <code bash> | ||
+ | spower LNR KIT.gdat -a 0.5 --def_valid_locus 3 1000 --sample_size 20000 --alpha 0.05 -P 0.5 -r 100 -v2 -o K1APP --jobs 8 # power 0.8632 | ||
+ | spower LNR KIT.gdat -a 0.5 --def_valid_locus 3 1000 --power 0.8 --alpha 0.05 -P 0.5 -r 100 -v2 -o K1APS --jobs 8 # sample size 17420.4501268 | ||
+ | spower show K1APP.csv power* | ||
+ | spower show K1APS.csv *size* | ||
+ | </code>\\ | ||
+ | Note that standard deviation for the 100 replicates is also calculated and can be displayed with wildcard symbol "*" in ''spower show'' command. | ||