User Tools


Differences

This shows you the differences between two versions of the page.

Link to this comparison view

srvbatch [2016/04/13 18:02] (current)
Line 1: Line 1:
 +===== Forward-time Simulation with SRV =====
 +The original ''​srv''​ simulator was authored by Bo Peng, 2011((Bo Peng and Xiaoming Liu (2010). **Simulating Sequences of the Human Genome with Rare Variants**. //Human Heredity//​)) ([[http://​simupop.sourceforge.net/​Cookbook/​SimuRareVariants|download]]). SEQPower expands the ''​srv''​ simulator to allow for additional demographic models and specification of selection coefficients. It also saves generated data in a compressed format compatible with SEQPower'​s power calculation input.
 +
 +<box 80% round blue|**__Note__**>​
 + This feature is available only in the "​Full"​ version of SEQPower.
 +</​box>​
 +==== Highlights ====
 +The ''​srv''​ method repeatedly simulating a population of sequences forward in time, subject to mutation, natural selection and demography. Because most mutants are introduced during the rapid population expansion, most of the alleles will be rare at the end of any replicate of the simulation. Samples simulated using this script can be used to study genetic diseases caused by a large number of rare variants. Data pools of generated mutant-site allele frequencies can be applied to assign genotype of confounder individuals.
 +
 +==== Default Output ====
 +The output contains
 +
 + *  ''​*.sfs''​ file, which contains variant site-specific information of each simulated replicate, such as minor allele frequency (maf), position (pos), selection coefficient (annotation),​ etc. Each row corresponds to each mutant site for a replicate. Columns are replicate ID, maf, sel and pos information.
 + *  ''​*.gdat''​ file, which stores generated haplotype pools in a compressed binary format that can be loaded by SEQPower for power calculations.
 +
 +==== Usage ====
 +Type
 +
 +<code bash>
 +spower simulate -h
 +</​code>​\\
 +To read all command line options, or see below for a complete table of command arguments.
 +
 +==== Examples ====
 +Please read [[http://​bioinformatics.org/​spower/​srvbatch-tutorial|this tutorial]] for simulation examples.
 +
 +==== Commandline Options ====
 +<wrap em hi>
 +-h, ''​%%--%%''​help
 +</​wrap>​
 +
 + * Display this help message and exit
 +
 +<wrap em hi>
 +''​%%--%%''​config | //ARG (default: None)//
 +</​wrap>​
 +
 + * Load parameters from a configuration file ARG
 +
 +<wrap em hi>
 +''​%%--%%''​optimized
 +</​wrap>​
 +
 + * Run the script using an optimized simuPOP module
 +
 +<wrap em hi>
 +''​%%--%%''​gui | //<​batch|interactive|True|Tkinter|wxPython>​ (default: None)//
 +</​wrap>​
 +
 + * Run the script in batch, interactive or GUI mode
 +
 +<wrap em hi>
 +''​%%--%%''​regRange | //ARG (default: [2500, 5000])//
 +</​wrap>​
 +
 + * // Gene Length (range) //
 + * The length of gene region for each time of evolution is taken as a random number within the range of region
 + * If a fixed number of gene length N is required, this parameter should be set as [N, N]
 +
 +<wrap em hi>
 +''​%%--%%''​fileName | //ARG (default: '​MySimuRV'​)//​
 +</​wrap>​
 +
 + * // Output Files Name (prefix) //
 +
 +<wrap em hi>
 +''​%%--%%''​numReps | //ARG (default: 3)//
 +</​wrap>​
 +
 + * // Number of Replicates //
 +
 +<wrap em hi>
 +''​%%--%%''​N | //ARG (default: [8100, 8100, 7900, 9000])//
 +</​wrap>​
 +
 + * // Effective Population Sizes //
 + * Assuming a n (n = array length - 1) stage demographic model, this parameter specifies population sizes at the beginning of evolution and at the end of each stage N_0,...,N_n
 + * If N_i < N_i+1, an exponential population expansion model will be used to expand population from size N_i to N_i+1
 + * If N_i > N_i+1, an instant population reduction will reduce population size to N_i+1
 + * For example, N=[5000, 5000, 800, 30000], which simulates a three stage demographic model where a population beginning with 5000 individuals first undergoes a burn-in stage with constant population size 5000, then goes through a bottleneck of 800 indiviudals,​ and after that expands exponentially to a size of 30000
 +
 +<wrap em hi>
 +''​%%--%%''​G | //ARG (default: [500, 10, 370])//
 +</​wrap>​
 +
 + * // Numbers of Generations per Stage //
 + * Numbers of generations of each stage of a n stage demographic model
 + * This parameter should have n elements, in comparison to n+1 elements for parameter N (Effective Population Sizes)
 +
 +<wrap em hi>
 +''​%%--%%''​mutationModel | //ARG (default: '​finite_sites'​)//​
 +</​wrap>​
 +
 + * // Mutation Model //
 + * Mutation model
 + * The default mutation model is a finite-site model that allows mutations at any locus
 + * If a mutant is mutated, it will be back- mutated to a wildtype allele
 + * Alternatively,​ an infinite-sites model can be simulated where new mutants must happen at loci without existing mutants, unless no vacant locus is available (a warning message will be printed in that case)
 +
 +<wrap em hi>
 +''​%%--%%''​mu | //ARG (default: 1.8e-08)//
 +</​wrap>​
 +
 + * // Mutation Rate //
 + * Mutation rate per base pair
 +
 +<wrap em hi>
 +''​%%--%%''​revertFixedSites | //(default: False)//
 +</​wrap>​
 +
 + * // Revert Fixed Sites? //
 + * Whether or not to revert fixed mutant sites to wild-type sites
 +
 +<wrap em hi>
 +''​%%--%%''​selModel | //ARG (default: '​additive'​)//​
 +</​wrap>​
 +
 + * // Multi-locus Selection Model //
 + * Multi-locus selection model, namely how to obtain an overall individual fitness after obtaining fitness values at all loci
 + * This script supports three models: -- multiplicative:​ Product of individual fitness
 + * ** additive: One minus the combined selection deficiencies**
 + * ** exponential:​ Exponential of combined selection deficiencies**
 + * **Note** Each fitness can be equal to or greater than zero, which represents neutral loci, or loci under positive selection
 +
 +<wrap em hi>
 +''​%%--%%''​selDist | //ARG (default: '​Boyko_2008_European'​)//​
 +</​wrap>​
 +
 + * // Selection Coefficient Distribution Model //
 + * Distribution of selection coefficient for new mutants
 + * Each distribution specifies s (selection coefficient) and h (dominance coefficient,​ default to 0.5 for additivity) that assign fitness values 1, 1-hs and 1-s for genotypes AA (wildtype), Aa and aa, respectively
 + * Note that positive s is used for negative selection so negative s is needed to specify positive selection
 + * Note that we use 2k in the default distribution of Gamma distributions because theoretical estimates of s is for each mutant with 1-2s as fitness value for genotype aa in our model
 + * This script currently supports the following distributions
 + * ** constant: A single selection coefficient that gives each mutant a constant value s**
 + * The default parameter for this model is 0.01, 0.5
 + * You can set selCoef to [0, 0] to simulate neutral cases or a negative value for positive selection
 + * ** Eyre-Walker_2006:​ A basic gamma distribution assuming a constant population size model (Eyre-Walker et al, 2006)**
 + * The default parameters for this model is Pr(s=x)=Gamma(0.23,​ 0.185*2), with h=0.5
 + * A scaling parameter 0.185*2 is used because s in our simulation accounts for 2s for Eyre-Walker et al
 + * ** Boyko_2008_African:​ A gamma distribution assuming a two-epoch population size change model for African population (Boyko et al, 2008)**
 + * The default parameters for this model is Pr(s=x)=Gamma(0.184,​ 0.160*2), with h=0.5
 + * ** Boyko_2008_European:​ A gamma distribution (for s) assuming a complex bottleneck model for European population (Boyko et al, 2008)**
 + * The default parameters for this model is Pr(s=x)=Gamma(0.206,​ 0.146*2) with h=0.5
 + * **Note**: If you would like to define your own selection model, please define your own function and pass it to parameter '​Customized Selection Coefficient'​ next to this one in the interface
 +
 +<wrap em hi>
 +''​%%--%%''​selCoef | //ARG (default: [])//
 +</​wrap>​
 +
 + * // Customized Selection Coefficient //
 + * Customized selection coefficient distribution model
 + * If None is given, the default value of distribution selected in the previous parameter '​Selection Coefficient Distribution Model' will be used
 + * **Note** Length of this parameter determines which type of model to use and values specify distribution coefficients
 + * ** length = 0: [] --  no customized input**
 + * ** length = 2: [s, h] -- constant selection coefficient (s) and dominance coefficient (h)**
 + * ** length = 3: [k, d, h] -- gamma distributed selection coefficient,​ where k, d are shape, scale parameters of gamma distribution and h is dominance coefficient**
 + * ** length = 5: [p, s, k, d, h] -- mixed-gamma distributed selection coefficient,​ where p is the probability of having the selection coefficient equal to s; k, d are shape and scale parameters of gamma distribution;​ h is the dominance coefficient**
 + * Distribution will be truncated at [0.00001, 0.1]
 + * ** length = 7: [p,​s,​k,​d,​h,​l,​u] -- truncated mixed gamma, where l,u are lower and upper bounds**
 + * ** length = 8: [p,​s,​q,​k1,​d1,​k2,​d2,​h] -- complex mixed gamma distribution that can generate a mix of constant, positive gamma and negative gamma distributions**
 + * The negative distribution represents protective variant sites with negative selection coefficients,​ where q is the probability of having the selection coefficient following a positive gamma distribution with shape/scale parameters k1/d1
 + * Thus, the probability of having selection coefficient following a negative/​opposite gamma distribution is 1-p-q
 + * The negative gamma distribution takes parameters k2 and d2
 + * The positive distribution will be truncated at [0.00001, 0.1], while the negative one at [-0.1, -0.00001]
 + * ** length = 12: [p,​s,​q,​k1,​d1,​k2,​d2,​h,​l1,​u1,​l2,​u2] -- truncated complex mixed gamma distribution,​ where [l1,u1], [l2,u2] are lower/upper bounds or [k1,d1] and [k2,d2] gamma distributions,​ respectively**
 + * For example, Parameter [0.001, 0] for a constant model defines a recessive model with fixed s
 + * Recomended parameter for mixed-gamma is [0.37, 0.0, 0.184, 0.160*2, 0.5] for Prob(s=0.0)=0.37 (neutral or synonymous) and Prob(s=x)=(1-0.37)*Gamma(0.184,​0.160*2)
 +
 +<wrap em hi>
 +''​%%--%%''​selRange | //ARG (default: [1e-05, 0.01])//
 +</​wrap>​
 +
 + * Generated selection coefficient is truncated by range limits
 +
 +<wrap em hi>
 +''​%%--%%''​recRate | //ARG (default: 0)//
 +</​wrap>​
 +
 + * // Recombination Rate //
 + * Recombination rate per base pair
 + * If r times loci distance is greater than 0.5, a rate of 0.5 will be used
 +
 +<wrap em hi>
 +''​%%--%%''​verbose | //ARG (default: 1)//
 +</​wrap>​
 +
 + * // Screen Output Mode (-1, 0 or 1) //
 + * -1 -- quiet, no screen output 0 -- minimum, minimum output of simulation progress and time spent for each replicate 1 -- regular, regular screen output of statistics, simulation progress and time spent
 +
 +<wrap em hi>
 +''​%%--%%''​steps | //ARG (default: [100])//
 +</​wrap>​
 +
 + * // Detailed Screen Output Interval per Stage //
 + * Calculate and output statistics at intervals of specified number of generations
 + * A single number or a list of numbers for each stage can be specified
 + * If left unspecified,​ statistics from the beginning to the end of every generation of each stage will be printed out
 +
 +<wrap em hi>
 +''​%%--%%''​saveGenotype | //ARG (default: 0)//
 +</​wrap>​
 +
 + * // Save Genotype for Replicates //
 + * Optional output of genotype information
 + * This option is turned off by default 0 because this format is not efficient in storing a small number of mutants
 + * If specified by a positive number n, genotype in standard .ped format for the first n replicates will be outputted to n files
 + * For example
 + * 1 -- genotype of the first replicate will be saved to file
 + * 3 -- genotype of the first three replicates will be outputted to three files
 + * In particular, if specified as -1, genotype information of ALL simulation replicates will be saved
 +
 +<wrap em hi>
 +''​%%--%%''​saveStat | //ARG (default: 0)//
 +</​wrap>​
 +
 + * // Save Statistics for Replicates //
 + * This optional parameter (default None) may output statistics to files
 + * It should be specified in the same manner as 'Save Genotype for Replicates'​ requires
 +
 +<wrap em hi>
 +''​%%--%%''​variantPool | //(default: True)//
 +</​wrap>​
 +
 + * Output variant data pool in addition to site-frequency spectrum (sfs)