We provide simulated site frequency spectrum as well as real world data:
SRVcontains simulated SFS and haplotype pool data generated with ''spower simulate'' using:
EuropeanAmericanEVS6500.sfs.gzare real world SFS extracted from the Exome Variant Server. The fourth column is SIFT score.
KIT.gdatcontains haplotype pool data on KIT gene from 1000 genomes project.
The site frequency spectrum input data for SEQPower should have 4 columns
In input text, lines starting with “#” will be ignored. This allows for additional notes or comments in the input SFS data.
Using haplotype pool data keeps the LD structure and singleton, doubleton, etc. distribution in real world human haplotypes, thus could result in more realistic power analysis. Haplotype pool data can be generated via
spower simulate module and we provide pre-generated haplotype pools. However currently (August, 2013) there is no publicly available exome-wide haplotype pools with reasonably large sample size for a single population group for power analysis purposes. For an illustration of the feature we provide data from 1000 genome project
KIT.gdat which contains the variants and haplotypes for KIT gene. It is not recommended to use this data set for power analysis due to the limited sample size and the fact that the haplotypes are from more than one population in 1000 genome project. Please contact the developers for assistance if you find a publicly available real world haplotype pool that you are interested in converting to SEQPower input format.