Table of Contents

SEQPower Input

Data Download

We provide simulated site frequency spectrum as well as real world data:

Site Frequency Spectrum Data

The site frequency spectrum input data for SEQPower should have 4 columns

In input text, lines starting with “#” will be ignored. This allows for additional notes or comments in the input SFS data.

Haplotype Pool Data

Using haplotype pool data keeps the LD structure and singleton, doubleton, etc. distribution in real world human haplotypes, thus could result in more realistic power analysis. Haplotype pool data can be generated via spower simulate module and we provide pre-generated haplotype pools. However currently (August, 2013) there is no publicly available exome-wide haplotype pools with reasonably large sample size for a single population group for power analysis purposes. For an illustration of the feature we provide data from 1000 genome project KIT.gdat which contains the variants and haplotypes for KIT gene. It is not recommended to use this data set for power analysis due to the limited sample size and the fact that the haplotypes are from more than one population in 1000 genome project. Please contact the developers for assistance if you find a publicly available real world haplotype pool that you are interested in converting to SEQPower input format.

1) Adam R. Boyko, Scott H. Williamson, Amit R. Indap, Jeremiah D. Degenhardt, Ryan D. Hernandez, Kirk E. Lohmueller, Mark D. Adams, Steffen Schmidt, John J. Sninsky, Shamil R. Sunyaev, Thomas J. White, Rasmus Nielsen, Andrew G. Clark and Carlos D. Bustamante (2008). Assessing the Evolutionary Impact of Amino Acid Mutations in the Human Genome. PLoS Genetics
2) G. V. Kryukov, A. Shpunt, J. A. Stamatoyannopoulos and S. R. Sunyaev (2009). Power of deep, all-exon resequencing for discovery of human trait genes. Proceedings of the National Academy of Sciences