Table of Contents

SEQPower R Interface

In addition to making sample size and power calculations, SEQPower can be used to validate and evaluate novel association methods proposed and implemented by researchers. We incorporate the mechanism to load R script from variant association tools (VAT). Under this mechanism, users write up an R function that analyzes data of an association testing unit (e.g., gene) to perform analysis on simulated association data from SEQPower. A growing collection of examples of RTest via VAT are being published online, including example incorporation of the RAssociation package. The documentation below is adapted from the VAT manual.

Format

You should have one main function in the R program named the same as the R script file name. This is the interface function that interacts with the RTest command, taking input parameters from command line and return output in specified format (see below for details) that can be recognized by RTest and be stored in databases. This main function can call any other R objects as long as they are available from R or implemented elsewhere in your R program.

Input parameters

The main function should be defined in the following format

Click to display ⇲

Click to hide ⇱

download-source.R
ScriptName <- function(dataname, args, **kwargs) {
...
}


where the first argument has to be the data object variable name (e.g. dat in the regression.R example, or any other valid names you specify), followed by a few required positional arguments (e.g. phenotype.name in regression.R example which has to be passed from commandline every time the program is executed), and other keyword arguments that have default values (e.g. family in the regression.R example. If not specified from the command line it will use default value “gaussian”). The required and optional arguments can be assigned from the commandline (e.g., the --phenotype.name 'stroke' argument of vtools associate).

Output configuration

The return object of the main R function should be a list with the properties of each element in the list been pre-specified as comment strings at the beginning of the scripts taking the following format

Click to display ⇲

Click to hide ⇱

# BEGINCONF
# [attribute1]
# name=
# type=
# comment=
# [attribute2]
# n=N
# columns=
# name=
# column_name=
# type=
# comment=
# ENDCONF


and the return R list object is

Click to display ⇲

Click to hide ⇱

list(attribute1=..., attribute2=..., ...)


It is important that the return R object matches the descriptions in the configuration area. If the configuration area is not found in the R script, no output will be written to result databases. This is allowed because there are usage cases that does not need any output, e.g., you write an R program to plot some graphical summary of the association testing unit rather than performing association analysis and calculate p-values.

Data structure

Data from vtools associate are passed into the main R function taking a variable name defined by the first argument of the function. For example if the first argument name is dat then you should manipulate the R variable dat in your R program. The dat object contains 3 default attributes and two optional attributes.

Default attributes

Optional attributes

Cautions

This R interfacing mechanism is flexible, and fragile at the same time, because the RTest method of vtools associate will have no control over what are implemented inside the R program. It assumes the R program the users provide is flawless and can result in exactly the same output as specified in the configuration area. If any errors occurs in the program, RTest will not attempt to fix it. It instead will simply flag an association test as “failed”. If you run an genome-wide association scan with your R program via RTest method and noticed all tests failed, its most likely your R program is problematic.

Debug suggestions

Tip

  • All error messages will be written to the project log file for you to look into. Take a look at the log file to figure out why failures occur and fix your R code.
  • A more advance option is to write out one or two groups of data with the R code to test interactively in R to see what's going wrong. The --data_cache N option will output N R scripts per thread with data-set coded in it, to cache folder. The file name will be [Association Group Name].dat.R. For debug purpose you can add --data_cache 1 to the association command and run the command on a small variant table, find the R data file in cache, load it in R and play with the data-set to make sure your R function works, then remove the --data_cache option to actually perform association scans.

Example

[To be updated]