[BiO BB] multinomial distribution and Ig light chains

A.J. Rossini rossini at blindglobe.net
Wed Apr 9 12:25:20 EDT 2003

Maurya Kaut <mkaut at bu.edu> writes:

> I'm attempting to replicate the analysis of immunoglobulin genes as in
> this article:
> Lossos, et al.  The Inference of Antigen Selection on Ig Genes.  The
> Journal of Immunology 165(9): 5122-5126 (2000)
> The article is available here:
> http://www.jimmunol.org/cgi/content/full/165/9/5122
> The Java applet mentioned in the paper is here:
> http://www-stat.stanford.edu/immunoglobin/
> The contact information in the paper no longer appears to be
> valid. Basically, I'd like to understand the multinomial tail

Both Rob and Naras are still at Stanford Stat.

> having trouble putting it all together.  I've written a Perl script
> that calculates expected replacement frequency for Ig light chain
> germline genes with some success, but the numbers I get for P values
> are two or three orders of magnitude off.  

Sounds like an implementation error -- I doubt if the distribution is
pathological enough to admit to round-off problems on that order of

> Firstly, I would just like to know if there is anyone who is
> familiar with this type of statistical work.  Also, I've heard of
> the "S" engine, and its cousin "R", but I'm not quite sure if they
> are applicable here.  Has anyone used them in conjuction with
> Perl/CGI?  Any advice is appreciated.

The S statistical programming language, as implemented by S (not
generally available), S-PLUS (commercially available) and R
(open-source), is a full featured language for programming, not unlike
a functional, white-space agnostic version of Python (in a sense). 

I would (and generally only) use R for data analysis, and it'll make
programming this problem up much simpler (assuming that you know both
how to program as well as have intuition for statistical data


