PDB Access Methods for FirstGlance in Jmol

PDB Data File Access Methods
Employed by FirstGlance in Jmol

Caution This document has not been updated since an early version of FirstGlance in Jmol. JSmol now uses jmol.php to obtain files from external servers. Contact if you wish to discuss current methods.

Contents

Descriptions of Access Methods
Summary of formats for specifying atomic coordinate data files, and decision tree for selecting an access method.
Apache Rewrite Rules (RCSB)
Apache Rewrite Rules (General)
Possible Access Methods Not Implemented

Descriptions of Access Methods Each method below is controlled by variables defined in the file config.js. Numbers below correspond to the method numbers in that file. Variable names below are in brown, and are all found in config.js.

Signed applet. All installations may use the signed applet, which can read an atomic coordinate file directly from any HTTP URL. Such files may be in any format readable by Jmol (including PDB format and mmCIF format).
1. If the applet server has a local copy of the entire PDB database, method #2 below will be used.
2. If method #2 is not available, PDB Id codes will be obtained from the path specified in rcsbURLSignedHead, normally from the Protein Data Bank (e.g. at rcsb.edu).
3. Atomic coordinate files specified in URL's (absolute or relative) will be obtained directly by the signed Jmol applet. The signed applet can obtain files by either "http://" or "ftp://" protocols.
4. The user is required to trust the applet. This could be problematic since the applet is not signed by a verifiable authority (due to the cost and time that would be involved).
5. This method requires no special action by the server administrator.
Unsigned Applet methods:

For security reasons, unsigned applets are prohibited from reading data from any source except the same domain as that from which the applet was served to the client browser. The advantage of using the unsigned applet is that the user will not be requested to trust the applet, which might (justifiably) cause the user to decline to use the applet. Below, the term "applet server" is used to mean the server that serves the FirstGlance application and the Jmol applet to the client browser.
PDB Id codes from PDB database on applet server. In some cases, the server providing FirstGlance and the unsigned applet also itself has the entire PDB database. In this case, any published PDB Id code can be obtained locally.
1. Whether this method is available on the server is specified in the boolean variable localPDBCodesAvailable.
2. PDB Id codes will be obtained from the path specified in localPDBCodesPathHead, which must have the same domain as the applet server. The PDB Id code will have localPDBCodesPathTail (for example ".pdb.gz") added to its end to construct the data file name.
3. The data files will be obtained directly from the applet server.
4. This method requires that the applet server have a copy of the Protein Data Bank database, automatically updated weekly in synchrony with new releases. FirstGlance cannot detect the absence of an expected PDB file. Therefore, failure to keep the database up to date will cause FirstGlance to fail ungracefully, should a recent PDB code be requested that is not in the local database. FirstGlance will never show the molecule, yet will not produce an error message.
Apache rewrite rules for PDB Id codes. An Apache rewrite rule (see details below) enables the unsigned applet to read the file from a path on the same server that serves the applet, yet the file actually comes from a different server. The rule tells the applet server how to obtain the data file from another server.
1. Whether this method is installed on the server is specified in the boolean variable apacheRCSBRewriteRulesAvailable.
2. Data files corresponding to PDB Id codes will be obtained from the path specified in rcsbPathHead, normally from the Protein Data Bank (e.g. at rcsb.edu). The PDB Id code will have rcsbPathTail (for example ".pdb.gz") added to its end to construct the data file name.
3. The data files will be obtained indirectly: first the data file is served from the data source server to the applet server; then the applet server serves the data file to the browser applet on the client computer.
4. This method requires that the server administrator install the rewrite rules.
General Apache rewrite rules for arbitrary URL's. The previous rewrite rule is limited to PDB codes, and is coded to use a single server, typically rcsb.edu. This will not work if the data file is not a published PDB file, and therefore is typically available from a different server. A general rewrite rule (see details below) enables the applet to obtain a data file from any URL. In order to limit possible abuse of such a wide-open proxy, the rewrite rule will only obtain filenames ending in .pdb, .pdb.gz, .mmol, and .mmol.gz. (mmol is a filename extension used by EBI's Probable Quaternary Structure Server.)
1. Whether this method is installed on the server is specified in the boolean variable apacheGeneralRewriteRulesAvailable.
2. Data file URL's beginning http:// will be obtained from the local rewrite rules path specified in generalPathHead. See the config.js and the Technical Gallery for examples.
3. The data files will be obtained indirectly: first the data file is served from the data source server to the applet server; then the applet server serves the data file to the browser applet on the client computer.
4. This method requires that the server administrator install the general rewrite rules.
PDB files bundled with FirstGlance. A few PDB files are bundled with FirstGlance, so they can be obtained from the applet server. These files are used in the Gallery or Technical Gallery. Anyone making their own installation of FirstGlance on their own server could bundle additional PDB files.
1. This method is always available for the limited set of bundled files.
2. Any filename specified without a preceding path (no http://, no slashes) but with a file extension will be assumed to be in the subdirectory specified by the variable localPDBFiles (normally named localPDBFiles). Since no slashes are permitted, this must be a subdirectory of the directory containing the FirstGlance application files. An example of such a filename would be "1d66.pdb" -- it is the ".pdb" in the filename that distinguishes this from a PDB Id code (which would be simply "1d66").
3. Bundled PDB files will be obtained from the subdirectory specified in localPDBFiles, which must have the same domain as the applet server, and must be a subdirectory of the directory containing the FirstGlance application files.
4. The data files will be obtained directly from the applet server.
5. This method requires no special action by the server administrator.
PDB Id codes for testing during development. This method applies only when the FirstGlance application is not on a server, but is on a the disk of a PC, typically during development. In this situation, there is no method for the unsigned applet to obtain PDB Id codes from a remote server.
1. A PDB Id code will be assumed to correspond to a file in the subdirectory specified in localTestPDBFiles (normally named "localTestPDBFiles"), which must be a subdirectory of the directory containing the FirstGlance application files.
2. The extension ".pdb" will be added to the PDB Id code to generate the complete filename.
Relative paths for PDB filenames on the applet server. This method requires no variables in config.js.
1. When a data filename is specified as a relative path, it will be assumed to exist on the applet server. A relative path must not begin with "http://", and must contain at least one slash, for example "../pdbfiles/drugs/viagra.pdb".
2. If the data file does not exist on the applet server, FirstGlance will fail ungracefully, simply failing to show the requested molecule, and failing to give any error message.
3. The data file will be obtained directly from the applet server.
4. This method requires no special action by the server administrator.

Summary of formats for specifying atomic coordinate data files, and decision tree for selecting an access method.

The data file formats illusrated below are specified with a URL query parameter "?mol=", as explained in links.

The logic summarized below is coded in javascript in the file top.js.

PDB Id code. Example: "1d66".
- If the applet is signed:
  - If the applet server has a local copy of the PDB database (boolean localPDBCodesAvailable), the PDB file is obtained using localPDBCodesPathHead and localPDBCodesPathTail.
  - Otherwise, the PDB Id is obtained from the URL prefix specified in rcsbURLSignedHead.
- If the applet is unsigned:
  - If the FirstGlance application is coming from a server:
    - If the applet server has a local copy of the PDB database (boolean localPDBCodesAvailable), the PDB file is obtained using localPDBCodesPathHead and localPDBCodesPathTail.
    - Otherwise, if Apache rewrite rules for PDB Id codes have been installed (boolean apacheRCSBRewriteRulesAvailable), then the PDB file is obtained using rcsbPathHead and rcsbPathTail.
    - Otherwise there is an error alert message stating that FirstGlance has no method to obtain a PDB Id code.
  - If the FirstGlance application is not being served, but is coming from a PC disk (because it is under development), the PDB file will be obtained from the subdirectory specified in localTestPDBFiles. The PDB Id code will have ".pdb" added to construct the filename.
Absolute HTTP URL. Example "http://pqs.ebi.ac.uk/pqs-doc/macmol/1k28.mmol"
- If the applet is signed, the PDB file is obtained with the URL directly by the applet.
- If the applet is unsigned:
  - If general Apache rewrite rules are installed (boolean apacheGeneralRewriteRulesAvailable) the PDB file is obtained by prefixing the URL ("http://" trimmed off) with generalPathHead.
  - Otherwise there is an error alert message stating that FirstGlance has no method to obtain the specified PDB file.
Absolute FTP URL. Example "ftp://www.bio.umass.edu/pub/shareware/rasmol/fourmols/bilagram.pdb"
- If the applet is signed, the PDB file is obtained with the URL directly by the applet.
- If the applet is unsigned, there is an error alert message stating that FirstGlance has no method to obtain the specified PDB file.
Relative URL. Example "../pdbfiles/drugs/viagra.pdb"
- If the applet is signed, the PDB file is obtained directly by the applet using the unmodified relative URL.
- If the applet is unsigned:
  - If FirstGlance is coming from a server, the PDB file is obtained directly by the applet using the unmodified relative URL.
  - If the FirstGlance application is PC disk (because it is under development), the user is asked to approve use of the signed applet. If declined, there is an error alert message stating that FirstGlance has no method to obtain the specified PDB file.
Simple filename (not a valid PDB Id code). Examples: "1d66.pdb", "1k28.mmol", "viagra", "1k28_chain_a".
- Regardless of whether the applet is signed or unsigned, and regardless of whether the application is served, the applet attempts to obtain the PDB file from the subdirectory specified by localPDBFiles.

Apache Rewrite Rules (RCSB)
The following instructions are for the server administrator.
The following instructions are quoted from an email message sent by Miguel Howard (principal developer of Jmol) to the jmol-users list (original message, Sept. 15, 2005).

Would you like to have the entire PDB accessible from your web server ?
  ... without having it take up any disk space?
  ... and without ever having to update it?
 
 Read on ...
 
 
 Detail
 ======
 One can configure the Apache web server so that it will 'proxy' the entire
 PDB from the Protein Data Bank. From the user's perspective, it seems as
 though the entire PDB is stored on the web server. In fact, the web server
 dynamically retrieves the file from the PDB and returns it to the client.
 
 This is particularly helpful when working with unsigned applets which can
 only retrieve data from the web server from which they originated.
 
 Apache has 'rewrite rules' that use regular expression pattern matching to
 manipulate URLs. The combination of proxying + rewrite rules is very
 powerful.
 
 I did all of this on Apache 2.0
 
 On my web server I wanted to create a virtual directory in the root of the
 web server called '/pdb'. In that virtual directory I have all of the PDB
 files in both .pdb and .cif format, both uncompressed and .gz compressed.
 This gives me:
 
   http://my.web.server/pdb/1CRN.pdb
   http://my.web.server/pdb/1A00.pdb.gz
   http://my.web.server/pdb/1D66.cif
   http://my.web.server/pdb/1D68.cif.gz
 
 Of course this is a virtual directory with virtual files ... none of it
 really exists.
 
 You can call it whatever you want, and put it at wherever you want in the
 directory tree. In reality, there is no directory tree ... it is just a
 text string that gets manipulated.
 
 From the perspective of the client, it looks and behaves like a URL. And
 the JmolApplet is a client of the web server.
 
 
 So, here is the trick ...
 
  1. Find your httpd.conf file. On Fedora Linux it is in
       /etc/httpd/conf/httpd.conf
 
  2. Locate the following lines:
       LoadModule rewrite_module modules/mod_rewrite.so
       LoadModule proxy_module modules/mod_proxy.so
       LoadModule proxy_http_module modules/mod_proxy_http.so
 
 These lines basically ensure that the proxy code and the rewrite engine
 are loaded into apache.
 
  3. We are going to add some rewrite rules. You will need
     root (administrator) access rights. You can put
     them in httpd.conf if you want. However, it would be
     better to put this in a new file that we will create:
       /etc/httpd/conf.d/pdbproxy.conf
 
 (If you do not understand this, just put it in the bottom of your
 httpd.conf)
 
   4. The magic lines are:
 
 --BEGIN--
 
 <IfModule mod_rewrite.c>
 # turn it on
 RewriteEngine on
 # log it here if you want to see what is happening
 RewriteLog "/tmp/rewrite.log"
 # log everything
 RewriteLogLevel 5
 
 #note that these backslash characters allow
 #you to continue onto more than one line
 
 RewriteRule ^/pdb/(.*)\.pdb$ \
 http://www.rcsb.org/pdb/cgi/export.cgi/\
 $1.pdb?format=PDB&pdbId=$1&compression=None [P]
 
 RewriteRule ^/pdb/(.*)\.pdb.gz$ \
 http://www.rcsb.org/pdb/cgi/export.cgi/\
 $1.pdb.gz?format=PDB&pdbId=$1&compression=gz [P]
 
 RewriteRule ^/pdb/(.*)\.cif$ \
 http://www.rcsb.org/pdb/cgi/export.cgi/\
 $1.cif?format=mmCIF&pdbId=$1&compression=None [P]
 
 RewriteRule ^/pdb/(.*)\.cif.gz$ \
 http://www.rcsb.org/pdb/cgi/export.cgi/\
 $1.cif.gz?format=mmCIF&pdbId=$1&compression=gz [P]
 </IfModule>
 
 --END--
 
 There are 3 parts to these:
 
 ^/pdb/(.*)\.pdb.gz$
 
   This essentially says match things that start with
   '/pdb/' and end with '.pdb.gz' and remember the stuff
   in the middle
 
 http://www.rcsb.org/pdb/cgi/export.cgi/\
 $1.pdb.gz?format=PDB&pdbId=$1&compression=gz
 
   This is the pattern of the new URL, where the $1 is
   replaced with the part that you remembered from
   the previous step
 
 [P]
 
   This says to proxy it, rather than tell the browser
   to redirect. So the server will fetch the contents
   and stream it back to the client rather than telling
   the client where to go and get it.
 
  5. Test your config change
       apachectl -t
 
  6. restart your web server
       apachectl restart
 
  7. open a web browser
 
  8. type in:
     http://your.web.server/pdb/1CRN.pdb.gz
 
 It should pop a dialog box and ask you what you want to do with the file.
 If that works, then you are in good shape.
 
  9. you can look in /tmp/rewrite.log to see what happened.
 
 That is all there is to it!
 
 
 Random thoughts:
 
 * This solution is ideal for sites deploying the JmolApplet. Unsigned
 applets can only retrieve data from the same domain name as the web server
 from which they were launched. So an Unsigned applet cannot fetch data
 from rcsb.org. However, this makes it look like the data is coming from
 the PDB, so the JmolApplet can fetch arbitrary PDB files.
 
 * This is *much* better than writing your own CGI ... because you do not
 have to write (+debug+test+maintain+port) any code. And because it is part
 of Apache it is solid and scalable.
 
 * One should *always* fetch the .gz files, either .pdb.gz or .cif.gz. The
 files are 1/4 of the size and load much faster.
 
 * The PDB has told me that the URL above is stable and should be used as
 their 'standard api' to access data across the web.
 
 * This same technique could, of course, be used to fetch data from other
 chemical databases.
 
 * you will need root (administrator) access in order to configure your web
 server. And don't forget to restart apache to force it to reload the
 modified configuration file:
 
    apachectl -t
    apachectl
 
 * A final note regarding ...
 
 Apache includes a caching module to prevent frequently used URLs from
 being fetched repeatedly. This is ideal for the fetching data from the PDB
 because once someone pulls down a file it will automatically be stored
 locally. One can control the amount of time that a file lives in the local
 cache.
 
 Unfortunately, I have not been able to get the cache to function properly
 with the .cgi provided by the PDB. The issue is complicated because of the
 HTML headers returned by their cgi (and most other cgis).
 
 Miguel

Apache Rewrite Rules (General)

This generic pdb proxy rule enables fetching URLs from other servers ... not just the rcsb.org servers. The rule below will fetch only files with the following file extensions from arbitrary web servers:

 .pdb
 .mmol
 .cif
 .pdb.gz
 .mmol.gz
 .cif.gz

Here is the rewrite rule:

RewriteRule ^/pdbproxy/(.*)\.((pdb|mmol|cif)(\.gz)?) \
http://$1.$2 [P]

To use this rewrite rule to fetch the file at
http://pqs.ebi.ac.uk/pqs-doc/macmol/1k28.mmol
remove the leading "http://" and add the remainder of the data file URL to the proxy URL as follows:


http://my.web.server/pdbproxy/pqs.ebi.ac.uk/pqs-doc/macmol/1k28.mmol

Note that you need not name the proxy URL "pdbproxy" -- you can specify any name in the rule.

Possible Data Access Methods Not Implemented

The following data file access methods were not implemented in the present version of FirstGlance in Jmol.

CGI program applet proxy. Such a program was implemented by Miguel Howard some time ago, but has not been used, to our knowledge, in released applications. The Apache rewrite rule is more efficient. Use of the CGI proxy would require that CGI exeution permission be granted by the server administrator in the same domain as the applet server.
Remote CGI program writing PDB file into javascript variable. This method was devised by Bob Hanson (see his demonstration). It was tested by Eric Martz, who found it to have serious limitations in the size of the PDB data file it could accomodate. The severity of the limitations appeared to be linux > Mac OSX > Windows, but even in Windows some realistic PDB files could not be accomodated. For further details, please contact Nevertheless, the advantage of this method is that it permits the unsigned applet to access data from any server, without any intervention by the applet server administrator (no rewrite rules, no CGI proxy), provided a suitable CGI program is installed on a suitable server (which can be a different server than the applet server). If a modification could be devised that removed the data file size limitations, it would be very useful.

Feedback to