PCD Include

From Bioinformatics.Org Wiki

Jump to: navigation, search

Click here to go back to: PCD


Contents

Proposal

One thing that I am finding would make a strong addition to BioLegato would be to extend PCD with a feature for insertion/replacement of code at run time.

For example, in a menu for running BLAST, it would be nice if the PCD could specify that parts of the code that lists the databases available on the local system could be read from a file, rather than being hard-coded into the PCD.

For example, the PCD code might look like this

       var "dbase"
           type        combobox
           label       "Database"
           default     0
           choices
               include $BIRCH/local/dat/BLAST/nt-db.txt
               "User-created file (FASTA format)" "-subject %USERFILE%"


where the file referenced contains the lines that should be inserted at that point in the code eg.

               "nt - non-redundant nucleotide"    "-db nt"
               "pdbnt - PDB nucleotide seqs."    "-db pdbnt"
               "vector - Vector seqs."    "-db vector"

The PDB would be parsed as if the original file had been

       var "dbase"
           type        combobox
           label       "Database"
           default     0
           choices
               "nt - non-redundant nucleotide"    "-db nt"
               "pdbnt - PDB nucleotide seqs."    "-db pdbnt"
               "vector - Vector seqs."    "-db vector"
               "User-created file (FASTA format)" "-subject %USERFILE%"

(I am borrowing from the C #include function.)

This kind of mechanism would be a powerful addition to PCD. I have numerous uses for it. At present, I am simulating the process with Python scripts that process the PCD files when new database files are added or deleted, but there needs to be a cleaner way to do this.

The usefulness of this PCD instruction goes beyond simple lists. Just about any PCD could be included in the input file. This would make it easier to build some very complex menus from fairly simple pieces of code.

Considerations

Should include be indentation agnostic?

    include filename

assumes that the source file is not indented. When included, the current indentation level will be applied to the included lines

include filename 

assumes that the source file is indented properly for the context of the insertion point.

Tentative conclusions:

  1. File must be indented correctly for the scope of the insertion site
  2. Still need to decide whether or not include statement must be indented.

Should include be recursive?

Should the included source also allow includes? If so, the PCD parser needs to be able to detect circular dependencies.

  1. Not now. However, if recursion is added at a later time, that won't break existing code or .blmenus files
  2. If an include is detected in an included file, print an error saying "Recursion not supported in this version."

File paths

filename

  1. Absolutely no fully-qualified file paths!!! That works against portability.
  2. Probably best if the file path is relative to the directory in which the parent PCD file is found.
  3. If we allow subdirectories in the file path, that could compromise the portability if we ever implement PCD on Windows, since file separators are '\' rather than '/'.
  4. Should the file path allow environment variables? There is actually a strong argument in favor of doing so, because for a chooser or combobox, we might want to take the choices from a file in a different directory. For example, a list of local BLAST databases might be found in some directory in $BIRCH/local, rather than the directory for the parent PCD file. This requires solving the file separator problem above, to allow Windows paths.

Conclusions:

  1. Use existing method eg. BLMain.envreplace("$BL_HOME") to parse paths.
  2. In most cases, BioLegato uses file.pathseperator and file.seperator to encode paths. The main difference in handling Linux vs. Windows, is that you still must begin an environment variable with $, rather than the %var% convention used in Windows.

Is include a part of the PCD language, or a preprocessing directive outside of PCD?

This affects both the underlying implementation of how BioLegato processes a PCD file, as well as the syntax of PCD.

If it makes pre-processing easier, we could have some sort of flag, like a hash mark, that could signal doing the inclusion before parsing the PCD. It would not be considered a formal part of the PCD language, but rather a pre-processing directive.

In a way, this might be good, because include wouldn't cause a change in PCD, per se, but rather a change in how PCD is implemented.

Tentative conclusion:

  1. Probably simplest if include was a pre-processing step not part of the PCD language.
  2. Maybe we use some symbol(s) other than # to flag the include

It's probably best to use a symbol other than '#' to indicate an include. An include is fundamentally different from a comment, so from an OO viewpoint, it should have a distinct definition. As well, it's easier to scan for if we use a non-#. The other thing is that we risk conflicting with comment-detection elsewhere. Granted, C uses #include, but it's possible that that was seen as a mistake, in retrospec. I don't know what the C community thinks about this.

For now, let's use '@' and the character for any pre-processing directive, and specifically, @include for an include line.

@include    filename

pcd.exec

Can we come up with a good example that demonstrates the functionality and usefulness of pcd.exec? We also need some precise documentation of how to use this feature.

File naming conventions

Syntax of the include line

Re: quotation - My guess is that we need to support quoting in case there are file path components containing (Ugh!) blanks. However, I wouldn't make quoting mandatory.

Re: Spacing - Since the include line is not really a PCD statement anyway, it makes no sense to enforce some sort of indentation rule. I'd say that the path is simply the remainder of the line, following include and 1 or more blank spaces.

Implementation: Both indentation and optional quotes have been implemented. The syntax of the include line is therefore:

[<whitespace>]@include<blank>[<blank>][quote]path[quote]

Implementation

The code for parsing is found in bioLegato/src/BioPCD/parser/src/org/biopcd/parser. Where do we actually implement include? Have a look in BLMain.java.loadPCD.

It looks like PCD (which is generated from pcd.jit) contains the code for actually reading each menu. At line 661 we see:

                   // Open PCD menu file.
                   FileReader infile = new FileReader(path);
                   // Create a new PCD object to store the PCD data read in
                   // from the PCD menu file, and parse the menu file into the
                   // PCD menu object.
                   PCDObject pcdo = loadPCDStream (infile, path.getParentFile(), canvas);

This changes to

                   // Call includePCD to insert includes into the blmenu file
                   File temp1 = PreProcess.includePCD(path);
                   // Open PCD menu file.
                   //FileReader infile = new FileReader(path);
                   FileReader infile = new FileReader(temp1);
                   etc......

Testing

BIRCHDEV/local/script/bldna.include - calls bioLegato 1.0.6

In principle, it should be possible to create a .blmenus file with nothing but includes, which evaluate to a complete .blmenus file. Can't imagine why anyone would want to do that, but it might be a test that should be passed.

bldna

DONE.

blncbi

DONE for NCBINUC and NCBIPROT.

birchadmin

BLHelper, and Update, Add and Delete menus may be more trouble than they're worth, when it comes to using @include.

One thing that is becoming apparent is that where only a single menu field needs replacing, @include makes things simpler. Where several fields must be substituted, using @include might actually make things harder to understand for the PCD programmer.

blreads

tbl2asn - use @ include for choices in qXfield. This may also not be the best idea, since we have to hardwire default values for each of the choosers. As well, there is a documented bug in BioLegato with this menu, that currently has a workaround in the .blmenu code itself.

References

BioPCD manuscript IJCA

Include in other languages

Inclusion vulnerability

Personal tools
Namespaces
Variants
Actions
wiki navigation
Toolbox