PCD Include

From Bioinformatics.Org Wiki

Jump to: navigation, search

Click here to go back to: PCD


Contents

Proposal

One thing that I am finding would make a strong addition to BioLegato would be to extend PCD with a feature for insertion/replacement of code at run time.

For example, in a menu for running BLAST, it would be nice if the PCD could specify that parts of the code that lists the databases available on the local system could be read from a file, rather than being hard-coded into the PCD.

For example, the PCD code might look like this

       var "dbase"
           type        combobox
           label       "Database"
           default     0
           choices
               include $BIRCH/local/dat/BLAST/nt-db.txt
               "User-created file (FASTA format)" "-subject %USERFILE%"


where the file referenced contains the lines that should be inserted at that point in the code eg.

               "nt - non-redundant nucleotide"    "-db nt"
               "pdbnt - PDB nucleotide seqs."    "-db pdbnt"
               "vector - Vector seqs."    "-db vector"

The PDB would be parsed as if the original file had been

       var "dbase"
           type        combobox
           label       "Database"
           default     0
           choices
               "nt - non-redundant nucleotide"    "-db nt"
               "pdbnt - PDB nucleotide seqs."    "-db pdbnt"
               "vector - Vector seqs."    "-db vector"
               "User-created file (FASTA format)" "-subject %USERFILE%"

(I am borrowing from the C #include function.)

This kind of mechanism would be a powerful addition to PCD. I have numerous uses for it. At present, I am simulating the process with Python scripts that process the PCD files when new database files are added or deleted, but there needs to be a cleaner way to do this.

The usefulness of this PCD instruction goes beyond simple lists. Just about any PCD could be included in the input file. This would make it easier to build some very complex menus from fairly simple pieces of code.

Considerations

Should include be indentation agnostic?

    include filename

assumes that the source file is not indented. When included, the current indentation level will be applied to the included lines

include filename 

assumes that the source file is indented properly for the context of the insertion point.

Tentative conclusions:

  1. File must be indented correctly for the scope of the insertion site
  2. Still need to decide whether or not include statement must be indented.

Should include be recursive?

Should the included source also allow includes? If so, the PCD parser needs to be able to detect circular dependencies.

  1. Not now. However, if recursion is added at a later time, that won't break existing code or .blmenus files
  2. If an include is detected in an included file, print an error saying "Recursion not supported in this version."

File paths

filename

  1. Absolutely no fully-qualified file paths!!! That works against portability.
  2. Probably best if the file path is relative to the directory in which the parent PCD file is found.
  3. If we allow subdirectories in the file path, that could compromise the portability if we ever implement PCD on Windows, since file separators are '\' rather than '/'.
  4. Should the file path allow environment variables? There is actually a strong argument in favor of doing so, because for a chooser or combobox, we might want to take the choices from a file in a different directory. For example, a list of local BLAST databases might be found in some directory in $BIRCH/local, rather than the directory for the parent PCD file. This requires solving the file separator problem above, to allow Windows paths.

Conclusions:

  1. Use existing method eg. BLMain.envreplace("$BL_HOME") to parse paths.
  2. In most cases, BioLegato uses file.pathseperator and file.seperator to encode paths. The main difference in handling Linux vs. Windows, is that you still must begin an environment variable with $, rather than the %var% convention used in Windows.

Is include a part of the PCD language, or a preprocessing directive outside of PCD?

This affects both the underlying implementation of how BioLegato processes a PCD file, as well as the syntax of PCD.

If it makes pre-processing easier, we could have some sort of flag, like a hash mark, that could signal doing the inclusion before parsing the PCD. For example

#include    filename

would not be considered a formal part of the PCD language, but rather a pre-processing directive.

In a way, this might be good, because include wouldn't cause a change in PCD, per se, but rather a change in how PCD is implemented.

Tentative conclusion:

  1. Probably simplest if include was a pre-processing step not part of the PCD language.
  2. Maybe we use some symbol(s) other than # to flag the include

pcd.exec

Can we come up with a good example that demonstrates the functionality and usefulness of pcd.exec? We also need some precise documentation of how to use this feature.

File naming conventions

Syntax of the include line

Re: quotation - My guess is that we need to support quoting in case there are file path components containing (Ugh!) blanks. However, I wouldn't make quoting mandatory.

Re: Spacing - Since the include line is not really a PCD statement anyway, it makes no sense to enforce some sort of indentation rule. I'd say that the path is simply the remainder of the line, following include and 1 or more blank spaces.

So far, that would make the syntax of the include line

[<whitespace>]include<blank>[<blank>][quote]path[quote]

Testing

BIRCHDEV/local/script/bldna.include - calls bioLegato 1.0.5

bldna

blncbi

birchadmin

References

BioPCD manuscript IJCA

Include in other languages

Inclusion vulnerability

Personal tools
Namespaces
Variants
Actions
wiki navigation
Toolbox