From Bioinformatics.Org Wiki
Click here to go back to: PCD
One thing that I am finding would make a strong addition to BioLegato would be to extend PCD with a feature for insertion/replacement of code at run time.
For example, in a menu for running BLAST, it would be nice if the PCD could specify that parts of the code that lists the databases available on the local system could be read from a file, rather than being hard-coded into the PCD.
For example, the PCD code might look like this
var "dbase" type combobox label "Database" default 0 choices include $BIRCH/local/dat/BLAST/nt-db.txt "User-created file (FASTA format)" "-subject %USERFILE%"
where the file referenced contains the lines that should be inserted at that point in the code eg.
"nt - non-redundant nucleotide" "-db nt" "pdbnt - PDB nucleotide seqs." "-db pdbnt" "vector - Vector seqs." "-db vector"
The PDB would be parsed as if the original file had been
var "dbase" type combobox label "Database" default 0 choices "nt - non-redundant nucleotide" "-db nt" "pdbnt - PDB nucleotide seqs." "-db pdbnt" "vector - Vector seqs." "-db vector" "User-created file (FASTA format)" "-subject %USERFILE%"
(I am borrowing from the C #include function.)
This kind of mechanism would be a powerful addition to PCD. I have numerous uses for it. At present, I am simulating the process with Python scripts that process the PCD files when new database files are added or deleted, but there needs to be a cleaner way to do this.
The usefulness of this PCD instruction goes beyond simple lists. Just about any PCD could be included in the input file. This would make it easier to build some very complex menus from fairly simple pieces of code.
Should include be indentation agnostic?
assumes that the source file is not indented. When included, the current indentation level will be applied to the included lines
assumes that the source file is indented properly for the context of the insertion point.
- File must be indented correctly for the scope of the insertion site
- Still need to decide whether or not include statement must be indented.
Should include be recursive?
Should the included source also allow includes? If so, the PCD parser needs to be able to detect circular dependencies.
- Not now. However, if recursion is added at a later time, that won't break existing code or .blmenus files
- If an include is detected in an included file, print an error saying "Recursion not supported in this version."
- Absolutely no fully-qualified file paths!!! That works against portability.
- Probably best if the file path is relative to the directory in which the parent PCD file is found.
- If we allow subdirectories in the file path, that could compromise the portability if we ever implement PCD on Windows, since file separators are '\' rather than '/'.
- Should the file path allow environment variables? There is actually a strong argument in favor of doing so, because for a chooser or combobox, we might want to take the choices from a file in a different directory. For example, a list of local BLAST databases might be found in some directory in $BIRCH/local, rather than the directory for the parent PCD file. This requires solving the file separator problem above, to allow Windows paths.
- Use existing method eg. BLMain.envreplace("$BL_HOME") to parse paths.
- In most cases, BioLegato uses file.pathseperator and file.seperator to encode paths. The main difference in handling Linux vs. Windows, is that you still must begin an environment variable with $, rather than the %var% convention used in Windows.
Is include a part of the PCD language, or a preprocessing directive outside of PCD?
This affects both the underlying implementation of how BioLegato processes a PCD file, as well as the syntax of PCD.
If it makes pre-processing easier, we could have some sort of flag, like a hash mark, that could signal doing the inclusion before parsing the PCD. For example
would not be considered a formal part of the PCD language, but rather a pre-processing directive.
In a way, this might be good, because include wouldn't cause a change in PCD, per se, but rather a change in how PCD is implemented.
- Probably simplest if include was a pre-processing step not part of the PCD language.
- Maybe we use some symbol(s) other than # to flag the include
Can we come up with a good example that demonstrates the functionality and usefulness of pcd.exec? We also need some precise documentation of how to use this feature.
File naming conventions
- There is no reason I can think of to require include to enforce any file-naming conventions on the include file.
- However, I propose that included files should probably have a .pcd extension (or .blinclude?). This distinguishes them from .blmenu files. However, we need to test whether BioLegato allows the .pcd extension, or whether that is reserved.
- In fact, maybe the one thing that might be enforced is that include files can NOT have the .blmenu extension!
- PCD.getCurrentPWD tells directory in which the current menu is being read. If we stipulate that the Include file has to be in the same directory as the .blmenu file, this could simplify things. We could potentially also require that the two files have the same basename, but different extensions eg. x.blmenu and x.pcd.
- Alternatively, do we want to implement environment variables as part of the path to the inclde file?
Syntax of the include line
- Three choices regarding quotation of file path:
- quoting NOT allowed eg. include path
- quoting required eg. include ["|']path["|']
- quoting optional
- Spacing between include and path
- 1 or more spaces?
- require 4 spaces as in Python?
Re: quotation - My guess is that we need to support quoting in case there are file path components containing (Ugh!) blanks. However, I wouldn't make quoting mandatory.
Re: Spacing - Since the include line is not really a PCD statement anyway, it makes no sense to enforce some sort of indentation rule. I'd say that the path is simply the remainder of the line, following include and 1 or more blank spaces.
So far, that would make the syntax of the include line
BIRCHDEV/local/script/bldna.include - calls bioLegato 1.0.5
- Database --> BLASTNlocal
- Menu file: BIRCHDEV/local/dat/bldna/PCD/Database/testBLASTNlocal.blmenu - reads blastdb_n.txt as include file
- Include file: BIRCHDEV/local/dat/bldna/PCD/Database/blastdb_n.txt - include file for testBLASTNlocal.blmenu
- Test: BIRCHDEV/local/dat/bldna/PCD/Database/MUSCATAL.gen - GenBank of mouse catalase gene. It is best to test this sequence using the RefSeq Gene database. RefSeq Gene is a small database, so the search should return results almost instantaneously.
- Database --> FEATURES_KEY
- Menu: BIRCHDEV/local/dat/bldna/PCD/Database/testFEATURES_KEY.blmenu - runs the Features program on a GenBank file
- Include file: BIRCHDEV/local/dat/bldna/PCD/Database/feakey.txt - list of feature keys for choice menu
- Test: BIRCHDEV/local/dat/bldna/PCD/Database/mouse_catalase.gen - contains 8 mouse catalase sequences. Try extracting features with key words such as CDS, STS, 5'UTR, repeat_region.
- Database --> Nucleotide - Query NCBI Nucleotide Database
- Menu: BIRCHDEV/local/dat/blncbi/PCD/Database/testNCBINUC.blmenu - reads feakey.txt for 8 duplicate choice menus
- Include file: BIRCHDEV/local/dat/bldna/PCD/Database/feakey.txt - read by testNCBINUC.blmenu as include file
- Test: Primary organism: Pisum AND (TextWord: PR10 OR TextWord: drr206) - should return 13 entries to blncbi.
- Preferences --> BLHelper
- UpdateAddInstall --> BlastDB Report - include a locally-specified FTP site specified in $BIRCH/local. Same for other BlastDB menus
- UpdateAddInstall --> BlastDB Update/Add/Delete - Do we gain anything by using include in these to specify database choices?