Links to HEAD: | (view) (annotate) |
Sticky Revision: | |
Sort logs by: |
Fixed bug: wasn't tokenising well when first field in line was quoted
FIXED BUG: now doesn't fail with records that are delimited with \n; ;\n Method tokeniseFields is now completely rewritten: is what does all the magic of parsing all the oddities of the mmcif format Using RandomAccessFile to open the file only once and then seek to the positions we need to scan at each point. Might be slower due to the RandomAccessFile that does no buffering. Also maybe because the new tokenisation is not very optimal Now parseCifFile does the whole parsing calling also the submethods instead of calling them in the constructor
extracted constant NULL_CHAIN_CODE from ...Pdb classes, added copy() methods to NodeSet and EdgeSet, added some functionality to NodesAndEdges, new class SimilarityGraph
Removed class AA and replace it by AAinfo, which reads contact types from separate file contactTypes.dat New class ContactType which contains atoms for each contact type and residue. A static object for each contact type is loaded into AAinfo upon reading the contactTypes.dat file Changed all references accordingly
added constructors for loading from online pdb
Fixed some comments
Now parsing each element in different methods (re-opening the file). Parsing first pdbx_poly_seq_scheme so we get the chainCode that we can use for reading the rest Now taking care of cases where struct_sheet_range is not a loop element In tokeniseFields now also unquoting double-quoted strings Tested on a set of 12000 entries
Checking number of fields per line in loop elements and throwing exception if count is not correect Doing tokenisation of lines through new function that takes care of possible quoted string with spaces New exception CiffileFormatError Checking 1st line of cif file has correct format: data_1xxx, if not throwing exception
Fixed buf: sometimes struct_conf can be non-loop elements, now also taking care of that particular case
Bug with '?' in auth_seq_num was not really fixed. Now should be fine: behaviour is the same as PdbasePdb
Fixed bug: needed to read alt locs in advance in another scan of the file because the order of the elements in the cif file is not guaranteed. As read of atom_site needs of alt locs, we need to do first the parsing of atom_sites_alt
Fixed bugs: - was reading HETATM lines as well as ATOM in atom_site - auth_seq_num with '?' not taken now when populating the pdbresser2resser map (same behaviour as in PdbasePdb) - now using chainCodeStr and auth_asym_id to identify chains in pdbx_poly_seq_scheme, struct_conf and struct_sheet_range. atom_site is not guaranteed to appear in file before all the others so we can't rely on having read a chainCode (asym_id) when parsing the other elements
Now taking indices for fields from parsed field names. Still only minimal testing
First implementation of mmCIF file parser. Tested minimally.
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, enter a numeric revision.