Go to the previous, next section.
The table is one of the central data structures in the analysis facility. It is described in the introduction to the analysis section, see section An Overview of the Analysis Facility. Here we describe the various commands available for construction and manipulation of tables.
The BUILD command generates a table. Each term in the energy function related to bonded interaction defines a class of tables where the data in the table are properties of the internal coordinate. Likewise, atoms define another class of tables, which contain data that are properties of atoms. Each class has its own rule for the selection of tags as well as criteria for deciding what residue a data point belongs to.
BUILD [COMPARE] [PARM ] class repeat([WRITE unit-number] property del) [ DIFF ] [IUPAC] [TITLE string del] { BOND } { ANGLE } class ::= {TORSION } {IMPROPER} { HBOND } { ATOM } property ::= string
Syntactic ordering: The operands must be specified in the above order, except for TITLE which can be specified anywhere after the class.
The tags and placement of data for each of the classes is given below:
Class Property of Placement Tags BOND Bond Use first atom Tags for atoms separated by dashes. ANGLE Angles Use middle atom As for bonds. TORSION Torsion angles Use second atom As for bonds. IMPROPER Improper Use first atom As for bonds. torsion angles HBOND Hydrogen bonds Use donor atom The tag for the donor atom is concatenated with a string giving its segment id, residue name, and residue id. ATOM Atom Itself See below
The placement gives the atom which is used to determine which residue the data belongs to. The tag for an atom is specified by the third word of the command. If nothing is specified or if IUPAC is specified, the tag for an atom is its IUPAC name as given in the residue topology file. PARM specifies that the atom type name as given in the parameter file should be used.
Each class of table allows different sets of properties to be specified. Any number of properties may be specified in a table subject to the limits of what can be contained in memory and what can be easily read when printed.
There are two groups of properties, static properties and dynamic properties. The static properties are those properties which are calculated just from the structure, a parameter set, and one set of coordinates. The dynamic properties are those which are calculated using a dynamics trajectory. In this section, we will discuss only the static properties; the dynamic properties are discussed in the section on dynamics analysis, see section Dynamical Table Properties. Also, a property may be read from a file, see section Table Input and Output. Note that dynamic and static properties as well as properties read from a file may be freely mixed in the table.
All the internal coordinate tables (BOND, ANGLE, TORSION, and IMPROPER) have the same set of properties. The syntax for these properties is as follows:
{GEOMETRY} property ::= { ENERGY } { NUMBER }
The GEOMETRY property returns the geometry of the internal coordinate. For example the BOND GEOMETRY is the bond length; the TORSION GEOMETRY is the torsion angle. The ENERGY property returns the energy of the the internal coordinate. For example, IMPROPER ENERGY returns the energy of the improper torsion angles. The NUMBER property returns the sequential position of the internal coordinate in the list contained in the PSF. For example, the numbers of the sulfur-sulfur bonds in a cysteine bridge will be very large, because they are added last. The NUMBER property is useful in figuring out error messages from the energy routines as the numbers the energy routines output correspond to the numbers returned by this property.
The syntax for hydrogen bond properties is as follows:
{DISTANCE} property ::= { ANGLE } { ENERGY } { NUMBER }
The DISTANCE property returns the heavy atom donor -- acceptor distance. The ANGLE property returns the complement of heavy atom donor -- hydrogen -- acceptor angle. The ENERGY property is the hydrogen bond energy. The NUMBER property gives the sequential position of the hydrogen bond.
The following table describes the syntax and function of the various properties available for atoms.
USERE
subroutine.
See section Interfacing to CONGEN, for a description of USERE
which
provides more information on writing user energy routines. Note that the
USERE
routine must be written to return the appropriate data for
analysis of energies (versions of USERE
routines which do not
support the analysis of energies will run in the main part of the
program even though they will bomb out if called from analysis).
These three table properties use the GEPOL algorithm(15)(16)(17)(18) to calculate three different surface properties of the atoms; van der Waals, accessible, and molecular surfaces. The van der Waals surface (keyword VSGEPOL) is the surface of the atoms calculated using the van der Waals radii and accessibility is ignored. The accessible surface (keyword ASGEPOL) is the area of the locus of the center of water probe. It is the same surface as calculated by the Lee and Richards algorithm referenced above. The molecular surface (keyword MSGEPOL) is the locus of points on the surface of the probe sphere when the probe is in contact with at least one atom in the molecule.
When calculating van der Waals and accessible surfaces, the GEPOL algorithm uses a points on tesselated sphere to calculate what parts of the sphere are exposed, and it adds all the contributions of the tesserae to determine the surface. The calculation of the molecular surface uses this accessible surface algorithm, but additional spheres are added to the calculation of the accessible surface, and these additional spheres closely define the molecular surface.
The GEPOL command, see section GEPOL Command -- Set GEPOL Defaults, may be used to set operating parameters for the GEPOL algorithm.
The BUILD command executes very quickly taking on the order of seconds for a table with several properties. The only exception to this is the SURFACE, CONTACT and GEPOL properties. For PTI, the calculation of Richard's surface takes about 15 seconds for the explicit hydrogen model (580 atoms) on a Iris 200 series workstation. The GEPOL molecular surface takes minutes on a R4000 based Silicon Graphics workstation.
Normally, the BUILD command will build a table using the
main calculation. However, if one specifies COMPARE, the
comparison data structures will be used for the table generation. The
specification of DIFF directs that a difference table be built.
Difference tables require different handling than standard tables. The
data in the table results from subtracting the data generated using the
comparison calculation from the data generated from the main
calculation. I.e. TABLE
= MAIN
- COMPARISON
.
Further, atoms must be paired properly in order to ensure that the
correct quantities are subtracted. Let us examine the use of the atom
pairs more closely.
If a table is constructed from atom properties, we can use the atom pair list directly to define what data points are equivalent, and therefore, should be subtracted form one another. For difference tables of internal coordinate properties (we consider hydrogen bonds to be internal coordinates in this discussion), the process of finding equivalent internal coordinates is more complex. It can be done as follows: for every internal coordinate in the main calculation, we look up each of the atoms in the atom pair list. If any atom is not found, we ignore that internal coordinate. From the equivalent atoms, we construct the equivalent internal coordinate for the comparison structure. The presence of the equivalent internal coordinate in the comparison structure indicates that we have found an equivalent pair. The pairing that is generated after examining all internal coordinates is used to relate the numbers generated for non-atom properties.
The other accommodations that must be made for the difference tables results from the various identifiers used in the table. Segments in the table correspond to a segment pair, so two segment identifiers must be kept for every table segment. Each residue in the table requires to two residue names and two residue identifiers. Likewise, the tags for atoms now refer to an atom pair. If the tags for both atoms in a pair are the same, then the tag for the pair is the same as the tag for one atom. If the atoms are different, the tag is formed by concatenating the tags for both atoms and separating them by a colon. Finally, the table is marked as a difference table so that the PRINT command can handle the double segment and residue identifiers correctly.
In order to allow users to input their own data into a table and to save the output from a particularly long property calculation (such as an accessible surface), the BUILD command allows properties to be read and written from files. The organization for such files is as follows:
title (up to 10 lines) * (line must be blank after asterisk) short header (80A1) long header (80A1) number of items (I5) format (80A1) data (using format given above. One per line)
The title is short description of the file. The text is printed when the
file is read. The short header is the header used over columns in a
column printout (see section Printing Tables). The long header is the title
for printout when the table is pretty printed (see below). The number of
items must be the same as the number of entities in the structure for
the particular class. For example, for a table of class, ANGLE,
the number of items must be equal to NTHETA
, the number of angles
in the PSF. The data uses an F15.0
format to allow 8 digits of
accuracy plus exponents.
The order of the data is determined by the order in which CONGEN keeps its list of internal coordinates, hydrogen bonds, or atoms. For example, the bond list will contain the list of bonds for the chain of a protein followed by the bonds for any disulphide bridges which may been patched in later. If one is preparing a property input, it must follow this ordering exactly. If a bond is deleted, as is the case for the first bond in an extended atom protein, one must have an entry for it anyway in the property input. The NUMBER static property, which is defined for all classes, gives the correct position for each item in the table. In addition, when a property is written out, identifying information is written as well. Therefore, a property file may also serve as a guide for the order of the data.
The syntax for a property to be read from a file is as follows:
property ::= READ unit-number
The WRITE option, which may be specified with any property, causes that property to be written to the given unit. The title used in writing that property is the one given by the TITLE option in the BUILD command. Only one title may be specified per BUILD command, so two different properties written out will have the same title. As a result, it is advantageous to write only one property at a time with the BUILD command. It is permissible to specify the WRITE option for a property which is being read in. The title is processed the same way title in the analysis command, WRITE, are processed.
The use of difference tables necessitates some different processing for the READ properties and the WRITE option. Since a difference table requires the calculation of two sets of properties, the READ property must use two units. Therefore, the property for the main calculation is read from the given unit, and property for the comparison calculation is read from unit plus 1. For example, if we are comparing hemoglobin to myoglobin, and hemoglobin is the main structure, then READ 24 will read the hemoglobin property from unit 24 and the property data for myoglobin will come from unit 25.
The WRITE option is not permitted with a difference table.
The PRINT command prints a table. There are two different methods available for printing a table. The first method is called pretty printing the table which can only be used for one property at a time. The table is printed in accordance with its hierarchical organization. The second method is the multiple column format. This allows one to print all the data in the table, but in a more amorphous form.
PRINT TABLE {PRETTY [PROPERTY property del] [BYRES] [TAGVAL] [CUTOFF real] } { } {COLUMN [SORT repeat([DESCEND] property del) deldel] } { [IGNORE repeat(property del) deldel] } [SYSTEM string del] [FRAC integer]
In order to pretty print the table, one uses the options associated with the PRETTY as specified in the syntax above. Since pretty printing can only be done with one property, the PROPERTY option must be specified if the table has more than one property. The table printing can be done in four different ways depending on the appearance of BYRES and TAGVAL in the command line. Let us consider the case when neither option is specified.
If we had an arbitrarily wide piece of paper, a readable way of outputting all the data in a table would be to have one column for every tag in the table, one row for every residues' data, and a page for every segment. Each data point would be output under its tag, and it would be easy to see how a particular type of data varied from one residue to the next.
However, we do not have infinitely wide paper. We can accommodate this shortcoming by taking advantage of the fact that some tags appear very frequently. The program selects those tags which appear most frequently and uses these to be the headers of the columns. As each residue's data is printed, the program checks to see if any of the data's tags are the same as any of the headers. If they are, the data is printed in that column. Any data which is not printed in these columns is printed at the right of the page in tag-value format. Tag-value format means that for each data point, the tag is printed alongside the data so that one can see what the numbers stand for.
The TAGVAL option specifies that the entire table is to be printed in tag value format. No selection of column headers is made. The BYRES option causes the residues to be collected into groups of the same name and printed in alphabetical order of the residues. The selection of headers is done on the frequency of tags within each groups rather than with the entire segment. As a result, using the BYRES option will increase the amount of data printed in columns. BYRES and TAGVAL may both be specified which means that the collection of data by residue still occurs, but the data is output using tag value format.
The CUTOFF option is used to mask out numbers whose magnitude is small. When a cutoff is specified and the magnitude of a data point in a column is smaller than the magnitude of the cutoff, an M is printed in place of the number. If a data point is to be printed in tag value format and its magnitude is smaller than the cutoff's magnitude, the tag value pair is not printed at all.
Using multiple column output, the PRINT command can print all of the properties in a table at once. This output format is specified using the COLUMN option and all options associated with it as indicated in the syntax of the command, see section Print Command Syntax. The multiple column output is as follows: Each property in the table is assigned a column. In addition, the segment identifier, the concatenation of the residue name and residue identifier, and the tag are also assigned columns. On output the data in each array is printed along with segment, residue, and tag all in a line. If more than one line of data will fit on the page, then they will packed across the page so they make use of the available space. If the line is too long for the line size as specified for the page, the line will be broken across multiple printer lines and each line of data will be followed by a blank line.
The table is identified with a title which specifies which class table is being printed, and every column of data is titled individually. If the lines of data are too long to fit in the line size provided, the titles over the columns will be broken over several lines as well.
Using this method of outputing the table, it is possible to sort the data. One may use the SORT option specify that the table is to be sorted on particular properties. In addition, one may specify sorting on the segment identifier, residue name, residue identifier, and tag. The default order for sorting is ascending. However, one may specify that a particular property should be sorted in descending order. The IGNORE option may be used to have certain properties not be included in the printout. This option also allows the segment, residue, and tag to be specified. The specifications for these additional properties is as follows:
This section describes options that pertain to both forms of table printing.
The SYSTEM option helps to further denote what is being printed out. The string specified by this option is printed along with the table's title at the top of each page.
The FRAC option gives the number of fractional digits in the format used to output data points in the table. The selection of formats used by the printing routine is automatic in that the width of the format is determined by the magnitudes of largest positive and negative numbers in the table. The default setting for FRAC depends on the table. If the table has just one property which is the NUMBER property or if pretty printing is done with a NUMBER property, the default value for FRAC is zero; otherwise, it is three.
The Fortran I/O unit on which the table is printed, the line size of ultimate print out device, and number of lines per page are all adjustable. section SET -- Modify Analysis Facility Variables, for more information.
This command will add statistical information to the table. Each clause in the command specifies what information is to be gathered, how it is gathered, and where in the table it is placed. Each clause is evaluated in parallel, i.e. any information added by one add clause is not seen as the other add clauses are evaluated.
ADD repeat(add-clause deldel) add-clause ::= STATS repeat(stat-option) del { ALLTAG } [COLLECT { EACHTAG } del ] {name [EXCEPT] repeat(tag-pat) } [PLACE [RESIDUE] [SEGMENT] [STRUCTURE] del] { AVE } { RMS } { SUM } stat-option ::= { SD } { MIN } { MAX } { M3 } { M4 } { ALL } name ::= string with no blanks tag-pat ::= string with no blanks
The STATS part of add clause determines what statistics are collected. The following table gives the meaning of each of the `stat-options':
The COLLECT clause specifies what the data the statistics are collected over and, indirectly, how the statistics are tagged. There are three options permitted.
The ALLTAG option mean that statistics are to be collected for every data point irrespective of tag. The tag assigned to a particular statistic is the same as the `stat-option' (except of course for ALL).
The EACHTAG option specifies that statistics are to be collected individually for each tag that appears. The tag for these statistics is the tag of the data point followed by a colon followed by the `stat-option'.
The final option allows one to select a set of tags for which data is collected and do statistics on that set. The tags are specified using tag patterns which allow the same wild cards as atom or cell selections, see section Atom Selection. The tag used for this option is the same as the name specified in the command, and therefore, specifying more than one `stat-option' will result in an ambiguous table. The use of EXCEPT with this option allows one to collect data over all tags except those specified. This option may be used for collecting statistics over the backbone or sidechain of a protein.
The range of data over which the statistics are collected is determined by the PLACE option described in the next node.
The PLACE option determines two things - the range of the table over which the collection of data occurs and where the statistics get added. When RESIDUE is specified, the statistics are collected over every residue and the new data is placed in every residue. Therefore, with ALLTAG, the statistics are collected for every data point in each residue. With EACHTAG, data is collected for each tag separately, and statistics for each tag are added to every residue (this particular combination of options is not very desirable.). With named collection, the data is collected for all matched tags in each residue and then added to each residue.
When SEGMENT is specified, the same rules apply as for PLACE'ing in residues except that the collection is over segments and statistical information is placed differently. A new residue is created in every segment. The residue has the name STAT and identifier SEG. All the statistical information for the segment is placed in this residue.
When STRUCTURE is specified, the collection operations occur over the whole table. A new residue is in the last segment of the table whose name is STAT and whose identifier is ALL. The statistics are placed in this residue.
Each of these options may be used in any combination in an add clause, and there is only one minor limitation in combining add clauses in an ADD command. That limitation is that EACHTAG may be specified only once for each possible placement.
The DELETE command is used to delete information from a table. This command is useful for omitting data from further analysis because one is not interested in it at the time. For example, if one is interested only in the results of a statistical calculation made by the ADD command, one can delete everything except the statistical information using the DELETE command.
DELETE [COMPare] [SEGID [EXCEPT] repeat(segid) del] [RESID [EXCEPT] [ALLSEG] repeat(resid) del ] [segid ] [RESNAME [EXCEPT] [ALLSEG] repeat(resname) del ] [segid ] [TAGS [EXCEPT] [ALLSEG] {ALLRES} repeat(tag) del ] [segid ] {resid } { { LT } } { { GT } } [VALUE [ABS] [PROPERTY propst del] { { LE } real } deldel ] { { GE } } { { EQ } } { { NE } } [IDENT cell-selection del] cell-selection ::= atom-selection
Syntactic ordering: The ordering of options in the sub commands cannot be changed except for the VALUE sub-command; there, the ordering is immaterial.
For the syntax of the `atom-selection', please see section Atom Selection.
Each of the six options specifies deletion by different rules. However, in many of the options, the word, EXCEPT, may be specified. Its usage dictates that the complement of whatever has been specified will be deleted. This allows one to specify what parts of the table one wishes to keep in a concise way.
Deletion by the SEGID option specifies that entire segments in the table are to be deleted. One specifies what segments to delete by the segment identifier. If EXCEPT is specified, then only those segments which are named are kept; all the others are deleted.
Deletion by RESID or RESNAME option specifies that residues are to be deleted. Each option may specify either one segment from which residues are deleted or all segments using the ALLSEG word. The difference between the two options is that RESID specifies deletion by residue identifier; RESNAME specifies deletion by the residue name.
The EXCEPT sub-option works in a more complicated way for these two commands. In order to understand what is done, we must examine the algorithm more closely. Before these two commands are interpreted, two marking arrays are allocated. All residues which are specified by the RESID sub-command are marked in the first mark array, and all residues specified in the RESNAME command are marked in the second mark array. If EXCEPT is specified in the RESID option, then its mark array is complemented. Likewise, if EXCEPT is specified in the RESNAME option, then its array is complemented. Then, any residue which is marked in either array is deleted.
For example, if we build a table on atom properties of pancreatic trypsin inhibitor (PTI), the command,
DELETE RESID EXCEPT 1 2 3 4 5 6 7 8 9 10 - 11 12 13 14 15 16 17 18 19 20 $ - RESNAME EXCEPT GLY ALA
will delete every residue after the twentieth and every residue which is not a glycine or an alanine. In other words, the table will consist of the glycine and alanine residues in the first 20 residues of PTI.
There are three methods provided for deleting data in a table, by tag, by value, or by identifiers. By tag allows you select data for deletion based on its tag in the table; by value allows you to delete data based on a relational test on values of properties; by identifier allows you to delete by all the identifiers in the table using the atom selection syntax described in section Atom Selection.
The TAGS option specifies the deletion of data by tag. After the optional EXCEPT, one may use ALLSEG or a segment identifier. ALLSEG means that the search for the tag will go over all segments; a segment identifier means that the search will take place over only the specified segment. Next, one must specify either ALLRES or a residue identifier. If we specify ALLRES, then the search is done over all residues within the searchable segments. If a residue identifier is specified, then only residues with that identifier will be searched. Finally, all data points within the searched residues which have tags as specified in the command will be deleted. If EXCEPT is given, then all data points which are found during the search are kept, and all others are deleted.
The VALUE option specifies deletion by property values. The marking of cells to be deleted is done by comparing values for properties against a real number you specify using a relational test you specify. All six Fortran relational tests are specified with the data in the table going on the left side of the comparison and the number you specify on the right. For example, LT 10 will delete all cells in which the property value is less than 10. The relational test is obligatory for this command.
The specification of a property to be checked is optional. If a property is specified, only that property's values will be checked; otherwise, all properties in the cells will be checked. The ABS option specifies that the absolute values are to be taken of numbers before they are compared.
For example, the following two commands will produce a difference torsion angle table where all torsion angle changes whose magnitude is less than 10 degrees will be eliminated. This would be useful for looking for conformational changes between coordinates.
BUILD DIFF TORSION GEOMETRY $ ENERGY $ DELETE VALUE PROPERTY GEOMETRY $ ABS LT 10 $$
The IDENT option allows deletion based on all the identifiers at once. A variant of the atom selection syntax is used, see section Atom Selection. Recall that the structure of a table mirrors that of the PSF -- segments composed of residues composed of cells (atoms in the PSF). For this option, the table is treated as if it were a PSF where the tags replace the atom names. The same atom selection syntax can then be applied. Any tags included in the selection are deleted from the table. The initial state of the selection is empty, ie. DELETE IDENT del does nothing.
For example, to set up a table which has data only from residues whose resid is divisible by 5, one would build the table and then use the following delete command:
DELETE IDENT CLEAR CELL * #5 * CELL * #0 * $
Once the deletion process is complete, all empty entries in the table are deleted. I.e., any residues which have no cells are deleted, and any segments which have no residues are deleted. This simplifies the use of the SELECT command, see section SELECT Command -- Select Data from the Table, when it is used for making two dimensional scatter plots.
The form of the DELETE command for difference tables is similar to that of standard tables except that one must decide where segment and residue identifiers come from. In a difference table, there are two sets of these identifiers along with two sets of residue names. If you specify nothing, the command will use the identifiers from the main set of data structure. On the other hand, if you specify COMPARE in the DELETE command, the identifiers will be taken from the comparison identifiers.
Go to the previous, next section.