Go to the previous, next section.
The commands described here are used for reading and writing data structures used in the main part of CONGEN. Some of data structures used in the analysis facility may also be read and written, see section Reading and Writing Analysis Data.
This command reads data into the data structures from external sources. The external sources can be either text files (card images for the ancient among us) or binary files. The fortran unit number from which the information is read, is specified with the unit-spec. Specifying UNIT 5 indicates that the data to be read following the command in the input stream.
The precise format of all these files is described only in the source code as that serves as the only definitive, accurate, and up to date description of these formats. The description of the data structures provides pointers to the subroutines which should be consulted, see section Data Structures.
READ { {RTF [PRINT] } { FILE unit-spec } } { {PARAmeter parm-opts } { CARD [unit-spec] } } { {IC [APPEnd] } { 'UNIT 5' } } { } { {SEQUence} seq-spec } { {RESIdue } } { } { {HBONd } [FILE] unit-spec } { {PSF } } { {CONStraint} } { {NBONd } } { } { { IMAGes } [CARD] unit-spec } { } { [ MAIN ] } { COORdinate coor-spec [ COMP ] } { [ DIFF ] } unit-spec ::= UNIT unit-number parm-opts ::= [NOFAIL] [VERSION int] [ST2 int ] seq-spec ::= [WATEr int ] [[source-spec] [unit-spec] [rtf-type] [abbrev-spec] seq-opts] seq-opts ::= [BYATom] [IDREad] [MODEl modelid] { CARDs } source-spec ::= { brookhaven-name [CHAIN segid]} { BROOkhaven } brookhaven-name ::= { BRKHvn } { TAPE } { PROT } { HPRO } { ALLH } rtf-type ::= { DNA } { A94N } { A94P } {AA } abbrev-spec ::= [ABBREV {DNA}] {RNA} coor-spec ::= { FILE unit-spec [IFILE int] } coor-option { CARD [unit-spec] [OFFS int ] } { IGNO [unit-spec] } { KONN [unit-spec] } { } { { BROOkhaven } unit-spec [ SEQUence] } brk-option { { BRKHvn } [NOSEQUence] } coor-option ::= [APPEnd] [INITial] [EXPAnd] [abbrev-spec] atom-selection brk-option ::= [CHAIN segid] [IDREad] [MODEl modelid] [ALTErnate identifier]
Syntactic ordering: The second field must be specified as shown.
The specification of SEQUence or RESIdue causes the program to accept a sequence of residue names to be used to generate the next segment in the molecule. There are four sources of sequence information. The first source is a CONGEN format sequence file which has the following syntax:
title number-of-residues repeat(residue-names)
The form of the title is defined in the syntactic glossary, see section Glossary of Syntactic Terms. The number of residues is specified on the line following the title in free field format. If the number of residues you specify is less than zero, CONGEN will read residues until it encounters a blank line or end of file. If the number is greater than zero, it will also stop once it has read at least as many residues as you've specified. If the number you specify is zero, you will get a warning message as one common error is to forget the number entirely. In this case, the first residue name will be consumed as the number and converted to zero.
The residue names are specified as separate words, each no longer than 4 characters, on as many lines as are required for all the residues. This sequence may be placed immediately following the READ command if the unit number is 5 or may be placed in a separate file.
The second source of sequences is a CONGEN coordinate file in CARD format. Currently, the BYATom option reads all residues within the file for inclusion in the sequence.
The third source of sequence information is a Brookhaven Protein Data Bank file. The BROOKHAVEN, BRKHVN, and TAPE options allow the sequence to be read from the SEQRES records in a Brookhaven protein data bank coordinate file. (TAPE is used because the Brookhaven protein data bank used to come on a tape.) If the CHAIN option is specified, then only the sequence of chain with the specified segid is read. Otherwise, the sequence of all the chains will be read together. Note that the Brookhaven format only allows single letter chain names, so your segid should only have one character.
Alternatively, the sequence may be read directly from the ATOM records by using the BYATOM option. Under the BYATOM option, if there are insertions or deletions in the within the list of residue idenitifiers, the IDREAD option will read the sequence identifiers, including insertion codes, directly from the Brookhaven file, rather than automatically generating a residue number based on sequential order. It should be noted that currently, the IDREAD option conflicts with the DISULPHIDE command, since this command assumes that the residue identifiers are those generated automatically. The MODEL option may also be used in conjunction with BYATOM to read the sequence from a particular model number in the file. If not specified, the first model in the brookhaven file is used.
The final source of sequences are the two water options. The WATEr option allows a sequence of water molecules to be specified. The integer which follows the keyword gives the number of waters. Likewise, the ST2 option allows ST2 waters to be specified. Obviously, no sequence on separate lines need be given. For CONGEN topology files, a residue named OH2 (or ST2) must be present. For AMBER94 topology file, a residue named HOH must be present. If these residues are missing, the GENErate command called afterwards will fail.
When reading is complete, CONGEN will list all the residues it has read.
The options; PROT, HPRO, ALLH, and DNA; specify what type of CHARMM potential file is being read. They are very important because they specify which patching operations are to take place on the segment once it is generated. The patching operations involve correcting the linkage of prolines, and correcting the charges and chemical types of the ends of the segment. PROT signifies that we are using an extended atom residue topology file as the source of residues. HPRO signifies that we have an explicit hydrogen topology file being used. ALLH signifies that we have an all hydrogen topology file. DNA specifies that we are working with the DNA topology file.
In addition, these options may cause additional residues to be added to the sequence. These additional residues serve to terminate the segment. However, if the segment is generated cyclically (see section The Generate Command - Construct a Segment of the PSF), then no termini will be added. In particular, PROT will add a CTER residue that has the C-terminal oxygen. HPRO and ALLH will add a CTER residue along with an NTER residue that holds two additional hydrogens for the N-terminus. DNA will add a 5TER to the beginning and a 3TER residue to the end of the segment.
If an AMBER94 topology file is being used, then the keywords, A94P or A94N, should be specified to indicate whether a protein or nucleic sequence is being read. Use of these keywords will then result the correct terminal residues being used at the ends of the segment. See section The Generate Command - Construct a Segment of the PSF, for more information about this process.
The ABBREV option allows the specification of residues using one letter abbreviations. When the AA keyword is specified, one letter amino acid codes can be used. For RNA and DNA, one letter nucleotide names will be translated into the appropriate two letter AMBER94 residue names.
The reading of coordinates is done with the READ COOR command, and there are several options (which may change over in future versions).
There are four possible file formats that can be used to read in coordinates. They are coordinate binary files, dynamics coordinate trajectories, coordinate card images, and Brookhaven Protein Data Bank files.
For all formats, a subset of the atoms in the PSF may be selected using the standard atom selection syntax. For binary files, this is a risky maneuver, and warning messages are given when this is attempted. Only coordinates of selected atoms may be modified. When reading binary files, or using the IGNORE keyword, coordinate values are mapped into the selected atoms sequentially (NO checking is done!).
The reading of the first two file formats is specified with the
FILE option. The program reads the file header to tell which format it
is dealing with. The coordinate binary files have a file header of
COOR
and contain only one set of coordinates. These are created with a
WRIT COOR FILE command. The dynamics coordinate trajectories have a file
header of CORD
and have multiple coordinate sets. These files are
created by the dynamics function of the program. To specify which
coordinate set in the trajectory to be read, the IFILE option is
provided. One specifies the coordinates position within the file. The
default value for this option will cause the first coordinate set to be
read.
For binary files, the APPEnd command will 'deselect' all atoms up to the highest one with a known position. This is done in addition to the normal atom selection. This is useful for structures with several distinct segments where it is desirable to keep separate coordinate modules.
The CARD file format is the standard means in CONGEN for providing a human readable and writable coordinate file. The format is as follows:
title NATOM (I5) ATOMNO RESNO RES TYPE X Y Z (repeated NATOM times) I5 I5 1X A4 1X A4 F10.5 F10.5 F10.5
The title
is a title for the coordinates, see section Glossary of Syntactic Terms. Next comes the number of coordinates. If this number is zero
or too large, the entire file will be read. Finally, there is one line
for each coordinate. The coordinates, but not the initial lines,
may contain blank lines for readability
ATOMNO
gives the number of the atom in the file. It is ignored
on reading. RESNO
gives the residue number of the atom. It must be
specified relative to the first residue in the PSF. The OFFSet option
should be specified if one wishes to read coordinates into other positions.
The APPEnd option adds an additional offset which points to the
the residue just beyond the highest one with known positions. This option
also `deselects' all atoms below this residue (inclusive).
For example, if one is reading in coordinates for the second segment of a
two chain protein using two card files, and the APPEnd option is used,
RESNO
must start at 1 in both files for the file reading to work
correctly.
It should also be remembered that for card images, residues are identified by residue number. This will change someday. What this implies, is that if one wishes to read coordinates from an extended atom (PROT) RTF into a structure using an explicit hydrogen (HPRO) RTF, the OFFSet keyword MUST be used to shift the residue numbers by one, (to make room for the NTER) so that the residues will line up. If the reverse process is required, an OFFSet value of -1 is called for.
RES
gives the residue name of the atom. RES
is checked against
the residue name in the PSF for consistency. TYPE
gives the IUPAC name
of the atom. The coordinates of an atom within a residue need not be
specified in any particular order. A search is made within each residue
in the PSF for an atom whose IUPAC name is given in the coordinate file.
The MAXERR option controls how many error messages are printed. Its default value is 10. Normally, the coordinate reader will scan the entire file, and it will list errors as it encounters them, until to the MAXERR limit. At the end of reading, it will terminate execution if any fatal errors were encountered.
The KONN option allows the reading of Konnert Hendrickson format files. The file consists of just atom records where each atom coordinate has the following format:
Res Segid Resid Iupac X Y Z 3X,A4, A1, A3, A4, 3F10.5
The four alphabetic fields are left justified by the program so
they can be placed anywhere within their columns. If the Segid
is not
specified, the program will attempt to place the atoms within a segment
which is determined by the APPEnd option (above). If APPEND is not
specified, then the first segment in the structure will be used. If APPEND
is specified, then the first segment which has a residue with all
undefined atoms will be used. Blank lines may be specified between coordinates.
Note that the Segid
and Resid
fields are too small to hold the
maximum length values. Truncations will cause unavoidable problems.
However, residue identifiers NTE and CTE are extended to NTER and CTER.
The BROOKHAVEN option (or its synonyms, TAPE or BRKHVN) specify that the coordinate file is in the Brookhaven Data Bank format. CONGEN can read the ATOM records for coordinates. However, because the Brookhaven format uses slightly different naming conventions, there are a number of inconsistencies you should be aware of when using this option:
Reading Brookhaven file format is not straightforward, so check the coordinates after they are read to see if there are correct. Energy evaluations (see section Minimization and Dynamics followed by analysis of the geometric terms (see section The Analysis Facility of CONGEN) are a useful way to do this. Also, the brkchm command (see section brkchm -- Converting Brookhaven to Congen Format) is an alternate way of converting Brookhaven files into a form that can be edited.
The IGNORE option allows one to read in a card coordinate file while bypassing the normal tests of the residue name, number, and atom name. When IGNORE is specified in place of card, the identifying information is ignored completely. Starting from the first selected atom, the coordinates are copied sequentially from the file.
Normally, the coordinates are not reinitialized before new values are read, but if this is desired, the INITIALIZE keyword, will cause the coordinate values for all selected atoms to be initialized. Note that only atoms that have been selected, will be initialized. The COOR INIT command provides a more general way to initialize coordinates.
The EXPAnd option should be specified if the following conditions apply:
In this case, the coordinates will be shuffled in order to leave room for the hydrogens. The hydrogen bond generation routine, section HBUILD Command, or the builder routines, section The Internal Coordinate Commands, must be called to construct the positions of these hydrogens.
It is also possible to read coordinates into the comparison (or reference) set using the COMP keyword. The DIFF keyword will read coordinates into the coordinate differences (also referred to as the normal mode arrays). It expected that these "coordinates" are really displacements that will be processed by the vibrational analysis command, see section Vibrational Analysis.
Currently, CONGEN will perform a limited set of name translations on any formatted coordinate reading operation. The isoleucine translations are not needed for the AMBER 94 topology file, see section AMBER94RTF. represent common differences in nomenclature:
The ABBREV option allows the specification of residue names using one letter abbreviations. When the AA keyword is specified, one letter amino acid codes can be used. For RNA and DNA, one letter nucleotide names will be translated into the appropriate two letter AMBER94 residue names.
Finally, the reading of coordinates is always a tricky business. Although standards exist for naming conventions, there are enough minor variations to make the situation difficult. Always check the structure after reading coordinates to ensure that the geometries and energies are reasonable.
In 1995, the parameter file format was completely redone to accomodate the addition of the AMBER potential, see section AMBERPARM. The new format is free field and allows patterns to be supplied for parameters in order to reduce the size of the file and to allow for default parameters to handle molecules which have not been seen before.
For the bond length, bond angle, torsion angle, and improper torsion parameters, CONGEN stores a patterns to match the atom types along with the relevant force field parameters. When the programs needs to calculate the energy of any internal coordinate, it goes sequentially through the patterns, and upon finding the first pattern which matches the atom types of the internal coordinate with the highest "specificity", it uses the corresponding parameters. In this context, "specificity" means the sum of the specificities of each atom type pattern. The specificity of an atom type pattern is 0 if it's a complete wildcard, "*"; 0.5 if any wildcards are present, and 1.0 if there are no wildcards at all. This scheme allows parameters to be specified in different levels of generality, with specific parameters taking precedence over general ones.
In the case of hydrogen bond and non-bonded interactions, all the possible combinations of atom types are computed, and tables for the parameters are constructed. Patterns are used to match against the atom types when the tables are computed.
Parameters are stored as strings with the first character being either "S" or "P", which means string or pattern, respectively. If a pattern is a string, then the program does a simple string comparison; otherwise, a wildcard match is used, see section Interpretation of Atom Selection Tokens. CONGEN checks the patterns you specify to see whether a pattern or string has been specified.
Parameter files can be read in either text or binary format. For text files, the version can be set using the VERSION keyword on the READ PARAMETER command. The default value of 2 specifies the old format. The new version is specified by using a value of 3. For binary files, the header record indicates which version of parameter file is being read.
The text format for the parameter file begins with a title, see section Glossary of Syntactic Terms, followed by a set of free field commands, and terminating with the end of the file or an END statement. The purpose of the commands is to fill the various parameter arrays. The commands are described below:
BOND repeat(word word) FORCe real DISTance real
The BOND command adds bond parameters. The bond energy term for one bond is given by
The force constant is given by the FORCE keyword and the equilibrium bond length is given by the DISTANCE keyword. Each pair of words is treated as a separate entry in the bond parameter arrays, so it is possible to specify the same parameters for many bonds.
{ANGLE} repeat(word word word) FORCe real ANGLE real {THETA}
The ANGLE command adds angle parameters. The angle energy term is given by Bond angles are defined over triplets of atoms. The force constant is given by the FORCE keyword and the equilibrium angle is given by the ANGLE keyword. Each triplet of words is treated as a separate entry in the angle parameter arrays, so it is possible to specify the same parameters for many angles.
{TORSION} repeat(word word word word) repeat(torsion-term) {PHI } torsion-term ::= TERM FORCe real PHASe real PERIod real MULTiplicity int END
The TORSION command adds torsion angle parameters. The torsion angle term has the following form
Torsion angles are defined over quadrulets of atoms, and there can be multiple terms per torsion angle so that complex torsions can be established. Each term is specified by strings beginning with TERM and ending with END. The force constant for each term is given by the FORCE keyword. The phase is given by the PHASE keyword. The periodicity is given by the PERIOD keyword, and limited to values of 1, 2, 3, 4, and 6. The multiplicity is given by the MULTIPLICITY keyword, and is most useful in using the AMBER force field, see section AMBERPARM. At least one term must be specified for a torsion angle. Each quadruplet of words is treated as a separate entry in the torsion parameter arrays, so it is possible to specify the same parameters for many torsions.
{IMPROPER} repeat(word word word word) FORCe real {PHASe real PERIod real} {IMPHI } {MIN real }
The IMPROPER command adds improper torsion parameters. If the dihedral form of improper torsion is selected, the improper torsion term use the torsion angle term given above. If the harmonic form of the improper torsion is selected, then the improper torsion energy term is given by Improper torsions are defined over quadruplets of atoms. The force constant is given by the FORCE keyword. If the dihedral form of the energy is used, then the phase and period are given by the PHASE and PERIOD keywords, respectively. The multiplicity is set to 1. If the harmonic form is used, then the equilibrium improper torsion angle is given by the MIN keyword. Each quadruplet of words is treated as a separate entry in the improper torsion parameter arrays, so it is possible to specify the same parameters for many improper torsions.
HBOND repeat(word word) {EMIN real RMIN real } {CREPulsive real CATTractive real}
The HBOND command adds hydrogen bond parameters. The form of the hydrogen bond term is given by
There are two different ways to calculate hydrogen bond energies. The form in the old CHARMM potential uses the distance between the heavy atom attached to the donor hydrogen and the acceptor, and angular term based on the heavy atom donor, donor hydrogen, acceptor angle. The form used by the AMBER potential uses the distance between the hydrogen and the acceptor, and no angle term. The DEFAULT command described below allows you to switch from one form to the other.
There are two ways to specify the two coefficients. They may be specified directly using CREPULSIVE to specify the first coefficient, and CATTRACTIVE for the second. The second way is to specify the minimum energy, keyword EMIN, and minimum energy distance, keyword RMIN, and CONGEN will compute the coefficients for you.
The pairs of words in each command specifies pairs of atom type patterns to be used for setting the coefficients. The first pattern in the pair gives the atoms types for the donor, being heavy atom or hydrogen. The second pattern gives the acceptor.
The actual process of setting hydrogen bond parameters is complicated by the requirement for constructing a table of hydrogen bond codes so that hydrogen bond codes can be looked up rapidly. Pseudocode for the operation is as follows:
For Ih = 1 to Number of Hydrogen bond patterns For I = 1 to Number of Atom Types (NATC) If atom_type(I) matches pattern(1,Ih) For J = 1 to NATC If atom_type(J) matches pattern(2,Ih) HBCODE = I*NATC+J-1 if (HBCODE is not in current list of HB codes) add new HBCODE and coefficients. fi fi done fi done done
{NBOND } repeat(word) [EMIN real ] {NONBONDED} [RADIUS real ] [ALPHa real ] [NEFF real ] [CREPulsive real ] [CATTractive real]
The NBOND command adds non-bonded energy parameters. The nonbonded energy function is Non-bonded energy parameters are specified only by atom types, and mixed parameters are specified using the combination rules in the CHARMM paper, see section Introduction to Congen, for the reference.
In each NBOND command, the words are atom type codes. The options have the following meanings:
The parameters can be specified in three different ways; by the 6-12 coefficients (CREPULSIVE and CATTRACTIVE, by minimum energy (EMIN) and radius (RADIUS), or by radius (RADIUS), number of effective electrons (NEFF), and polarizabilities (ALPHA). The program does not check if you overspecify options, so pick one method and use it consistently.
DEFAULT [IMPRoper [COSIne ] [NOSYmmetry] END] [HARMonic] [SYMMetry ] [HBOND [H-A] END] [D-A] [NBOND [VDW14 real] [EL14 real] [HBEXclude] END] [HBINclude]
The DEFAULT command is used to set defaults which pertain how some of energy terms are calculated. These defaults are set in the parameter file because the parameters are developed as an integrated whole. Settings in the DEFAULT command are an integral part of any parameter file.
From the syntax, it can be seen that there are three different energy terms to which these defaults can apply. The IMPROPER options control the following aspects of the improper torsion energy:
The HBOND default options control which distance is used in the hydrogen bond energy. If D-A is specified, the distance is calculated between the heavy atom donor and the acceptor, and the angular term is included. In addition, the parameterization is done based on the heavy atom donor and acceptor. This is the CHARMM form. If H-A is specified, the distance is calculated between the hydrogen and the acceptor, and no angular term is included. The parameterization is done based on the hydrogen and the acceptor. The default is D-A
The NBOND default options control scaling for 1-4 interactions and the inclusion of van der Waals energies for hydrogen bond pairs. 1-4 interactions are non-bonded interactions of atoms connected by three bonds (see section NBXMOD -- Automatic Generation of Non-bonded Exclusions, for more information). The VDW14 keyword sets the scale factor for the van der Waals energy of 1-4 interactions. The EL14 keyword sets the scale factor for 1-4 electrostatic interactions. The default is 1.0 for both of these scale factors. In the AMBER potential, they are set to 0.5. The HBINCLUDE keyword specifies that van der Waals interactions will be calculated for atoms involved in hydrogen bonds. This is the default. The HBEXLCUDE keyword specifies that van der Waals interactions will be turned off for all possible atom pairs specified as possible hydrogen bonds. This is the default for the AMBER potential. Warning: you must ensure that the hydrogen bond distance cutoff is positive when this option is in use. Otherwise, it is possible to generate infinite energies if a charged hydrogen and its acceptor get too close together.
PRINT [ON ] [OFF]
The PRINT command turns on the echoing of commands in the parameter file and the display of all non-bonded parameters. It is useful for debugging. It is off by default.
END
The END statement terminates the parameter file.
In the format for text parameter file, the data is divided into sections beginning with a keyword line and followed by data lines. The sections may be arranged in any order, and may divided up as well. Just prefix each set of data with the appropriate keywords. The format for each data section follow along with the necessary keywords. Please look at the parameter input files in the `CGDATA' directory for examples.
Keyword Format BOND - atom atom force_constant distance (2(A4,1X),2F10.0) THETA - atom atom atom force_constant theta_min (3(A4,1X),2F10.0) PHI - atom atom atom atom force_constant periodicity phi_max (4(A4,1X),3F10.0) IMPHI - atom atom atom atom force_constant i_phi_min (4(A4,1X),2F10.0) NBOND - atom polarizability n_effective_electrons vdW_radius (A4,1X,3F10.0) HBOND - atom atom well_depth distance (2(A4,1X),2F10.0)
Note that the data lines are NOT free field. However, you can add comments using the exclamation point, see section Controlling a CONGEN Run. Sections end with the occurrence of another keyword or a line with the word END in columns 1-3, the latter terminating parameter reading. Blank lines are allowed in all the sections.
Some errors in the input file will result in warning messages but not termination of the run.
CONGEN will check for duplicate parameters. If all the corresponding values for a duplicate parameter are the same, then only a warning message is issued. Otherwise, an error message will be issued.
Any errors detected in the reading of the formatted parameter file will result in termination of the run, unless NOFAIL is specified on the READ command.
phi_max
is either 0.0 or 180.0 for dihedrals with the minimum staggered
or eclipsed respectively.
If successive torsion angle or improper torsion angle parameters are specified with all four atoms and have the same atoms, this is a flag that the energy is to be computed as a sum of these multiple terms. For this special processing to be done, the PSF (or topology file used to generate the PSF) must have successively equal torsion or improper torsion angles which correspond to the parameters. In order to use this option, you must specify NOFAIL on the command line.
NBOND parameters must be present for all of the atom types. The program attempts to check this when reading either card image or binary parameter files.
Here is a description of what is in residue topology files (as
they are stored in text files). You may use this format if you specify
the CARD option in the READ command. The format of binary files depends
on the current implementation of the RTF data structure.
See the file `RTF.FCM' in the CONGEN source directory for
more details. These files are read by RTFRDR
, a subroutine in RTFIO
which
should be be consulted for formats and the final word on what is
actually done with these files.
The purpose of residue topology files is to store the information for generating a representation of macromolecule from its sequence. For each residue, CONGEN requires a description of all bonds, bond angles, dihedral angles, improper torsion angles, partial charges, chemical types, and hydrogen bond donors and acceptors. By linking residues in the sequence together, segments in the molecule are constructed.
The linkage between successive residues is determined when the segments are generated (see section The Generate Command - Construct a Segment of the PSF). It is specified in the residue by using special prefixes on the atom names which refer to residues either ahead of or behind the current residue. In the case of cyclic segments, the program will wrap references around the cycle.
The residue topology files begin with `rtop'. There are two forms, binary module (`.mod') and card format (usually `.inp'). The card format files are used only for creating binary modules and therefore are structured as input files for CONGEN, beginning with a run title and the command READ RTF CARD, followed by the actual topology file.
The first section of the topology files is a title section in the usual format of up to ten lines delimited by a line containing only a * in column 1.
The next line is a set of up to 20 numbers of which the first number gives the topology file format version number. This number be set to 200 for CONGEN to read the remaining file correcting in free field format. If some other number is present or the number is missing, the program will attempt to read the topology file in the current format.
The remaining information is read in free field format as commands to define the RTF. The ordering of the commands is important in that some information is needed to define others (i.e. the atoms of a residue should be defined before the bonds between them). The recommended structure of this file is:
Initial setup: TYPE declaration MASS specification for each atom type (also hydrogen bond donor and acceptor classifications) DECLarations of out of residue definitions ORDEr specification for atom order. SET command for charge patching For each residue: RESIdue name and total charge specification ATTRibute option to specify ATOM definitions within this residue BOND specifications ANGLe specifications DIHEdral angle specifications IMPRoper dihedral angle specifications DONOr specifications ACCEptor specifications BUILD information GROUPing definitions GENErate options COPY option to copy information from other residues Closing: END statement Display control: PRINT option
The format above is not rigid.
There exists the facility to automatically generate the bond angles, torsion angles, hydrogen bond donors and acceptors, and the BUILD information. It is also possible to delete terms that are generated automatically, and therefore, it is possible to correct any deficiencies in the automatic schemes.
It is important to understand how to make references to adjacent residues when constructing a topology file. The following table lists the possible prefixes which may used. Note that the actual atoms referenced are not determined until the generation of segments, see section The Generate Command - Construct a Segment of the PSF.
No reference will be made beyond the end of a segment. In the case of cyclic peptide, CONGEN will wrap around the sequence in the appropriate direction.
{ PROT } { HPRO } TYPE { ALLH } { DNA } { UNKN }
This option sets the type of the residue topology file. When a segment is generated, the program will check to see that the sequence type matches the RTF type, provided that a sequence type is specified.
MASS int word real [ACCEptor] [DONOr ]
The MASS command specifies the chemical types of atoms, their names, their masses, and optionally whether they are hydrogen bond donors or acceptors. This command is one of the most important in the topology file because is specifies all the permissible atoms in any system.
The int is the numerical chemical type code as used in
the parameter file, see section The Format of Parameter Files. Its value may not exceed the
parameter MAXATC
, which is currently 100. The
word is the chemical type name, and this symbol is used
in the parameter file. It can also be referenced when analysis tables
are built, see section Syntax of the BUILD Command. The real number
specifies the mass of each atom type in Atomic Mass Units.
Finally, the optional keywords, ACCEPTOR and DONOR, indicate
when the atom can participate in a hydrogen bond as an acceptor or
donor, respectively. These finally keywords are used only by the RTF
GENERATE command, see section RTF Generate Command.
DECLARE word
When a formatted RTF file is read, CONGEN will check to see that all components of a residue refer to atoms within that residue, and will issue an informational message if they are not. However, since all polymeric structures will have linkages between residues, there will be atoms which refer outside of each residue.
The DECLARE command informs the program that atoms whose name is word is a linkage atom, and that CONGEN should not issue a message for such messages. Aside from these messages, the DECLARE command is optional.
SET word real
When residues in the topology file are patched when a segment is made from them, some atomic charges must be adjusted. Currently, the program calculates the correct charges for the explicit hydrogen and extended atom topology files, see section Residue Topology Files. However, with other topology files, it does not.
The SET command provides a mechanism for assigning these charges in the topology file. The user specifies a variable name as the first operand in the command, and a charge value or adjustment in the second. These variable name, charge value pairs are stored with the topology file. If no variable name is available, then a default value is used which will give the correct values for the explicit hydrogen topology files.
The following table gives the currently used variables, default values, and their meaning.
Variable Default Meaning CG_C3_PRIME 0.084 Charge increment for the 3' carbon in DNA. CG_C5_PRIME 0.092 Charge increment for the 5' carbon in DNA. CG_DISU_CB 0.022 Charge for the beta carbon in a disulphide linkage. CG_DISU_SG -0.032 Charge for the gamma sulfur in a disulphide linkage. CG_FIRSTCA 0.020 Charge increment for the amino terminal alpha carbon. CG_FIRSTN 0.710 Charge increment for the amino terminal nitrogen in a protein. CG_LASTC 0.030 Charge increment for the carboxy terminal carbonyl carbon. CG_LASTO -0.200 Charge increment for the carboxy terminal carbonyl oxygen. CG_O3_PRIME 0.163 Charge increment for the 3' oxygen in DNA. CG_O5_PRIME 0.147 Charge increment for the 5' oxygen in DNA. CG_PRO_FIRSTN 0.085 Charge of the amino nitrogen in an amino-terminal proline. NOTE: the presence of this variables signifies that prolines get special treatment. If you omit it, then none of the CG_PRO variables will have any effect. CG_PRO_FIRSTCD 0.079 Charge of the delta carbon in an amino-terminal proline. CG_PRO_FIRSTCA 0.095 Charge of the alpha carbon in an amino-terminal proline. CG_PRO_IHT1 0.225 Charge of the first amino terminal peptide hydrogen in an amino-terminal proline. CG_PRO_IHT2 0.225 Charge of the second amino terminal peptide hydrogen in an amino-terminal proline.
This particular mechanism used for charged should be viewed a temporary measure. There are plans to replace it with a more robust patching scheme.
ORDER repeat(word)
Certain operations within CONGEN expect the atoms in a residue to appear in a given order. Examples of such commands are the RANGE options in an atom selection, see section Atom Selection, or the reading of binary coordinate files, see section Reading coordinates. The ORDER command permits you to specify the atom order for all current and succeeding residues in the topology file.
Each word in the ORDER command is interpreted as a wild card atom selection token, see section Atom Selection. When each residue is completed, all the atoms in the residue are matched against the words in the ORDER command. An exact match takes precedence over a wildcard match, and the last match takes precedence over earlier matches. The order of the atoms in the residue is then rearranged based on the matches into these words. If a set of atoms match the same word in the ORDER command, then these will be ordered according to their original order in the topology file.
As an example, the command
ORDER N H CA * C O
will put the amino nitrogen and hydrogen and the alpha carbon of an amino acid first, the sidechain atoms in the middle, and the carbonyl carbon and oxygen at the end of each residue.
The ORDER command takes effect starting with the next residue completed. Thus, at the beginning of a file, it affects the entire file. If specified after a RESIDUE command, it will affect that residue.
RESIDUE word [real]
The RESIDUE is used to start a new residue. If another residue has already been specified, then it is completed. The word in the command specifies the residue name, and the optional real number specifies the total electrostatic charge of the residue. If no total charge is specified, then 0.0 is assumed.
COPY word [INVERT]
The COPY command is used to copy the information stored for a previous residue into the current residue. The name of the previous residue is given by the word following the COPY verb. It can only be used at the beginning of a residue specification. The INVERT option cause the program to invert all the torsion angles specified in the BILD commands, see section RTF Build Command.
The COPY command is most useful when two residues are nearly identical, and you do not wish to keep multiple copies of identical information. For example, it is used in the specification of D amino acids.
ATTRIBUTE [DELETE] repeat(word)
The ATTRIBUTE command is used to specify residue attributes. Each word following the ATTRIBUTE command is added to the residue attribute list, unless the DELETE option is specified. In that case, the words are deleted from the residue attribute list.
At present, the only residue attribute that has any significance is the D attribute, which informs the conformational search code, see section Conformational Search, that the residue is a D amino acid, and should be processed accordingly.
ATOM iupac word real repeat(iupac)
The ATOM command specifies the atoms in a residue. The first word in the ATOM specifies the IUPAC name of the atom. The second word in the atom specification gives the chemical type code as specified in the second operand of the MASS command, see section RTF Mass Command. The third operand specifies the partial charge of the atom as a real number.
Finally, the remaining words specify the names of atoms which are to be excluded from non-bonded interactions. Such exclusions are made because they are directly bonded or separated by only two covalent bonds. Note that 1-2 and 1-3 non-bonded exclusions can be constructed automatically at segment generation time, see section NBXMOD -- Automatic Generation of Non-bonded Exclusions.
BOND [DELETE] repeat(iupac iupac)
The BOND command is used for specifying the bonds in a residue. Each pair of iupac atom names specifies a bond. Atoms outside the current residue can be specified using the scheme described in section Linkage Atom Naming.
The DELETE option causes CONGEN to delete the named bonds from the current residue. This option is useful when the COPY command is used.
{ ANGLE } [DELETE] repeat(iupac iupac iupac) { THETA }
The ANGLE command, and its synonym, THETA, are used to specify bond angles in a residue. Each triple of iupac names specifies a bond angle. The RTF Generate Command, see section RTF Generate Command, can be used to generate the bond angles within a residue automatically, but bond angles involving atoms outside the current residue must always be specified "by hand".
The DELETE option causes CONGEN to delete the named angles from the current residue. This option is useful when the COPY command or the automatic generation of angles is used.
{ TORSION } [DELETE] repeat(iupac iupac iupac iupac) { DIHEDRAL }
The TORSION command, and its synonym, DIHEDRAL, are used to specify torsion angles in a residue. Each quadruple of iupac names specifies a torsion angle with the middle pair of atoms defining the bond being rotated (and used to chose parameters). When the parameter file contains dihedral angles specified by all four atoms, every dihedral angle is first checked to see if it matches any of this type. If so, then the four atom parameters values are used. If a particular four atom dihedral is specified twice in adjacent positions, then it is assumed that the corresponding parameter file specifies two separate parameter values for this four atom dihedral, and both will be used.
The RTF Generate Command, see section RTF Generate Command, can be used to generate the torsion angles within a residue automatically, but torsion angles involving atoms outside the current residue must always be specified "by hand".
The DELETE option causes CONGEN to delete the named torsion angles from the current residue. This option is useful when the COPY command or the automatic generation of torsion angles is used.
{ IMPROPER } [DELETE] repeat(iupac iupac iupac iupac) { IMPHI }
The IMPROPER command, and its synonym, IMPMI, are used to specify improper dihedrals in a residue. Each quadruple of iupac names specifies an improper dihedral with the first atom is the atom being kept planar or chiral. As with the proper dihedral, four atom specifications may be used and when an improper dihedral is repeated, multiple parameter values will be sought.
There does not exist any automatic mechanism for constructing improper dihedrals. It is entirely a function of the parameters used to build the residues.
The DELETE option causes CONGEN to delete the named improper dihedral angles from the current residue. This option is useful when the COPY command is used.
DONOR [DELETE] [ iupac ] iupac [ iupac iupac ]
HBUILD
, to construct the hydrogens automatically.(2)
Note that if the first atom is outside of the current residue, CONGEN will assume it is a hydrogen. Also note that only one hydrogen donor can be specified at a time. This is different than the previous commands.
Hydrogen bond donors can be automatically constructed, but only within the current residue, and without the construction of antecedents. The DELETE option may be used to remove hydrogen bond donors introduced by either the automatic generation scheme, or by the COPY command.
ACCEPTOR [DELETE] iupac [iupac [iupac] ]
The DELETE option may be used to remove an automatically generated acceptor or one copied using the COPY command.
{ BILD } { BUILD } [DELETE] iupac iupac iupac iupac real real real real real
The BILD command is used to specify the rules for constructing atoms. The first four operands are atom names, and they specify a set of four atoms which are linked together, which are referred to as I, J, K, and L. If the third atom is prefixed by an asterisk, then the rule is an improper torsion, where atom K is in the middle, and atoms I, J, and L are bound to it. Otherwise, the rule is a proper torsion, and the four atoms are bound together in a chain.
In the case of a proper torsion, the five real numbers specify the bond length between I and J, the bond angle between I, J and K, the torsion angle between I, J, K, and L, the bond angle between J, K, and L, and the bond length between K and L. In the case of an improper torsion, the five real numbers specify the bond length between I and K, the bond angle between I, K, and J, the improper torsion between I, J, K, and L, and bond angle between J, K and L, and the bond length between K and L. Units for bonds are in Angstroms; units for angles are in degrees.
In either case, the rules can be used to construct the positions of atoms I or L depending the positions of the remaining three atoms. If either bond length or either bond angle is specified as 0.0, then the program will search the parameter files for the values to use. The torsion angles are used as specified, unless they are edited by an internal coordinate editing command, see section The Internal Coordinate Commands.
The DELETE option may be used to remove a BILD rule either automatically generated, or copied using a COPY command.
GROUP name first-atom-iupac last-atom-iupac
The GROUP command was incorporated to provide electrostatic groups. It is not used.
{ ANGLES } { THETAS } { { TORSIONS } [ ALL ] } GENERATE repeat( { { PHIS } [ ONE ] } { DONORS } { ACCEPTORS } { BILDS } { BUILDS } { ALL }
The GENERATE command may be used to automatically generate parts of the topology file. Angles, torsions, donors, acceptors, and IC constructors may be generated either singly or in any combination. ALL specifies that all of these entities be constructed. Automatic generation is only performed with the atoms defined within a residue. Atoms involved with linkages to other atoms are not used in the automatic generation process, and must be generated "by hand".
The bond angle construction works by examining the current list of bonds for the current residue, and generating angles for all pairs of bonds. The torsion angle construction can work in either of two ways. If the ALL suboption is specified, then for each bond, a torsion angle is constructed for all combinations of atoms attached to this torsion. If the ONE suboption is specified, the program will first look for a pair of heavy atoms attached to the central bond, and failing that, it will repeat the search looking at all atoms including hydrogens. The donors and acceptors generation depend on the use specifying which atom types are donors or acceptors in the MASS command above.
The generation of IC constructors is the most difficult generation task, and it is done primarily to provide a set of constructors which can be edited for better results. The program will first find three atoms connected together. Next, it will look at the central atom and see if an adjacent atom needs to be constructed. If so, it will generate a improper torsion constructor. It will then recursively try the new atom in conjunction with the previous two. Once these constructions complete, the program will attempt to add atoms on each end of the original three atoms, and will recurse on any successful constructions.
Because not all of these automatic generation commands work perfectly, there exists two mechanisms to edit the results. First, the RTF may be written out in free field format using a WRITE RTF CARD command and the result can be edited. Second, the DELETE options in other commands may be used to delete particular entries after they are generated. Also, additional entries for a residue may be made after the GENERATE command is given.
Note that non-bonded exclusions are generated automatically by default. See section NBXMOD -- Automatic Generation of Non-bonded Exclusions, for more information.
PRINT { ON } { OFF }
The PRINT command may be used to control the display of lines as they are read by the RTF reader. The initial setting for printing is controlled by the READ command itself. If PRINT is specified, then printing will initially be enabled; otherwise, the commands will not be echoed. PRINT ON turns on echoing of RTF specifications; PRINT OFF turns them off. This command is useful for debugging an addition to a previously tested topology file.
This is a small RTF example.
* title for documentation example * 200 1 TYPE HPRO MASS 1 H 1.00800 MASS 11 C 12.01100 MASS 12 CH1E 13.01900 MASS 13 CH2E 14.02700 MASS 14 CH3E 15.03500 MASS 31 N 14.00670 MASS 38 NH1 14.00670 MASS 51 O 15.99940 MASS 56 OH2 15.99940 DECL -C DECL -O DECL +N DECL +H DECL +CA RESI ALA 0.00000 ATOM N NH1 -0.20000 H CA CB C ATOM H H 0.12000 CA ATOM CA CH1E 0.07500 CB C O +N ATOM CB CH3E 0.02000 C ATOM C C 0.35000 O +N +H +CA ATOM O O -0.36500 +N BOND N CA CA C C +N C O N H BOND CA CB THET -C N CA N CA C CA C +N THET CA C O O C +N -C N H THET H N CA N CA CB C CA CB DIHE -C N CA C N CA C +N CA C +N +CA IMPH N -C CA H C CA +N O CA N C CB DONO H N -C -O ACCE O BILD -C CA *N H 0.0000 0.00 180.00 0.00 0.0000 BILD -C N CA C 0.0000 0.00 180.00 0.00 0.0000 BILD N CA C +N 0.0000 0.00 180.00 0.00 0.0000 BILD +N CA *C O 0.0000 0.00 180.00 0.00 0.0000 BILD CA C +N +CA 0.0000 0.00 180.00 0.00 0.0000 BILD N C *CA CB 0.0000 0.00 120.00 0.00 0.0000 RESI OH2 0.00000 ATOM OH2 OH2 -0.40000 H1 H2 ATOM H1 H 0.20000 H2 ATOM H2 H 0.20000 BOND OH2 H1 OH2 H2 THET H1 OH2 H2 DONO H1 OH2 DONO H2 OH2 ACCE OH2 END
The parameter files (PARAMETER) and internal coordinate files (IC)
can be read as card images or binary files. Specifying CARD signifies
card image input; specifying FILE signifies binary file input. Please
note that topology file must be read in before the parameters can be
read. More information about the Internal Coordinate files can be found
in their I/O routines, READIC
and WRITIC
, in the file `intcor.flx'.
Hydrogen bond (HBOND), protein structure files (PSF) files, harmonic constraints (CONSTRAINT), and non bonded lists (NBOND) can only be read as binary files.
The Image file (IMAGES) containing transformation information can only be read in card image format. This is not to be confused with the Images data structure (see section Symmetry and Molecular Images).
WRITe { { PSF } [FILE] } UNIT unit-number { { HBONd } } { { PARAmeter } } { { NBONd } } { { CONStraint } } { } { { RTF } [FILE] } { [CARD] } { } { { IC } [CARD] } { [FILE] } { } { { COORdinate } [CARD ] coor-spec } { [FILE ] } { [KONNert ] } { [BROOkhaven] } { [BRKHvn ] } { } { IMAGes [CARD] } title coor-spec:== { [MAIN] } [ OFFS int ] [ HNUSe ] [ WRAP ] atom-selection { COMP } [NONHUSe] [NOWRAP] { DIFF }
The primary purpose of this command to save some of CONGEN's data structures on file in unformatted form. In addition, the coordinate and internal coordinate data structures can be written in formatted form so that they be edited independent of CONGEN using GNU Emacs or a similar text editor. The option, FILE, specifies that a file is to be written in unformatted form (binary). The option, CARD, specifies that a file is to written in formatted form. For the coordinate and internal coordinate file, CARD is the default.
A set of title lines must follow the WRITE command. This title will be written at the start of the file and serves to document the file. For your protection, one should always make good use of this title, as it may be the only documentation for the file.
The UNIT keyword specifies what Fortran unit the output should be written to. It cannot be omitted.
Additional options are available for writing coordinates in text format. The option, KONN, will write the coordinates in Konnert-Hendrickson format. The synonymous options, BROOKHAVEN and BRKHVN, will write the coordinates in Brookhaven Protein Data Bank format. The option pair, HNUSE and NOHNUSE, control whether the hydrogen on the peptide nitrogen is written with a name of `HN' or `H. The default is NOHNUSE which uses `H'. The option pair, WRAP and NOWRAP, controls whether hydrogens which have a terminating digit are written with the terminating digit first. For example, the arginine atom, HH12, is written as `2HH1' if WRAP is enabled, and written as `HH12' if NOWRAP is enabled. The default is NOWRAP.
PRINt { PSF } { RTF } { CONStraint } { PARAmeter } { RESIdue } { COORdinate coor-spec } { IC } { HBONd [ ANAL ] } { IMAGes } { NMR nmr-options } { FROM unit-number } coor-spec::= { [MAIN] } [ OFFS int ] atom-selection { COMP } { DIFF } nmr-options ::= [ALL ] [NONE] [[NO]NOE] [[NO]JCOUPLING] [[NO]TABLE] [[NO]SORT] [TOP int] [INDIVIDUAL int] [[NO]ENERGY] [[NO]FORCE] [[NO]VIOLATION] [MIN real] [MAX real] [ROWS int] [COLUMNS int]
Syntactic ordering: All commands must be typed in the order shown except for the nmr-options which can be in any order after the NMR option.
This command is used to list information contained in data structures used by the program or to list a formatted file. The information must already have been created through use of a READ, GENERATE, HBONDS, etc., command. The printable output is sent to unit 6.
If the FROM option is used, the PRINT command will print a formatted file onto unit 6. The file will be rewound after printing so it may be used again.
For hydrogen bonds, ANAL gives a geometrical and energy analysis of the hydrogen bonds. Representing the hydrogen bond as A2-A1-X-H....Y-, the distances X-Y, H-Y, the angle (180 - <(X-H-Y) ), the dihedral angle A2-A1-X-H and the hydrogen bond energy contribution are listed.
The PRINT NMR command invokes an analysis of the NMR constraints. There a number of components in the analysis, and The various nmr-options control which components appear. If no options are specified then all components will be displayed. However, if there are any options, then only those specified by the user will be displayed. The keyword, ALL, may be used to turn on the display of all components, and then the user may modify the display with additional operands.
The keywords are interpreted as follows:
OPEN UNIT integer NAME filename [WRITE] [UNFORMatted] [READ] [FILE] [FORMatted] [CARD]
The OPEN command is used to open logical units to specific files specified from the input file rather than logical name assignments made prior to the run. This is useful in setting up test cases and interactive use of the program. OPEN can be used to redirect the output that appears on unit 6 to different files by opening unit 6 in the middle of a run. However, it may not be possible to restore unit 6 back on some machines, so be careful with this.
CLOSe UNIT integer [SAVE ] [KEEP ] [DELETE] [PRINT ]
The CLOSE command closes a logical unit. This frees the associated file and logical unit so that they can be used for other purposes. The default disposition of the file is SAVE or KEEP.
REWInd UNIT integer
The REWIND command causes the requested logical unit to be rewound. When used with the STREAM command, a particular sequence can be used more than once.
STREam UNIT integer
The STREAM command allows the input of command sequence to be shifted to another file. This is useful when parts of an input file are to be used many times or used by many different calculations. The only input value is the unit number to transfer to.
RETUrn
The RETURN command causes the input of command sequence to return to the stream that called the current stream. Streams may be nested to up to 20 calls. There are no parameters for this command.
Go to the previous, next section.