Go to the previous, next section.

How to Use CONGEN

This chapter contains an overview of how to use CONGEN. CONGEN executes commands as they are read sequentially from a command file. In general the ordering of these commands is limited only by the requirement that each command must have all of its prerequisite data. For example, the energy cannot be calculated unless the arrays holding the coordinates, the parameters, etc., have already been filled. It is up to the user of program to ensure that everything is present, so it is vital to understand how the program works in order to use it correctly.

Although this manual is extensive, it does not provide much tutorial help in resolving questions about the program's operations. Also, there is not much guidance on how to deal with errors. The full source code for CONGEN is provided with the program, and it should be consulted if this manual does not provide answers to questions that arise with using the program.

This manual is available in on-line form. If you use GNUemacs and if the INFO form of this documentation has been installed, then the manual can be perused while you edit CONGEN input files. See section Installation of CONGEN on UNIX, or section Installation of CONGEN on VMS, for more information about the installing the on-line documentation.

Controlling a CONGEN Run

A CONGEN run is controlled by a command file. The user specifies operations to be performed via these commands, and CONGEN executes them sequentially. There are commands for changing the file from which input is taken, see section STREAM Command.

A legal command file for CONGEN begins with a specification of the title of the run. See section Glossary of Syntactic Terms, for the syntax of a title. Then, any number of commands may be specified.

Each command consists of a command line possibly followed by other data. The command line is scanned free field. This command line may be longer than one line in the file; to do this, one must place a hyphen, -, at the end of line which is to be continued on the next line. Comments may be placed on a command line by preceding the comments by exclamation points. Note that comments may be placed after a hyphen --- the comment is removed before the hyphen is checked for. All lower case characters are converted to upper case. This format is identical to that used by the VAX command language interpreter. In addition, blank lines are permitted to separate blocks of commands for increased readability.

Generally, when a free field command line is read in, it is echoed onto Fortran unit 6. Each such echo will be prepended by a short marker, eg. CONGEN>, which identifies the line of input as well as the command processor which is interpreting it.

The command line is scanned in units of words and delimited strings. A word is defined by a sequence of non-blank characters, A delimited string consists of a keyword followed by a string of characters of variable length followed by a delimiter string. The delimiter string is either a single delimiter character, two such characters concatenated together with no space in between, or the keyword, END. The initial value for the delimiter is a $. It may be changed with the DELIm command, see section Set Delimiter Command -- DELIM.

The first word of every command line specifies the command. Generally, required operands of a command must follow in a precise order. On the other hand, options may generally be specified in any order. Further, any number is always preceded by a key word so that any numeric operands, can be placed in arbitrary order.

Abbreviations are permitted in various contexts. The first word may be abbreviated to four characters except for commands to the analysis facility. Numerous options and operands may also be abbreviated to four characters. However, key words which are used to mark numbers may never be abbreviated. See the description for individual commands to see what can and cannot be abbreviated.

In general, as each of the command is interpreted, it is deleted from the command line. When command processing is finished, a check is made to see that nothing is left over. The presence of extraneous junk indicates that something was mistyped.

Some of the options and numeric values are maintained from one invocation of a command to the next. This is done with the energy manipulation options, see section Minimization and Dynamics. Other options have no memory and must be respecified each time they are changed from their default values. Thus the safest approach is to respecify all options whenever they are used.

Although the command line of every command is processed free field, there are commands (especially ones implemented in the early days of CHARMM) which are followed by data in a fixed format. In general (and there are many exceptions), such commands expect integers as I5 and floating point numbers as F10.n. In the event that the documentation is not clear or non-existent on such matters, the source code can be consulted.

Rules for Describing the Syntax (The Meta-Syntax)

The syntax of commands is described using the following rules: Capitalized words are keywords that must be specified as is. However, if the word is partially capitalized, it may be abbreviated to the capitalized part. Lower case words are to be replaced by a corresponding data entry. The symbol `::=' means "has the following syntactic form:". Anything enclosed in square brackets, `[]', is optional. If several things are stacked in square brackets, one may choose one optionally. Anything enclosed in curly brackets, `{}', specifies that a selection must be made of the choices stacked vertically inside. The syntactic entities which appear as an argument to `repeat' may be repeated any number (including zero) times. Defaults for optional parameters may be enclosed in apostrophes and placed under the entity they stand for. However, defaults are not specified in this manner if the rules for the default are complex.

The syntactic glossary, see section Glossary of Syntactic Terms, contains further syntactic entities which are used in the command descriptions. Finally, the options and operands in each command can usually be specified in any order except if otherwise noted.

As an illustration of the syntax notation, the plot command in the analysis facility has following syntax:

PLOT [titspec] [HORIZ real real integer]

titspec ::= TITLE string del

Assuming the standard definition of the del, a syntactically acceptable command is

PLOT HORIZ 1.0 10.3 60 TITLE BOND ENERGY PER RESIDUE $

Fortran I/O Units Usage by CONGEN

In order to keep CONGEN as machine independent as possible, all specification of files is done through Fortran unit numbers. Two unit numbers have special significance, 5 and 6. Unit 5 is the command file interpreted by CONGEN. Unit 6 is the output file for all printed messages. As commands are read from unit 5, they are echoed on unit 6. On Unix machines, unit 5 is usually the standard input, and unit 6 is the standard output. On VMS machines, unit 5 is assigned to the logical name, FOR005 which defaults to SYS$INPUT, and unit 6 is assigned to the logical name, FOR006 which defaults to SYS$OUTPUT. All other unit numbers have no predefined meaning. The OPEN command, see section Open File Command -- OPEN, can be used within CONGEN to open a file on a particular Fortran unit. The DCL command, ASSIGN, may be used on VMS systems to assign files to units. Unix systems can use whatever mechanism is provided with their Fortran run-time system to achieve this effect. E.g. on the Iris, one can make a symbolic link between `fort.n' and a file in order to associate unit n with the file.

On Unix systems, all file names in OPEN commands are translated to lower case. This permits CONGEN input files to be portable from VMS to Unix.

When CONGEN is about to read a file on a VAX/VMS machine, it opens the file as a shared, readonly file. This is done to allow users to share commonly useful files, such as topology files. One should be aware that this feature prevents a user from overwriting any file which he is reading. Any file which is written is not opened shared, so it cannot be used by any other process until CONGEN completes execution.

On most Unix machines, the normal I/O operation opens all files shared, so it is possible to read any file being read or written by CONGEN. When a file is opened for writing, any previous file is overwritten. If two CONGEN processes attempt to open the same file for writing, the result is unpredictable, and it is unlikely that the user will receive a warning message indicating he has made a mistake.

The CONGEN System of Units: AKMA.

CONGEN uses a distinct system of units, the AKMA system. I.e. Angstroms, Kilocalories / Mole, Atomic mass units. All distances are measured in Angstroms, energies in kcal/mole, mass in atomic mass units, and charge is in units of electron charge. Using this constants tabulated in Abramowitz and Stegun (1970)). 20 AKMA time units is .978 picoseconds and is sometimes referred to as a "ps".

Angles are given in degrees for the analysis and constraint sections. In parameter files, the minimum positions of angles are specified in degrees, but the force constants for angles, dihedrals, and dihedral constraints are specified in Any numbers used in the documentation may be assumed to be in AKMA units unless otherwise noted.

Data Structures

There are a number of data structures that CONGEN manipulates. Many of these data structures are important for most operations; others which are less important, are described with the commands that use them. Much more specific information is available in the various common blocks whose extension is `.FCM' in the source directories.

The important data structures are given below: Each data structure name is followed by its abbreviation which is used as its name in commands.

  1. Residue Topology File (RTF). The residue topology file stores the definitions of all residues. The atoms, atomic properties, bonds, bond angles, torsion angles, improper torsion angles, hydrogen bond donors and acceptors and antecedents, and non-bonded exclusions are all specified on a per residue basis. The RTF also specifies the list of chemical type which are used in the parameter file. This file is required for any calculations.

  2. The Parameters (PARA or PARM). The parameters specify the force constants, equilibrium geometries, Van der Waals radii, and other such data needed for calculating the energy. The list of atom type codes comes from the RTF. The parameters are required for any calculation, and they depend on the list of chemical types provided in the RTF. The parameters must be consistent with the topology file in that they must designed together. In addition, there must be one and only one non-bonded parameter for each atom type specified in the topology file.

  3. Protein Structure File (PSF). The protein structure file is the concatenation of information in the RTF. It specifies the information for the entire structure. It has a hierarchical organization wherein atoms are grouped into residues which are grouped into segments which comprise the structure. Each atom is uniquely identified within a residue by its IUPAC name; each residue is uniquely identified in the segment by a residue identifier which is the character form of the residue's position in the segment; and each segment is identified by a segment identifier specified by the user. This information is required for any calculations.

  4. The Coordinates (COOR). The coordinates are the Cartesian coordinates for all the atoms in the PSF. There are two sets of coordinates provided. The main set is the default used for all operations involving the positions of the atoms. A comparison set (also called the reference set) is provided for a variety of purposes, such as a reference for rotation or operations which involve differences between coordinates for a particular molecule.

  5. The Non-bonded List (NBON). The non-bonded list contains the list of non-bonded interactions to be used in calculating the energies as well as optional information about the charge, dipole moment, and quadrupole moments of the residues. This data structure depends on the coordinates for its construction and must be periodically updated if the coordinates are being modified.

  6. The Hydrogen Bond List (HBON). The hydrogen bond list contains the list of hydrogen bonds. Like the non-bonded list, this data structure depends on the coordinates and must be periodically updated.

  7. The Constraints (CONS). The constraints are harmonic potentials placed on selected atomic positions or on dihedral angles. The purpose of these constraints is to limit motion of those atoms or torsions or to force the molecule to assume a particular conformation. Normally, there are no constraints on the molecule. One should note that this data structure does not hold either constraints related to SHAKE or motion constraints where atoms are prevented from moving entirely.
  8. The Internal Coordinates (IC). The internal coordinates data structure contains information concerning the relative positions of atoms within a structure. This data structure is most commonly used to build or modify cartesian coordinates from known or desired internal coordinate values. It is also used in conjunction with the analysis of normal modes. Since there are complete editing facilities, it can be used as a simple but powerful method of examining or analyzing structures.

  9. The Images data structure (IMAGES). The images data structure determines and defines the relative positions and orientations of any symmetric image of the primary molecule(s). The purpose of this data structure is to allow the simulation of crystal symmetry or the use of periodic boundary conditions. Also contained in this data structure is information concerning all nonbonded, H-bonds, and ST2 interactions between primary and image atoms.

Directory Names

In order to simplify the use of CONGEN, all the directories used for storing CONGEN executables, sources, parameter sets, test inputs, etc., are defined with either environment variables (on UNIX) or logical names (on VAX/VMS).

On Unix machines, the CONGEN directories have separate trees for each machine on which CONGEN can be run. These trees begin under the root CONGEN directory. This structure allows one directory tree to be shared using the Network File System (NFS) by multiple hardware architectures. The program, `$CGROOT/update_tree', is used to update different machine trees from the master copy.

The OPEN command, see section Open File Command -- OPEN, is designed to allow the usage of identical syntax on either VMS or UNIX for using these directory names. The directory is placed before the file name with a colon, :, separating the two. E.g. `CGDATA:RTOPH8.MOD'. Also, all file names are converted to lower case.

When referring to the directory names in operating system commands, it is necessary to keep the operating system in mind. On VMS, one uses a colon to separate the two as above. On UNIX running the C-shell, one uses a dollar sign prefix and a slash. For example, `$CGDATA/rtoph8.mod'.

The directory names are given below. Unix syntax is used for the directory names, but a similar structure is also used under VMS.

`MM'
Specifies the disk where CONGEN is kept. This is used only on VMS.

`CGROOT'
Specifies the root directory for CONGEN. This is used only on Unix.

`CG'
`$CGROOT/<machine>/v<version-number>/' Top level for a particular implementation of CONGEN.

`CGBIN'
`$CG/bin/' Location of executables. This directory will be put into your path by initialization file such as `cgdefs', see section Installation of CONGEN on UNIX.

`CGD'
`$CGROOT/doc/' Documentation and INFO.

`CGDATA'
`$CGROOT/data/' Data files for normal execution.

`CGLIB'
`$CG/lib/' Object libraries and files for building user versions of CONGEN and for the support programs.

`CGP'
`$CG/support/' Support programs for use with CONGEN. See section Support Programs, for details.

`CGPS'
`$CG/support/src/' Source code for support programs for use with CONGEN. There are several subdirectories under this one which contain larger support programs, too bulky to keep in this directory.

`CGS'
`$CG/source/' Source code for CONGEN.

`CGT'
`$CG/test/' Test cases.

`CGTD'
`$CG/testdat/' Data files for testing the recent production version.

Files Available for General Use

There are number of residue topology files, parameter files, coordinates files and files of other data structures available. The most important files generally available are residue topology and parameter files. Both such classes of files are stored for general use in the directory, `CGDATA:'. The file names used for both these files consists of an alphabetic part followed by a number, e.g. `PARAM5'. There are two copies of each file; one with extension, `.INP', which is a character files used as an command file to generate the binary file, with extension, `.MOD'. The `.INP' is meant for human eyes; the `.MOD' files is meant for CONGEN to read.

The numeric part of each name is its version number. In general, one should use the highest version number of a file.

The current list of files which are in the best condition are as follows:

`PARAM5'
Parameters for all protein topology files.
`RTOP8'
Extended atom (no hydrogen) topology file. This topology has not been tested with conformational search.
`RTOPH8'
Explicit hydrogen (polar hydrogens only) topology file.
`RTOPALLH7'
All hydrogen topology file.

The DNA files are not usable with extensive review. The chemical type codes are not consistent with proteins, and the linkage scheme or the patching code has not been tested in years.

The directory, `CGTD', contains a few protein coordinates assembled from a variety of sources.

For information on the general use of directories, and the files they contain, see section Directory Names.

Size Limits

At present, the description of the molecule is stored in fixed arrays in Fortran common blocks. As a result, there are limits to the size of various entities used to describe a molecule. The limits are given in the file $CGS/aaalimits.fcm.

Sample CONGEN Runs

There are several examples of CONGEN runs within this manual, e.g. section Examples of CONGEN Commands. Also, the input files found in the test data directory, `CGTD', give many examples of CONGEN input. The test case directory, `CGT', has many CONGEN runs, but these are less useful to the novice because they are designed to test program features, not get any useful work done.

DCL or Shell CONGEN Commands

There are several shell or DCL commands for running CONGEN. In addition to these commands, on Unix systems, CONGEN can be executed directly by typing congen.

Syntax:
runcg [-i] [-n integer] filespec
Function:
This command will make logical name assignments for CONGEN normal I/O units and run CONGEN. The filespec must not specify any file type (file extension) or version. On VMS, RUNCGn will assign filespec.INP to FOR005 and filespec.OUT to FOR006 and then run CONGEN. The -i and -n integer options are not permitted on VMS. On UNIX, it will run CONGEN with standard input and output being set to filespec.inp and filespec.out. The nice command will be used with an argument of 4 to lower the runtime priority. The option, -i, specifies that interactive priority should be used. The option, -n integer, specifies an alternate nice value.

Syntax:
rc
Function:
Run CONGEN. All logical name assignments for Fortran units must be set beforehand.

Syntax:
ruc [-i] [-n integer] filespec [directory]
Function:
Same as RUNCG except the directory from which CONGEN is taken is specified in the command. If no directory is specified, then the current default directory is used.

Interfacing to CONGEN

A mechanism has been provided to allow casual users of the CONGEN to write their own special purpose subroutines which can be incorporated into the system without threatening its integrity.

There are two "hooks" into the protein system which have been specially provided for casual modifiers. The first is the USER command which invokes the subroutine, USERSB, and performs no other action. USERSB is a subroutine with no arguments. However, parameters may be passed to this subroutine via the system's COMMON blocks. These COMMON blocks store nearly all of the systems data. These common blocks may be obtained by including them from the directory containing the sources for the version of the program you are using.

The second hook is the user energy function USERE. USERE is a subroutine that is called whenever the total energy and its derivatives are calculated. USERE has ten parameters (in order):

EU
to be returned with the user energy

X
Y
Z
the current coordinates

DX
DY
DZ
to be returned with the derivatives if ANALYS is zero

ANALYS
a mode flag set to 0 if the total user energy and its derivatives are needed, and 1 if the atom energies are desired (in analysis)

DATA
to be returned with the atom energies if ANALYS is set to one.

NATOM
the current number of atoms.

Thus the call should be:

CALL USERE(EU,X,Y,Z,DX,DY,DZ,ANALYS,DATA,NATOM)

EU should be set to the value of the user energy upon return. If this value is non-zero, it will be automatically printed each time the energy is evaluated. The system supplied version of USERE does nothing except set EU to zero. The coordinates are supplied via X, Y, and Z. Derivatives of the user energy must be added to DX, DY, and DZ. All other information used by USERE must be obtained through the system's common blocks as is the case with USERSB. Older versions of USERE that used only the first seven variables of the call can still be linked and run in the main section of the program, but will fail in the analysis section.

When using USERE fill only the arrays that are being requested by ANALYS (otherwise you will get access violations). In the analysis section of the program do not assume that the common block information will be correct for a comparison data structure, it may not be.

To simplify the use of these hooks and to allow users to replace subprograms in CONGEN with their own versions of said subprograms, the makefile, `$CGS/usermake', has been provided for Unix, and the equivalent MMS descriptor file, `CGS:USERMAKE.MMS' has been provided on VMS. These makefiles will produce a private version of CONGEN in your default directory using your version of `usersb.flx'. If you need to change more files, then copy `usermake' into your working directory, and modify it accordingly.

Before attempting to write your own user functions, you should familiarize yourself with the information available on the implementation of CONGEN, see section The Implementation of CONGEN.

There are several utility routines available to a user routine. Some of them are listed below.

CALL GETE(X,Y,Z) will cause the energy and forces to be computed and values are saved in the appropriate common blocks. For this to work properly, NBONDS, HBONDS, and CODES must have been called. This can be done by executing both the NBONds and HBONds command, or by having previously found the energy (minimization, dynamics, etc.).

CALL PRINTE(IUNIT,ICYCLE,LHDR) will write the current energy values (from common block values) to the specified unit, IUNIT. It will also write out the cycle or iteration number, ICYCLE and optionally write out the standard header if LHDR is TRUE.

Glossary of Syntactic Terms

atom-selection
A description of a set of atoms. See section Atom Selection, for the complete syntax.

char
A character

del
The delimiter - a single character which is used to mark the end of a portion of a command. Initially, it is a dollar sign but can be changed using the DELIM command, see section Set Delimiter Command -- DELIM. It should be noted that the delimiter cannot be a character within any string it is supposed to delimit.

deldel
Two delimiters concatenated together with no space in between.

integer
An integer.

iupac
IUPAC name for an atom. Initially specified in the residue topology file.

keyword
A word, see below, serving to identify some option.

name

A word, see below, serving to identify an object.

property
A table property which is syntactically a string. See section Building Tables, and a description of dynamical properties, see section Dynamical Table Properties.

range
equivalent to real real integer. The first real is the minimum value in the range, the second number is the maximum value in the range, and the third number gives the number of interval, i.e. lines or columns.

real
A real number. No decimal point is required for the number to be interpreted correctly.

resid
Residue identifier (a string of up to 4 characters).

resname
Residue name.

segid
Segment identifier (a string of up to 4 characters).

string
An ordered set of characters.

tag
A string which is a tag, i.e. no embedded spaces.

title
A series of 1 to 10 lines of text (max 80 characters per line) terminated by a line which has an asterisk * as the first character. Used for commenting files.

word
A string with no blanks

unit-number
An integer which is a Fortran unit number.

General Glossary

data structure
A collection of arrays, scalars, and possibly other data structures which are related by part of a larger entity. For example, a coordinate set is a data structure which hold the three dimensional positions of atoms. This data structure consists of 1 scalar and three arrays. The scalar is the number of coordinates; the three arrays are the X, Y, and Z components of the coordinates.

Internal Coordinates
Bonds, angles, torsions, improper torsions. Also, a data structure used for constructing coordinates.

Iupac Names
The name of an atom with a residue. This name should be within a residue and should conform to the IUPAC nomenclature. See Biochemistry 9:3471 (1970).

Hbonds
Hydrogen bonds.

Parameters
Constants in the energy expression (force constants, minima of energy surfaces, charges, Lennard-Jones parameters, van der Waals radii, etc.)

PSF
Protein Structure File a list of the internal coordinates and related information

Residue Identifier
A string of four characters or less which uniquely specifies residue with in a segment. This value is currently set by CONGEN to be the character representation of the residue number in the segment starting from the first real monomer unit in it. Special terminating residues gets their names as identifiers. For example, if we build a tri-peptide LYS ARG ASP using an explicit hydrogen topology file, we get five residues in the segment, NTER LYS ARG ASP CTER, and the residue identifiers are then `NTER', `1', `2', `3', and `CTER'.

RTF
residue topology file : a list of standard internal coordinates, atom charges, atom types, excluded non-bonded interactions, etc.

Segment Identifier
A string of up to four characters uniquely designating a segment. Specified in the GENErate command, see section The Generate Command - Construct a Segment of the PSF.

Sequence
list of residues.

Go to the previous, next section.