Go to the previous, next section.
This chapter contains an overview of how to use CONGEN. CONGEN executes commands as they are read sequentially from a command file. In general the ordering of these commands is limited only by the requirement that each command must have all of its prerequisite data. For example, the energy cannot be calculated unless the arrays holding the coordinates, the parameters, etc., have already been filled. It is up to the user of program to ensure that everything is present, so it is vital to understand how the program works in order to use it correctly.
Although this manual is extensive, it does not provide much tutorial help in resolving questions about the program's operations. Also, there is not much guidance on how to deal with errors. The full source code for CONGEN is provided with the program, and it should be consulted if this manual does not provide answers to questions that arise with using the program.
This manual is available in on-line form. If you use GNUemacs and if the INFO form of this documentation has been installed, then the manual can be perused while you edit CONGEN input files. See section Installation of CONGEN on UNIX, or section Installation of CONGEN on VMS, for more information about the installing the on-line documentation.
A CONGEN run is controlled by a command file. The user specifies operations to be performed via these commands, and CONGEN executes them sequentially. There are commands for changing the file from which input is taken, see section STREAM Command.
A legal command file for CONGEN begins with a specification of the title of the run. See section Glossary of Syntactic Terms, for the syntax of a title. Then, any number of commands may be specified.
Each command consists of a command line possibly followed by other data. The command line is scanned free field. This command line may be longer than one line in the file; to do this, one must place a hyphen, -, at the end of line which is to be continued on the next line. Comments may be placed on a command line by preceding the comments by exclamation points. Note that comments may be placed after a hyphen --- the comment is removed before the hyphen is checked for. All lower case characters are converted to upper case. This format is identical to that used by the VAX command language interpreter. In addition, blank lines are permitted to separate blocks of commands for increased readability.
Generally, when a free field command line is read in, it is
echoed onto Fortran unit 6. Each such echo will be prepended by a short
marker, eg. CONGEN>
, which identifies the line of input as well as the
command processor which is interpreting it.
The command line is scanned in units of words and delimited strings. A
word is defined by a sequence of non-blank characters, A delimited
string consists of a keyword followed by a string of characters of
variable length followed by a delimiter string. The delimiter string is
either a single delimiter character, two such characters concatenated
together with no space in between, or the keyword, END. The
initial value for the delimiter is a $. It may be changed with the
DELIm
command, see section Set Delimiter Command -- DELIM.
The first word of every command line specifies the command. Generally, required operands of a command must follow in a precise order. On the other hand, options may generally be specified in any order. Further, any number is always preceded by a key word so that any numeric operands, can be placed in arbitrary order.
Abbreviations are permitted in various contexts. The first word may be abbreviated to four characters except for commands to the analysis facility. Numerous options and operands may also be abbreviated to four characters. However, key words which are used to mark numbers may never be abbreviated. See the description for individual commands to see what can and cannot be abbreviated.
In general, as each of the command is interpreted, it is deleted from the command line. When command processing is finished, a check is made to see that nothing is left over. The presence of extraneous junk indicates that something was mistyped.
Some of the options and numeric values are maintained from one invocation of a command to the next. This is done with the energy manipulation options, see section Minimization and Dynamics. Other options have no memory and must be respecified each time they are changed from their default values. Thus the safest approach is to respecify all options whenever they are used.
Although the command line of every command is processed free
field, there are commands (especially ones implemented in the early days
of CHARMM) which are followed by data in a fixed format. In general (and
there are many exceptions), such commands expect integers as I5
and
floating point numbers as F10.n
. In the event that the documentation is
not clear or non-existent on such matters, the source code can be
consulted.
The syntax of commands is described using the following rules: Capitalized words are keywords that must be specified as is. However, if the word is partially capitalized, it may be abbreviated to the capitalized part. Lower case words are to be replaced by a corresponding data entry. The symbol `::=' means "has the following syntactic form:". Anything enclosed in square brackets, `[]', is optional. If several things are stacked in square brackets, one may choose one optionally. Anything enclosed in curly brackets, `{}', specifies that a selection must be made of the choices stacked vertically inside. The syntactic entities which appear as an argument to `repeat' may be repeated any number (including zero) times. Defaults for optional parameters may be enclosed in apostrophes and placed under the entity they stand for. However, defaults are not specified in this manner if the rules for the default are complex.
The syntactic glossary, see section Glossary of Syntactic Terms, contains further syntactic entities which are used in the command descriptions. Finally, the options and operands in each command can usually be specified in any order except if otherwise noted.
As an illustration of the syntax notation, the plot command in the analysis facility has following syntax:
PLOT [titspec] [HORIZ real real integer] titspec ::= TITLE string del
Assuming the standard definition of the del, a syntactically acceptable command is
PLOT HORIZ 1.0 10.3 60 TITLE BOND ENERGY PER RESIDUE $
In order to keep CONGEN as machine independent as possible, all specification of files is done through Fortran unit numbers. Two unit numbers have special significance, 5 and 6. Unit 5 is the command file interpreted by CONGEN. Unit 6 is the output file for all printed messages. As commands are read from unit 5, they are echoed on unit 6. On Unix machines, unit 5 is usually the standard input, and unit 6 is the standard output. On VMS machines, unit 5 is assigned to the logical name, FOR005 which defaults to SYS$INPUT, and unit 6 is assigned to the logical name, FOR006 which defaults to SYS$OUTPUT. All other unit numbers have no predefined meaning. The OPEN command, see section Open File Command -- OPEN, can be used within CONGEN to open a file on a particular Fortran unit. The DCL command, ASSIGN, may be used on VMS systems to assign files to units. Unix systems can use whatever mechanism is provided with their Fortran run-time system to achieve this effect. E.g. on the Iris, one can make a symbolic link between `fort.n' and a file in order to associate unit n with the file.
On Unix systems, all file names in OPEN commands are translated to lower case. This permits CONGEN input files to be portable from VMS to Unix.
When CONGEN is about to read a file on a VAX/VMS machine, it opens the file as a shared, readonly file. This is done to allow users to share commonly useful files, such as topology files. One should be aware that this feature prevents a user from overwriting any file which he is reading. Any file which is written is not opened shared, so it cannot be used by any other process until CONGEN completes execution.
On most Unix machines, the normal I/O operation opens all files shared, so it is possible to read any file being read or written by CONGEN. When a file is opened for writing, any previous file is overwritten. If two CONGEN processes attempt to open the same file for writing, the result is unpredictable, and it is unlikely that the user will receive a warning message indicating he has made a mistake.
CONGEN uses a distinct system of units, the AKMA system. I.e. Angstroms, Kilocalories / Mole, Atomic mass units. All distances are measured in Angstroms, energies in kcal/mole, mass in atomic mass units, and charge is in units of electron charge. Using this constants tabulated in Abramowitz and Stegun (1970)). 20 AKMA time units is .978 picoseconds and is sometimes referred to as a "ps".
Angles are given in degrees for the analysis and constraint sections. In parameter files, the minimum positions of angles are specified in degrees, but the force constants for angles, dihedrals, and dihedral constraints are specified in Any numbers used in the documentation may be assumed to be in AKMA units unless otherwise noted.
There are a number of data structures that CONGEN manipulates. Many of these data structures are important for most operations; others which are less important, are described with the commands that use them. Much more specific information is available in the various common blocks whose extension is `.FCM' in the source directories.
The important data structures are given below: Each data structure name is followed by its abbreviation which is used as its name in commands.
In order to simplify the use of CONGEN, all the directories used for storing CONGEN executables, sources, parameter sets, test inputs, etc., are defined with either environment variables (on UNIX) or logical names (on VAX/VMS).
On Unix machines, the CONGEN directories have separate trees for each machine on which CONGEN can be run. These trees begin under the root CONGEN directory. This structure allows one directory tree to be shared using the Network File System (NFS) by multiple hardware architectures. The program, `$CGROOT/update_tree', is used to update different machine trees from the master copy.
The OPEN command, see section Open File Command -- OPEN, is designed to allow the usage of identical syntax on either VMS or UNIX for using these directory names. The directory is placed before the file name with a colon, :, separating the two. E.g. `CGDATA:RTOPH8.MOD'. Also, all file names are converted to lower case.
When referring to the directory names in operating system commands, it is necessary to keep the operating system in mind. On VMS, one uses a colon to separate the two as above. On UNIX running the C-shell, one uses a dollar sign prefix and a slash. For example, `$CGDATA/rtoph8.mod'.
The directory names are given below. Unix syntax is used for the directory names, but a similar structure is also used under VMS.
There are number of residue topology files, parameter files, coordinates files and files of other data structures available. The most important files generally available are residue topology and parameter files. Both such classes of files are stored for general use in the directory, `CGDATA:'. The file names used for both these files consists of an alphabetic part followed by a number, e.g. `PARAM5'. There are two copies of each file; one with extension, `.INP', which is a character files used as an command file to generate the binary file, with extension, `.MOD'. The `.INP' is meant for human eyes; the `.MOD' files is meant for CONGEN to read.
The numeric part of each name is its version number. In general, one should use the highest version number of a file.
The current list of files which are in the best condition are as follows:
The DNA files are not usable with extensive review. The chemical type codes are not consistent with proteins, and the linkage scheme or the patching code has not been tested in years.
The directory, `CGTD', contains a few protein coordinates assembled from a variety of sources.
For information on the general use of directories, and the files they contain, see section Directory Names.
At present, the description of the molecule is stored in fixed arrays
in Fortran common blocks. As a result, there are limits to the size
of various entities used to describe a molecule. The limits are given
in the file $CGS/aaalimits.fcm
.
There are several examples of CONGEN runs within this manual, e.g. section Examples of CONGEN Commands. Also, the input files found in the test data directory, `CGTD', give many examples of CONGEN input. The test case directory, `CGT', has many CONGEN runs, but these are less useful to the novice because they are designed to test program features, not get any useful work done.
There are several shell or DCL commands for running CONGEN. In addition to these commands, on Unix systems, CONGEN can be executed directly by typing congen.
RUNCGn
will assign
filespec.INP to FOR005 and filespec.OUT to
FOR006 and then run CONGEN. The -i and -n integer
options are not permitted on VMS. On UNIX, it will run CONGEN with
standard input and output being set to filespec.inp and
filespec.out. The nice command will be used with an
argument of 4 to lower the runtime priority. The option, -i,
specifies that interactive priority should be used. The option, -n
integer, specifies an alternate nice value.
A mechanism has been provided to allow casual users of the CONGEN to write their own special purpose subroutines which can be incorporated into the system without threatening its integrity.
There are two "hooks" into the protein system which have been specially
provided for casual modifiers. The first is the USER command which
invokes the subroutine, USERSB
, and performs no other action.
USERSB
is a subroutine with no arguments. However, parameters may
be passed to this subroutine via the system's COMMON
blocks.
These COMMON
blocks store nearly all of the systems data. These
common blocks may be obtained by including them from the directory
containing the sources for the version of the program you are using.
The second hook is the user energy function USERE
. USERE
is a
subroutine that is called whenever the total energy and its derivatives
are calculated. USERE
has ten parameters (in order):
EU
X
Y
Z
DX
DY
DZ
ANALYS
is zero
ANALYS
DATA
NATOM
Thus the call should be:
CALL USERE(EU,X,Y,Z,DX,DY,DZ,ANALYS,DATA,NATOM)
EU
should be set to the value of the user energy upon return. If
this value is non-zero, it will be automatically printed each time the
energy is evaluated. The system supplied version of USERE
does
nothing except set EU
to zero. The coordinates are supplied via
X
, Y
, and Z
. Derivatives of the user energy must
be added to DX
, DY
, and DZ
. All other information
used by USERE
must be obtained through the system's common blocks
as is the case with USERSB
. Older versions of USERE
that
used only the first seven variables of the call can still be linked and
run in the main section of the program, but will fail in the analysis
section.
When using USERE
fill only the arrays that are being requested
by ANALYS
(otherwise you will get access violations). In the analysis
section of the program do not assume that the common block information
will be correct for a comparison data structure, it may not be.
To simplify the use of these hooks and to allow users to replace subprograms in CONGEN with their own versions of said subprograms, the makefile, `$CGS/usermake', has been provided for Unix, and the equivalent MMS descriptor file, `CGS:USERMAKE.MMS' has been provided on VMS. These makefiles will produce a private version of CONGEN in your default directory using your version of `usersb.flx'. If you need to change more files, then copy `usermake' into your working directory, and modify it accordingly.
Before attempting to write your own user functions, you should familiarize yourself with the information available on the implementation of CONGEN, see section The Implementation of CONGEN.
There are several utility routines available to a user routine. Some of them are listed below.
CALL GETE(X,Y,Z)
will cause the energy and forces to be computed
and values are saved in the appropriate common blocks. For this to work
properly, NBONDS
, HBONDS
, and CODES
must have been
called. This can be done by executing both the NBONds and
HBONds command, or by having previously found the energy
(minimization, dynamics, etc.).
CALL PRINTE(IUNIT,ICYCLE,LHDR)
will write the current energy
values (from common block values) to the specified unit, IUNIT
.
It will also write out the cycle or iteration number, ICYCLE
and
optionally write out the standard header if LHDR
is TRUE
.
Go to the previous, next section.