Go to the previous, next section.

The Implementation of CONGEN

The implementation of CONGEN is more complex than most molecular modeling programs at present. Such complexity is necessary because of the age of many parts of the code, the desire to preserve useful functions from CHARMM, the requirements of portability, and the need to provide as much functionality as possible. This chapter of the manual and the following one on storage management are essential to anyone interested in modifying the program.

CONGEN is implemented as single program. As a result, it is big. However, because of the use of dynamic storage allocation, it requires less initial storage than many contemporary modeling programs. By placing everything together, the task of modifying the program is made more reliable because errors in modifying the program are more likely to be noticed. This philosophy requires that testing be an integral part of the implementation of CONGEN. Ideally, there should be tests that exercise every line of code in CONGEN. As changes are made, the pre-existing tests are run to verify that changes were made correctly. As new features are added, new tests are constructed to ensure their correct operation.

CONGEN is implemented using FORTRAN, the FLECS preprocessor, and C. The reason for this mix is largely inertia. Early work on CHARMM was in FORTRAN because of its "easy" portability and convenience for numerical processing. Later, FLECS was used to make the program easier to read and modify. When the conformational search code was added, recursion became essential which lead to the use of C. There are no plans to rewrite CONGEN into one language because it is anticipated that others may wish to add Fortran code into the program.

The use of two languages in CONGEN requires an interlanguage interface. Most modern computers provide a mechanism for communication between C and Fortran. However, these interfaces are invariably machine dependent. In order to reduce the portability problems this entails, the program, wrapgen, has been written to localize the machine dependencies into one place, and free the CONGEN programmer from this problem while working on the regular code.

Besides the problem of the interlanguage interface, portability is a very important aspect in the implementation of CONGEN. In general, most operations in CONGEN are implemented using standard language features, but there are instances where machine dependent features are essential. In order to accommodate these machine dependencies, all the source code in CONGEN is run through a preprocessor. For the C code, the normal C preprocessor is used. For the FLECS/Fortran code, a modified version of the GNU Emacs C preprocessor is used, namely fcpp, or Fortran C PreProcessor. Certain C preprocessor variables are defined which indicate compilation on a particular machine, and these variables are used to determine which code to compile. A configuration file, `config.h' is used to determine the settings of generic preprocessor variables.

Please note that this section of the manual is intended to be an overview. There is no substitute for studying the source code carefully. Although it is an important goal to keep this documentation up to date, one should not trust this documentation too much since it is effectively a copy of information which is always being changed. If there are any doubts about accuracy, the source code is the final arbiter.

The Structure of CONGEN

CONGEN has a very simple organization. The main program is primarily a command dispatcher. It reads each command from the input file, parses the first word which is a command verb, and executes code to parse the remainder of the command and then perform its function. Data storage is simple in principle -- there are several dozen COMMON blocks which store information about the system being modeled, and the commands normally act to use or modify these COMMON blocks. Thus, many of the subroutines are independent of each other since they only depend on the data in the COMMON blocks.

Within CONGEN, there are also a large number of subroutines and functions that provide a convenient programming environment. For example, there are string manipulation routines, array manipulation routines, storage management capabilities, and operating system interfaces. Most of these routines are found in the source files; `string.flx', `array.flx', `util.flx', `cutil.flx', and `cgutil.flx'; and others can be found scattered throughout the program. The programming tool, autodoc, see section Programming Tools, can be used to list the comments from the subprograms within FLECS files.

Programming Environment

CONGEN is implemented using Fortran 77, the FLECS Fortran preprocessor, and C. All code is passed through a C preprocessor in order to handle conditional compilation. In general only standard features of the languages are used, but this rule is violated when there are strong reasons for doing so. (For example, the six letter maximum length of identifiers in Standard Fortran is far too onerous to bear, so the C limit of 31 characters is used, and there is a tool, makeshort, which can generate C preprocessor definitions to reduce the size of the variables to whatever machine dependent limits are necessary).

The management of revisions is done using RCS, the Revision Control System. We use version 5.5 obtained from the Free Software Foundation. Every source file should be kept under RCS control.

Use of Fortran and FLECS

The main reasons for using Fortran as a base language is its widespread usage among the molecular modeling community and its wide availability. Most hardware manufacturers concentrate their compiler optimization capabilities into Fortran. However, Fortran lacks good control constructs, data structures, and operating system interfaces, so steps have been taken to circumvent these limitations. The control constructs are provided by using FLECS, data structures are provided through either an elaborate storage management scheme (see section Storage Management in Fortran) or by using C, and the operating system interfaces are generally provided using C.

FLECS is a Fortran preprocessor that allows us to use a much more robust set of control constructs than normally provided in Fortran. This allows for much more readable and understandable code than could be obtained via Fortran alone. In addition, the use of FLECS allows the development of new programs much more quickly because much less time must be spent working out the flow of control. It is described in detail in section 'Introduction' in FLECS Manual.

Specifically, FLECS provides a block structured IF -- THEN --- ELSE clause, a structured iteration clause, WHILE and REPEAT WHILE clause as well as their inverses, CONDITIONAL and SELECT clauses, and internal procedures. The internal procedures are especially valuable because they allow long subroutines to be broken into small pieces while leaving all variables accessible. In addition, one can give very long names to these procedures which makes their purpose far more clear. For example, INTERPRET-COMMAND-AND-BUILD-CLAUSES tells you a lot more than CALL INTCBC or GOTO 285 does.

FLECS operates by translating any FLECS constructs into Fortran. Any non-FLECS constructs are merely copied. The Fortran compiler must be then be invoked to compile the translated code into machine language.

To invoke FLECS, type FLECS file1 file2 .... Any number of files can be translated. The default file extension for FLECS programs is `.flx'. The Fortran translations will be produced in files whose extension is `.f' or `.for' depending on the machine FLECS is executed on. The FLECS listing files appear with extension, `.fli'.

Concerning the portability of FLECS, FLECS was written in itself. It was designed to be transported and has been modified to use the C preprocessor to encode machine dependencies.

Not all of CONGEN has been written using FLECS, as the preprocessor was not adopted until August 1979. All new code should be written using it, and old code in needed of major modification should use FLECS as well.

Two non-standard feature of most Fortran compilers is routinely used, variable name lengths and memory overlays. Variable names can be any length. There is a programming tool, makeshort which can be found in `$CGP', which can be used to translate all long names into short names. (In this day, six character variable names is really quite an anachronism!)

The dynamic storage allocation schemes, see section Storage Management in Fortran, depend on mapping arrays of one type onto arrays of another type. It is necessary that REAL variables occupy the same amount of space as INTEGER variables. Numeric arrays are not mixed with character arrays, which avoids many alignment problems.

Use of C

C was first used in CONGEN for implementing the conformational search algorithm. This algorithm required complicated data structure processing and recursion, for which the language is eminently suitable. Later on, C was used for operating system interfaces because many of the operating system calls in C are portable. For example, the storage management Fortran library uses malloc in the C library to obtain additional storage from the operating system without a requirement for machine dependent code.

One limitation in the use of C is I/O. The Fortran and C I/O libraries are not compatible. Therefore, all I/O from C must be done using Fortran routines. See the source code for for_printf, for_scanf, etc. in `cutil.c' for routines which provide this functionality.

FCPP -- The Fortran C Preprocessor

FCPP is a modified version of the GNU Emacs C Preprocessor. This preprocessor provides all the functionality of the old style C preprocessor as well as #elif and the macros; __LINE__, __DATE__, __FILE__, and __TIME__. In addition, it contains additional features for Fortran code, and it has been ported to the VAX/VMS operating system in addition to other Unix platforms.

It is used on all Flecs code in CONGEN, and is described in greater detail in the fcpp manual page.

Note:, since fcpp is derived from the GNU Emacs C Preprocessor, it is licensed under the GNU Emacs License, which provides for more freedom to redistribute it than is available with CONGEN itself. Please see the license text in the source file, `cccp.c', or in your CONGEN license agreement, which incorporates the GNU Emacs License.

WRAPGEN -- The Wrapper Generator

The program, wrapgen, is used to generate the interfaces between Fortran and C. It is described in greater detail in the wrapgen manual page. Briefly, wrapgen takes a file of prototypes which describe functions written in either C or Fortran, and it generates procedures which can be called from the Fortran or C, respectively. The wrappers takes care of the machine dependencies inherent in character string representations, character string lengths, call by value or reference, and naming rules, so that the programmer need not be concerned about these details. All source code must include a wrapper header file generated by wrapgen in order to correctly handle the substitution of wrapper function names.

MKPROTO -- Make C Function Prototypes

The program, mkproto, will generate ANSI C function prototypes for all the procedures in a C source file. It is described in greater detail in the mkproto manual page.

In CONGEN, mkproto is used to avoid duplicating all the function definitions when preparing prototypes. Currently, it is used for all C files, except for `tree23.c' and `noproto.c'. The file, `tree23.c', is not run through mkproto because it is anticipated that it will be released as a separate procedure. The file, `noproto.c', is used for procedures for which mkproto doesn't work correctly. Presently, this occurs only with files containing machine dependencies where function definitions require the declaration of structures that are specific to a machine. Any necessary prototypes from this file can be put directly into `funct.h'.

The Configuration File, `config.h'

In order to simplify the use of machine dependent features, preprocessor variables have been defined which describe particular features. For example, the declarations of double precision variables in Fortran is given by the symbol, DOUBLE_TYPE. On most machines, this translates into DOUBLE PRECISION, but on the Cray, it translates to REAL. The source code file, `config.h', contains all these configuration variables. The values are set using a small number of machine identifiers which are set by the makefiles used to build CONGEN.

Making CONGEN

CONGEN is constructed using either the make command on Unix machines or the MMS utility on VMS. There is a master `makefile' or `descrip.mms' file in `CG', which will invoke all the necessary `makefile's needed to build the entire system. These `makefile's work fine on most machines, but have trouble on the Convex because their make program is older.

In order to handle the different compiler names and flags necessary for each different machine, many of the CONGEN directories have a file named `makefile.gen' which is missing definitions for these macros. In the `CG' directory, there are a set of `*.make' files which contain the macros needed for each type of machine. The names of the `*.make' files is as follows:

`sgim?i?'
There are several files for Silicon Graphics Iris 4D workstations. The digit following the "m" gives the MIPS instruction set (R3000 is 1, R4000 is 2, R8000 is 4). The digit following the "i" gives the major release of Irix (5 or 6). Currently, there are three versions of these files, `sgim1i5.make', `sgim2i5.make', and `sgim6i4.make'.
`unicos'
Cray YMP running Unicos.
`rs6000'
IBM RS/6000 workstation running AIX.
`convex'
Convex computers.
`hpux'
Hewlett Packard 700 series workstations running HPUX.
`sparc'
Sun Sparcstations using gcc as the C compiler.
`alphaosf'
DEC Alpha machines running OSF.
`fujitsu'
Fujitsu VP240.

There is a program, fixmake, in the `CG' directory which will take one of these files, and incorporate into a `makefile'.

The selection of machine is done via the CGPLATFORM environment variable which is set in `$CG/cgdefs' or `$CG/cgprofile'.

The master `makefile' has several different targets that can be specified to build parts of the system. Use them with care. They are listed as follows:

prepare
Prepare for rebuilding the entire system. This will set up to rebuild everything including binary data files. It should be used only when porting CONGEN to a new machine, or when file structures are changed.

clean
Clean up unnecessary copies of executables, intermediate data files, objects, etc.

all
Make everything in place. Executables made by this target are not copied to the directories which are included into user PATH environment variables.

install
Make everything and put them where users will access them. Note that the install script from the X window system is used.(36)

tovax
Copy files to dino. This should be used only at Bristol-Myers Squibb, but will not have any effect on other sites except if the file name `/u/dino/fc/congen/...' leads someplace real.
setup
This will construct all the `makefile's from the `makefile.gen's.

In order to construct CONGEN from scratch on a brand new machine, the following steps must be followed:

  1. Modify `$CG/cgdefs' and `$CG/cgprofile' to reflect your directory structure and the hardware platform.

  2. Create a platform file in `$CG' which specifies the necessary switches for compilation.

  3. Set your default directory to `$CG'.

  4. Do a make prepare.

  5. Do a make install.

  6. Compare the test cases found in `$CGT'.

Standards (Rules) for Writing CONGEN Code

The following set of rules is designed to help keep CONGEN readable and modifiable.

  1. All routines should have similar organization.

    Each Fortran subroutine should have the following structure:

          SUBROUTINE DOTHIS(ARG1,ARG2,....
    C
    C     A comment which describes the purpose of this subroutine.
    C     This comment is essential because it provides the only
    C     documentation for nearly all subroutines in CONGEN. The
    C     program, AUTODOC, can be used to get this comment from all
    C     subroutines.
    C     
          <Declarations>
    C
          <Code>
    

    The separation of the code from the declarations by a blank comment aids in reading the code. It becomes obvious where executed code begins.

    Each C procedure should be written like this:

    dothis(type par1, type par2,...)
    /*
    *   A comment which describes the purpose of the routine. This
    *   comment must come here so that automatic documentation can
    *   be implemented (a program similar to AUTODOC is planned.)
    */
    
    {
        local declarations
    
        code
    }
    

  2. Prototypes for all the C functions should be provided through the use of mkproto, see section MKPROTO -- Make C Function Prototypes. See the makefile for CONGEN for details of the mechanism. Note that not all files can be processed correctly, such as functions which are declared with structures or types that are machine dependent. All such functions should be placed in the file, `noproto.c'.

  3. All source files must have a copyright notice at the top. See any source file for the appropriate text.

  4. All C source files must include `config.h' and `wrappers_c.h' in that order. It is a good idea to use an existing source file to provide the copyright and includes to get started.

  5. All FLECS source files must include `config.h' and `wrappers_f.h'.

  6. All code should be written clearly. Since the code must be largely self-documenting, clarity should not be sacrificed for insignificant gains in efficiency. The use of C and the FLECS preprocessor is encouraged as it graphically illustrates the flow of control and allows for internal procedure calls. Variable names should be chosen with care so as to illustrate their purpose. Avoid using one or two letter variable names in any COMMON blocks. Comments should be used where the function of code is not obvious.

  7. All usages of integers, floats, doubles in C code must use the F77_INTEGER, F77_REAL, and F77_DOUBLE macros defined in `config.h'. Boolean variables should use the BOOLEAN macro, and Fortran logical variables defined in Fortran should use the F77_LOGICAL macro. The F77_INTEGER macro should declare to a long int. If not, you must review calls to the different scanf and printf functions to ensure correct typing.

  8. Be careful to distinguish between Fortran 77 logical variables and C integers being used to hold Boolean variables. The testing conditions are machine dependent. Use the macros provided in `macros.h' for Fortran logicals.

  9. Be careful that the type of any numeric constant match correctly with its usage across the various platforms that CONGEN is implemented on. For example, there may be problems with using a real constant in an intrinsic function call with multiple variables in Fortran, eg. SIGN(1.0,P). If P is DOUBLE_DECL, you will have a problem because DOUBLE_DECL can map to either REAL or DOUBLE PRECISION depending on the machine. In such cases, it is better to use a variable or parameter to store the constant.

  10. All usages of DOUBLE PRECISION variables in Fortran must be declared using the DOUBLE_DECL macro. This allows CONGEN to switch double precision variables to single precision on 64-bit computers.

  11. Any variable in Fortran code that holds a pointer to be used by the C code must be declared using the POINTER_DECL macro. On 64 bit architectures, this macro will expand to 64 bit integers. The equivalent type in C is given by the F77_POINTER macro.

  12. Any subroutine defined in C which can be called from FLECS code must have its prototype entered into the source file `wrap_cdef.proto'. Likewise, any subroutine defined in FLECS which can be called from C must have its prototype entered into the source file `wrap_fdef.proto'.

  13. Whenever Fortran common blocks are accessed within C, you must use the predefined macro for the common block name. The macro is the upper case name for the common block. Header files (suffix `.h') are defined for all common blocks used in C code. For example, if you want to refer to the X coordinate array in C, use COORD.x.

  14. There are number of rules associated with input and output:

    1. All input commands should be free field. The command processor should check that the entire command is consumed.

    2. Short outputs, messages, warnings, and error should be sent to unit 6 for output.

    3. All inputs should be echoed to unit 6. All values read by the command should also be output to unit 6.

    4. All warning and fatal messages should state what subroutine generated it, so that one find the location in the source code where the problem arose.

    5. All data structures output with unformatted I/O statements must have a HDR, ICNTRL, and TITLE in the first two records. See any existing binary output subroutines for the exact format.

    6. Unformatted I/O file formats should remain upward compatible. Use an ICNTRL array element to indicate which version of CONGEN wrote the file. Such upward compatibility must be maintained only across production versions of CONGEN. In other words, a file format for the developmental version may be freely changed until a new version is generated, at which point all future versions must be able to read it.

    7. All I/O must be done through Fortran I/O. C I/O is not to be used. See the procedures in the source code file, `CUTIL.C', for useful analogs of C I/O functions to make this rule easy to follow.

  15. All error conditions must terminate with a CALL DIE. The subroutine, DIE, provides a traceback or core dump so the program statements causing the error can be seen.

  16. Large or variable storage requirements for Fortran code must be met on the stack or heap. In C, cgalloc and cgfree should be used for all variable storage needs.

  17. Array overflows must always be checked for when arrays are being written. This is especially important when the array being constructed might be dynamically allocated. Error checking in general should be as complete as feasible.

  18. The code should use a minimum of non-standard Fortran or C features. All non-standard features must be conditionally compiled so that any CONGEN programmer is informed that the code is special.

  19. In order to make subroutines callable from different contexts, parameter passing should be done through the subroutine call rather than through COMMON blocks.

  20. All common blocks which are shared between multiple subprograms are to be placed in files and #include'd into the program. The common blocks should have comments describing each variable in the common block so that new users will know what's there. No directory should be specified for the #include'd files, so that the -I option to the C preprocessor can be used to select the directory at will. If a common block is to be shared between C and Fortran code, use the existing code in the *.h files to implement the needed name equivalence.

  21. Avoid the use of static memory for initialization purposes. As more sections of CONGEN are implemented on parallel computers, making the subroutines reentrant is essential. Also, avoid the use of EQUIVALENCE and DATA statements in Fortran, since all storage referenced by these statements is allocated statically on the Iris.

  22. When using scanf functions in C, use only long int's or doubles for your I/O, and then convert to your type. This avoids the need to control for machine dependent variations in data lengths.

Programming Tools

Presently, there is one tool for assisting in the development of CONGEN besides the language tools described within this section.

The program, autodoc, will collect information on all the entry points in a large Fortran or Flecs program and write them out using several different methods. For each entry point, the program collects the module name if different than the entry name, the file that entry point is in, the definition line of entry point, and the first block of comments which hopefully document the function of the routine. The files to be scanned are specified on the command line, and are written to file whose name is requested from the user when the program executes.

When autodoc is run, it will read the command line for files, and if none are found, it will ask you for files. Then, it will ask for an output file. It will then scan the files, and subsequently, it will ask you if you wish to sort the entry points by name. If not, the output will be in the order the files were read. Then it will ask if you want the short form of the listing. The short form is all the information on each entry except the comments. You will then be presented a list of subroutines which have no comments.

CONGEN Test Cases

The test cases may be found in `CGT' (as well as their developmental counterparts). All of these file generate output files which are to be compared with previous runs. In addition, some of the tests will generate other files which have the same file name, and these should be compared too. Scratch files have file names of `FOO' and file types which begin with the file name. For example, `DYNTEST1' generates a number of scratch files named, `FOO.DYNTEST1_nn', where nn is the unit number. These files should be deleted when the runs complete. The CPU time listed below is given in minutes for version 2 of CONGEN running on a single CPU of a Silicon Graphics 4D/200 series workstation.

Test cases run on platforms other than the Iris can be found in subdirectories under `$CGT' whose names match the platforms. For example, the Cray test case outputs are found in `$CGT/unicos'.

All the tests are run using the equivalent of the RUNCG command. On Unix machines, there are makefiles in both directories for running the test cases, and on VMS machines, there is a `descrip.mms' file. A target of diffs will make difference files for all the test cases.

           CPU
           Time*
File Name (min) Purpose

AM94CYCLE  1.9  Test of AMBER94 using a cyclic peptide. Modified
                from CGCYCLE.
AM94GENER  0.1  Simple generation test for AMBER94.
AM94SPL    2.6  Test of splicing using AMBER94.
AM94TEST1  4.4  Construction of all major residues in the AMBER94
                topology file.
AM94TEST2  3.1  Repeat of first AMBER 3 demonstration run, energy
                calculation of alpha-lytic protease.
AM94TEST3  0.7  Simple conformational search testing AMBER94 energy
                calculation.
AM94TEST4  0.4  Test minimized ring constructions for AMBER94.
AM94TEST5  0.1  Checks the backbone and sidechain degrees of freedom
                work correctly at the endpoints of chains and
                prolines. Modified from CGTEST9.
AM94TEST6  0.6  Test parser errors when AMBER94 is used. Modified
                from CGTEST1.
AM94TEST7  2.3  Test D amino acid construction. Modified from CGTEST14.
AM94TEST8  2.2  Test AMBER 94 amino acid constructions. Modified from
                CGTEST15.
AMTEST1    0.4  Amber test 1, check terminal charges, part 1
AMTEST2    0.3  Check terminal charges, part 2
AMTEST3    0.6  Check conformational search with DNA protein complex.
AMTEST4    0.1  Test multi-term torsion term and conformational search.
AMTEST5    0.1  Test hydrogen bond term.
AMTEST6    0.3  Test amino acid construction in conformational search.
AMTEST7    0.2  Test antibody loop construction using AMBER potential.
BRBTEST    0.1  Tests Builder, Newton-Raphson minimization, and
                vibrational analysis.
CGCYCLE    6.4  Tests construction of cyclic peptides.
CGFIX      0.2  Test fixed atom construction in CONGEN.
CGFIX2     1.2  Test mixture of fixed atom and regular construction
                in conformational search.
CGHBUILD   0.2  Tests partial sidechain construction in the context of
                rebuilding hydrogen bonds.
CGMERGE    0.2  Tests merging of conformation files.
CGPARA1    1.8  Tests parallel processing in searching. The time
                given is the elapsed time.
CGPBE     17.1  Tests use of Poisson-Boltzmann equation with
                conformational search.
CGPBE2    21.4  Tests parallel implementation of PBE with
                conformational search.
CGPBE3     3.7  Test parallel conformational search using
                serial PBE evaluation.
CGRAND     0.3  Tests random node evaluation.
CGRESTART  2.1  Tests restarting when directed searching is done and
                MIX strategy used. (Currently fails on the CONVEX in
                malloc. No real idea why).
CGRESTART2 1.6  Repeat of CGRESTART, but without restart step. Output
                should match CGRESTART except for command processing.
CGRESTART3 6.9  Tests restarting when depth first search is used.
CGRESTART4 6.8  Repeats CGRESTART3 without restarting. Output should
                match CGRESTART3 except for command processing.
CGTEST1    0.1  Checks the CGEN parser. Many error messages are tested
                and no conformation file is written.
CGTEST2    0.2  Check ALL and FIRST sidechain construction options
CGTEST3    0.2  } Together, CGTEST3 and CGTEST4 check that the optimization
CGTEST4    0.7  } of the sidechain search for FIRST and ALL in the case
                  where sidechains interact. CGTEST3 has the optimization,
                  whereas CGTEST4 omits it. The CG files generated by both
                  tests should match each other except for the first
                  record, but CGTEST4 should take more CPU time.
CGTEST5    0.3  } CGTEST5 and CGTEST6 verify that the CLSA optimization 
CGTEST6    0.5  } used with backbone degrees of freedom works correctly.
                  The CG files should be the same, but CGTEST6 should take
                  longer to get the results.
CGTEST7    0.6  Checks the energy calculations in the sidechain degree of 
                freedom.
CGTEST8    0.5  Checks esoterica of CLSA and CLSD options
CGTEST9    1.9  Checks backbone termini processing and handling of
                prolines in both backbone and chain closure.
CGTEST10   0.9  Checks all sidechain construction options
CGTEST11   0.2  Tests van der Waals avoidance and Nosymmetry options
                in a single sidechain construction.
CGTEST12   2.6  Test of van der Waals avoidance in context of full
                search. Iterative option.
CGTEST13   0.9  Similar to CGTEST12 except Independent option used.
CGTEST14   6.4  Test of D amino acid construction and all amino acid 
                sidechains
CGTEST15   6.4  Similar to CGTEST14, except we test the all hydrogen
                topology file.
CGTEST16   1.1  Simple test of overlapping degrees of freedom.
CGTEST17   0.5  Second test of overlapping degrees of freedom (sidechains).
CGTEST18   0.7  Test of coordinate writing and energy display filters.
CGTEST19   1.9  Test of ALLCISTRANS options.
CGTEST20   0.1  Test of other non-bonded energy calculations.
CGTEST21   0.1  Test RDEPTH search option.
CGTEST22   1.5  Test cavity energy calculation.
CGTEST23   1.7  Test combination of cavity and PBE energies.
CGTEST24   0.5  Test Worst RMS evaluation option.
CGTEST25   0.5  Test SGRID SELECT and AUTO options.
CONGEN     0.3  A simple conformational search over five residues
CONGEN2    0.4  A two part conformational search over five residues
CORMANTST  0.1  Tests some coordinate manipulations.
CORTST1    0.1  A virtually worthless test of the correlation functions
DELTEST    0.1  Tests deletion by value in the analysis section
DJSTEST    0.1  Tests ABNER
DRAWTEST   0.1  Tests drawing capability of the program.
DYNTEST1   0.2  A series of tests on the dynamics algorithms. Not a complete
                test. Checks Gear and Verlet algorithms, SHAKE, ability
                to fix atoms in place. Also checks that the analysis
                facility can rotate a trajectory with respect to a fixed
                coordinate set. Some simple checks of dynamics analysis

                are also present.
GAUSSIAN   6.0  Test of interface to Gaussian 92.
GENERTEST  0.1  Tests some of the generation and patching routines.
GEPOL      0.1  Test GEPOL surface calculation.
GEPOL2     4.9  Test incremental GEPOL options.
GEPOL3     4.8  Another incremental GEPOL test.
H2OTST     0.1  Runs a water dimer to convergence and a true minimum. Also
                tests TLIMIT option.
HBCOMP     0.5  A self comparison of hemoglobin. Tests the comparison
                command in the analysis facility
HBMBCOMP   1.4  A comparison of hemoglobin to myoglobin. Tests comparison
                command and construction of difference tables.
ICTEST     0.1  Tests the routine that deal with internal coordinates.
IMH2OTEST  0.4  Water with periodic boundaries
IMST2TEST  0.5  ST2 water with periodic boundaries.
IMTEST     0.1  Checks Images for a small system with C2 symmetry.
JTEST1     0.2  J coupling calculations on one leucine.
JTEST2     0.2  Ensemble averaging of J coupling calculations on two
                leucines.
JTEST3     0.2  J coupling calculations on one leucine with J errors.
JTEST4     0.3  Ensemble averaging of J coupling calculations on two
                leucines with joining with convergence tests.
JTEST5     0.1  Four leucine J coupling, ensemble averaging test with
                real data.
NANATST1   0.8  Tests most of the features of the analysis facility
NANATST2   0.7  Tests more features of the analysis facility
NANATST3   1.5  Tests the dynamic properties in the analysis facility
NOETEST    0.2  Tests NOE constraint calculations and calculation of
                energy derivatives.
NOETEST2   0.1  Test NOE ensemble averaging on a three atom system
NOETEST3   0.1  Test NOE ensemble averaging on a four atom system
NOETEST4   0.5  Test NOE code on a larger system.
NOETEST5   0.1  Test NOE code on beta hairpin using real data.
PBETEST    9.1  Test Poisson-Boltzmann electrostatics.
PBETEST2   1.5  More PBE testing. Thorough testing of options.
PBETEST3   0.1  Test of dielectric smoothing.
PBETEST4   0.1  Test of dielectric cavity
PBETEST5   0.3  Test of cavity in a Debye-Huckel fluid.
PBETEST6   4.2  Test of molecular surface usage in PBE code.
PBETEST7   5.5  Test of charge anti-aliasing
PBETEST8   1.1  Test of dielectric smoothing in a protein.
PBETEST9   0.1  Test of dielectric combination rules
PBETEST10  1.6  Test of dielectric constant modification based on
                accessible surface.
PBETEST11  0.2  Test of margin option.
PDBTEST1   0.4  Test #1 of Brookhaven Data Bank reading. Read tendamistat.
PDBTEST2   0.7  Test #2 of Brookhaven Data Bank reading. Read Fab KOL.
READTEST   0.1  Incomplete test of coordinate reading.
READTEST2  0.1  Test of sequence reading by atom.
SEARCHNOE  0.3  Tests conformational search with NOE's and also
                runs some simple tests of All Hydrogen construction.
SPHERE     2.7  Rudimentary test of sphere drawing.
ST2TEST    0.2  ST2 water without boundary conditions.
SURFTST    0.2  Checks the accessible surface calculation
TEST       0.1  Short test that hits a lot of stuff. Must always be run.
TESTCONS   0.7  Tests the harmonic constraints.
TESTCONS2  1.9  Tests the interaction of dihedral and J coupling constraints
                with the conformational search.
TESTHB     0.1  Test hydrogen bond calculations.
TESTPARM   0.1  Test AMBER parameter reading code.
TESTRTF    0.4  Tests the RTF I/O commands, and a simple test of PEER
                output
TESTRTF2   0.1  Test of charge generation in the RTF code.
TESTRTF3   0.1  Test automatic generation code on three and four membered
                rings.
TESTSEL    0.4  Tests the atom selection routines and use of wildcards
                in commands in the analysis section
TESTSPL    0.2  Tests SPLICE command
TRANSFORM  1.4  Tests coordinate transformation commands.
TWIST      0.1  Tests TWIST command in the analysis facility
VIBRTST    0.1  Tests vibrational analysis

* The CPU time is for code not optimized by the compiler.

Modifications to CONGEN

The following steps should be taken when making a change to CONGEN. They are intended to ensure that the change will be maintained in the future and does not unwittingly affect other program functions.

  1. If you have not already done so, establish a directory of your own for working on the source, and set up a symbolic link to the source RCS directory, `$CGS/RCS'.

  2. Make your modifications and debug them. Please follow the guidelines in section Standards (Rules) for Writing CONGEN Code, so that the code will be consistent. Use either make (on Unix systems) or MMS (on VMS systems) to rebuild the program.

  3. Run the standard test case and conformational search test case and compare them. On a Unix machine using the C-shell, do the following:

    cd $CGT
    make test.dif congen.dif
    more test.dif congen.dif
    

    On VMS, do the following:

    $ SET DEFAULT CGT:
    $ MMS TEST.DIF,CONGEN.DIF
    $ TYPE TEST.DIF
    $ TYPE CONGEN.DIF
    

    The files should be identical except for the first four lines, version numbers or locations of files, and the last few lines giving the free list on the heap. If they are different in any other way, you must be able to prove that the results are correct. If you change any commands, the test case must be modified so that it will give the same results as before if possible. If you cannot duplicate the test case, you must eliminate your changes.

  4. Run all the test cases in `CGT'. Use either make diffs on Unix machines, or MMS DIFFS on VMS machines. Any signigicant changes must be accounted for.

  5. If your modification involves a new feature, you must either modify an existing test or make a new test to demonstrate and check its operation. See section CONGEN Test Cases, for a description of the tests currently available. If you add a new test, please update that node. WARNING: Any additions made without this will stop working as the entropy of programming randomizes your code without detection.

  6. Checkin your change (using the co command), and enter a good descriptive log entry for what you have done.

  7. If your change involves adding or modifying a command or adding or modifying a feature, modify existing documentation or if none is available, make new documentation. Recreate the INFO file and the manual using the makefile in `CGD'.

  8. If you modify or add new energy functions, use the TEST command, see section TEST Command -- Test Internal Functions, to verify that the derivatives of your energy calculations are correct.

Making New Versions of CONGEN

This section of the manual is not complete, but is left as a guide for future work on the process of generating new versions.

This section describes the steps in generating a new version of the protein system. It is constantly in flux and should be viewed as a guide.

  1. Make sure the version number and date in opening output of CONGEN.FLX is correct for this new version.

  2. Relink CONGEN if necessary.

  3. Redo a make depend in those directories where it is supported.

  4. Run all the test cases and compare against previous versions.

  5. Recompile the program with optimization, and compare results again.

  6. Rebuild the documentation.

  7. Clean up all directories of garbage.

  8. Make the tar files for distribution.

  9. Do a global setting of symbolic version number for this release.

  10. Backup the directory tree for posterity.

Installation of CONGEN on VMS

The installation of CONGEN on VAX/VMS is a very straightforward process. The files are organized so that CONGEN can be installed by either a system manager or an individual user without privilege. There are only a few steps to be taken:

  1. The tape on which CONGEN is shipped contains a single saveset, CONGEN.BAC. Restore the tape while preserving the directory structure into a directory of your own choosing, e.g.
    $ BACKUP MUA0:CONGEN.BAC [CONGEN...]
    
    You will need about 100000 blocks to restore the saveset.

  2. Modify the file, `[CONGEN.V2]CGDEFS.COM', to reflect your own directory structure.

  3. Change either the system site specific startup file, `SYS$MANAGER:SYSTARTUP.COM', or your `LOGIN.COM' file to include a call to `CGDEFS' to set up logical names. Use an argument of SYSNAM or JOBNAM as appropriate.

  4. Change either the system `LOGIN.COM' file or your `LOGIN.COM' file to include a call to CGDEFS to define commands, thusly, `@CG:CGDEFS COMMANDS'

  5. You may wish to delete rarely used files such as the bulk of the test cases or source code object files.

  6. Copy the INFO files (`congen', `congen-*', `flecsdoc', and `flecsdoc-*') in `CGD:' into the GNU Emacs manual directory and modify the INFO directory file so GNU Emacs can access the CONGEN documentation.

  7. It is helpful if GNU Emacs is installed so you can read the documentation online.

Installation of CONGEN on UNIX

The installation of CONGEN on UNIX is a very straightforward process. The files are organized so that CONGEN can be installed by either a system manager or an individual user without privilege. There are only a few steps to be taken:

  1. There are two parts to the CONGEN distribution, the CONGEN directory tree and the local file tree. Each part is shipped as a compressed tar file, and they are named, `congen.tar.Z' and `local.tar.Z', respectively. Depending on the capacity of the tape that is written, you may receive one or two tapes. These tapes will contain a tar'ed version of these compressed tar files (yes, they are tar'ed twice).

    Restore the tapes while preserving the directory structure into a directory of your own choosing, e.g.

    tar xvfo /dev/tape
    zcat congen.tar.Z | tar xvfo -
    zcat local.tar.Z | tar xvfo -
    

    You should substitute the appropriate device name for /dev/tape.

    You will need about 400000 blocks to restore the two tar files, although many of the test case directories and RCS directories can be deleted. It is possible that your tape drive swaps bytes, and if so, the initial tar operation will fail with a bad checksum. In that case, try the following command instead of the first one above:

    dd if=/dev/tape conv=swab | tar xvfo -
    

  2. Modify the files, `./congen/cgdefs' and `./congen/cgprofile', to reflect your own directory structure. The files, `cgdefs' and `cgprofile', contain some special code for the definition of CGROOT which is used at our site to allow us to have two copies of CONGEN on different machines, and to switch based on the machines' availability. You can remove all the conditionals, and simply give it a definition.

  3. Change either the system wide profile or `.cshrc' file or your own profile or `.cshrc' file to source `cgdefs' or `cgprofile'.

    The file, `cgdefs', is for the C shell, and the file, `cgprofile', is for the Bourne or Korn shells. Once these files are executed, all of the commands will work.

  4. Examine the contents of the local tar file, and install those utilities that you do not have. A brief description of what's present:

    rcs
    We use rcs version 5.5 for CONGEN, and it is slightly incompatible with version 3 supplied with Irix 4.0. On our machines, we have set up our PATH variables so this rcs takes precedence over the system supplied version.

    makeinfo
    Essential for rebuilding the CONGEN documentation. This contains bug fixes over the version normally supplied with GNU Emacs.

    texindex
    Essential for rebuilding the CONGEN documentation or any texinfo document.

    gnuemacs
    A version of gnuemacs that runs on Irix 3.3.

    gnudiff
    A much faster version of diff. It is used by rcs. On Irix 5.0 and higher, SGI has switched to using gnudiff, so this directory is no longer needed.

    gnumake
    The GNU make. It is not used in CONGEN on the Iris, but is useful on the Convex since their make is older.

    lib
    A few procedures needed for the GNU programs above.

    gnugrep
    A faster and more powerful version of grep.

  5. Change your default directory to `$CGT', and run two tests as follows:

    make test.dif congen.dif
    

    Examine those two files. The only differences you should see are file names, version numbers, dates, allocations in the heap, and execution times.

  6. You may wish to delete rarely used files such as the bulk of the test cases or source code object files.

  7. Copy the INFO files (`congen', `congen-*', `flecsdoc', and `flecsdoc-*') in `$CGD' into the GNU Emacs manual directory and modify the INFO directory file so GNU Emacs can access the CONGEN documentation.

  8. It is helpful if TeX is installed. This will allow you to modify the documentation.

Go to the previous, next section.