Go to the previous, next section.

Conformational Search

The CONGEN command performs a conformational search over a set of degrees of freedom that you specify. The first section in this chapter provides an introduction to this command. The remaining sections describe how to use the command.

Conformational search can take advantage of parallel processing in a highly efficient manner. See section PARALLEL Command, for information on how to enable parallel processing. In addition, there are some options which affect the efficiency of the processing as described below.

There are a number of commands to assist you in searching conformational space. See section Pointers to Relevant Programs, for more details.

N.B. The conformational search is designed to work only with explicit hydrogen or all hydrogen topology files.

Overview of Conformational Search

The conformational search process is a sampling over degrees of freedom within a macromolecule. With the CONGEN command, the term, "degree of freedom" is used somewhat more freely than in the statistical mechanical sense. It means any operation that determines any number of atomic positions (including zero) and which can be iterated at least once over some variable. The reason for this generalization is to allow input, output, and energy evaluation operations into the course of the search in an simple and powerful way.

The sampling process is a series of nested iterations applied over all the degrees of freedom in the order specified by the user. All of the variables are sampled discretely, although there are provisions to solve for certain variables over an continuous range where constraints may be applied.(13) Thus, the computer time required for a search grows exponentially with the number of degrees of freedom. It is easy to set up a run that could run for the age of universe.

There are several different methods for directing the search process. The simplest method is a depth-first search where the program tries every sample in turn using an algorithm that requires a minimum of temporary storage to keep track of its progress. There are also methods for sampling based on the quality of the partial conformations, and these techniques can result in better quality conformations being generated early in the search process. It is also possible to generate random structures. section Overview of Directed Searching, for more information.

The program as described in the Biopolymers 1987 paper was originally designed to search the conformational space of a single polypeptide segment within a protein. The version described here provides that capability in a more general way, so that multiple segments can be searched, the local environment around a segment can be considered, or terminal segments can be sampled.

The degrees of freedom presently implemented can be divided into three categories, those which construct atoms within the system, those which do I/O, and finally, one for the evaluation of the conformations.

There are three degrees of freedom involved with construction; Backbone, Chain Closure, and Sidechain; and together, they can search over a polypeptide segment. In addition, by creating new sidechain topology files, the Sidechain degree of freedom can be adapted for any molecule. See section Sidechain Topology File, for more information. The backbone and chain closure degrees of freedom work together to construct the backbone for an internal polypeptide segment. The sidechain degree of freedom is used for making the sidechains. See the menu below for more description of these degrees of freedom.

There are two degrees of freedom for I/O, WRITE and RBEST. The WRITE degree of freedom writes to a CONGEN conformation file the position of all atoms constructed up to that point in the search along with the latest evaluation of the conformation, see section Conformations File. This file can be read back by the RBEST degree of freedom, it can be scanned for a particular conformation using the XCONF command, merged with other conformation files using the MERGE CG command (see section Commands to Assist in Conformational Search), and scanned with the CMPLOOP command (see section Pointers to Relevant Programs). The RBEST degree of freedom is used to read the best conformations from a CONGEN conformation file. By using this degree of freedom, real space renormalization (see H. Scheraga, Biopolymers (1983) 22, 1-14) can be implemented.

Finally, there is the EVL degree of freedom. EVL is used to evaluate the conformation currently being constructed. Any type of energy manipulation is possible, see section Minimization and Dynamics, but typically, only energy evaluation is done. The EVL degree of freedom can also be used for comparing generated conformations against a known structure, so that the theoretical limits of the sampling can be assessed. Finally, the EVL option can invoke a user written evaluation function or it can assign a random number to the evaluation of the conformation.

Although CONGEN was written for searching protein segments, it can be applied to arbitrary molecules. The sidechain degree of freedom reads a topology file, see section Sidechain Topology File, which can be used to describe the conformational degrees of freedom in any molecules. The sidechain degree of freedom is capable to searching any subset of the degrees of freedom, and therefore, search protocols like those used for proteins can be executed where the central part of a small molecule can be done exhaustively, and the peripheral moieties can be iteratively searched.

Because long searches are common, CONGEN can periodically save the state of a search in a checkpoint file and restart the run from such a point. In addition, the status of the run can be periodically written to a file which can be typed by the user as the program is executing.

When CONGEN performs a search, it initializes the positions of all atoms involved in the degrees of freedom. This prevents collisions between newly constructed atoms and their prior positions if any. If you are planning several sequential searches, then you should initialize the position of all the atoms involved (using a COOR INIT command, see section Purpose of the Coordinate Manipulation Commands).

In some cases, it is desirable to repeat a search over a particular degree of freedom. For example, consider the problem of finding structures which satisfy a set of NMR constraints. Because many of the constraints involve sidechain atoms, it is desirable to search sidechains after each backbone. However, since constraints bridge across multiple sidechains, it is desirable to rebuild all the sidechains after each new one is added. CONGEN will support this type of operation in general by examining if atoms in one degree of freedom are reconstructed by a later degree of freedom. If so, then these "overlapping" atoms will be removed just prior to the sampling of any degree of freedom which generates new atomic positions. Note that this approach can be quite inefficienct, but it might be improved if this capability proves to be useful.

There is a limited capability to treat part of the molecule as a rigid body while other parts are being searched. The backbone and sidechain degrees of freedom both have FIX options, see section Backbone Degree of Freedom, and section Sidechain Degree of Freedom, which specify that atoms be constructed with the same bond lengths, bond angles, and torsion angles that they had when the CONGEN command was invoked. This can be used to explore how two domains interact with one another when a linker joining them is flexible.

It is possible to include a cavity formation term in the energy function used by the conformational search. See section GEPOL Command -- Set GEPOL Defaults, for more information.

Overview of the Backbone and Chain Closure

To generate the positions of the backbone atoms, an extension (Macromolecules (1985) 18, 2767-2773) of the local chain deformation and chain closure procedure of Go and Scheraga (Macromolecules (1970) 3, 178-187) is used. Given fixed, oriented endpoints and a chain of bonded atoms containing six freely rotatable torsions, their procedure determines a set of values for the torsion angles that permit the chain to bridge the endpoints. The free torsions are the phi and psi angles so three is the minimum number of residues for which a search over an internal polypeptide segment can be performed (it is presumed that the omega peptide torsion angle is planar and normally trans, although cis peptides are considered as described below).

It turned out that the original Go & Scheraga procedure was overly restrictive, particularly for bridging regular structures like alpha-helicies. To avoid this problem, we have modified the method to permit limited alterations in the bond angles and the procedure is named CLSCHN. The main option controlling bond angle variations is MAXDT which gives the maximum variation from standard bond angles. Its default value is 5 degrees which improves its performance significantly while incurring a bond angle energy penalty of at most 1 kT per angle. The number of solutions obtained from the chain closure method is always even and has not exceeded eight in our experience with peptides.

The ring in proline creates special problems. The proline ring constrains the phi torsion to be close to -65 degrees; any deviation from -65 degrees distorts the ring. Prior to running CONGEN, we determine the minimum energy configuration of the proline ring (specifically, 1,2 dimethyl pyrrolidine) for a range of phi angles (+/- 90 degrees) about -65 degrees using energy minimization with a constraint on phi, and we construct a file (`PRO.CNS') which contains these energies and the construction parameters necessary to calculate the position of CB, CG and CD of the proline. All of these energies are adjusted relative to a minimum ring energy equal to zero. After a chain closure is performed, we discard any conformations which have a proline phi angle whose energy exceeds the minimum energy by more than the parameter, ERINGPRO. Generally, we use a large value for ERINGPRO, 50 kcal/mole, so CLSCHN does not overly restrict proline closures. We handle cis-trans peptide isomerization by trying all possible combinations of cis and trans configurations. The user has complete control over which residues can be built in the cis isomer. Since there are only three residues involved in the chain closure, this results in no more than eight (2^3) attempts at chain closure.

The backbone search of an N residue segment begins by using backbone degrees of freedom to sample the free torsions of N-3 residues and then using the chain closure degree of freedom to close the chain. As the free torsions are sampled, we can discard any segment if the end of the constructed chain is too far from the other framework end for closure to be possible. See section Backbone Degree of Freedom, for a description of the CLSA and CLSD options which control this process. The determination is made by calculating the distance between the last atom constructed and the other fixed endpoint and comparing that to the distance spanned by m peptides with all torsions being trans and all bond angles increased by MAXDT, where m is the number of peptides still to be constructed.

The direction of backbone construction is arbitrary, although the endpoints of the search are conserved regardless of the order. The N-terminus of the internal segment is anchored on the peptide nitrogen; the C-terminus is anchored on the alpha carbon. When the construction direction is from the N terminus to the C terminus, the first torsion to be sampled in a residue is the omega angle (which normally is sampled just at 180 degrees, and sampled at 0 degrees and 180 degrees for prolines). It determines the alpha-carbon and the peptide hydrogen positions. The phi angle determines the position of the carbonyl carbon and the beta carbon of the sidechain; and finally, the psi angle determines the carbonyl oxygen and peptide nitrogen of the next residue. When the construction is in the reverse direction; the psi angle determines the peptide nitrogen; the phi angle determines the carbonyl carbon of the preceding residue, the peptide hydrogen, and the beta carbon; and the omega angle determines the position of the preceding residue's alpha carbon and carbonyl oxygen.

Rather than treating each of the three torsion angles in a amino acid residue as three separate degrees of freedom, we combine them into a single degree of freedom. This permits the use of Ramachandran type plots to limit the range of phi, psi values to those that are energetically acceptable and found in known structures.

To determine the allowed phi,psi angles, the CONGEN command uses energy maps. These energy maps are stored as files, see section Torsion Angle Maps, and they are expressed in tabular form with entries composed of omega, phi, and psi angle values along with the energy for that angle combination. Thus, any arbitrary criterion may be used in place of the energy in these maps. There are three different types of maps, one for glycine, one for proline, and one for the other amino acids which are modeled by alanine. Typically, the glycine and alanine maps are computed by modeling a dipeptide and using the van der Waals energy, whereas the proline maps is computed using a dipeptide, but the energy in the map is the sum of the van der Waals energy plus the ring energy. The actual values for phi and psi in these tabulations are usually multiples of 15 degrees or 30 degrees with all possible combinations of angles present. Thus, these maps determine the sampling grid. For the alanine and glycine map, phi and psi both range over -180 degrees to 180 degrees, and the omega angle can be either 0 or 180 degrees. Normally, the omega angle of 0 degrees is not used. In the proline map, the range of phi angles is -150 degrees to 30 degrees; psi goes from 0 degrees to 360 degrees; and omega has values of 0 degrees and 180 degrees.

Each backbone degree of freedom can specify its own map. However, in most applications, each backbone residue will use the correct map for its type (proline, glycine, or alanine), and the grid spacing in all the maps will be the same. Therefore, default maps are typically specified for each type of amino acid, and the user can override these maps for individual residues. Fortran unit numbers for default maps for glycine, proline, and other amino acids are specified with the global variables; GLYMAP, PROMAP, and ALAMAP; respectively.

The particular values of torsion angles used for generating conformations is determined by these maps and the so-called EMAX options. The maps specify all the possible angles. The EMAX options restrict these sets as they specify the maximum allowed energy relative to the minimum energy value found in each map. The global options; GLYEMAX, ALAEMAX, and PROEMAX; specify the selection of the default backbone maps, see section Global Options for the CONGEN Command.. GLYEMAX specifies that for the glycine map; PROEMAX for the proline map; and ALAEMAX for the alanine map. The backbone degree of freedom option, EMAX, specifies the allowed energy for an individual backbone degree of freedom. For example, when using the alanine map with a sequence of alanines and a value of ALAEMAX of 2.0 kcal/mole, all the conformations generated will have phi, psi angles corresponding to only right-handed alpha-helices or beta-sheets. For a value around 5 kcal/mole, phi, psi angles for left-handed alpha-helices will also be selected. If values for the EMAX options are set to very large values then the entire phi, psi space will be sampled.

D amino acids are indicated by the presence of the word, "D", in the residue attributes. These residues are handled by inverting all torsion angles for the backbone maps and for the proline constructor files.

For more details on the commands which implement these degrees of freedom, see section Backbone Degree of Freedom, and section Chain Closure Degree of Freedom.

Overview of Sidechain Degree of Freedom

Given a set of backbone conformations, it remains to generate a set of side chain atom positions for each of the backbone conformations. Before we explore the problems inherent in side chain generation, we describe the side chain atom placement.

As with the backbone atom placement, the side chain atoms are positioned based on free torsion angles. The side chain torsions are processed from the backbone out as each succeeding atom requires the position of the previous atom for its placement. The sampling interval of each torsion (the option SGRID) can be either some fixed number of degrees or the period of the torsion energy. When the latter is used, the sidechain torsions will be at minima in the torsion angle potential involving the free atom and its antecedents. It is also possible to modify the sampling to avoid van der Waals contacts (VAVOID option). It is common for one free torsion to generate the position of more than one atom because of side chain branching, non-rotatable bonds, and rings. For example, although tryptophan has 11 side chain atoms to be placed, it has only two free torsion angles. Also, certain torsions have symmetry so we can reduce the sampling necessary. Finally, a search of the surrounding space is made for any constructed atom to see if there are any close contacts with a repulsive energy greater than MAXEVDW, and if so, that structure is eliminated.

Although this data structure was designed for amino acids, it can be applied to an arbitrary molecule. The only prerequisite is the presence of a few known atomic position upon which the remaining atoms can be constructed.

The information needed for side chain construction is stored in a side chain topology file, see section Sidechain Topology File.

Given these specifications for generating side chain atomic positions, we need to introduce a protocol that generates only a limited number of conformers. The procedure analogous to the backbone generation procedure would result in a series of nested iterations over each chi torsion angle with the number of levels being equal to the sum of the free torsions in all the side chains of the peptide segment. The large number of free torsions in the side chains and the absence of a connectivity constraint, such as exists for the backbone, result in an enormous number of possible sidechain conformations. Consequently, such a direct approach is not feasible except in limited cases.

However, the situation is not that bleak. First, the backbone construction process provides the position of CB which gives a strong bias to the side chain orientation. Thus, an acceptable course of action is the generation of only one sidechain conformation for each backbone conformation. We must strive to make this one conformation the lowest energy possible for the given backbone. Second, because the side chains close together in sequence frequently are not close together in space, and therefore, do not interact strongly, it is a reasonable approximation to treat the side chains quasi-independently. Instead of finding all combinations of side chain atomic positions, we can handle the side chains sequentially so the time required for side chain placement increases linearly, rather than exponentially, with the number of residues.

In order not to limit the options for using the program, six possible methods for generating side chain positions have been implemented. These are specified using the SIDEOPT option in the sidechain degree of freedom. All of these methods discard conformations which have any repulsive contacts exceeding MAXEVDW in van der Waals energy. The first two methods described, ALL and FIRST, assume no quasi-independence of the sidechains whereas the others do.

The first method, ALL, generates all possible conformations by a series of nested iterations over every sidechain as described above. The second method, FIRST, uses the same algorithm as ALL except that all the iterations terminate when the first conformation for all the sidechains has been found. This method is useful for determining if a backbone conformation will accommodate the sidechains when details about the sidechain energetics are not required.

The next three methods all depend on a function which evaluates the side chain positions as they are generated so that the best ones can be selected. "Best" is defined as the conformation whose evaluation function is numerically smallest. Two evaluation functions are currently provided, one based on positional deviations, and one based on the CHARMM energy function. The evaluation function based on positional deviations is present for testing CONGEN as it provides a means for determining the limit of CONGEN's ability to generate a known structure. If coordinates are present for the peptide gap, this evaluation function will determine the RMS shift between a generated side chain conformation and the initial coordinates. The second evaluation function computes the CHARMM energy of the sidechain atoms omitting the bond and bond angle energies because the generation procedure does not vary either of these two terms. At present, either the r dependent dielectric or the constant dielectric for the electrostatic energy is used. The other electrostatic calculations, see section Generation of Non-bonded Interactions, are not available.

The INDEPENDENT method assumes that the side chains in the peptide chain being generated do not interact with one another. The atoms of each side chain are placed independently, with those of the other side chains in the peptide being ignored; interactions with all other atoms in the system are included. The conformation which has the lowest value for the evaluation function is selected for each side chain. When the RMS evaluation function is used, this method gives the optimum conformation, though it may be sterically inappropriate. Thus, it cannot be used when the energy is the evaluation function unless the possibility of large repulsive van der Waals is not important.

The method, COMBINATION, begins by generating a small number of the best side chain conformations for each side chain independently, as above. Then, these side chain conformations are assembled in all possible combinations, and those combinations which do not have bad van der Waals contacts are accepted. The number of conformations saved for each side chain must be small to avoid a combinatorial explosion.

The ITERATIVE method starts with an energetically acceptable side chain conformation for all the side chains. This conformation is generated, if possible, using the FIRST method (see above). Starting with this conformation, we regenerate all the possible positions for the side chain atoms of the first residue, and select the conformation with the lowest energy. We also save the value of the evaluation function. This regeneration is done with all the other side chain atoms present so we can account for their effect. The process is repeated sequentially for the rest of sidechains in the gap. We then return to the first residue and go through the process again until the energies of the side chain atoms do not change or until the number of passes reaches an iteration limit. This method has the virtue that only one conformation is generated per backbone conformation, and it is an energetically reasonable one. However, if there are significant interactions between the sidechain atoms, the first part of the process will bias the iterative process toward the initial side chain arrangement selected, and we may miss the lowest energy side chain conformation.

The FIXED method is used to construct sidechains in a fixed conformation. When this method is specified, the program calculates the construction bond lengths, angles, and torsions for all atoms in the degree of freedom from the starting coordinates, and will generate just one sample using those values. If there are van der Waals overlaps, then no conformation will be generated.

With any of the methods described above, the CONGEN command can apply any of the minimization algorithms to the generated conformations before they are written out for further analysis. Minimization provides an ability to reduce the small van der Waals repulsions that are inevitable with coarse torsion grids used.

See section Sidechain Degree of Freedom, for more information on this degree of freedom.

Overview of Directed Searching

The process of conformational search can be viewed as a search of tree, where a tree means a graph with no cycles. See the following figure:

In this analysis, the root of the tree represents the system at the beginning of the search process, where no degrees of freedom have been sampled. The next level represents the search process after the first degree of freedom has been sampled. In the above figure, there are three nodes at this level which signifies that the good conformations were found. The third level in the tree represents the effects of sampling the second degree of freedom, and so on.

Each node in the tree represents the result of a particular set of samplings for the degrees of freedom. Leaf nodes are those nodes which exist at the very end of the tree, and represent complete conformations, where every degree of freedom has been completely sampled. Nodes in between the root and leaves are partial conformations. Any node which has been sampled is called an expanded node, and any node which has not been sampled is called an open node.

The goal of a conformational search is to find the lowest energy leaf node. An exhaustive search of the tree can be programmed easily by simply following down the leftmost branch of the tree until either a leaf is hit, or a conformation which is blocked because of bad contacts or other problems. The program then backs up to the previous level, and examines the next node over, and again searches down towards the leaves. Effectively, the search proceeds from left to right across the entire tree, and every leaf is eventually generated. This is a depth-first search. Because each degree of freedom increases the number the nodes by an approximately constant factor, the time for an exhaustive search is exponential in the number of degrees of freedom.

In principle, knowledge of the path through the search tree to the best leaf would make it theoretically possible to find the best leaf in time linear with the number of degrees of freedom. Such knowledge is generally not available in advance. However, in some situations, information about the partial conformations can provide a guide towards finding the best leaf. For example, if CONGEN is used to reconstruct the backbone of a protein from the alpha carbon coordinates, then the RMS deviation of the partial conformations from the known alpha carbon coordinates is an excellent guide to a good quality fit.

The use of information about partial conformations is the basis for the directed search options. When these options are used, energies or RMS deviations for conformations at each node in the tree are calculated. The program maintains a sorted index of the open nodes in the tree, and it samples degrees of freedom in an order that depends on the order of the energies or RMS deviations.

The evaluations of the open nodes can be affected by the use of the EINHERIT and EIMMEDIATE keywords for particular degrees of freedom. EINHERIT causes all nodes expanded for a particular degree of freedom to inherit the evaluation of their father. EIMMEDIATE sets the evaluation of nodes at this level to the smallest possible value which forces the program to expand nodes at this level ahead of all others. This is useful when you do not want the energies or RMS's of this degree of freedom to influence the directed search.

Three selection strategies exist at present. The first strategy, the evaluation strategy which is selected by the EVAL keyword, always selects the lowest energy or RMS deviation node. It has a serious drawback when a degree of freedom results in conformers whose energy or RMS deviation is raised. In that case, the program will first expand all nodes before that degree of freedom before moving on towards the leaves.

The second strategy, the deepening evaluation strategy which is selected by the DEVAL keyword, is intended to circumvent the drawback of the EVAL strategy. Here, CONGEN maintains a sorted index of open nodes for each degree of freedom. When the search begins, all the indices are empty except for the first degree of freedom which contains just the root node. The program then cycles through each degree of freedom, and selects the lowest energy or RMS deviation node at that level for expansion. If there are no open nodes available, or if the search reaches the leaves, then it cycles back to the first degree of freedom. This cycling forces the program to make progress toward the leaves.

The third strategy, the mixed method selected by the MIX keyword, is a combination of EVAL and DEVAL. The MIX strategy maintains both types of indices, and simply alternates between each rule for the selection of nodes to expand. So far, it is the best directed search strategy.

It is possible to use the directed search methods to generate random structures. The trick here is to use random numbers for the evaluation of nodes rather than energy values. There are two possible approaches to the generation of random structures, generating multiple conformations in a single CONGEN run or making multiple runs of CONGEN with each generating one conformation. In the first case, the EVAL search option generally leads to more variation in the structures, although there is substantial overlap of structures. In the second case, the DEVAL option should be used because it is faster for each run.

The directed search capabilities are still under active development, and suggestions for improvement are always welcome.

When the directed search methods are used, the program maintains a search tree in memory. Since each node requires several hundred bytes of memory, it is impractical to keep the entire search tree. Thus, there is a tree pruning mechanism which can be tailored to periodically eliminate nodes that are either unnecessary or unlikely to be sampled in a reasonable amount of time given the search strategy in use. See section Global Options for the CONGEN Command., for a description of the TREE options.

Parallel Processing

Because individual conformations in the search tree can be processed independently of one another, CONGEN can operate on different nodes simultaneously. This allows the program to use multiple CPU's to speed the progress of a search.

In order to minimize the interference between processors, CONGEN partitions its activities into three nearly disjoint operations. The most time consuming operation is the sampling of degrees of freedom. This operation takes one partial conformation, samples it, and generates new partial conformations. The second operation is the decision making one, deciding what conformations should be sampled next, and in addition, taking care of the search tree. The third operation is taking new conformations from the sampling operation and putting them into the search tree.

CONGEN maintains two queues in order to minimize the contention between these operations. The first is the work queue (cg_work_q) which stores conformations to be expanded. The second is the new node queue (cg_new_node_q) which holds new conformations for insertion into the search tree.

In order to obtain maximum parallelism, the work queue should be a large as possible. However, for directed searches, it is best to keep the tree up to date so that decisions about what to explore next can be made with the most information. Therefore, directed searches have a requirement to keep the work queue small. There are two variables which control the size of the queue, QMIN and QMAX. QMIN specifies target value for the minimum size of the work queue, and QMAX specifies the target value for the maximum size of the work queue. CONGEN automatically sets these variables depending on the type of search being performed, but they can be changed if you wish.

There is one significant pitfall with CONGEN running in parallel, non-determinism. Since one cannot predict when a processor will finish sampling a conformation, the order in which the search tree is traversed will vary from run to run. In the case of exhaustive searches, this doesn't matter because all runs will produce the same set of conformations, albeit in different orders.

However, in the case of a directed search, the decisions on which node to expand next is made based on the contents of the search tree at the time. Since the contents will vary from run to run, the searches will all be different. It is not known as yet if all the results will be consistent. In simple trials where the peptide backbone is reconstructed from the alpha carbons, consistent results were observed.

CONGEN Command Syntax

{ CONGEN } repeat(global-options) repeat(dof-commands)
{ CGEN   }

global-options ::=

    GLYMap unit-number
    ALAMap unit-number
    PROMap unit-number
    PROCons unit-number
    STUNIT unit-number
    [GLYEmax energy] default: 100
    [ALAEmax energy] default: 100
    [PROEmax energy] default: 100
    [ERINgpro energy] default: infinity
    [HBCG hbond-spec END]
    [NBCG nbond-spec END]
    [EIGNore EVDW]
    [COOR SAVE END]
    [MAXLeaf int]
    [MAXNode int]
    [DEBUg int]
    [NOSOPT]
    [QMIN int]
    [QMAX int]
    [SEED int]
    [EPCOns real] default: infinite
    [EJCOns real] default: infinite

            { DEPTH                              }
            { BREADTH                            }
            { RDEPTH                             }
    [SEARCH { { EVAL  }                          } END]
            { { DEVAl } [evaluation-option-word] }
            { { MIX   }                          }

    [TREE [PRTFRQ int] [LIMIT int] [REDUction real]
          [TOPSAVE int] [SDSAVE real] END]

    [STATUS [UNIT unit-number] [SETPRN] END]

    [CHECKPOINT [UNIT unit-number] [NODEfreq int] [TIMEfreq int]
                [[NO]FLUSH]
                END]

    [RESTart UNIT unit-number END]

    [PBE [NEWB] [SOLV [VACUum real]] [ONLY] END]
         [FIXB]


                { BACK backbone-options          }
                { CHAIn chain-closure-options    }
                { SIDE sidechain-options         }
dof-command ::= { RBESt rbest-options            } [dof_options] del
                { WRITe write-coordinate-options }
                { STATus status-options          }
                { EVLuate evaluate-options       }

dof_options ::= [MAXNode int] [[NO]EINHERIT] [[NO]EIMMEDIATE]

backbone-options ::=
    STARtres segid resid
    [LASTres segid resid]
    [MAXEvdw real]  default:100
    [CISTrans]
    [ALLCistrans]
    [FORWard]
    [REVErse]
    [GRID real]     default:30
    [NOTERSymmetry]
    [CLSA segid resid iupac]
    [CLSD [DELTa] real]
    [MAXDt real]
    [MAP unit]
    [EMAX real]
    [FIX]

chain-closure-options ::=
    STARtres segid resid
    [MAXDt real]  default:5
    [MAXG real]  default:100
    [MAXEvdw]    default:100
    [CISTrans]
    [ALLCistrans]

sidechain-options ::=
    repeat( STARtres segid resid [LASTres segid resid])
    repeat( CLUMP clump-spec )
    repeat( [MAXEvdw real] )  default:100
    repeat( [SGRId { real                      } ] )
            [      { MIN                       } ]
            [      { AUTO                      } ]
            [      { SELECT repeat({real}) END } ]
            [      {               {MIN }      } ]
    repeat( { VAVOID   } )
            { NOVAVOID }
    repeat( { SYMMetry   } )
            { NOSYmmetry }
    [SIDEOPT sidechain-option-word]
    [NCOMb number-of-combinations]
    [EVAL evaluation-option-word]
    [MAXSideiter integer]

                   { ALL           }
    clump-spec ::= { word          }
                   { [word]:[word] }

                              { FIRst       }
                              { INdependent }
    sidechain-option-word ::= { All         }
                              { Combination }
                              { ITerative   }
                              { FIXed       }

                               { Energy }
    evaluation-option-word ::= { RMs    }
                               { Wrms   }
                               { RAndom }

rbest-options ::=
    UNIT unit
    [NBESt int    ]  Default: 100
    [MAXEvdw real ]

write-coordinates-options ::=
    CUNIt unit
    [MAXCOnf integer]  default: 2**30
    [REF COMP]
    [CUT real]
    [MINCUt real]
    [MAXCut real]
    [FILTER]

evaluate-options ::=
    [MINI minimization-commands END]
    [RMS                           ]
    [USER                          ]
    [RANDOM                        ]

    [NPRInt integer]  default 0
    [CUT real]
    [MINCut real]
    [MAXCut real]
    [FILTER]

Notes:

  1. Global options may be placed anywhere outside of degrees of freedom.
  2. The order for degrees of freedom is the order of the search.
  3. A title MUST follow the command.
  4. CGEN is a synonym for the command, CONGEN.

CONGEN Input and Output Files

CONGEN has a large requirement for I/O. All of the files are specified via unit numbers in the CONGEN command, so OPEN statements are generally needed to set these up. See section Open File Command -- OPEN, for more information.

Torsion Angle Maps

The torsion angle maps are simple formatted files containing a set of record giving the energy for each possible value of omega, phi, and psi. The format of the file is as follows:

title
count                           (I5)
omega,phi,psi,etrans,ecis       (3F10.0,2F15.0)

where the count is the number of entries in the file. The omega, phi, and psi torsion angles are in degrees, and the energies, etrans and ecis, are in kcal/mol. Torsion angles are selected using the minimum of etrans and ecis.

A number of standard maps are available in `CGDATA:'. The files are named as `EMAPresnn.OMP' where `res' is one of `ALA', `GLY', or `PRO'; and `nn' is one of `30', `15', `10', `5', or `MN'. The numeric values are used for files which contain uniform grids over the torsion angles. MN stands for a minimal map which has met with only limited utility in our experience (see the Biopolymers paper for details). When using the `MN' files, be sure to specify the EMAX variables (see section Global Options for the CONGEN Command.) all to zero. The energy values in the ALA and GLY maps are computed for alanine and glycine dipeptides, and consist of just the van der Waals energy for those dipeptides. The trans value (etrans) is computed for the second peptide being trans; the cis value (ecis) is computer for the second peptide being cis. In the case of the proline maps, the energy in the file is the sum of the van der Waals energy and the internal energy of the ring as taken from the proline constructor file, see section Proline Constructor File.

The `makefile' in `$CGP/emap' can be used to generate these files. See section emap -- Backbone Maps and Proline Constructors, for more information. The files are read by the subroutine, RDEMAP.

Proline Constructor File

The construction of prolines is more involved than the other amino residues because of the ring. The approach used in CONGEN is to precompute a table of proline geometries based on the particular value of the phi torsion angle. When a ring with a particular phi angle is required, the table is linearly interpolated to find the geometry to use.

The geometries are determined using a model of just the ring with two terminating methyl groups, specifically, 1,2-dimethyl pyrrolidine. The geometry and energies for particular phi angles are computed by constructing the ring with the given phi angle, setting a torsion angle constraint (see section Holding Dihedrals Near Selected Values), with a high force constant. Then, the ring is minimized, and the actual value of phi, and the bond lengths, angles, and torsions needed to construct CB, CG, and CD are used. In addition, the energy of system e, the van der Waals energy evdw, and constraint energy ec are stored in the table.

This table is stored in the proline constructor file and has the following format:

title
count                                          (I5)
phi,e,evdw,ec,(bond(i),theta(i),phi(i),i=1,3)  (13F10.5)

where count is the number of table entries. phi is the phi backbone torsion angle after minimization. e, evdw, and ec are given above. The triples of bonds, angles, and torsions given the construction data needed to construct CB, CG, and CD, using internal coordinates, C-N-CA-CB, N-CA-CB-CG, CA-CB-CG-CD, respectively.

A proline constructor file good for general use is stored in `CGDATA:PRO.CNS'. See section emap -- Backbone Maps and Proline Constructors, for a description of its construction. The subroutine, RDPROCONS, is used this file.

Sidechain Topology File

The sidechain topology file describes how sidechain atoms are to be constructed. It is assumed that backbone atoms N, CA, and CB have been constructed prior to sidechain construction. The construction of sidechain is broken into clumps of atoms, where a single free torsion angle determines all the atoms in the clump.

The sidechain topology file may be used for limited conformational searches of any molecule. As long as one has a core of atoms whose positions are either known or easily constructable, one can define a description of how to construct the remaining atoms while sampling over all the rotatable torsion angles.

The sidechain topology file is a free format file containing a series of commands which fill the sidechain topology data structure used by CONGEN command. This data structure consists of a set of residues where each residue is made up of a set of free torsions (called clumps). Each clump has a rotational symmetry number associated with it as well as an option identifier for the clump. The identifier is useful when one is searching over a subset of clumps.

The clumps consist of a set of atom specifications. Each atom specification has the IUPAC names of four atoms arranged to form a constructor as used in the internal coordinates (see section The Internal Coordinate Commands), and the bond length, bond angle, and torsion angle used in that constructor.

There are two sidechain topology files available, `CGDATA:TOPCGEN3.INP', for explicit hydrogen topology files, and `CGDATA:TOPALLHCG.INP', for all hydrogen topology files. The subroutine, STREAD, is used to read this file.

The commands read in the sidechain topology file follow closely to the data structure. The commands are as follows:

RESIDUE command

Syntax

RESIdue res [SPECIAL]

Function

The residue command specifies the start of a new residue specification. The residue name given is matched against the residue names in the PSF (see section Data Structures), when a sidechain degree of freedom is processed. The SPECIAL option is used to signify a sidechain that requires special processing, but presently, it is not used by the program. A RESIDUE command must be specified before any clumps.

CLUMP command

Syntax

CLUMP int [NAME word]

Function

The CLUMP command specifies the start of a new clump within a residue. A clump command must precede any atom specifications.

The integer operand specifies the rotational symmetry of the clump. E.g., a symmetry of 1 means the full 360 degree range of the torsion must be sampled; a symmetry of 3 means only 120 degrees must be sampled. The optional name is used to identify the clump. In the absence of a specification, the name is taken to be the number of the clump within the residue.

ATOM command

Syntax

                                       { FREE     }
ATOM iupac iupac iupac iupac real real { real     }
                                       { ADD real }

Function

The ATOM command specifies how a particular atom is constructed. The first four operands give the IUPAC name for the atom. Prefixes may be used with this IUPAC names, and they are interpreted in the same way as residue topology file names are, see section Linkage Atom Naming. An IUPAC name of N is processed specially, to wit, if it is not found within the residue, then NT is searched for.

The next two operands specify the bond length in Angstroms and the bond angle in degrees. If specified as 0, then CONGEN will search the parameter file (see section Data Structures), for the equilibrium values. In the files provided with CONGEN, the rings (i.e. His, Phe, Trp and Tyr) have the bond lengths and angles explicitly specified; their values are determined using energy minimization with constraints to ensure planarity and symmetry. (The CONGEN parameters for bond lengths and angles are not perfectly consistent for these rings, which result in small deviations from symmetry and higher energies.)

The torsion angle is specified in the final parameter as either FREE, meaning that it is the degree of freedom for the clump; ADD real, when the real number is added to the value of the degree of freedom (the FREE value); or just a number which is used directly as the torsion angle. All angles are in degrees.

The order of atom constructors is important in that the first three atoms in any constructor must be present when the fourth atom is built. Thus, the order must reflect construction from the backbone out.

PRINT command

Syntax

PRINT { ON  }
      { OFF }

Function

This command controls whether the topology file commands are printed as they are read. The default is no print.

COPY command

Syntax

COPY { res } [INVERT]

Function

This command will copy the information about one residue into the description of a second residue. The INVERT option will cause all the torsion angles specified in the non-free atom constructor to be negated. This results in an inversion of chirality for the residue, and is currently used for constructing D amino acids. The COPY command is useful when two residues have either similar or identical sidechains, and one wishes to avoid the problems with duplicating data that should be the same. The effect of the COPY command is the same as if the commands for the copied residue were inserted when the COPY command is specified.

END command

Syntax

END

Function

This command terminates processing of sidechain topology commands.

Conformations File

The conformations file, which is written by the Write Degree of Freedom (see section Write Conformations Degree of Freedom), stores the list of conformations generated by the CONGEN command. This file is read and written by subroutines in the file, `CGS:CGIO.FLX'. The file is unformatted, has a typical extension of `.CG', and has the following format:

HDR,ICNTRL(20)     ! ICNTRL(1)=NRES, ICNTRL(2)=NATOM, ICNTRL(3)=NCONS
NTITLE,TITLE
IBASE              ! Taken from PSF
RES                ! Taken from PSF
TYPE               ! Taken from PSF
CONSP              ! List of atoms constructed.
! The following record is repeated for each conformation plus one more
! at the beginning of the file for the initial coordinates.
NTITL,((TITLE(I,J),I=1,10),J=1,MIN(10,NTITL)),
      (X(CONSP(I)),Y(CONSP(I)),Z(CONSP(I)),I=1,NCONS),TOTE

The first six records in the file provide enough information to restore coordinates from a conformation in a structure, and they also allow the program, CMPLOOP, see section Pointers to Relevant Programs, to differentiate between backbone and sidechains. The title record is taken from the title specified after the CONGEN command.

The remaining records hold conformations. The first conformation stored is not a real conformation, rather it is the coordinates of the constructed atoms prior to the beginning of the search, the so-called reference set. The variable, TOTE, stores the value returned by the last EVL degree of freedom (see section Evaluate Degree of Freedom), or if no EVL was executed, then the total energy for the sidechains as determined in the Sidechain Degree of Freedom, (see section Sidechain Degree of Freedom), or 0.0 otherwise.

Status File

The status file is a formatted file that reports the time of day when the file was written, the number of times the end of the search was reached, and statistics for open nodes at each level of the search.

Checkpoint File

The checkpoint file is an unformatted file which stores the complete state of a search. All search tree nodes are stored as well as some global information about the search, and various local variables used to control the process. The checkpoint file is necessary for restarting a conformational search which has terminated for either planned or unplanned reasons.

Global Options for the CONGEN Command.

The global options control aspects of the entire searching process. See section CONGEN Command Syntax, for the syntax of the global options.

Search Method Options

The conformational search method specifies how CONGEN explores the conformational space dictated by the degrees of freedom. The SEARCH global option is used to specify which method is used. If no SEARCH option is specified, then the program uses the depth-first search method. The directed search methods are still under development, and are subject to more caveats than the rest of the program. See section Overview of Directed Searching, for more information.

If the DEPTH option is specified, the program does a depth first search of the conformational space. The program tries to go down each branch of the search tree until it is either blocked or reaches the leaves. It then backs up to the closest open node to continue its search. This method is the most space efficient, and should be used for shorter loops where a complete search is feasible.

The RDEPTH option is similar to the DEPTH option, except that the children of a node expansion are randomly permuted before being put into the search tree. Effectively, it randomizes the order of search, but is still exhaustive and space efficient.

The BREADTH option specifies the use of a breadth-first search. Here all open nodes for one degree of freedom are expanded before the next degree of freedom is processed. This method is the worst case search method from the space utilization aspect, but was included for completeness.

The EVAL, DEVAL, and MIX options all specify different types of directed searches. These searches can be directed by four different evaluation criteria; ENERGY, RMS, WRMS, and RANDOM. The ENERGY criteria is the calculation of the energy, and it is applied to all of the atoms constructed at the level of the node. The RMS criteria is the Root Mean Square deviation of the constructed atom coordinates with respect to the coordinates of those atoms at the start of the CONGEN conformational search command. The WRMS keyword is mnemonic for Worst RMS, and directs by the worst match rather than the best. The RANDOM criteria is just a random number between 0 and 1. It is most useful when random structures are to be generated using the directed search. See section Miscellaneous Global Options, for a description of the SEED keyword for setting the seed for the random number generator.

The EVAL option specifies the evaluation directed search. Here, the program always expands the node which has the lowest evaluation criteria.

The DEVAL option specifies the deepening evaluation strategy. Each degree of freedom is selected in turn, and the open node with the lowest evaluation at the level is selected for expansion. This method drives the search toward the leaves, but generally does not yield high quality results.

The MIX option specifies the mixed strategy, which alternates between the EVAL and DEVAL methods. Currently, it is the best directed search strategy.

The options, EINHERIT and EIMMEDIATE, control evaluation and expansion for specific degrees of freedom during a directed search. These options can be specified for any degree of freedom. The option, EINHERIT, specifies that the value of nodes generated for a degree of freedom inherit their values from their parents.

The option, EIMMEDIATE, specifies that nodes generated from a degree of freedom will be expanded immediately. This is done by setting the value of the node to the largest negative number. The effect is to bypass the degree of freedom from the selection process. This option is very useful in energy directed searches where one wants to direct the search based on backbone and sidechain energies together. In such cases, the EIMMEDIATE option is specified for all backbone and chain closure degrees of freedom. See section Energy Directed Search Example, for a nice example.

With the directed search strategies, tree pruning is essential. See section Search Tree Pruning Options, for a description of this process and how it can be controlled.

The QMIN and QMAX options control the size of the work queue when parallel processing is used. The default values for this variables depends upon the type of search. For an exhaustive search, they are set to 10 and 40 times the number of CPU's in use, respectively. For a directed search, they are both set to the number of CPU's. See section PARALLEL Command, for a description of how the number of CPU's can be set. QMIN must always be less than or equal to QMAX.

Conformational Search Limits

The MAXLEAF option controls how much of the search is executed. Each time CONGEN samples the last of the degrees of freedom, i.e., reaches the bottom of the search tree, the leaf count (variable LEAFNUM) is incremented. If MAXLEAF is specified, then the search is terminated when LEAFNUM equals or exceeds MAXLEAF.

The MAXNODE options controls how many search tree nodes are generated during a search. Once the number is exceeded, the search is stopped safely. MAXNODE can also be specified for each degree of freedom to limit the number of samples taken for any one partial conformation. This is useful when experimenting with the ALL sidechain option, see section Sidechain Degree of Freedom.

Maps and Construction Tables

The GLYMAP, PROMAP, and ALAMAP options specify the unit numbers for the default torsion angle maps (see section Torsion Angle Maps), for glycine, proline, and all other amino acids, respectively. Maps for individual amino acids can be selected specifically using the MAP and EMAX keywords for the backbone degree of freedom, see section Backbone Degree of Freedom, for more information.

The GLYEMAX, PROEMAX, and ALAEMAX options specify the how much of the torsion angle maps are used for glycine, proline, and all other amino acids, respectively. When each map is read in, the minimum energy is determined. Then all entries which have energy within EMAX of this minimum are marked.

The PROCONS option specifies the unit number for the proline constructor file, see section Proline Constructor File. The ERINGPRO option specifies which constructors are used by energy. When the procons file is read, the minimum of the energy of ring conformations minus the constraint energy used to the hold the ring is calculated. All constructors whose energy difference is within ERINGPRO of the minimum difference are used for proline construction. Any constructors outside this range are ignored. If this option is omitted, then all constructors are used.

The STUNIT option specifies the unit number for the sidechain topology file. See section Sidechain Topology File, for more information.

Energy Calculation Options

The HBCG option specifies the hydrogen bond energy parameters to be used for this search. See section Syntax of the Hydrogen Bond Command, for more details. The NBCG option specified the non-bonded energy parameters to be used for this search. See section Generation of Non-bonded Interactions, for more details. Please note that only the ATOM and CONS electrostatic options are supported. In addition, the CONS option used for evaluating sidechain conformations is slightly different than used for other energy evaluations, in that the surrounding field is ignored.

The EIGNORE EVDW option directs the program to ignore the van der Waals term in the calculation of all energies.

Poisson-Boltzmann Options

By using the PBE option, it is possible to use electrostatic energies calculated using the Poisson-Boltzmann equation (PBE) in evaluating conformations. This code is still experimental, and is far from robust. In order to use it, you must first issue a PBE SETUP command, see section PBE SETUP Command which constructs the grids using all the atoms that will be used in the conformational search. Also, using this code is very slow because solving the Poisson-Boltzmann equation is expensive. If you can use parallel processing to speed execution, you should use the LOOPS option in the PARALLEL command, see section PARALLEL Command, in order to conserve memory.

Several possibilities for using the Poisson-Boltzmann equation are provided. By default, the Poisson-Boltzmann electrostatic energy substitutes for the Coulomb energy. If you specify the ONLY keyword, then the PBE energies substitute for the total energy. If you specify the SOLVATION keyword, then the electrostatic solvation energy is used. The electrostatic solvation energy is the difference between the Poisson-Boltzmann energy of the system with a solvent dielectric as specified in the PBE SETUP command and the Poisson-Boltzmann energy of the system with a solvent dielectric as specified in the VACUUM option. By default, the VACUUM dielectric is 1.0. The ONLY option can be specified with the SOLVATION option, and the result is that the electrostatic solvation energy substitutes for the total energy. The NEWB option specifies that the boundary be recalculated for each new conformation, whereas the FIXB option specifies the boundary be left untouched. NEWB is the default. An example of these options in use is given in Poisson-Boltzmann section, see section PBE Examples.

Search Tree Pruning Options

The TREE global option is used to control the pruning and display of the search tree. When directed searches are performed one large problems, see section Overview of Directed Searching, the search tree can grow very rapidly, and consume all available virtual memory. Generally, one is not interested in running these searches to completion, so removing open nodes that are not likely to be expanded is an appropriate step to conserve memory. Depending on the search strategy, the method used for pruning the tree will vary.

In the case of the evaluation directed search strategy, the tree pruning strategy is obvious, the nodes with the highest evaluations are deleted. In the deepening evaluation strategy, the tree pruning strategy is much less clear. Depending on the nature of the degree of freedom, the number of open nodes at each level may vary substantially. Only those levels with many nodes should be pruned. In addition, since the top level nodes are visited most often through the search, these levels should not be pruned. The specific strategy for the deepening evaluation strategy is as follows:

  1. The average and standard deviation of the number of open nodes per degree of freedom is calculated. Only those degrees of freedom that have more than one open node are used in the calculation.

  2. No pruning is done within TOPSAVE levels of the root.

  3. No pruning is done where the number of nodes on a level is less than or equal to the average number minus the standard deviation times the value of the SDSAVE option.

  4. All other levels are reduced in number by a factor of the REDUCTION option.

In the case of the mixed strategy, the program applies both pruning strategies and deletes only those nodes that satisfy both rules.

The meaning of each option is given below:

PRTFRQ
The printing frequency for printing a display of search tree in units of generated nodes. If the frequency is set to 0, then no printing will be done. The default is 0.

LIMIT
When the number of nodes in the search tree exceeds LIMIT, pruning is done. The default is 10000 for directed searches and 1000 for depth first and breadth first searches.

REDUCTION
The reduction factor for pruning. When evaluation directed searching is used, the number of nodes will be reduced by this factor. When deepening evaluation directed searching is used, those levels in the search tree that pass their tests will be reduced by this factor. When the mixed strategy is used, then only those nodes that would have been pruning by both strategies will be pruned. The default is 3.0.

TOPSAVE
The number of levels closest to the root node that are not pruned when the deepening evaluation directed search or the mixed strategy are used. The defaults is 1.

SDSAVE
When the deepening evaluation directed strategy is used, only those levels whose number exceeds the average minus SDSAVE times the standard deviation of the number distribution will be considered for pruning. The default value of SDSAVE option is 0.5.

Checkpoint and Restart Options

Since conformational searches can execute for long periods of time, it is necessary to be able to save the state of search and restart it at a later point in time. The CHECKPOINT and RESTART options provide this capability. The frequency of checkpoints are also used to specify the frequency of status writing operations. Finally, a checkpoint is written at the end of the run, so that a long run that is terminated by your options can be continued at greater length later.

The CHECKPOINT options are as follows:

UNIT
The UNIT options specifies what file unit the checkpoints should be written to. If this is omitted or given a value of -1, then no checkpoints are written. CONGEN handles the file attached to this unit in a special way. When a checkpoint is about to be written, CONGEN attempts to determine the name of the file associated with the unit. It then issues an operating system command to rename or move the file to a different name made up of the original name concatenated with `_bak'. On VMS systems, any previous `_bak' file is deleted first. Then, a new file with the original name is opened, and the checkpoint is written. Thus, one good checkpoint file is always maintained, and usually there are two.

NODEFREQ
The NODEFREQ option specifies how many nodes must be allocated between writing a checkpoint. The default is 100000.

TIMEFREQ
The TIMEFREQ option specifies the time in minutes between writing checkpoint files. Checkpoints are written when either NODEFREQ or TIMEFREQ tests are passed, and both the counters and timers are reset when the checkpoint is written.

[NO]FLUSH
This options controls when CONGEN periodically flushes the output buffers for the log file (Fortran unit 6) and the output files specified in the write coordinate degrees of freedom. This option is normally on, but it can be turned off using NOFLUSH.

To restart a CONGEN run, one should edit the CONGEN input file that used for the checkpoint file, and a RESTART option should be used. The unit number specified in the RESTART option should be a valid checkpoint from a previous run. It is permitted to change the options which control the operation of search such as MAXNODE and MAXLEAF, but the degrees of freedom must not be changed. Only limited testing of option changing has been performed after restarting, so be careful.

Status Display Options

The STATUS option specifies whether a status file is written periodically through the conformational search. The status file is written using the same file naming scheme as the checkpoint file described in section Checkpoint and Restart Options. The frequency of writing is also dictated by the CHECKPOINT option. The status file is written to the unit specified by UNIT option.

The SETPRN option directs the program to set the process name to indicate the progress of the search. This option works only for batch jobs on VMS systems. The process name is set to the number of leaves found thus far followed by the minimum evaluation encountered thus far during the search.

Miscellaneous Global Options

After a conformational search is complete, the coordinates are left in the same state as they were before the search began. The COOR SAVE END option is ignored.

The DEBUG option controls the setting of the CGEN debug variable. The default value of 0 prints no debugging information. Larger values print more information. A debug value of 2 is good for getting an idea of why a search failed without generating too much output. The maximum value of 5 will fill a disk as fast as the computer can write to it. See section Set Debugging Variables -- DEBUG, for more details.

The NOSOPT option will turn off an optimization in the sidechain FIRST placement method. See section Sidechain Degree of Freedom, for more details. This option is used only for debugging.

The SEED option sets the seed for the random number generator used by RANDOM evaluation option, see section Search Method Options. When CONGEN runs in parallel, each process gets a copy of this seed. The seeds are not saved in the restart file, so a run interrupted and restarted will not generate the same random structures.

The options, EPCONS and EJCONS, are used to apply dihedral angle constraints, see section Holding Dihedrals Near Selected Values, or NMR J coupling constraints, see section NMR Constraints, in the generation of atomic positions. This functionality is experimental, and requires some care in its application. Both options specify the maximum allowed energy for constructing an atom which is either the first or last atom in the specification of a dihedral angle or J coupling constraint, and whose antecedent atoms are also in the specification of the constraint. No dihedral angle constraints are applied if the J coupling constraint derives from an average over multiple conformations or if the J's are joined. For example, if you have a J coupling constraint for a backbone phi torsion angle, then the construction of the backbone carbonyl carbon at the end of that constraint may be affected. The option, EPCONS, is used to select the dihedral angle constraints. The option, EJCONS, is used for the J coupling constraints. Both of these options specify energies in kcal/mole. If these values are very large, then they will not have any affect on the conformational search. At the present time, these constraints are not coupled with van der Waals avoidance when sidechains are constructed, see section Overview of Sidechain Degree of Freedom. If there is sufficient interest, they can be.

Degrees of Freedom

There are six "degrees of freedom" currently available in CONGEN. Three; backbone, chain closure, and side chain; are involved with atom construction; two; RBEST and WRITE; are involved with I/O; and the last, EVL, is involved with evaluation of conformations.

Backbone Degree of Freedom

The backbone degree of freedom constructs the backbone atoms of a polypeptide. For our purposes, the backbone atoms are N, H, CA, CB, C, and O and the ring atoms in proline.

This degree of freedom is implemented like a macro in that the program treats the backbone of each residue as a separate degree of freedom. Thus, when this command is used to generate conformations for n backbones, n backbone degrees of freedom are generated internally.

The STARTRES option specifies the starting residue for which a backbone is to be constructed. The LASTRES option specifies the final residue for construction. Together with the FORWARD and REVERSE options, these options also specify which way the construction should be done, i.e. constructing from N to C or C to N. The following decision table gives the order used for linear structures:

                       FORWARD       REVERSE       NONE
     STARTRES<LASTRES   N->C          C->N         N->C
     STARTRES=LASTRES   N->C          C->N         N->C
     STARTRES>LASTRES   N->C          C->N         C->N

The following decision table gives the order used for cyclic structures:

                       FORWARD       REVERSE       NONE
     STARTRES<LASTRES   N->C          C->N     Use shortest distance.
     STARTRES=LASTRES   N->C          C->N     Forward = true
     STARTRES>LASTRES   N->C          C->N     Use shortest distance.

The range of residues cannot span between two segments of the PSF.

The MAP option specifies the Fortran unit number for the backbone energy map, see section Torsion Angle Maps, to be used. If no map is specified, then the default map unit is used. The defaults are specified by the PROMAP, GLYMAP, and ALAMAP global options, see section Overview of the Backbone and Chain Closure. The EMAX option specifies the maximum energy of entries in the map for the range of residues. If no EMAX option, then the global EMAX option appropriate for the given residue applies. You can use this option to override the default EMAX option for a particular backbone specification.

The MAXEVDW option specifies the maximum energy for any repulsive contact between any generated atom and atoms in the surroundings.

The CISTRANS and ALLCISTRANS options control whether cis peptides are included in the search. By default only trans peptides are used. CISTRANS specifies that only prolines may have cis peptides. ALLCISTRANS specifies that any amino acid may have cis peptides. In this context, CISTRANS and ALLCISTRANS applied to residue n refer to the peptide bond between residue n-1 and n. N.B. If no cis peptides are included in the torsion angle map, then this option has no effect. Generally, all the peptide maps defined over a grid have both cis and trans peptides, but you should check if this option is important to your problem.

The CLSA, CLSD, and MAXDT options control an important optimization in the search. Since this degree of freedom is generally used in conjunction with a chain closure, there is no point constructing backbones that stray too far for closure to take place. The CLSA option specifies the atom which will terminate the chain closure. For a construction in the N->C direction, this atom should be a CA; for the opposite direction, this atom should be an N. The program will construct a model backbone, all torsion angles being trans, and all bond angles involved in chain closures being stretched by MAXDT degrees (see below for more details) which would connect from the residue whose backbone is being searched to that CLSA atom, and it will measure this distance. If a backbone conformation is constructed with a separation distance greater than the CLSA distance, that search path will be abandoned. CLSD permits the closing distance to be modified. If just a number is specified with CLSD, then the program uses this value in place of the model backbone calculation. If the keyword, DELTA, is also used, then the real number is added to the model distance.

The construction of the model peptide attempts to correctly account for the flexibility permitted in peptide bond angles. If MAXDT is specified, then all bond angles in the model calculation are stretched by this amount. If MAXDT is not specified in the backbone command, then CONGEN searches through all the degrees of freedom to see which involve chain closure. If chain closures are found, then the associated bond angles are stretched accordingly. If the search involves no chain closures, then the default value for MAXDT will be used to stretch all bond angles.

The backbone degree of freedom also handles backbones at the N and C termini. The N terminus and C terminus cannot use the standard torsion angle maps because there are fewer atoms than assumed in the construction of the maps and because of the rotational symmetry present at each end. Thus, the CONGEN command will perform a complete sampling for each end using only the MAXEVDW test. The grid for this sampling will be set to value of the GRID option. Also, the rotational symmetry of the terminus will be used to reduce the search unless NOTERSYMMETRY is specified.

N.B., there is currently an error in the design of the code with regard to the construction of prolines at the amino terminus when using the AMBER94 potential. The problem manifests itself in geometric errors in the amino terminal nitrogen. Until this problem is fixed, you should avoid such constructions.

The FIX option is used to specify that the backbone should be constructed in a single fixed conformation without searching. The bond lengths, angles, and torsions for all the atoms in the degree of freedom are calculated from the current coordinates. The normal test of van der Waals overlap applies.

Chain Closure Degree of Freedom

The chain closure degree of freedom calls the modified Go and Scheraga chain closure procedure. This procedure find atomic positions for the backbone atoms starting with the residue given by the STARTRES option and continuing to residue STARTRES+2. In residue STARTRES, the position of the N must be known; in residue STARTRES+2, the position of CA, C, and O must be known. In addition, STARTRES cannot be the first residue in a protein chain, although STARTRES+2 can be the C terminal residue.

The MAXDT option specifies (in degrees) how much variation in peptide bond angles is permitted for a chain closure. The default value of 5 degrees should be adequate for most applications.

The MAXG option controls how bond angle adjustments are made. The default value gives good results. See the source code file, `clschn.flx', for more details.

MAXEVDW specifies the largest energy permitted for a repulsive contact for any atom generated in the chain closure. Its value is specified in degrees.

CISTRANS controls whether the chain closure routine will generate cis prolines for any position within the three residues being constructed. ALLCISTRANS controls whether cis prolines can be generated for any residue, proline or otherwise.

Sidechain Degree of Freedom

The sidechain degree of freedom is the most complex. There are presently six different ways of constructing sidechains. See section Overview of Sidechain Degree of Freedom, for more information about the sidechain degree of freedom.

Unlike the other degrees of freedom, many of the options are associated with particular residues. I.e., they are position dependent. These position dependent options are MAXEVDW, SGRID, VAVOID, CLUMP, and NOSYMMETRY. The program will first scan the command string for the first of these options, and it will use that value as the default for all the residues. Then, starting with this default, it will scan for options and residues (STARTRES and LASTRES options below). Each time it finds an option specification, the current value is changed. Each time it finds a residue, the option settings for that residue will be set to the current value.

The residues whose sidechains are to be constructed are specified with an arbitrary number of STARTRES and LASTRES commands. Each LASTRES command is paired with the previous STARTRES command, and all residues within the pair are constructed. However, a STARTRES command need not have a LASTRES pair, and in that case, only the residue specified in the STARTRES command will be processed. E.g., STARTRES H 30 STARTRES H 10 LASTRES H 12 STARTRES H 25 will search the sidechain conformational space for residues; H 10, H 11, H 12, H 25, and H 30. If not STARTRES command is given, then the residue specified in the last BACKBONE degree of freedom will be used.

The MAXEVDW option specifies the maximum repulsive contact for any atom within the sidechain. This option is position dependent as described above.

VAVOID and NOVAVOID control whether van der Waals repulsion avoidance is used. VAVOID signifies that avoidance is to be done. Van der Waals repulsion avoidance results in the program using only those torsion angles which avoid any van der Waals contact greater than MAXEVDW. Every atom that can make contact with the atoms in a clump are checked. Any sampling of the sidechain grid that falls into a repulsive range is moved to the closest torsion angle within an acceptable range.

SGRID controls the sampling grid. If a numeric value is specified, the value is taken for the grid in degrees. The first value for the torsion angle is taken from the global minimum of the torsion angle potential for the free atom and its antecedents. If there are multiple minima, then the smallest value in the range is taken. If MIN is specified, then the grid is set to the local minima in the energy of the free torsion angle of the clump. If SELECT is specified, then you can specify the torsion angle grid as a function of the number of free torsions in the sidechain. Each number or MIN keyword specifies the sidechain grid to use starting with sidechains with one clump on up. The last value applies to all the larger sidechains. The keyword, AUTO, is equivalent to "SELECT 10 30 30 60 60 MIN END".

The CLUMP option allows the user to specify a subset of clumps for succeeding residues. This is useful when one wishes to search over only a portion of a sidechain or a portion of an arbitrary molecule. The syntax of this option is intended to provide a simple means for specifying a range of clumps. If the option, ALL, is used, then succeeding residues will include all clumps. If a single word is given, then only clumps named by that word will be included. If the colon form of the option is used, then only clumps which are sequentially between the two names in the sidechain topology file entry for the residues will be used. If a word is omitted on either side of the colon, then either the first or last clump of each residue will be used as the default, respectively. A solitary colon is thus the same as the ALL option. If you want to use disjoint subsets of clumps within one residue, simply specify two sets of CLUMP and STARTRES commands.

For example, suppose one wished to search over the gamma carbons and gamma 2 hydrogens of a valine alone. These are currently listed as clumps 1 and 3 in the valine residue in the sidechain topology file. Assume that the valine is in segment MAIN and has residue identifier 46. Then the following command segment would be appropriate:

SIDE CLUMP 1 STARTRES MAIN 46 -
     CLUMP 3 STARTRES MAIN 46 ...

SYMMETRY and NOSYMMETRY controls whether clump symmetry is used. Normally, the rotational symmetry associated with a clump (see section Sidechain Topology File), reduces the search space. However, the elimination of this symmetry is occasionally desirable because the comparison command of the analysis facility (see section Comparisons) does not recognize symmetric elements in residues and can generate artifactually large differences. The effect of NOSYMMETRY is make all clumps have symmetry of 1.

SIDEOPT specifies the sidechain construction method. See section Sidechain Degree of Freedom, for more details. The default is FIRST.

There is an important optimization which is performed when the FIRST sidechain construction method is used. (It is also important for the ITERATIVE method too, because this method uses a conformation generated by FIRST to get started.) If the FIRST method fails to find a conformation for a particular residue, it will backtrack in the set of residues being constructed to the first residue which made contact with the failed residue. If no such residue exists, then the search fails. In order to make this work in the optimal way, you should group residues whose sidechain interact together when specifying their order. In addition, the global option, NOSOPT, will turn off this optimization.

NCOMB specifies the number of combinations to be used for the COMBINATION method. The default is 2.

EVAL specifies how the sidechain conformations are to be evaluated when the ITERATIVE, INDEPENDENT, and COMBINATION options are used. For the RMS and WRMS options to work, the coordinates stored in the program prior to the execution of the CONGEN command must be the reference set. The WRMS option means Worst RMS.

MAXSIDEITER controls how many iterations over all the residue are performed during an iterative sidechain construction. The default is 10.

Read Best Conformations Degree of Freedom

The Read Best Conformations degree of freedom reads a conformation file (`CG' file) produced by another run of the CONGEN command, and selects the best conformations from it. Each conformation selected is considered one sampling for this degree of freedom. It is potentially useful when a complete search would take too long to complete. In such a case, a search over part of the atoms would be written out, and then a low energy subset would be read back for use.

The UNIT option specifies the Fortran file unit where the input conformation file is read. NBEST specifies how many conformations are read back in the following way: The file is first scanned and the energy of each conformation is recorded in an array. The array is sorted and then, the NBESTth lowest energy is selected. During the actual sampling process, the conformation file is scanned again, and any conformation having an energy less than or equal to the selected energy is used. Thus, if several conformations have the same energy, it is possible for more than NBEST conformations to be read in.

MAXEVDW specifies the maximum van der Waals repulsion for any atom read by this degree of freedom. If the parameter file specifies that hydrogen bonds should exclude the non-bonded energy, see section Parameter File Format, then the hydrogen bond potential will be used to calculate the distance corresponding to MAXEVDW for interactions involving hydrogen bondable atoms.

Write Conformations Degree of Freedom

The Write Conformations Degree of Freedom writes the coordinates of all atoms constructed in the degrees of freedom which precede this command. See section Conformations File, for a description of the output file format. Note that the last energy evaluation done by either the EVL degree of freedom (see section Evaluate Degree of Freedom), or the read best degree of freedom (see section Read Best Conformations Degree of Freedom), is written with the conformation.

The CUNIT operand specifies the Fortran unit number where the file is to be written. Note that this file must be opened UNFORMATTED.

The MAXCONF option can be used to truncate the search. Once MAXCONF conformations have been written to the file, the search is stopped.

The REF COMP option directs the program to write the coordinates in the comparison coordinate set. See section The Coordinate Manipulation Commands for more information.

The three cutoff options; CUT, MINCUT MAXCUT; are used to limit the number of conformations written to the file. If neither option is specified, then all conformations are written to the output file. If CUT is set, then only those conformations whose last evaluation was less than or equal to CUT will be written. If MINCUT is set, then only those conformations whose last evaluation is within MINCUT of the lowest evaluation seen so far will be written. If MAXCUT is set, then only those conformations whose last evaluation is within MAXCUT of the highest evaluation seen so far will be written. These last two tests are an incomplete, but hopefully useful way of limiting the output of CONGEN to conformations within the cutoffs of the extreme energy conformation. If multiple options are specified, then the conformation is written if any test is satisfied.

The FILTER option is used in conjunction with cutoff options to stop conformers for being passed down to succeeding degrees of freedom. When the FILTER option is absent, then all conformers pass to succeeding levels. When the FILTER option is present, then only those conformers which are printed are passed to the next degree of freedom.

Evaluate Degree of Freedom

The evaluate degree of freedom evaluates the conformation generated up to this point in the search. N.B., the evaluate degree of freedom is specified using the EVL keyword. The letter, A is intentionally omitted. In addition, it has the capability of running any energy commands from the main program. These degrees of freedom keep running totals of the minimum and maximum values found, and they are printed at the end of normal execution. In addition, the evaluations are saved for writing to the conformation file if a WRITE degree of freedom follows the evaluation. If no options for this degree of freedom are specified, then the energy is used.

The MINI evaluation option call the energy minimization and dynamics command (see section Minimization and Dynamics). Prior to call this option, CONGEN saves the coordinates of the generated coordinates and will reset them when the degree of freedom passes control to the previous degree of freedom. Any modifications to the coordinates are passed to the next degree of freedom. In addition, the coordinates of the surrounding atoms are fixed, (see section Fixing Atoms in Place).

The RMS evaluation option computes the RMS difference between the current coordinates and the coordinates when the CONGEN command was invoked.

The RANDOM evaluation option assigns a random number as the evaluation of this conformer. In conjunction with the CUT, MINCUT, MAXCUT, and FILTER options below, it can be used to select a number of conformers at random for succeeding constructions.

The USER evaluation option calls the user defined evaluation. This option is not well developed, see the source code files cgen.c and usersb.flx for more information.

The NPRINT option is used to limit evaluation listings to every NPRINTth evaluation. If NPRINT is set to zero, then all evaluations are listed. At present, NPRINT does not turn off the output of ECNTRL when a minimization is done on the conformations.

The three cutoff options; CUT, MINCUT MAXCUT; are used to limit the number of conformations written to the file. See section Write Conformations Degree of Freedom for a description of how these options work to limit energy outputs into the log file.

The FILTER option is used in conjunction with cutoff options to stop conformers for being passed down to succeeding degrees of freedom. When the FILTER option is absent, then all conformers pass to succeeding levels. When the FILTER option is present, then only those conformers which are printed are passed to the next degree of freedom.

NPRINT and the cutoff interact as follows: The NPRINT check is made first. If the conformation's energy is to be printed, the cutoff check is then made.

Examples of CONGEN Commands

There are many examples of CONGEN commands in the directory, `CGT:'. Most of the test cases begin with the letters, CG. Here, we give two examples of using CONGEN; a search over five residues, and a reconstruction of all sidechains in a protein.

Five Residue Search

In this example, we search over the conformational space of residues 127 to 131 in flavodoxin. The search is done using backbone degrees of freedom over residue 127 and 128, with a chain closure over residues 129 to 131. We do searches to find both the minimum energy conformation and minimum RMS deviation to the X-ray crystal structure.

Construction of helix segment 127-131 in flavodoxin.
In this run, we find both the theoretical lower limit and the
lowest energy conformations.
*
OPEN NAME CGDATA:RTOPH8.MOD UNIT 01 READ UNFORM
READ      RTF UNIT 1
OPEN NAME CGDATA:PARAM5.MOD UNIT 03 READ UNFORM
READ      PARAMETER UNIT    3
OPEN UNIT 10 NAME CGTD:FLVDOXPSF.MOD UNFORM READ
READ PSF FILE UNIT 10
OPEN UNIT 11 NAME CGTD:FLVDOX.MOD UNFORM READ
READ COOR FILE UNIT 11
COOR COPY COMP
OPEN UNIT 60 NAME 127FLVRMS.CG UNFORM WRITE             ! Conformation file
OPEN UNIT 70 NAME CGDATA:TOPCGEN2.INP FORM READ         ! Sidechain topology
OPEN UNIT 51 NAME CGDATA:EMAPGLY30.OMP FORM READ        ! Glycine torsion map
OPEN UNIT 52 NAME CGDATA:EMAPALA30.OMP FORM READ        ! Alanine torsion map
OPEN UNIT 53 NAME CGDATA:EMAPPRO30.OMP FORM READ        ! Proline torsion map
OPEN UNIT 55 NAME CGDATA:PRO.CNS FORM READ              ! Proline constructor
OPEN UNIT 40 NAME 127FLV.STS FORM WRITE                 ! Status file
!
!   Run the search now.
!
CGEN -
STATUS SETPRN UNIT 40 END -
HBCG CUTHB 4.5 CUTHBA 90.0 -
     CTONHA 98.0 CTOFHA 99.0 CTONHB 98.0 CTOFHB 99.0 END -
- !   Evaluate energy using constant dielectric of 50, distance cutoff = 5 A
NBCG CUTNB 5.0 ATOM CTONNB 98.0 CTOFNB 99.0 END -
- !
- !   The following Backbone degree of freedom contains CLSA optimizations to
- !   the correct terminators for the search.
- !
BACK MAXEVDW 20 STARTRES 1 127 LASTRES 1 128 CISTRANS CLSA 1 131 CA $ -
CHAIN STARTRES 1 129 CISTRANS MAXEVDW 20 $ -
SIDE SGRID MIN STARTRES 1 127 LASTRES 1 131 -
     SIDEOPT INDE MAXEVDW 20 EVAL RMS $ -
- !
- !   We compare RMS's to the starting coordinates
- !
EVL RMS $ -
WRITE CUNIT 60 $ -
GLYMAP 51 ALAMAP 52 PROMAP 53 PROCONS 55 -
GLYEMAX 2 ALAEMAX 2 PROEMAX 2 STUNIT 70 -
ERINGPRO 50 -
  ! The following contains the title used in the conformation file
127FLVRMS.CG
Conformations of helix segment 127-131 in flavodoxin.
RMS driven.
*
COOR COPY               ! Restore original coordinates.
OPEN UNIT 60 NAME 127FLVE.CG UNFORM WRITE               ! Conformation file
OPEN UNIT 70 NAME CGDATA:TOPCGEN2.INP FORM READ         ! Sidechain topology
OPEN UNIT 51 NAME CGDATA:EMAPGLY30.OMP FORM READ        ! Glycine torsion map
OPEN UNIT 52 NAME CGDATA:EMAPALA30.OMP FORM READ        ! Alanine torsion map
OPEN UNIT 53 NAME CGDATA:EMAPPRO30.OMP FORM READ        ! Proline torsion map
OPEN UNIT 55 NAME CGDATA:PRO.CNS FORM READ              ! Proline constructor
OPEN UNIT 40 NAME 127FLV.STS FORM WRITE                 ! Status file
!
!   Run the search now.
!
CGEN -
STATUS SETPRN UNIT 40 END -
HBCG CUTHB 4.5 CUTHBA 90.0 -
     CTONHA 98.0 CTOFHA 99.0 CTONHB 98.0 CTOFHB 99.0 END -
- !   Evaluate energy using constant dielectric of 50, distance cutoff = 5 A
NBCG CUTNB 5.0 ATOM CTONNB 98.0 CTOFNB 99.0 END -
- !
- !   The following Backbone degree of freedom contains CLSA optimizations to
- !   the correct terminators for the search.
- !
BACK MAXEVDW 20 STARTRES 1 127 LASTRES 1 128 CISTRANS CLSA 1 131 CA $ -
CHAIN STARTRES 1 129 CISTRANS MAXEVDW 20 $ -
SIDE SGRID MIN STARTRES 1 127 LASTRES 1 131 -
     SIDEOPT ITER MAXEVDW 20 EVAL E $ -
- !
- !   We do a simple energy evaluation for the five residues. The
- !   energies will be written to the conformation file.
- !
EVL MINI ENERGY END $ -
WRITE CUNIT 60 $ -
GLYMAP 51 ALAMAP 52 PROMAP 53 PROCONS 55 -
GLYEMAX 2 ALAEMAX 2 PROEMAX 2 STUNIT 70 -
ERINGPRO 50 -
  ! The following contains the title used in the conformation file
127FLVE.CG
Conformations of helix segment 127-131 in flavodoxin.
Energy calculations.
*

Complete Sidechain Reconstruction Example

This example illustrates how every sidechain in a structure can be reconstructed. This run serves as an important test of CONGEN. Note that cysteines would not rebridged correctly because the CONGEN command doesn't handle the disulphides at present.

Reconstructing side chains on flavodoxin backbones
*
OPEN NAME CGDATA:RTOPH8.MOD UNIT 01 READ UNFORM
READ RTF UNIT 1
OPEN NAME CGDATA:PARAM5.MOD UNIT 03 READ UNFORM
READ PARAMETER UNIT 3
OPEN UNIT 10 NAME CGTD:FLVDOXPSF.MOD UNFORM READ
OPEN UNIT 11 NAME CGTD:FLVDOX.MOD UNFORM READ
READ PSF FILE UNIT 10
READ COOR FILE UNIT 11
COOR COPY COMP
ENERGY
OPEN UNIT 70 NAME CGDATA:TOPCGEN3.INP FORM READ
!
!   The map and proline constructor files are required for all CONGEN
!   runs even if they are not used.
!
OPEN UNIT 51 NAME CGDATA:EMAPGLY30.OMP FORM READ
OPEN UNIT 52 NAME CGDATA:EMAPALA30.OMP FORM READ
OPEN UNIT 53 NAME CGDATA:EMAPPRO30.OMP FORM READ
OPEN UNIT 55 NAME CGDATA:PRO.CNS FORM READ
CONGEN -
SIDE VAVOID MAXEVDW 20 -
     SIDEOPT ITER EVAL E MAXSIDE 30 -
     SGRID MIN -
     STARTRES 1 1 LASTRES 1 138 $ -
EVL MINI ENERGY END $ -
GLYMAP 51 ALAMAP 52 PROMAP 53 PROCONS 55 -
GLYEMAX 2 ALAEMAX 2 PROEMAX 2 STUNIT 70 -
ERINGPRO 50 -
HBCG CUTHB 4.5 CUTHBA 90 CTONHB 98 CTOFHB 99 CTONHA 98 CTOFHA 99 END -
NBCG CUTNB 8.0 ATOM CTONNB 98.0 CTOFNB 99.0 END
Conformations of sidechains in flavodoxin model
*
!
!    Compare sidechain coordinates
!
COOR RMS CLEAR ATOM * * NT ATOM * * N ATOM * * CA ATOM * * C ATOM * * CB -
               ATOM * * O ATOM * * OT1 ATOM * * OT2 ATOM * * H -
               ATOM * NTER * ATOM * 1 HT3 NOT
OPEN UNIT 21 NAME FLVDOXS.MOD WRITE UNFORM
WRITE COOR FILE UNIT 21
FLVDOXS.MOD
Sequentially constructed flavodoxin. Side chains generated by CONGEN.
SGRID 30 SGRID 60 for ARG and LYS.
*
ANAL
SET LINESZ 80
COMPARE COOR COMP $
BUILD DIFF ATOM R
DELETE VALUE ABS LT 0.0005 $               ! Delete backbone comparisons
ADD STATS RMS $ PLACE RESIDUE SEGMENT $    ! Add residue statistics
PRINT TABLE PRETTY
END

Energy Directed Search Example

This example illustrates how a energy directed search can be executed where the search is driven by the energies of complete residues along the loop. Note that we reconstruct sidechains repeatedly in order to get an accurate energy.

Energy directed conformational search
*
!
! Open and read RTF, parameters, PSF, and coordinates.
!
OPEN NAME CGDATA:RTOPH8.MOD UNIT 01 READ UNFORM
READ RTF UNIT 1
OPEN NAME CGDATA:PARAM5.MOD UNIT 03 READ UNFORM
READ PARAMETER UNIT 3
OPEN UNIT 10 NAME CGTD:FLVDOXPSF.MOD UNFORM READ
OPEN UNIT 11 NAME CGTD:FLVDOX.MOD UNFORM READ
READ PSF FILE UNIT 10
READ COOR FILE UNIT 11
ENERGY
!
! Open files needed for the search
!
OPEN UNIT 60 NAME EXAMPLE.CG UNFORM WRITE
OPEN UNIT 70 NAME CGDATA:TOPCGEN3.INP FORM READ
OPEN UNIT 51 NAME CGDATA:EMAPGLY30.OMP FORM READ
OPEN UNIT 52 NAME CGDATA:EMAPALA30.OMP FORM READ
OPEN UNIT 53 NAME CGDATA:EMAPPRO30.OMP FORM READ
OPEN UNIT 55 NAME CGDATA:PRO.CNS FORM READ
OPEN UNIT 40 NAME EXAMPLE.STS FORM WRITE
PARALLEL                        ! Setup a parallel search
CONGEN -
SEARCH EVAL ENERGY END -
- ! Allocate a big tree so many branches can be explored.
TREE LIMIT 1000000 TOPSAVE 4 REDUCTION 2.0 SDSAVE 0.0 PRTFRQ 1000000 END -
- ! Write a status file so we can get see the progress.
STATUS UNIT 40 END -
- ! The CHECKPOINT option is used to set the frequency of status file updates.
CHECKPOINT NODE 1000000 TIME 120 UNIT -1 END -
- !
- ! Build the loop from each end toward the middle
- ! Force the program to expand all backbone and chain closure open nodes
- ! so that only complete residues will be used to direct the search process.
-
BACK EIMMED STARTRES 1 38 MAXEVDW 20.0 CISTRANS CLSA 1 48 CA $ -
SIDE VAVOID MAXEVDW 5.0 SIDEOPT ITER EVAL E MAXSIDE 30 SGRID MIN -
     STARTRES 1 38 $ -
BACK EIMMED STARTRES 1 48 MAXEVDW 20.0 CISTRANS CLSA 1 39 N REVERSE $ -
SIDE VAVOID MAXEVDW 5.0 SIDEOPT ITER EVAL E MAXSIDE 30 SGRID MIN -
     STARTRES 1 38 -
     STARTRES 1 48 $ -
BACK EIMMED STARTRES 1 39 MAXEVDW 20.0 CISTRANS CLSA 1 47 CA $ -
SIDE VAVOID MAXEVDW 5.0 SIDEOPT ITER EVAL E MAXSIDE 30 SGRID MIN -
     STARTRES 1 38 LASTRES 1 39 -
     STARTRES 1 48 $ -
BACK EIMMED STARTRES 1 47 MAXEVDW 20.0 CISTRANS CLSA 1 40 N REVERSE $ -
SIDE VAVOID MAXEVDW 5.0 SIDEOPT ITER EVAL E MAXSIDE 30 SGRID MIN -
     STARTRES 1 38 LASTRES 1 39 -
     STARTRES 1 47 LASTRES 1 48 $ -
BACK EIMMED STARTRES 1 40 MAXEVDW 20.0 CISTRANS CLSA 1 46 CA $ -
SIDE VAVOID MAXEVDW 5.0 SIDEOPT ITER EVAL E MAXSIDE 30 SGRID MIN -
     STARTRES 1 38 LASTRES 1 40 -
     STARTRES 1 47 LASTRES 1 48 $ -
BACK EIMMED STARTRES 1 46 MAXEVDW 20.0 CISTRANS CLSA 1 41 N REVERSE $ -
SIDE VAVOID MAXEVDW 5.0 SIDEOPT ITER EVAL E MAXSIDE 30 SGRID MIN -
     STARTRES 1 38 LASTRES 1 40 -
     STARTRES 1 46 LASTRES 1 48 $ -
BACK EIMMED STARTRES 1 41 MAXEVDW 20.0 CISTRANS CLSA 1 45 CA $ -
SIDE VAVOID MAXEVDW 5.0 SIDEOPT ITER EVAL E MAXSIDE 30 SGRID MIN -
     STARTRES 1 38 LASTRES 1 41 -
     STARTRES 1 46 LASTRES 1 48 $ -
BACK EIMMED STARTRES 1 45 MAXEVDW 20.0 CISTRANS CLSA 1 42 N REVERSE $ -
SIDE VAVOID MAXEVDW 5.0 SIDEOPT ITER EVAL E MAXSIDE 30 SGRID MIN -
     STARTRES 1 38 LASTRES 1 41 -
     STARTRES 1 45 LASTRES 1 48 $ -
CHAIN EIMMED STARTRES 1 42 CISTRANS MAXEVDW 20.0 $ -
SIDE VAVOID MAXEVDW 5.0 SIDEOPT ITER EVAL E MAXSIDE 30 SGRID MIN -
     STARTRES 1 38 LASTRES 1 48 $ -
EVL MINI ENERGY END $ -
WRITE CUNIT 60 MINCUT 3 $ -
GLYMAP 51 ALAMAP 52 PROMAP 53 PROCONS 55 -
GLYEMAX 2 ALAEMAX 2 PROEMAX 2 STUNIT 70 -
ERINGPRO 50 -
HBCG CUTHB 4.5 CUTHBA 90 CTONHB 98 CTOFHB 99 CTONHA 98 CTOFHA 99 END -
NBCG CUTNB 8.0 ATOM CTONNB 98.0 CTOFNB 99.0 END
Directed search example.
*
!
! Now extract the best conformation
!
CLOSE UNIT 60
OPEN UNIT 60 NAME EXAMPLE.CG UNFORM READ
XCONF 60 BEST 1

Pointers to Relevant Programs

There are a number of programs available for assisting in the effort to analyze a conformational search. See section cmploop -- Preliminary Analysis of CGEN Files, for a description of a program which determines the RMS deviations for all conformations in a conformation file from the reference coordinates stored in that file. In addition the energies are printed. The program SORTN can be used to sort the output of CMPLOOP. See section sortn -- Sort a Text File by Numbers (Real or Integer) for more information.

Go to the previous, next section.