Go to the previous, next section.

Storage Management in Fortran

One of the great limitations in Fortran is its inability to dynamically allocate storage. In a large program which must deal with problems are highly variable sizes, it is extremely valuable to allocate storage based on the size of the problem. Thus, a given computer can process a mix of problems as long as the aggregate size is not too large.

Even though Fortran does not allow dynamic storage allocation, we can conveniently simulate it so that arrays can be established at run-time with whatever size is necessary. These storage allocation schemes are written entirely in Fortran and do not require any machine dependent features.

The storage schemes described here are ubiquitous, and it is very difficult to follow the code without a good feel for these storage schemes. However, once they are comprehended, modifications are straightforward.

There are numerous sources in the computer science literature on stacks, heaps, and data structures. An excellent starting point is Jeffrey D. Ullman, Fundamental Concepts of Programming Systems, Addison Wesley (1976). Chapters 3, 6, and 7 are most useful.

Stacks: Local storage on a small scale
Heaps: Arbitrary storage access on a large scale.
Dynamic Data Structures Implementing data structures on the heap.
Applicability: Is this all worth it? (You can guess the answer)

Stacks

In all large programs, the need for "local" storage arises very frequently. "Local" storage or temporary storage is defined as storage needed by a subroutine for itself or any subroutine it calls. Invariably, local storage is required for work arrays whose sizes can be determined prior to their use. Local storage is not meant to be accessible when the subroutine is not executing.

An example requirement for local storage is a routine which squares a matrix. Because array multiplication cannot be done in place, one would require local storage to store a temporary array which would hold the squared matrix.

The data structure which is ideally suited for this task is a stack. At minimum, a stack consists of an array and an integer. Storage is allocated and freed from the top end of the array only. The integer is the stack pointer and keeps track of what part of the stack is in use. When a subroutine requires local storage, it gets it from the top of the stack. If it calls another subroutine which requires more local storage, the local storage currently being used is, in effect, buried, and the additional storage is allocated on the top of the stack. "Burying" such storage is reasonable because the first subroutine's execution is suspended when the next one is executing. When it finishes, the storage is freed, and the top of the stack moves back to where it was. The first subroutine then resumes execution with the same stack environment before the other subroutine was called.

As seen above, the stacks allow nested subroutines to each have their own local storage. This is very important because CONGEN uses many subroutines, and they can nest many levels. Using blank common for local storage fails because a nested subroutine would overwrite storage used by its caller.

Implementation of Stacks in CONGEN

CONGEN maintains two stacks, one for numeric data and one for character string data. (The Fortran standard forbids the mixing of numeric and character data in a common block, so for portability, we keep them separate even though one could get away with mixing them on most machines.)

The two stacks are stored in two common blocks named STACK and STACK_ST. Both common blocks are defined in `stack.fcm' or `stack.h'. The current declaration is as follows:

C
C     Variables:
C     
C     LSTUSD      Stack pointer which refers to last used element in
C                 stack array.
C     MAXUSD      Maximum value of LSTUSD seen during execution.
C     STACK       Numeric variable stack.
C     LSTCUSD     Character string version of LSTUSD
C     MAXCUSD     Character string version of LSTUSD
C     CSTACK      Character string stack.
C
C     Parameters:
C
C     STKSIZ      Maximum size of stack.
C     CSTKSIZ     Character string version of STKSIZ
C
      INTEGER LSTUSD,STKSIZ,MAXUSD,STACK,ALLSTK,LSTCUSD,CSTKSIZ,
     2        MAXCUSD,ALLCSTK
      PARAMETER (STKSIZ = 100000, CSTKSIZ = 100000)
      COMMON /STACK/ LSTUSD,MAXUSD,LSTCUSD,MAXCUSD,
     2               STACK(STKSIZ)
      REAL RSTACK(STKSIZ)
      EQUIVALENCE (RSTACK(1),STACK(1))
      CHARACTER*1 CSTACK
      COMMON /STACK_ST/ CSTACK(CSTKSIZ)

LSTUSD and LSTCUSD hold the index of the last element used in each stack; they are the stack pointers. STKSIZ and CSTKSIZ hold the size of the stacks which is used for overflow checking. MAXUSD and MAXCUSD accumulate the maximum value of the stack pointers so that one determine how much storage must be dimensioned for the stack without wasting any space. As one can see, the stack must be dimensioned to some appropriately large size, and all local storage requirements will be taken from it. If the stack is not big enough for the needs of the program, the stack must be redimensioned. Changing the size of the stack is far easier than redimensioning every array that would have declared locally.

There are some valuable techniques for using the character string stack. Note that the stack is declared as an array of CHARACTER*1. This was done to allow any size character string because on some machines (e.g. VAX/VMS), strings are limited to 32767 characters. When an allocated character string is passed to a subroutine, the space should be referenced as a character substring of the allocated length. Then, the subroutine will see the length of character string as the allocated length, and subscript range checking is simplified. If a character string array is allocated, the passed length should still be the length of the string element. Please note that if the passed length derives from a constant, then most compilers will detect the apparent substring range violation. Therefore, it will be necessary to allocate an extra integer variable, assign the constant to it, and use the integer variable in the subroutine call.

To refer to the stack in a subroutine, use a #include statement and refer to the file `STACK.FCM'. For convenience in debugging, the stack is also declared as a REAL array using an equivalence statement.

The stacks are limited in size, and it is not memory effective to increase their size greatly. Therefore, if you have large local storage needs, use heap storage, see section Heaps, for the really big arrays.

Useful Subprograms

A number of subroutines and functions are provided to perform all the necessary manipulations of each stack. They are found in the source code file, `util.flx'.

`INISTK`

To initialize the stacks, one must call INISTK with no arguments. INISTK sets all the stack variables appropriately, and if DBG_ALLSTK is greater than zero (it normally is), the stacks are filled with non-zero elements. The initial filling of the stacks helps to catch error which occur because local storage is not properly initialized.

`ALLSTK` and `ALLCSTK`

To obtain storage on the stacks, one uses the integer function ALLSTK or ALLCSTK. Both functions take one argument, the number of elements required, and returns the index into the array, STACK or CSTACK, of the first element allocated. In the case of ALLSTK, the elements are integers. The function, FSIZEOF, should always be used to calculate the number of integers required for a given data type because the number of integers is machine dependent. Never do this calculation in your own code! Also, ALLSTK will round your request up to the nearest multiple of 2 so that double precision alignment will be maintained. In the case of ALLCSTK, the elements are characters. If you are allocating space for a character string array, pass INICSTK the product of the array length times the string lengths. The functions revise the stack pointers, LSTUSD or LSTCUSD, to indicate the allocation, and they check for stack overflow. Stack overflow results in the termination of the program. Negative arguments are not permitted to these functions. Note that common block predeclares this function for you.

`FRESTK` and `FRECSTK`

To free storage on the stack, the subroutines FRESTK or FRECSTK are used. FRESTK is used for the numeric data stack, and FRECSTK is used for the character string stack. They each take one argument, the number of elements to be freed. They decrement the stack pointers by their arguments to indicate the release of storage, and they check to see that the stack pointers have not become negative. If they have, an error message is output, and program execution stops. Both routines check that their arguments are non-negative. FRESTK will round its argument to the next multiple of 2.

`FSIZEOF`

The integer function, FSIZEOF, accepts two arguments, TYPE and SIZE, and returns the number of integers needed to store SIZE variables of data type, TYPE. The TYPE argument is a character string and may have one of the following values: `REAL', `REAL*8', `DOUBLE', `INTEGER', `INTEGER*1', `INTEGER*2', `LOGICAL', `LOGICAL*1', `SHORT', `BYTE', `CHARACTER', or `COMPLEX'. The lengths provided by FSIZEOF vary from machine to machine.

Other Stack Functions and Variables

The function, PRINSTK, may be used to print the current state of the stack. The debugging variable, DBG_ALLSTK, may be used to debug stack usage. The default setting of 1 results in the initialization of the stack to non-zero values. If DBG_ALLSTK is set to 2 or greater, all calls to the allocation and freeing routines will generate messages displaying the argument to each call.

Using Storage on the Stack

The main method for using the space allocated on the stack is as follows: Suppose we have a subroutine, FOO, which requires local storage to work. Suppose further that FOO takes two arguments, A and B, and needs two working arrays, C and D. The code in subroutine FOO will call ALLSTK twice; once for each work array needed. Since C and D are names of the work arrays, the indices into the stack returned by ALLSTK are assigned to the variables C and D. A second subroutine, FOO1, is defined with four arguments, the original two and two arguments which are work arrays. In FOO1, C and D are defined as arrays. Once FOO has finished allocating the work storage, it calls FOO1 as follows:

        CALL FOO1(A,B,STACK(C),STACK(D))

FOO1 actually does the computational work for FOO. Once FOO1 completes, FOO then calls FRESTK to free the work storage used, and it returns having completed its task. In summary, referencing the work arrays is accomplished by using a subroutine call to map an array onto the stack.

A second method exists for accessing local storage on the stack. This involves using subroutines which access one element of an array at a time. They are not very useful for the stack, but they are discussed further in the section on data structures in the heap, see section Data Structures on the Heap in CONGEN.

Stack Example 1 -- Matrix Squaring

To illustrate the use of the stack, we show how the matrix squaring subroutine is programmed. The main program sets up the stack and calls the routine which is named MSQUAR.

      PROGRAM TSTMSQ
C
C     TESTS MSQUAR, A MATRIX SQUARING ROUTINE. WE READ THE MATRIX IN
C     FROM UNIT 5 AND OUTPUT ITS SQUARE ON UNIT 6. THE SIZE OF THE TEST
C     MATRIX, A, IS 10 BY 10.
C
#     include "stack.fcm"
C
      REAL A(10,10)
C
      CALL INISTK
      READ (5,100) ((A(I,J),J=1,10),I=1,10)
  100 FORMAT(10F8.0,9(/10F8.0))
      CALL MSQUAR(A,10)
      WRITE (6,200) ((A(I,J),J=1,10),I=1,10)
  200 FORMAT(' RESULTS OF CALLING MSQUAR -- THE MATRIX A:'/
     2       10(/1X,1P10G11.4))
      CALL PRINSTK(6)
      STOP
      END
      SUBROUTINE MSQUAR(A,N)
C
C     SQUARES A WHICH IS A REAL ARRAY ASSUMED TO BE N BY N. WE USE THE
C     STACK TO STORE THE SQUARED ARRAY.
C
      REAL A(N,N)
#     include "stack.fcm"
      INTEGER SQUARE
      INTEGER OLDLST
C
C     WE ALLOCATE SPACE FOR THE SQUARED ARRAY. NOTE THAT WE SAVE THE
C     VALUE OF LSTUSD UPON ENTERING THIS SUBROUTINE. THIS MAKES IT
C     EASIER TO FREE THE AMOUNT OF STORAGE ALLOCATED BY EVERY ALLSTK
C     CALL WHEN THE SUBROUTINE QUITS.
C
      OLDLST=LSTUSD
      SQUARE=ALLSTK(N*N)
      CALL MSQUA1(A,N,STACK(SQUARE))
      CALL FRESTK(LSTUSD-OLDLST)
      RETURN
      END
      SUBROUTINE MSQUA1(A,N,SQUARE)
C
C     MSQUA1 DOES THE JOB OF SQUARING THE MATRIX A. WE CAN NOW REFER TO
C     THE CONTENTS OF THE TEMPORARY ARRAY, SQUARE, DIRECTLY.
C
      REAL A(N,N),SQUARE(N,N)
C
      DO 1 I=1,N
        DO 1 J=1,N
          S=0.0
          DO 2 K=1,N
    2       S=S+A(I,K)*A(K,J)
    1     SQUARE(I,J)=S
C
C     NOW COPY SQUARE BACK INTO A
C
      DO 3 I=1,N
        DO 3 J=1,N
    3     A(I,J)=SQUARE(I,J)
      RETURN
      END

Stack Example 2 -- `JOINWD`

In this example, we show the use of the character string stack. JOINWD is used to insert one character string, WD, onto the beginning of a second character string, ST. Stack space is required because we wish to avoid an overlapped copy. JOINWD first copies ST to local storage, copies WD to ST followed by space, and then copies the old value of ST back into ST. Note the use of a substring reference into CSTACK in order to provide COPYST the correct size of the string it is copying into.

      SUBROUTINE JOINWD(ST,STLEN,WD)
C
C     DOES NEARLY THE INVERSE OF NEXTWD, I.E. JOINS WD ONTO 
C     THE BEGINNING OF ST.
C
      IMPLICIT INTEGER(A-Z)
      CHARACTER*(*) ST,WD
#     include "stack.fcm"
C
      STL = STLEN
      WDLEN = LEN(WD)
      COPY = ALLCSTK(STL)
      CALL COPYST(CSTACK(COPY)(1:STL),COPYLN,ST(1:STLEN))
      CALL COPYST(ST,STLEN,WD(1:WDLEN))
      IF (COPYLN .GT. 0) CALL ADDST(ST,STLEN,' ')
      CALL ADDST(ST,STLEN,CSTACK(COPY)(1:COPYLN))
      CALL FRECSTK(STL)
      RETURN
      END

Heaps

The stack is a very efficient means of providing dynamic storage for a great variety of uses. However, its limitation to local storage presents a problem when a subroutine must allocate storage for use by the routine which calls it. An example where the stack cannot be used is as follows: Suppose we wanted to write a routine, NBO, which finds all close non-bonded interactions, and suppose further that we wanted NBO to allocate the storage itself without any preset limitations. The stack cannot be used for this because the storage being allocated for the non-bonded list is not local; it must be used by the subroutine which calls NBO.

To circumvent this problem, we need a storage mechanism which allows completely arbitrary allocation of storage. Such flexibility is obtained with a heap.

A heap consists of an array and a list. The array is used for storing data. The list, referred to as the free list, keeps track of what areas in the heap are free. Before the heap is used, the free list is initialized to indicate that the entire heap is free. When space is required, the free list is searched for a space large enough to accommodate the request. The free list is then modified to indicate that less storage is available. To free storage, the freed area of storage must be put back on the list and recombined with adjacent free areas if possible.

As with the stacks, CONGEN provides two heaps, one for numeric data and one for character string data. The numeric data heap is managed entirely within Fortran, but the character string heap is really nothing more than the heap maintained by the C run time library routines malloc and free. There is a character string heap array, and the pointers returned by malloc are mapped into the address for the character string heap.

Implementation of the Heap for CONGEN

In CONGEN, the heaps are implemented using three common blocks. HEAPCM stores numerical information and the numeric heap, SVHEAP stores a copy of critical heap pointer for error detection, and HEAP_ST stores the initial address for the character string heap. All of the declarations are in the files, `heap.fcm' and `heap.h', and the appropriate file should be #include'd wherever the heaps are used. The real number array, RHEAP and the logical array, QHEAP, are equivalenced to the HEAP so references to other data types are simplified. (It should be noted that the use of these equivalences presumes that these data types have the same sizes as integers. This presumption may not always be true.)

Useful Subprograms

There are a number of subprograms which simplify the use of the heaps.

`INITHP`

To initialize the heap, INITHP is used. First, one sets HEAPSZ to the dimension of the heap, and then, INITHP is called with no arguments.

`ALLHP` and `ALLCHP`

To obtain storage on the heap, we use the integer function ALLHP. ALLHP is called with one argument, the amount of storage needed on the HEAP array, in units of INTEGERs. You should always use the INTEGER function FSIZEOF, see section Stacks, to determine the number of integers required for particular data type and number of elements. ALLHP then returns the index of the first element of the array allocated in the heap. This index shall be referred to as the base of the allocated storage.

ALLCHP works in a similar way to ALLHP except that it operates on the character string heap. The argument to ALLCHP is the number of characters to be allocated, and the return value is the index into CHEAP for the storage. Internally, ALLCHP is implemented using malloc.

`FREHP` and `FRECHP`

To return storage to the heap, one calls FREHP or FRECHP. Unlike FRESTK, FREHP and FRECHP require two arguments, a base and a length. FREHP and FRECHP return the storage demarked by the base and length to their respective free lists. If this storage can be recombined with adjacent free areas of storage, it is done.

Error Detection

ALLHP, ALLCHP, FREHP, and FRECHP make extensive checks on their arguments and on themselves. Since these two routines are vital to the operation of CONGEN, they are designed to catch many programming errors that can result from using the heap. In addition, there are debugging variables that can be used to improve the error detecting capabilities of the heap subprograms. See the source code for the usage of the debugging variables alloc and allhp.

`PRINHP`

To obtain a listing of the free storage areas available on the heap, PRINHP is available. It is called with one argument, a unit number to which the output is sent.

Using Storage on the Heap

Referencing the data in the heap can be done using the same techniques as described for the stack. However, a more sophisticated method of referencing data on the heap is the subject of the next section.

Heap Example -- Unlimited Read of a File

To illustrate the use of a heap, consider the following example: Suppose we want a subroutine which reads in a list of card images from a file and stores them into an array on the heap. Assume that the number of cards which can be read is to be limited only by the space available on the heap.

To implement this, we assume that we can store 4 characters to a word and that our cards have 80 columns. The card array is then dimensioned (20,N). The subroutine which is called to get the card images is called RDCARD.

      PROGRAM TSTRDC
C
C     This program tests RDCARD. It initializes the HEAP, calls RDCARD
C     on unit 5, prints the card images, and then prints the HEAP.
C
#     include "heap.fcm"
      INTEGER CRDBAS,CARDSZ
C
      CALL INITHP
      CARDSZ = 80
      CALL RDCARD(CRDBAS,5,NCARD,CARDSZ)
      CALL PRTCRD(CHEAP(CRDBAS)(1:CARDSZ),NCARD)
      CALL PRINHP(6)
      STOP
      END
      SUBROUTINE RDCARD(BASE,UNIT,NCARD,CARDSZ)
C
C     Reads all the card images from UNIT into the HEAP. BASE is
C     returned giving the first word in the HEAP where the card images
C     are stored. NCARD returns the number of cards found.
C
      INTEGER BASE,UNIT,NCARD,START,END,CARDSZ
#     include "heap.fcm"
      LOGICAL EOF
      INTEGER INC
      PARAMETER (INC = 100)
C
C     Initially, we allocate space for INC cards. If this is exceeded
C     during the reading of the file, space for an additional INC cards
C     will be allocated, and the reading will continue.
C
      LIMIT=INC
      BASE = ALLCHP(LIMIT*CARDSZ)
      NCARD=0
C
C     RDCAR1 actually reads the file, putting card images into the HEAP.
C
      REPEAT UNTIL (EOF)
      CALL RDCAR1(CHEAP(BASE)(1:CARDSZ),NCARD,LIMIT,UNIT,EOF)
      UNLESS (EOF)
C
C     At this point, we know that RDCAR1 ran out of space. We allocate a
C     array larger in size and copy the cards already obtained into the
C     new array, and then free the old array.
C
      NEWBAS = ALLCHP(CARDSZ*(LIMIT+INC))
      CALL COPYCA(CHEAP(NEWBAS)(1:CARDSZ),LIMIT+INC,
     2            CHEAP(BASE)(1:CARDSZ),LIMIT)
      CALL FRECHP(BASE,CARDSZ*LIMIT)
      LIMIT = LIMIT+INC
      BASE = NEWBAS
      FIN
      FIN
      END
      SUBROUTINE RDCAR1(CARDS,NCARD,LIMIT,UNIT,EOF)
C
C     Reads cards from UNIT into CARDS, setting EOF if end of file is
C     found. If the routine returns with EOF not set, it means that
C     space was exhausted in the card array.
C
      CHARACTER*(*) CARDS(LIMIT)
      INTEGER UNIT
      LOGICAL EOF
C
      EOF = .FALSE.
      REPEAT WHILE (NCARD .LT. LIMIT)
      NCARD=NCARD+1
      READ (UNIT,100,END=99) CARDS(NCARD)
  100 FORMAT(A)
      FIN
      RETURN
C
C     Come here on end of file. We must decrement NCARD by one because
C     the READ statement did not add another card image to the array.
C
   99 NCARD = NCARD-1
      EOF = .TRUE.
      RETURN
      END
      SUBROUTINE PRTCRD(CARDS,NCARD)
C
C     Prints the card images in CARDS onto unit 6, the printer.
C
      CHARACTER*(*) CARDS(*)
C
      WRITE (6,200) NCARD
  200 FORMAT('0OUTPUT OF ',I5,' CARDS:')
      DO (I=1,NCARD)
      WRITE (6,201) CARDS(I)
  201 FORMAT(1X,A)
      FIN
      RETURN
      END
      SUBROUTINE COPYCA(ST1A,NST1A,ST2A,NST2A)
C
C     Copies strings from ST2A into ST1A. The lengths of the arrays are
C     given by NST2A and NST1A, respectively. 
C
      IMPLICIT INTEGER(A-Z)
      CHARACTER*(*) ST1A(*),ST2A(*)
      INTEGER NST1A,NST2A
C
      DO (I = 1,MIN(NST1A,NST2A))
      ST1A(I) = ST2A(I)
      FIN
      RETURN
      END

Data Structures on the Heap in CONGEN

Fortran, the language that CONGEN is implemented in, suffers from a lack of flexibility with respect to the storage of data. The structuring of data is limited to scalars, complex numbers, and arrays. For many programming tasks, there is a need for more complicated structures for holding data. For example, in CONGEN, the protein structure file, the PSF, consists of a number of arrays and scalars that collectively describe the structure of a macromolecule. It would very useful to be able to refer to the PSF as a whole as opposed to referring to all of its constituents.

This inability to refer to a number of related scalars and arrays by one name results in a number of problems. Passing the information to a subroutine requires either using a large number of parameters or common blocks.

Using large numbers of parameters makes subroutine calls very bulky and increases the chances of errors resulting from the mistyping of a parameter. On the order of 100 parameters are required to call the analysis subroutine.

Common blocks could be used to pass information to a subroutine. However, these require allocating all needed space at compile time. Also, having more than one instance of a data structure is difficult. For example, the analysis facility requires a second PSF, coordinate set, and other data structures. Without solving this data structure problem, we need to establish another huge set of scalars and arrays. Further, as the program is changed, this second set would have been modified as well which would have a great potential for programming errors. These problems required a better method be introduced to handle the use of data structures in Fortran.

Having decided to implement a more succinct method for using data structures, we must consider other design criteria which would be useful to meet. First of all, the elements of each data structure must convenient to access, and the clarity of the program must not be unduly restricted. Next, since we have a mechanism for dynamic storage allocation, data structures should use only as much space as they need and should be able to grow. Third, the sharing of readonly data is desirable.

Given these design goals, how are they implemented for use in CONGEN? First, the heaps are ideally suited for this task as it allows completely arbitrary use of storage. Scalars are assumed to require one word of storage on the heap. Arrays will take as many words as their dimension and data type requires. Character strings are stored in the character string heap.

To keep track of the location and data types of elements in a data structure, three arrays; the base, length, and type arrays; are used. The base array stores the index of the first word or character of each element in the data structure. The length array gives either the total number of integers required for the element or the total number of characters, depending on the data type. The type array specifies the data type of the element. If the type array contains a positive number, this indicates that the element is a character string, and the number in the type array specifies the length of the character strings. Otherwise, if the number is equal to the parameter, DT_NUMERIC, then the element is numeric. Any other values are illegal. Multiple instances of a data structure require multiple sets of arrays. The naming scheme for the base, length, and type arrays is to prefix a `B', `L', or an `T', respectively, in front of the data structure name.

To identify the word in the base array which corresponds to an element in the data structure, we use a common block which is named for the data structure. If the data structure contains n elements, the common block of indices contains n+1 variables. The first variable gives the size of the base array needed to hold one instance of the data structure. The remaining variables all have the same name as the element in the data structure. Before proceeding any further, an example is required.

Example -- The Non-bonded List Data Structure

Consider a simplified and slightly contrived version of the data structure for storing the non-bonded list. Two arrays; JNB and INBLO; six scalars; NNNB, CTONNB, CTOFNB, CUTNB, WMIN, and NBOPT; and one character string, NAME are needed. We will use the name, NBND, for the data structure. The base, length, and type arrays for this data structure are called BNBND, LNBND, and TNBND, respectively, and the following declarations in a subroutine would suffice to use parts of the non-bonded list for a calculation:

      IMPLICIT INTEGER(A-Z)
#     include "heap.fcm"
      COMMON /INBND/ SNBND,JNB,INBLO,NNNB,CTONNB,CTOFNB,CUTNB,
     2               WMIN,NBOPT,NAME
      INTEGER BNBND(SNBND)

To make reference to the scalar NBOPT, one would write HEAP(BNBND(NBOPT)). To make reference to the character string, NAME, one would write CHEAP(BNBND(NAME))(1:TNBND(NAME)). The substring reference serves to specify a legitimate length for the name, and would be accessible in a called subroutine as the length of the string.

Three points must be noted before we continue. First, we can store any type of data in these data structures, provided that the numeric and character data are identified and separated. All we must know is how much space they require, which is provided by the FSIZEOF function, see section Stacks. For example, DOUBLE PRECISION variables require two integers per variable on most machines, and one integer on the Cray.

Second, we ensure that the index common blocks are the same from one subroutine to the next by using an #include statement which is processed by the Fortran C preprocessor, see section FCPP -- The Fortran C Preprocessor. An #include statement takes a file name and inserts the contents of the file in the place of the #include statement. By setting up files which contain the common block definitions and using #include statements exclusively for getting the common block, we can ensure that the common blocks are the same.

Third, in general, one cannot access the data directly. The cases where direct access is possible is for numeric data whose size is the same as an INTEGER, such as REALs and LOGICALs. Since the heap is declared as INTEGER, we make a direct reference. Single precision real scalars can be accessed directly using RHEAP, and logicals can be reference using QHEAP. References to any other parts of the data structure can be done using two general techniques.

The first of these reference techniques is to call a subroutine with the desired element of the data structure as a parameter. In the subroutine, one declares the parameter for what it is, and one can make references as desired. For example, if we wish to access the JNB array in the non-bonded data structure, we can make a subroutine call as follows:

      CALL SUB(HEAP(BNBND(JNB)))

and declare in the subroutine

      SUBROUTINE SUB(JNB)
      INTEGER JNB(1)

and make references to any element of the JNB array. This is the same method as described for stacks. The overhead involved with this scheme is that one must create a subroutine for the references.

The second technique which has a much higher overhead, but doesn't require a new subroutine is to use referencing and storage functions which have been written for CONGEN and which are found in the source file `array.flx'. The referencing functions take an array and indices into it and return the value of the specified element. For example, the function REFI accepts two arguments, an integer array, A, and an index, I. REFI returns A(I). The storage functions take an array, indices, and a value and store the value into specified element. For example, the subroutine STORI takes an integer array, A; its length, N; and index, I; and a value, VALUE, and executes the following statement: A(I)=VALUE. The higher overhead of this method results from the fact that a subroutine call is required for each array reference.

Useful Subprograms

Next, what subroutines and functions are provided to assist in the use of these data structures? There are many which cover most of the needs of CONGEN.

`STRTDT` and `SETUPI`

The subroutine STRTDT is used to initialize the common blocks. Whenever a new data structure is defined or an old one is changed in size, STRTDT must be changed to accommodate this. STRTDT calls SETUPI.

The subroutine, SETUPI, is used to initialize the index common blocks. It depends on being able to equivalence the integer variables in the index common block to an integer array by using a subroutine call.

`INITDT`

The subroutine INITDT is used to initialize a base array. Before a base array can be used, it must be initialized to the free state. The free state means that all bases are set a value which cannot point into the heap. This value, known as the free value, is a sufficient large negative number such that any references using it will cause an access violation which is useful in catching programming errors before they do subtle damage.

`ALLDT`

The subroutine ALLDT is used to allocate space for a data structure on the heaps. It is called with filled length and type arrays and a free base array. The contents of the length array is summed to get the space requirements for the data structure. ALLHP and ALLCHP are then called to get space on the heaps for the data structure. Finally, the bases are all set to point to their part of data structure. Lengths of zero are permitted. In these cases the bases for the element will be left as the free value. This will catch any illegal references.

`FREEDT`

The subroutine FREEDT is used to free a data structure. The data structure must be allocated before it can be freed. The base array is set to the free value.

`USEDDT`

The logical function USEDDT is used to determine if a base array is free or is pointing to a data structure. It returns .TRUE. if the base array passed to it is in use.

`RESIZE`

RESIZE is used to change the size of any of the elements in a data structure. It is called with a base, length, and type array and another trio of arrays which give elements to be changed and the new sizes. Note that only the lengths of character string elements can be changed, data types of elements may not be. RESIZE allocates space for a copy of the data structure with new sizes; copies the old data into the new structure; and then frees the old data structure.

`READDT` and `WRITDT`

READDT and WRITDT may be used to write a data structure in binary form to a file. The trio of arrays must be specified along with a Fortran unit number, HDR and ICNTRL information, and a title. READDT will read what WRITDT creates.

`DUPLDT`

DUPLDT is used to duplicate a data structure. It simply makes a complete copy of the arrays and the data stored in the heaps.

Sharing Data and `ASGNDT`

With the subroutines used to implement these data structures explained, we can return to a discussion of other implementation details. The next issue is the question of sharing data. Because the base arrays are pointers, there is nothing to prevent two base arrays from pointing to the same data. Obviously, this can lead to great problems if this occurs unintentionally, but one can save space and underscore the identity of data if we allow sharing of read-only data.

Implementing sharing is very easy with the above scheme. We reserve the first element of the base array to be a pointer to the reference count for a particular instance of a data structure. When the data structure is allocated, the reference count is set to 1. If we assign another data structure to this instance by copying its bases, the reference count is incremented by 1. If we free the data structure, the reference count is decremented by one. When the reference count drops to zero, the space on the heap is freed. The use of the first element of the base arrays has an additional benefit because the length of the reference count is 1 word. This fact ensures that the first word in the base array cannot have a free value when the base array is in use.

The subroutine, ASNGDT, implements the assignment of data structures which are to shared. It will take a base and length array and copies them to another base and length array. The reference count for the data structure will be incremented by one.

Switching to Data Structures on the Heap

The final detail to be discussed is the interface between this scheme for manipulating data structures and the usual scheme. Currently, the analysis subroutine and non-bonded list uses this more concise means for handling data structures; the rest of CONGEN works with the separate arrays and scalars. When the analysis subroutine is invoked, the data structures must be built. Copying subroutines exist for the five data structures passed to the analysis subroutine. These copying subroutines will take the arrays and scalars which comprise a data structure and build a data structure on the heap for it. Once the copying operation is complete, the actual analysis subroutine can be called and it can freely use the data structures existing on the heap.

The copying operation presents a naming problem in that the indices kept in the index common blocks have the same names as the arrays and scalars. To circumvent this problem, a second set of index common blocks is kept for all data structures which are not isolated in the analysis subroutine. This second set of index common blocks have the same common block names as the first, but the names of the indices are prefixed with `I'.

A good example of the use of these data structures is the subroutine NBONDS and some of its subsidiary routines in the file, `nbonds.flx'.

General Applicability of These Schemes

One final aspect of these storage schemes must be considered --- when should they be used? It is clear that for new work in CONGEN, C should be used for programming because it provides all these facilities in a straightforward manner. (CHARMM and CONGEN would have been more advanced had C be used from the beginning.) However, if new Fortran code must be integrated or maintained, then these schemes are applicable. The stack and the data structure storage on the heap were developed primarily to deal with an enormous number of arrays whose sizes must be able to change. The other design criteria came later. It should be obvious that one pays a price for these schemes, and one must determine whether the benefits are worth the cost.

Disadvantages

One disadvantage of these methods is that they are unfamiliar, and therefore, they take time to learn. Unfortunately, a subjective judgment is required to estimate how large a disadvantage this is. In my opinion, this chapter will enable an interested person to learn this material in a day or two. The subroutines to use these schemes are sufficiently comprehensive to permit effective use of these schemes without additional programming. Further, the ideas presented in this document are worth learning as they are valuable outside the context of CONGEN. Besides its application for local storage, the stack is useful for problems which require backtracking or which can be solved recursively. The heap is useful for problems which involve management of complex lists.

A second disadvantage of these schemes is that they are more cumbersome than the techniques they replace. There is no arguing this point. However, when used with the conventions described here, clarity is not sacrificed.

Advantages

Let us now consider the advantages. We shall confine ourselves to the analysis facility where they are used extensively.

First, as the size of a system being analyzed grows to the limits of what is supported, one need only change the size of the stack and the heap to accommodate a bigger system. When a subroutine has as many arrays as the analysis facility has, this is vital.

Next, although references to parts of a data structure are more cumbersome, referencing a whole structure is much more concise. The subroutines in the analysis facility illustrates the compactness of the subroutine call despite the large amount of information that is being passed. Further, the inclusion of the comparison data structures is trivial; we merely have to create another set of base and length arrays.

Thirdly, the data structures can be referred to as a unit. This aids the job of programming because now we can deal with bigger units of information without having to worry about the details. These data structures on the heap allow a programmer to ignore their contents until they are ready to be used. This can simplify the task of modifying a program the size of CONGEN.

Go to the previous, next section.