Go to the previous, next section.
One of the great limitations in Fortran is its inability to dynamically allocate storage. In a large program which must deal with problems are highly variable sizes, it is extremely valuable to allocate storage based on the size of the problem. Thus, a given computer can process a mix of problems as long as the aggregate size is not too large.
Even though Fortran does not allow dynamic storage allocation, we can conveniently simulate it so that arrays can be established at run-time with whatever size is necessary. These storage allocation schemes are written entirely in Fortran and do not require any machine dependent features.
The storage schemes described here are ubiquitous, and it is very difficult to follow the code without a good feel for these storage schemes. However, once they are comprehended, modifications are straightforward.
There are numerous sources in the computer science literature on stacks, heaps, and data structures. An excellent starting point is Jeffrey D. Ullman, Fundamental Concepts of Programming Systems, Addison Wesley (1976). Chapters 3, 6, and 7 are most useful.
In all large programs, the need for "local" storage arises very frequently. "Local" storage or temporary storage is defined as storage needed by a subroutine for itself or any subroutine it calls. Invariably, local storage is required for work arrays whose sizes can be determined prior to their use. Local storage is not meant to be accessible when the subroutine is not executing.
An example requirement for local storage is a routine which squares a matrix. Because array multiplication cannot be done in place, one would require local storage to store a temporary array which would hold the squared matrix.
The data structure which is ideally suited for this task is a stack. At minimum, a stack consists of an array and an integer. Storage is allocated and freed from the top end of the array only. The integer is the stack pointer and keeps track of what part of the stack is in use. When a subroutine requires local storage, it gets it from the top of the stack. If it calls another subroutine which requires more local storage, the local storage currently being used is, in effect, buried, and the additional storage is allocated on the top of the stack. "Burying" such storage is reasonable because the first subroutine's execution is suspended when the next one is executing. When it finishes, the storage is freed, and the top of the stack moves back to where it was. The first subroutine then resumes execution with the same stack environment before the other subroutine was called.
As seen above, the stacks allow nested subroutines to each have their own local storage. This is very important because CONGEN uses many subroutines, and they can nest many levels. Using blank common for local storage fails because a nested subroutine would overwrite storage used by its caller.
CONGEN maintains two stacks, one for numeric data and one for character string data. (The Fortran standard forbids the mixing of numeric and character data in a common block, so for portability, we keep them separate even though one could get away with mixing them on most machines.)
The two stacks are stored in two common blocks named STACK
and STACK
_ST.
Both common blocks are defined in `stack.fcm' or `stack.h'. The current
declaration is as follows:
C C Variables: C C LSTUSD Stack pointer which refers to last used element in C stack array. C MAXUSD Maximum value of LSTUSD seen during execution. C STACK Numeric variable stack. C LSTCUSD Character string version of LSTUSD C MAXCUSD Character string version of LSTUSD C CSTACK Character string stack. C C Parameters: C C STKSIZ Maximum size of stack. C CSTKSIZ Character string version of STKSIZ C INTEGER LSTUSD,STKSIZ,MAXUSD,STACK,ALLSTK,LSTCUSD,CSTKSIZ, 2 MAXCUSD,ALLCSTK PARAMETER (STKSIZ = 100000, CSTKSIZ = 100000) COMMON /STACK/ LSTUSD,MAXUSD,LSTCUSD,MAXCUSD, 2 STACK(STKSIZ) REAL RSTACK(STKSIZ) EQUIVALENCE (RSTACK(1),STACK(1)) CHARACTER*1 CSTACK COMMON /STACK_ST/ CSTACK(CSTKSIZ)
LSTUSD
and LSTCUSD
hold the index of the last
element used in each stack; they are the stack pointers. STKSIZ
and CSTKSIZ
hold the size of the stacks which is used for
overflow checking. MAXUSD
and MAXCUSD
accumulate the
maximum value of the stack pointers so that one determine how much
storage must be dimensioned for the stack without wasting any space. As
one can see, the stack must be dimensioned to some appropriately large
size, and all local storage requirements will be taken from it. If the
stack is not big enough for the needs of the program, the stack must be
redimensioned. Changing the size of the stack is far easier than
redimensioning every array that would have declared locally.
There are some valuable techniques for using the character
string stack. Note that the stack is declared as an array of
CHARACTER*1
. This was done to allow any size character string
because on some machines (e.g. VAX/VMS), strings are limited to 32767
characters. When an allocated character string is passed to a
subroutine, the space should be referenced as a character substring of
the allocated length. Then, the subroutine will see the length of
character string as the allocated length, and subscript range checking
is simplified. If a character string array is allocated, the passed
length should still be the length of the string element. Please note
that if the passed length derives from a constant, then most compilers
will detect the apparent substring range violation. Therefore, it will
be necessary to allocate an extra integer variable, assign the constant
to it, and use the integer variable in the subroutine call.
To refer to the stack in a subroutine, use a #include
statement
and refer to the file `STACK.FCM'.
For convenience in debugging, the stack is also declared as a
REAL array using an equivalence statement.
The stacks are limited in size, and it is not memory effective to increase their size greatly. Therefore, if you have large local storage needs, use heap storage, see section Heaps, for the really big arrays.
A number of subroutines and functions are provided to perform all the necessary manipulations of each stack. They are found in the source code file, `util.flx'.
INISTK
To initialize the stacks, one must
call INISTK with no arguments. INISTK sets all the stack variables appropriately,
and if DBG_ALLSTK
is greater than zero (it normally is), the
stacks are filled with non-zero elements.
The initial
filling of the stacks helps to catch error which occur because local
storage is not properly initialized.
ALLSTK
and ALLCSTK
To obtain storage on the stacks, one uses the integer function
ALLSTK
or ALLCSTK
. Both functions take one argument, the
number of elements required, and returns the index into the array,
STACK
or CSTACK
, of the first element allocated. In the
case of ALLSTK
, the elements are integers. The function,
FSIZEOF
, should always be used to calculate the number of
integers required for a given data type because the number of integers
is machine dependent. Never do this calculation in your own
code! Also, ALLSTK
will round your request up to the
nearest multiple of 2 so that double precision alignment will be
maintained. In the case of ALLCSTK
, the elements are characters. If
you are allocating space for a character string array, pass
INICSTK
the product of the array length times the string lengths.
The functions revise the stack pointers, LSTUSD
or
LSTCUSD
, to indicate the allocation, and they check for stack
overflow. Stack overflow results in the termination of the program.
Negative arguments are not permitted to these functions. Note that
common block predeclares this function for you.
FRESTK
and FRECSTK
To free storage on the stack, the subroutines FRESTK
or
FRECSTK
are used. FRESTK
is used for the numeric data
stack, and FRECSTK
is used for the character string stack. They
each take one argument, the number of elements to be freed. They
decrement the stack pointers by their arguments to indicate the release
of storage, and they check to see that the stack pointers have not
become negative. If they have, an error message is output, and program
execution stops. Both routines check that their arguments are
non-negative. FRESTK will round its argument to the next
multiple of 2.
FSIZEOF
The integer function, FSIZEOF
, accepts two arguments,
TYPE
and SIZE
, and returns the number of integers needed
to store SIZE
variables of data type, TYPE
. The
TYPE
argument is a character string and may have one of the
following values: `REAL', `REAL*8',
`DOUBLE', `INTEGER', `INTEGER*1', `INTEGER*2',
`LOGICAL', `LOGICAL*1', `SHORT', `BYTE',
`CHARACTER', or `COMPLEX'. The lengths provided by
FSIZEOF
vary from machine to machine.
The function, PRINSTK
, may be used to print the current state
of the stack. The debugging variable, DBG_ALLSTK
, may be used to
debug stack usage. The default setting of 1 results in the initialization
of the stack to non-zero values. If DBG_ALLSTK
is set to 2 or greater,
all calls to the allocation and freeing routines will generate messages
displaying the argument to each call.
The main method for using the space allocated on the stack is as
follows: Suppose we have a subroutine, FOO
, which requires local
storage to work. Suppose further that FOO
takes two arguments,
A
and B
, and needs two working arrays, C
and
D
. The code in subroutine FOO
will call ALLSTK
twice; once for each work array needed. Since C
and D
are
names of the work arrays, the indices into the stack returned by
ALLSTK
are assigned to the variables C
and D
. A
second subroutine, FOO1
, is defined with four arguments, the
original two and two arguments which are work arrays. In FOO1
,
C
and D
are defined as arrays. Once FOO
has
finished allocating the work storage, it calls FOO1
as follows:
CALL FOO1(A,B,STACK(C),STACK(D))
FOO1
actually does the computational work for FOO
. Once
FOO1
completes, FOO
then calls FRESTK
to free the
work storage used, and it returns having completed its task. In summary,
referencing the work arrays is accomplished by using a subroutine call
to map an array onto the stack.
A second method exists for accessing local storage on the stack. This involves using subroutines which access one element of an array at a time. They are not very useful for the stack, but they are discussed further in the section on data structures in the heap, see section Data Structures on the Heap in CONGEN.
To illustrate the use of the stack, we show how the matrix
squaring subroutine is programmed. The main program sets up the stack
and calls the routine which is named MSQUAR
.
PROGRAM TSTMSQ C C TESTS MSQUAR, A MATRIX SQUARING ROUTINE. WE READ THE MATRIX IN C FROM UNIT 5 AND OUTPUT ITS SQUARE ON UNIT 6. THE SIZE OF THE TEST C MATRIX, A, IS 10 BY 10. C # include "stack.fcm" C REAL A(10,10) C CALL INISTK READ (5,100) ((A(I,J),J=1,10),I=1,10) 100 FORMAT(10F8.0,9(/10F8.0)) CALL MSQUAR(A,10) WRITE (6,200) ((A(I,J),J=1,10),I=1,10) 200 FORMAT(' RESULTS OF CALLING MSQUAR -- THE MATRIX A:'/ 2 10(/1X,1P10G11.4)) CALL PRINSTK(6) STOP END SUBROUTINE MSQUAR(A,N) C C SQUARES A WHICH IS A REAL ARRAY ASSUMED TO BE N BY N. WE USE THE C STACK TO STORE THE SQUARED ARRAY. C REAL A(N,N) # include "stack.fcm" INTEGER SQUARE INTEGER OLDLST C C WE ALLOCATE SPACE FOR THE SQUARED ARRAY. NOTE THAT WE SAVE THE C VALUE OF LSTUSD UPON ENTERING THIS SUBROUTINE. THIS MAKES IT C EASIER TO FREE THE AMOUNT OF STORAGE ALLOCATED BY EVERY ALLSTK C CALL WHEN THE SUBROUTINE QUITS. C OLDLST=LSTUSD SQUARE=ALLSTK(N*N) CALL MSQUA1(A,N,STACK(SQUARE)) CALL FRESTK(LSTUSD-OLDLST) RETURN END SUBROUTINE MSQUA1(A,N,SQUARE) C C MSQUA1 DOES THE JOB OF SQUARING THE MATRIX A. WE CAN NOW REFER TO C THE CONTENTS OF THE TEMPORARY ARRAY, SQUARE, DIRECTLY. C REAL A(N,N),SQUARE(N,N) C DO 1 I=1,N DO 1 J=1,N S=0.0 DO 2 K=1,N 2 S=S+A(I,K)*A(K,J) 1 SQUARE(I,J)=S C C NOW COPY SQUARE BACK INTO A C DO 3 I=1,N DO 3 J=1,N 3 A(I,J)=SQUARE(I,J) RETURN END
JOINWD
In this example, we show the use of the character string stack.
JOINWD
is used to insert one character string, WD
,
onto the beginning of a second character string, ST
.
Stack space is required because we wish to avoid an overlapped copy.
JOINWD
first copies ST
to local storage, copies WD
to ST
followed by space, and then copies the old value of ST
back into ST
. Note the use of a substring reference into CSTACK
in order to provide COPYST
the correct size of the string
it is copying into.
SUBROUTINE JOINWD(ST,STLEN,WD) C C DOES NEARLY THE INVERSE OF NEXTWD, I.E. JOINS WD ONTO C THE BEGINNING OF ST. C IMPLICIT INTEGER(A-Z) CHARACTER*(*) ST,WD # include "stack.fcm" C STL = STLEN WDLEN = LEN(WD) COPY = ALLCSTK(STL) CALL COPYST(CSTACK(COPY)(1:STL),COPYLN,ST(1:STLEN)) CALL COPYST(ST,STLEN,WD(1:WDLEN)) IF (COPYLN .GT. 0) CALL ADDST(ST,STLEN,' ') CALL ADDST(ST,STLEN,CSTACK(COPY)(1:COPYLN)) CALL FRECSTK(STL) RETURN END
The stack is a very efficient means of providing dynamic storage
for a great variety of uses. However, its limitation to local storage
presents a problem when a subroutine must allocate storage for use by
the routine which calls it. An example where the stack cannot be used is
as follows: Suppose we wanted to write a routine, NBO
, which finds all
close non-bonded interactions, and suppose further that we wanted NBO
to
allocate the storage itself without any preset limitations. The stack
cannot be used for this because the storage being allocated for the
non-bonded list is not local; it must be used by the subroutine which
calls NBO
.
To circumvent this problem, we need a storage mechanism which allows completely arbitrary allocation of storage. Such flexibility is obtained with a heap.
A heap consists of an array and a list. The array is used for storing data. The list, referred to as the free list, keeps track of what areas in the heap are free. Before the heap is used, the free list is initialized to indicate that the entire heap is free. When space is required, the free list is searched for a space large enough to accommodate the request. The free list is then modified to indicate that less storage is available. To free storage, the freed area of storage must be put back on the list and recombined with adjacent free areas if possible.
As with the stacks, CONGEN provides two heaps, one for numeric
data and one for character string data. The numeric data heap is managed
entirely within Fortran, but the character string heap is really nothing more
than the heap maintained by the C run time library routines malloc
and
free
. There is a character string heap array, and the pointers
returned by malloc
are mapped into the address for the character
string heap.
In CONGEN, the heaps are implemented using three common blocks.
HEAPCM
stores numerical information and the numeric heap,
SVHEAP
stores a copy of critical heap pointer for error
detection, and HEAP_ST
stores the initial address for the
character string heap. All of the declarations are in the files,
`heap.fcm' and `heap.h', and the appropriate file should be
#include
'd wherever the heaps are used. The real number array,
RHEAP
and the logical array, QHEAP
, are equivalenced to the HEAP
so references to other data types are simplified. (It should be noted
that the use of these equivalences presumes that these data types have
the same sizes as integers. This presumption may not always be true.)
There are a number of subprograms which simplify the use of the heaps.
INITHP
To initialize the heap, INITHP
is used. First, one sets HEAPSZ
to the dimension of the heap, and then, INITHP is called with no
arguments.
ALLHP
and ALLCHP
To obtain storage on the heap, we use the integer function
ALLHP
. ALLHP
is called with one argument, the amount of
storage needed on the HEAP
array, in units of INTEGER
s.
You should always use the INTEGER
function FSIZEOF
, see section Stacks, to
determine the number of integers required for particular data type and
number of elements. ALLHP
then returns the index of the first
element of the array allocated in the heap. This index shall be referred
to as the base of the allocated storage.
ALLCHP
works in a similar way to ALLHP
except that it
operates on the character string heap. The argument to ALLCHP
is the number
of characters to be allocated, and the return value is the index into CHEAP
for the storage. Internally, ALLCHP
is implemented using malloc
.
FREHP
and FRECHP
To return storage to the heap, one calls FREHP
or
FRECHP
. Unlike FRESTK
, FREHP
and FRECHP
require two arguments, a base and a length. FREHP
and
FRECHP
return the storage demarked by the base and length to
their respective free lists. If this storage can be recombined with
adjacent free areas of storage, it is done.
ALLHP
, ALLCHP
, FREHP
, and FRECHP
make extensive checks on their arguments
and on themselves. Since these two routines are vital to the operation
of CONGEN, they are designed to catch many programming errors that can
result from using the heap. In addition, there are debugging variables that
can be used to improve the error detecting capabilities of the heap
subprograms. See the source code for the usage of the debugging variables
alloc
and allhp
.
PRINHP
To obtain a listing of the free storage areas available on the
heap, PRINHP
is available. It is called with one argument, a unit number
to which the output is sent.
Referencing the data in the heap can be done using the same techniques as described for the stack. However, a more sophisticated method of referencing data on the heap is the subject of the next section.
To illustrate the use of a heap, consider the following example: Suppose we want a subroutine which reads in a list of card images from a file and stores them into an array on the heap. Assume that the number of cards which can be read is to be limited only by the space available on the heap.
To implement this, we assume that we can store 4 characters to a
word and that our cards have 80 columns. The card array is then
dimensioned (20,N)
. The subroutine which is called to get the card
images is called RDCARD.
PROGRAM TSTRDC C C This program tests RDCARD. It initializes the HEAP, calls RDCARD C on unit 5, prints the card images, and then prints the HEAP. C # include "heap.fcm" INTEGER CRDBAS,CARDSZ C CALL INITHP CARDSZ = 80 CALL RDCARD(CRDBAS,5,NCARD,CARDSZ) CALL PRTCRD(CHEAP(CRDBAS)(1:CARDSZ),NCARD) CALL PRINHP(6) STOP END SUBROUTINE RDCARD(BASE,UNIT,NCARD,CARDSZ) C C Reads all the card images from UNIT into the HEAP. BASE is C returned giving the first word in the HEAP where the card images C are stored. NCARD returns the number of cards found. C INTEGER BASE,UNIT,NCARD,START,END,CARDSZ # include "heap.fcm" LOGICAL EOF INTEGER INC PARAMETER (INC = 100) C C Initially, we allocate space for INC cards. If this is exceeded C during the reading of the file, space for an additional INC cards C will be allocated, and the reading will continue. C LIMIT=INC BASE = ALLCHP(LIMIT*CARDSZ) NCARD=0 C C RDCAR1 actually reads the file, putting card images into the HEAP. C REPEAT UNTIL (EOF) CALL RDCAR1(CHEAP(BASE)(1:CARDSZ),NCARD,LIMIT,UNIT,EOF) UNLESS (EOF) C C At this point, we know that RDCAR1 ran out of space. We allocate a C array larger in size and copy the cards already obtained into the C new array, and then free the old array. C NEWBAS = ALLCHP(CARDSZ*(LIMIT+INC)) CALL COPYCA(CHEAP(NEWBAS)(1:CARDSZ),LIMIT+INC, 2 CHEAP(BASE)(1:CARDSZ),LIMIT) CALL FRECHP(BASE,CARDSZ*LIMIT) LIMIT = LIMIT+INC BASE = NEWBAS FIN FIN END SUBROUTINE RDCAR1(CARDS,NCARD,LIMIT,UNIT,EOF) C C Reads cards from UNIT into CARDS, setting EOF if end of file is C found. If the routine returns with EOF not set, it means that C space was exhausted in the card array. C CHARACTER*(*) CARDS(LIMIT) INTEGER UNIT LOGICAL EOF C EOF = .FALSE. REPEAT WHILE (NCARD .LT. LIMIT) NCARD=NCARD+1 READ (UNIT,100,END=99) CARDS(NCARD) 100 FORMAT(A) FIN RETURN C C Come here on end of file. We must decrement NCARD by one because C the READ statement did not add another card image to the array. C 99 NCARD = NCARD-1 EOF = .TRUE. RETURN END SUBROUTINE PRTCRD(CARDS,NCARD) C C Prints the card images in CARDS onto unit 6, the printer. C CHARACTER*(*) CARDS(*) C WRITE (6,200) NCARD 200 FORMAT('0OUTPUT OF ',I5,' CARDS:') DO (I=1,NCARD) WRITE (6,201) CARDS(I) 201 FORMAT(1X,A) FIN RETURN END SUBROUTINE COPYCA(ST1A,NST1A,ST2A,NST2A) C C Copies strings from ST2A into ST1A. The lengths of the arrays are C given by NST2A and NST1A, respectively. C IMPLICIT INTEGER(A-Z) CHARACTER*(*) ST1A(*),ST2A(*) INTEGER NST1A,NST2A C DO (I = 1,MIN(NST1A,NST2A)) ST1A(I) = ST2A(I) FIN RETURN END
Fortran, the language that CONGEN is implemented in, suffers from a lack of flexibility with respect to the storage of data. The structuring of data is limited to scalars, complex numbers, and arrays. For many programming tasks, there is a need for more complicated structures for holding data. For example, in CONGEN, the protein structure file, the PSF, consists of a number of arrays and scalars that collectively describe the structure of a macromolecule. It would very useful to be able to refer to the PSF as a whole as opposed to referring to all of its constituents.
This inability to refer to a number of related scalars and arrays by one name results in a number of problems. Passing the information to a subroutine requires either using a large number of parameters or common blocks.
Using large numbers of parameters makes subroutine calls very bulky and increases the chances of errors resulting from the mistyping of a parameter. On the order of 100 parameters are required to call the analysis subroutine.
Common blocks could be used to pass information to a subroutine. However, these require allocating all needed space at compile time. Also, having more than one instance of a data structure is difficult. For example, the analysis facility requires a second PSF, coordinate set, and other data structures. Without solving this data structure problem, we need to establish another huge set of scalars and arrays. Further, as the program is changed, this second set would have been modified as well which would have a great potential for programming errors. These problems required a better method be introduced to handle the use of data structures in Fortran.
Having decided to implement a more succinct method for using data structures, we must consider other design criteria which would be useful to meet. First of all, the elements of each data structure must convenient to access, and the clarity of the program must not be unduly restricted. Next, since we have a mechanism for dynamic storage allocation, data structures should use only as much space as they need and should be able to grow. Third, the sharing of readonly data is desirable.
Given these design goals, how are they implemented for use in CONGEN? First, the heaps are ideally suited for this task as it allows completely arbitrary use of storage. Scalars are assumed to require one word of storage on the heap. Arrays will take as many words as their dimension and data type requires. Character strings are stored in the character string heap.
To keep track of the location and data types of elements in a
data structure, three arrays; the base, length, and type arrays; are
used. The base array stores the index of the first word or character of
each element in the data structure. The length array gives either the
total number of integers required for the element or the total number of
characters, depending on the data type. The type array specifies
the data type of the element. If the type array contains a positive
number, this indicates that the element is a character string, and the
number in the type array specifies the length of the character strings.
Otherwise, if the number is equal to the parameter, DT_NUMERIC
, then
the element is numeric. Any other values are illegal.
Multiple instances of a data structure
require multiple sets of arrays.
The naming scheme for the base, length, and type arrays is to prefix a
`B', `L', or an `T', respectively,
in front of the data structure name.
To identify the word in the base array which corresponds to an element in the data structure, we use a common block which is named for the data structure. If the data structure contains n elements, the common block of indices contains n+1 variables. The first variable gives the size of the base array needed to hold one instance of the data structure. The remaining variables all have the same name as the element in the data structure. Before proceeding any further, an example is required.
Consider a simplified and slightly contrived version of the
data structure for storing
the non-bonded list. Two arrays; JNB
and INBLO
; six scalars; NNNB
,
CTONNB
, CTOFNB
, CUTNB
, WMIN
, and NBOPT
; and
one character string, NAME
are needed. We will use the
name, NBND
, for the data structure. The base, length, and type arrays
for this data
structure are called BNBND
, LNBND
, and TNBND
, respectively,
and the following declarations in a
subroutine would suffice to use parts of the non-bonded list for a
calculation:
IMPLICIT INTEGER(A-Z) # include "heap.fcm" COMMON /INBND/ SNBND,JNB,INBLO,NNNB,CTONNB,CTOFNB,CUTNB, 2 WMIN,NBOPT,NAME INTEGER BNBND(SNBND)
To make reference to the scalar NBOPT
, one would write HEAP(BNBND(NBOPT))
.
To make reference to the character string, NAME
, one would write
CHEAP(BNBND(NAME))(1:TNBND(NAME))
. The substring reference
serves to specify a legitimate length for the name, and would be accessible
in a called subroutine as the length of the string.
Three points must be noted before we continue. First, we can
store any type of data in these data structures, provided that the
numeric and character data are identified and separated.
All we must know is how
much space they require, which is provided by the FSIZEOF
function,
see section Stacks.
For example, DOUBLE
PRECISION
variables require two integers
per variable on most machines, and one integer on the Cray.
Second, we ensure that the index common blocks are the
same from one subroutine to the next by using an
#include
statement which is processed by the Fortran
C preprocessor, see section FCPP -- The Fortran C Preprocessor. An #include
statement takes a file name and inserts
the contents of the file in the place of the #include
statement. By
setting up files which contain the common block definitions and using
#include
statements exclusively for getting the common block, we can
ensure that the common blocks are the same.
Third, in general, one cannot access the data directly. The
cases where direct access is possible is for numeric data whose size is
the same as an INTEGER
, such as REAL
s and LOGICAL
s.
Since the heap is declared as INTEGER
, we make a direct
reference. Single precision real scalars can be accessed directly using
RHEAP
, and logicals can be reference using QHEAP
.
References to any other parts of the data structure can be
done using two general techniques.
The first of these reference techniques is to call a subroutine
with the desired element of the data structure as a parameter. In the
subroutine, one declares the parameter for what it is, and one can make
references as desired. For example, if we wish to access the JNB
array
in the non-bonded data structure, we can make a subroutine call as
follows:
CALL SUB(HEAP(BNBND(JNB)))
and declare in the subroutine
SUBROUTINE SUB(JNB) INTEGER JNB(1)
and make references to any element of the JNB
array. This is the same
method as described for stacks. The overhead involved with this scheme
is that one must create a subroutine for the references.
The second technique which has a much higher overhead, but
doesn't require a new subroutine is to use referencing and storage
functions which have been written for CONGEN and which are found in the
source file `array.flx'. The referencing functions take an array
and indices into it and return the value of the specified element. For
example, the function REFI
accepts two arguments, an integer
array, A
, and an index, I
. REFI
returns
A(I)
. The storage functions take an array, indices, and a value
and store the value into specified element. For example, the subroutine
STORI takes an integer array, A
; its length, N
; and index,
I
; and a value, VALUE
, and executes the following
statement: A(I)=VALUE
. The higher overhead of this method results
from the fact that a subroutine call is required for each array
reference.
Next, what subroutines and functions are provided to assist in the use of these data structures? There are many which cover most of the needs of CONGEN.
STRTDT
and SETUPI
The subroutine STRTDT
is used to initialize the common blocks.
Whenever a new data structure is defined or an old one is changed in
size, STRTDT
must be changed to accommodate this. STRTDT
calls SETUPI
.
The subroutine, SETUPI
, is used to initialize the
index common blocks. It depends on being able to equivalence the integer
variables in the index common block to an integer array by using
a subroutine call.
INITDT
The subroutine INITDT
is used to initialize a base array. Before
a base array can be used, it must be initialized to the free state. The
free state means that all bases are set a value which cannot point into
the heap. This value, known as the free value, is a sufficient large
negative number such that any references using it will cause an access
violation which is useful in catching programming errors before they do
subtle damage.
ALLDT
The subroutine ALLDT
is used to allocate space for a data
structure on the heaps. It is called with filled length and type arrays and a
free base array. The contents of the length array is summed to get the
space requirements for the data structure. ALLHP
and ALLCHP
are then called to get
space on the heaps for the data structure. Finally, the bases are all set
to point to their part of data structure. Lengths of zero are permitted.
In these cases the bases for the element will be left as the free value.
This will catch any illegal references.
FREEDT
The subroutine FREEDT
is used to free a data structure. The data
structure must be allocated before it can be freed. The base array is
set to the free value.
USEDDT
The logical function USEDDT
is used to determine if a base array
is free or is pointing to a data structure. It returns .TRUE.
if the
base array passed to it is in use.
RESIZE
RESIZE
is used to change the size of any of the elements in a
data structure. It is called with a base, length, and type array and another
trio of arrays which give elements to be changed and the new sizes. Note that
only the lengths of character string elements can be changed, data types of
elements may not be.
RESIZE
allocates space for a copy of the data structure with new sizes;
copies the old data into the new structure; and then frees the old data
structure.
READDT
and WRITDT
READDT
and WRITDT
may be used to write a data structure
in binary form to a file. The trio of arrays must be specified along
with a Fortran unit number, HDR
and ICNTRL
information, and a
title. READDT
will read what WRITDT
creates.
DUPLDT
DUPLDT
is used to duplicate a data structure. It simply
makes a complete copy of the arrays and the data stored in the heaps.
ASGNDT
With the subroutines used to implement these data structures explained, we can return to a discussion of other implementation details. The next issue is the question of sharing data. Because the base arrays are pointers, there is nothing to prevent two base arrays from pointing to the same data. Obviously, this can lead to great problems if this occurs unintentionally, but one can save space and underscore the identity of data if we allow sharing of read-only data.
Implementing sharing is very easy with the above scheme. We reserve the first element of the base array to be a pointer to the reference count for a particular instance of a data structure. When the data structure is allocated, the reference count is set to 1. If we assign another data structure to this instance by copying its bases, the reference count is incremented by 1. If we free the data structure, the reference count is decremented by one. When the reference count drops to zero, the space on the heap is freed. The use of the first element of the base arrays has an additional benefit because the length of the reference count is 1 word. This fact ensures that the first word in the base array cannot have a free value when the base array is in use.
The subroutine, ASNGDT
, implements the assignment of data
structures which are to shared. It will take a base and length array and
copies them to another base and length array. The reference count for
the data structure will be incremented by one.
The final detail to be discussed is the interface between this scheme for manipulating data structures and the usual scheme. Currently, the analysis subroutine and non-bonded list uses this more concise means for handling data structures; the rest of CONGEN works with the separate arrays and scalars. When the analysis subroutine is invoked, the data structures must be built. Copying subroutines exist for the five data structures passed to the analysis subroutine. These copying subroutines will take the arrays and scalars which comprise a data structure and build a data structure on the heap for it. Once the copying operation is complete, the actual analysis subroutine can be called and it can freely use the data structures existing on the heap.
The copying operation presents a naming problem in that the indices kept in the index common blocks have the same names as the arrays and scalars. To circumvent this problem, a second set of index common blocks is kept for all data structures which are not isolated in the analysis subroutine. This second set of index common blocks have the same common block names as the first, but the names of the indices are prefixed with `I'.
A good example of the use of these data structures is the
subroutine NBONDS
and some of its subsidiary routines in the file,
`nbonds.flx'.
One final aspect of these storage schemes must be considered --- when should they be used? It is clear that for new work in CONGEN, C should be used for programming because it provides all these facilities in a straightforward manner. (CHARMM and CONGEN would have been more advanced had C be used from the beginning.) However, if new Fortran code must be integrated or maintained, then these schemes are applicable. The stack and the data structure storage on the heap were developed primarily to deal with an enormous number of arrays whose sizes must be able to change. The other design criteria came later. It should be obvious that one pays a price for these schemes, and one must determine whether the benefits are worth the cost.
One disadvantage of these methods is that they are unfamiliar, and therefore, they take time to learn. Unfortunately, a subjective judgment is required to estimate how large a disadvantage this is. In my opinion, this chapter will enable an interested person to learn this material in a day or two. The subroutines to use these schemes are sufficiently comprehensive to permit effective use of these schemes without additional programming. Further, the ideas presented in this document are worth learning as they are valuable outside the context of CONGEN. Besides its application for local storage, the stack is useful for problems which require backtracking or which can be solved recursively. The heap is useful for problems which involve management of complex lists.
A second disadvantage of these schemes is that they are more cumbersome than the techniques they replace. There is no arguing this point. However, when used with the conventions described here, clarity is not sacrificed.
Let us now consider the advantages. We shall confine ourselves to the analysis facility where they are used extensively.
First, as the size of a system being analyzed grows to the limits of what is supported, one need only change the size of the stack and the heap to accommodate a bigger system. When a subroutine has as many arrays as the analysis facility has, this is vital.
Next, although references to parts of a data structure are more cumbersome, referencing a whole structure is much more concise. The subroutines in the analysis facility illustrates the compactness of the subroutine call despite the large amount of information that is being passed. Further, the inclusion of the comparison data structures is trivial; we merely have to create another set of base and length arrays.
Thirdly, the data structures can be referred to as a unit. This aids the job of programming because now we can deal with bigger units of information without having to worry about the details. These data structures on the heap allow a programmer to ignore their contents until they are ready to be used. This can simplify the task of modifying a program the size of CONGEN.
Go to the previous, next section.