Go to the previous, next section.

Selection and Plotting Commands

The generation of plots in the analysis facility is done using a data structure known as a selection. A selection consists of a list of data points associated with a position number, residue number, and a residue name (more detail is available in the description of the SELECT command below and the introduction, see section An Overview of the Analysis Facility). Data for a selection comes from the table, see section Table Manipulation Commands. All plots are generated from the selection. Two such data structures are provided allowing scatter plots to be produced as well.

SELECT Command -- Select Data from the Table

Syntax

SELECT [ADD] [TWO] {      ALLTAG            }
                   {TAGS repeat(tag-pat) del}

       [PROPERTY property del]

tag-pat ::= string of non-blank characters

Function

The SELECT command takes data from the table and places it in a selection. Data is specified by its tag and its property. Only one property may be specified, and one must specify a property if the table has more than one. Either all the entries in the can be selected by specifying ALLTAG or only particular data using the TAGS option to give the tags. The tags are specified using string patterns with wildcards permitted as in the atom or cell selection syntax, see section Atom Selection. If TWO is placed in the command, then the data goes into the second selection; otherwise, it goes into the first selection. If ADD is specified, the data is added to the data already present in the selection. Otherwise, the selection is cleared first before data is added to it.

In each selection, every data point has the following associated information: a position number, a residue number, and a residue name. The position number is obtained by counting data points as they are scanned in the table and setting the position number to the current count. By printing a table in tag value format, one can see the order in which the data is scanned. The residue number is generated in the same way except the count is kept on residues. The residue name is the name of the residue where the data point was taken. Any comparison residue name is ignored. For standard tables to which no deletions have been applied, the residue numbers will be reasonable in that they will be close to the residues position in the sequence. However, in a difference table where not all the atoms matched, the number will not be meaningful by themselves. Likewise, in a table where deletion of residues has taken place, the residue numbers will bear no resemblance to the residue identifiers.

HISTO Command -- Print a Histogram

Syntax

HISTO [TWO] [RANGE range] [TITLE string del]

Syntactic-ordering: TWO, if specified, must come first.

Function

The HISTO command will make a printable histogram from the numbers in a selection. TWO specifies that the second selection should be used; it absence indicates that the first selection should be used. If RANGE is not specified, the program will use the minimum and maximum values in the selection as the extents of the histogram, and will use as many lines as will fit on one printed page. If RANGE is specified, the first two numbers give the minimum and maximum extent of the histogram, and the third number gives the number of lines in the histogram. If the third number is less than 2, then the number of lines will be set to fill the page completely. Finally, the TITLE option allows one to specify a title which will appear at the top of the histogram.

PLOT Command -- Plot Data Against Position or Residue

Syntax

           [RESIDUE]
PLOT [TWO] [ CODE  ] [UNIT unit-number] [TITLE string del]
           [NUMBER ]

     [VERTICAL range] [HORIZ range]

Syntactic ordering: The first two options must come first.

Function

The PLOT command allows one to plot data in a selection against residue number or position number. The plot can go to the printer or an output file may be created which is suitable for the plotting program, PLT2, invoke man plt2 for more information. The residue number is plotted on the y axis (vertically) and data point in plotted on the x axis (horizontally).

The specification of TWO in the command directs that the second selection's data be used; otherwise, the first selection is used. Omitting the next option or specifying RESIDUE causes a plot by residue number to be used. Specifying NUMBER directs that a plot by position number be set up. Specifying CODE says that a plot by residue number plot should be output and that the plot symbols be the one letter amino acid codes for the residues. D amino acids are plotted using lower case letters. The specification of the UNIT option causes the plot data to be output on the specified unit; otherwise, the output goes on the printer. Note that CODE is meaningless when UNIT is specified since the code are not output with the numbers.

The VERTICAL option gives the minimum and maximum labels and number of lines for the vertical axis. If not specified, the range is set up so that each data point gets a line and the labels are set up in reverse order (smaller number on top). If the number of lines is specified as zero, then CONGEN will set it to the maximum number that will fit the page, see section SET -- Modify Analysis Facility Variables, for a description of the PAGESZ variable.The HORIZ option gives the minimum and maximum values for the horizontal label as well as the number of columns to be used. The number of columns may be adjusted down to make the plot fit in the available line size (see section SET -- Modify Analysis Facility Variables, to change this number.) If the number of columns is specified as a zero, then the number of columns will be set to maximally fill the page. The TITLE option allows one to specify a title to appear at the top of the plot.

2DPLOT Command -- Make a Scatter Plot

Syntax

       [RESIDUE]
2DPLOT [ CODE  ] [UNIT unit-number] [TITLE string del]
       [NUMBER ]

       [VERTICAL range] [HORIZ range]

Syntactic ordering: The first option must appear first in the command.

Function

This command is used for plotting the second selection as a function of the first selection. Each selection has residue numbers and position numbers associated with each point. These numbers can serve as identifiers for the data points making it possible to relate the data points in two selections and produce matched pairs of data points. Plots of these pairs can be used to demonstrate correlations or patterns between the two selections (for example phi-psi or Ramachandran plots).

The first option in the command determines whether matching will be done by residue number or sequence number. If all of the data in both selections is uniquely identified by the specified identifier then a rapid algorithm is used to match the data. If the data is not uniquely identified then the match will use the order of occurrence in selections to resolve ambiguities. For example if the first selection was generated by selecting all dihedral angles, then the selection will be unique by sequence number but not unique by residue. If the second selection also contained all of the dihedrals from a second structure, it would not be unique by residue either. If NUMBER is specified then the two lists, then a unique mapping of one selection onto the other exists (although if the sequences of the two structures were different it may not be the desired mapping). Conversely, RESIDUE will result in multiple possible matchings of which one must be chosen to produce a plot.

The routine resolves this ambiguous mapping situation by matching the first available data for a residue (or number) in selection one with and first available data for the same residue (or number) in the second selection. In the example above RESIDUE mapping will align the first dihedrals of each residue etc. If a sequence mismatch exists between the structures and the mismatched residues contain different numbers of dihedrals, then for that residue pair the first dihedral will be matched with the first and so forth until all of the dihedrals in one residue are matched. Note that the extra data will now be ignored rather than offsetting all subsequent matches. NUMBER mapping would align the first dihedral in each structure etc., but would misalign all of the data past the mismatch.

CODE results in residue matching and the use of one letter amino acid codes for plot symbols. The default for the first option is RESIDUE.

The x (horizontal) axis is used for numbers from the first selection; the y (vertical) axis is used for number from the second selection.

If UNIT is specified, the pairs are output to the unit and no printer plot is made. Otherwise, the plot goes on the printer.

TITLE gives a title that appears at the top of plot.

VERTICAL and HORIZ can be used to specify the vertical and horizontal ranges and sizes of the plot. If unspecified, the limits will be set to the minimum and maximum in the data and the number of columns will set as large as will fit on a line. If specified, the first two numbers give the limits, and the third number gives the number of rows or columns to be used in the plot. The number of rows or columns may be reduced to get the plot to fit on the page. If either the number of rows or columns is specified as zero, then they will set to fill the page maximally. See section SET -- Modify Analysis Facility Variables, for a description of the PAGESZ and LINESZ variables which control this.

A rudimentary statistical analysis of the data in the selections is included with the plot.

Go to the previous, next section.