AutoStructure-1.0 HowTo?

How to install AutoStructure?
How to set up pc-clusters for parallel DYANA calculation?
How to run AutoStructure?
How to prepare input experimental data set?
How to prepare control-file and input date set?
    General section
    Command section
    PeakList section
How to analysis homo-dimer protein?
How to understand the output_dir?
How to measure M-score and average shift?
How to measure I, L scores?
How to examine input data quality?
How to understand and exam the secondary structure file?
How to run AutoStructure in a practical way?
    Ways to exam number of assigned and unassigned peaks in CYCLE1-0, 1-1, ...

How to install AutoStructure?

Update the following enviorment variables defined in the script file 'bin/autostructure':

RASMOL rasmol command (viewer command, optional)

DYANA dyana command

DQS command to run parallel calculation (optional)

ASBIN path of AutoStructure bin directory

NALIB path of NOESY_Assign liberary

AyudaPATH path of PDBSTAT liberary (optional)

Example: (the red-colored word need to be changed for your system)

#!/bin/sh
#rasmol command
RASMOL=/usr/bin/X11/rasmol
#dyana command
DYANA=/usr/local/bin/dyana
#command to run parallel calculation
DQS=/usr/local/DQS/bin/qsub332
#path of AutoStructure bin
ASBIN=/farm/software/AutoStructure-1.0beta/bin
#path of Noesy_assign lib
NALIB=/farm/software/AutoStructure-1.0beta/noesy_assign-1.0beta/Lib
#path of PDBSTAT lib
AyudaPATH=/farm/software/AutoStructure-1.0beta/PDBSTAT/Lib
#export
export NALIB
export AyudaPATH
export ASBIN
export DQS
export DYANA
export RASMOL
#start with noesy_assign
$ASBIN/noesy_assign $*

How to set up pc-clusters for parallel DYANA calculation?

'/DQS/bin/qstat -f' shows all jobs in the queue system.
This command will show you a list like this:
     Queue Name    Queue Type    Quan Load          State
     ----------    ----------    ---- ----          -----
     bmw                              batch         1/1   1.01 er      UP
       roberto CalcStrBRCT SmtBRCT5.sh bmw          82863 04/10/102 17:52:13
     broccoli                         batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT16.s broccoli     82852 04/10/102 17:52:13
     cabbage                          batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT3.sh cabbage      82865 04/10/102 17:52:14
     carrot                           batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT2.sh carrot       82866 04/10/102 17:52:14
     corn                             batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT10.s corn         82858 04/10/102 17:52:13
     cucumber                         batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT14.s cucumber     82854 04/10/102 17:52:13
     eggplant                         batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT12.s eggplant     82856 04/10/102 17:52:13
     falcon                           batch         0/1   0.00 er      UP
     ferrari                          batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT15.s ferrari      82853 04/10/102 17:52:13
     garlic                           batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT8.sh garlic       82860 04/10/102 17:52:13
     jaguar                           batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT4.sh jaguar       82864 04/10/102 17:52:14
     lettuce                          batch         1/1   1.00 er      UP
       roberto CalcStrBRCT SmtBRCT1.sh lettuce      82867 04/10/102 17:52:14
     lotus                            batch         1/1   1.01 er      UP
       roberto CalcStrBRCT SmtBRCT9.sh lotus        82859 04/10/102 17:52:13
     olive                            batch         0/1   0.00 er      UP
     onion                            batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT11.s onion        82857 04/10/102 17:52:13
     porsche                          batch         1/1   1.10 er      UP
       roberto CalcStrBRCT SmtBRCT7.sh porsche      82861 04/10/102 17:52:13
     potato                           batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT13.s potato       82855 04/10/102 17:52:13
     spinach                          batch         0/1   0.00 er      UP
     squash                           batch         0/1   0.00 er      UP
     tarzan                           batch         0/1   0.07 er      UP
     tomato                           batch         1/1   0.96 er      UP
       roberto CalcStrBRCT SmtBRCT6.sh tomato       82862 04/10/102 17:52:13

First column list all computers in the queue system (the farm). In this example, roberto (account name) is running 16 calculations on the farm.

How to run AutoStructure?

    1. prepare input data files
    2. prepare control file
    3. run 'bin/autostructure -c control-file -o output_dir'
    4. All results are in output_dir.

How to prepare input data set?

Sequence file --- in bmrb format (example) required
Resonance assignments --- in bmrb format (example) required
Peak lists (example) required
J coupling list --- in bmrb format (example) recommended
Slow NH exchange list (example) recommended
manual constraints --- in dyana format (examples: dihedral-angle constraints, upper limit distance constraints, h-bond constraints)
par.tbl (example)

How to prepare control-file? --- control file has General, Command and PeakList three sections.

control-file Example 1
1. General Section : the first part of control-file is the General section. This section gives the name of the protein, the names of the sequence file, resonance assignment file, J-list file, NH slow exchange file --> basicly all input files except peak lists. When manual analysis results such as upper limit distance constraints, dihedral angle constraints and h-bonds are avaiavible, these constraints may added in and used in structure calculation.
Keywords:

proteinName the name of the protein

seqFile the sequence file in bmrb format (example)

chemicalShiftFile the resonance assignment file in bmrb format (example)

JListFile J-List file (optional) in bmrb format (example)

NHSlowList NH slow exchange file (optional) (example)
AutoStructure use slow amide exchange data to determine secondary structures and identify hydrogen bonds.

ACO dihedral angle constraint file (optional) in DYANA format (example)

HBOND h-bonds file (optional) (example)

UPL upper-limit distance constraint file (optional) in DYANA format (example)

par parameter file (optional, for advanced usage)

nCycles nCycles is the maximum number of bootstrapping cycles. AutoStructure will stop when no more assignments can be made or after nCycles of bootstrapping.
When AutoStructure stops normaly, it will print 'The program is finished.' at the end of the *_NA.ovw file. Sometimes, when the queue system is unstable, the calculation may be stopped by the queue system or hung there forever. In these cases, 'The program is finished.' will not be printed out in the *.ovw file.
CYCLE*-0 is a bootstrapping cycle. In this cycle, new assignments are made based on the 3D structures from last iteration. The maximum number of bootstrapping cycles are specified by nCycles.
CYCLE*-1 (or CYCLE*-2, etc) is a validation cycle. In this cycle, assignments that are consistently and significantly violated with the 3D structures from last iteration are unassigned, and 3D structures are re-computed. No new assignments are made in the validation cycle. There are no limitation on the number of validation cycles.

Example 1: (the red-colored word need to be changed for your data set)

[General]
proteinName=FGF
#input files except peak lists
seqFile=INPUT/seq.bmrb
chemicalShiftFile=INPUT/chemicalshift.bmrbStereo
JListFile=INPUT/FGFJval.bmrb
NHSlowList=INPUT/NHSlowList
ACO=INPUT/FGF.acoManual
HBOND=INPUT/hbond.dyaManual
UPL=INPUT/FGF.uplManual
#you can comment this next line out if you want to use the default one
par=INPUT/par.tbl
#max. nunber of bootstrapping cycles
nCycles=20

Example 2:

[General]
proteinName=TMZIP
#input files except peak lists
seqFile=INPUT/sequence.bmrb
chemicalShiftFile=INPUT/chemicalshift.bmrb
JListFile=INPUT/TMZIP.Jvalloose
#you can comment this next line out if you want to use the default one
par=INPUT/par.tbl
#max. nunber of bootstrapping cycles
nCycles=20

2. Command Section: the second part of control-file is the command section. All the command script are in the $ASBIN directory. There can only be one line after each command entry which is treated as a shell command line and can be commented out, if not using it.
Keyswords:

viewCommand View is a script to run rasmol command (viewer command, optional)

hyperCommand hyper command, all option of hyper can be added at the end of -N (optional) (example)

dyanaCommand CreateProc is a script to run parallel DYANA computing over the DQS system. `CreatProc TMZIP x y z' means calculating x*y structures using x machines and selecting the best z. On each machine, there are y structures calculated.
CreateProcOne is a script that uses one cpu, no DQS system required. `CreatProcOne TMZIP 1 x y' means calculating x structures on one machine and select the best y.
Example:
`CreatProc TMZIP 14 4 10' means calculating 14*4 structures using 14 machines and selecting the best 10. On each machine, there are 4 structures calculated.
`CreatProcOne TMZIP 1 10 5' means calculating 10 structures on one machine and select the best 5.

cnsCommand a script to run cns command (under development, coming soon)

Example 1:

# there can only be one line after each command entry which is treated as a shell command line and can be commented out.
#here only dyanaCommand is actived
# dyanaCommand: calc structures on 14 machines, each calc 4 and select best 10
[viewerCommand]
#$ASBIN/View
[hyperCommand]
#$ASBIN/hyper -N
[dyanaCommand]
$ASBIN/CreateProc FGF 14 4 10

Example 2:

# there can only be one line after each command entry which is used as a shell command line and can be commented out.
# viewerCommand: for demo and view the 3d using rasmol
# hyperCommand: call for hyper
# dyanaCommand: calc structures on 14 machines, each calc 4 and select best 10
[viewerCommand]
$ASBIN/View
[hyperCommand]
$ASBIN/hyper -N
[dyanaCommand]
$ASBIN/CreateProc TMZIP 14 4 10

3. PeakList Section: Each peak list is an entry in control file.
Keywords:

dimension the dimension of peak list
dimension = 2 means that the peak list has hx1 and hx2 dimensions.
dimension = 3 menas that the peak list has hx1, x1 and hx2 dimensions.
dimension = 4 means that the peak list has hx1, x1, hx2 and x2 dimensions. Only 4D CC-NOESY is supported right now.

IC
haveIC for monomer: IC = 0, haveIC = 0
for homo-dimer ( detail see next section How to handle homo-dimer protein?) :
      IC = 0, haveIC = 0 means the NOESY peak list have only intra chain NOEs.
      IC = 0, haveIC =1 means the NOESY peak list have both intra chain and inter chain NOEs.
      IC = 1, means the NOESY peak list have only inter chain NOEs. It is a X-filtered experiment.

waterFlag if in water solution, waterFlag = 1;
if in D20 solution, waterFlag = 0;

sign sign=1 tells the program to use half-dwell sampling in C/N to filter the possible assignments list.
sign=0 tells the program no half-dwell sampling filter is applied.
Both positive and negative peaks are considered in both cases
The current version only support C/N half-dwell sampling filter in 3D spectrum. The coming version(unreaseled yet) is different. It support half-dwell sampling filter in any H/C/N dimension.

iperc the noise lever = highest intenisty in the peak list * iperc.
All peaks that below this noise lever are not assgined by Unique method.
If not specified, the default value is used.

column col.intensity the intensity column

col.label the label column. Label is a comment string column for user. User can write any string on that column, such as NOE-assignments.
This column is read in but not used by NOESY_Assign.

col.id the id column

col.hx1 the hx1 column

col.hx2 the hx2 column

col.x1 the x1 column (not used for 2D NOESY, if dimension = 2) and hx1--> x1

col.x2 the x2 column (not used for 2D or 3D NOESY, if dimension = 2 or 3) and hx2-->x2

tol hx1.tol
hx2.tol
x1.tol
x2.tol Match tolerance for hx1, hx2, x1 and x2 dimensions in ppm.
x1.tol and x2.tol are not used for 2D NOESY
x2.tol is not used for 3D NOESY

sw hx1.sw
hx2.sw
x1.sw
x2.sw Sweep width for hx1, hx2, x1 and x2 dimensions in ppm.
It is used to determined all possible aliased chemical shift positions. In the 1.0beta version, only C/N aliasing are supported. In the coming version, H aliasing is also supported.
The program may run faster given a large sweep width for unaliased dimension, such as sw=1000 or 10000.

shift hx1.shift
hx2.shift
x1.shift
x2.shift `shift' is used to do global referencing.
If x1.shift=0.1, then all chemical shift in x1 dimension are added by 0.1ppm.
If your spectrum is well referenced with your resonance assignments, set all shift=0.

type hx1.type
hx2.type
x1.type
x2.type atom type for hx1, hx2, x1 and x2 dimensions.
type = H for proton
type = N15 for nigtron
type = C13 for carbon

Example : Peak list entries for a monomer protein (the red-colored word need to be changed for your dataset)

2D NOESY 3D N15-NOESY (peak list example) 3D C13-NOESY (peak list example) 4D CC-NOESY

[INPUT/2d.noesy]
# line above is the peak list file name
#it is a 3d noesy
dimension=2
IC=0
haveIC=0
#in h2o
waterFlag=1
#half-dwell sampling filter off
sign=0
# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=5
col.label=2
col.id=1
col.hx1=3
col.hx2=4

# the match tolerance, sweep width, global reference, atom type for hx1
hx1.tol=0.03
hx1.sw=1000
hx1.shift=0
hx1.type=H
# the match tolerance, sweep width, global reference, atom type for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H

[INPUT/n15.noesy]
# line above is the peak list file name
#it is a 3d noesy
dimension=3
IC=0
haveIC=0
#in h2o
waterFlag=1
#half-dwell sampling filter off
sign=0
# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=6
col.label=2
col.id=1
col.hx1=3
col.hx2=4
col.x1=5

# the match tolerance, sweep width, global reference, atom type for hx1
hx1.tol=0.05
hx1.sw=13.44
hx1.shift=0
hx1.type=H
# the match tolerance, sweep width, global reference, atom type for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H
# the match tolerance, sweep width, global reference, atom type for x2
x1.tol=0.5
x1.sw=27.0
x1.shift=0
x1.type=N15
[INPUT/c13.noesy]
# line above is the peak list file name
#it is a 3d noesy
dimension=3
IC=0
haveIC=0
#in d2o
waterFlag=0
#half-dwell sampling filter on
sign=1
# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=6
col.label=2
col.id=1
col.hx1=3
col.hx2=4
col.x1=5

# the match tolerance, sweep width, global reference, atom type for hx1
hx1.tol=0.05
hx1.sw=9.16
hx1.shift=0
hx1.type=H
# the match tolerance, sweep width, global reference, atom type for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H
# the match tolerance, sweep width, global reference, atom type for x1
x1.tol=0.5
x1.sw=20.7
x1.shift=0
x1.type=C13
[INPUT/c13.noesy]
# line above is the peak list file name
#it a 4d noesy
dimension=4
IC=0
haveIC=0
#in d2o
waterFlag=0
#half-dwell sampling filter on
sign=1
# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=7
col.label=2
col.id=1
col.hx1=3
col.hx2=4
col.x1=5
col.x2=6
# the match tolerance, sweep width, global reference, atom type for hx1
hx1.tol=0.05
hx1.sw=9.16
hx1.shift=0
hx1.type=H
# the match tolerance, sweep width, global reference, atom type for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H
# the match tolerance, sweep width, global reference, atom type for x1
x1.tol=0.5
x1.sw=20.7
x1.shift=0
x1.type=C13
# the match tolerance, sweep width, global reference, atom type for x2
x1.tol=0.5
x1.sw=20.7
x1.shift=0
x1.type=C13

How to handle homo-dimer protein?

1) For dimeric proteins, pseudo linkers (i.e. PL, LL5 and LP) that connect two chains are required to be added in order to run dyana calculations. This is set up
internally by AutoStructure to handle homodimer proteins. For monomer, no change is needed.
Example: GlyTM1bZip datasets

traditional 3D n15-NOESY peak list
(it has both inter and intra chain NOE) traditional 3D c13-NOESY peak list
(it has both inter and intra chain NOE) 3D X-filtered C13-NOESY peak list
(it has only interchain noes)

[INPUT/n15.noesy]
dimension=3
IC=0
haveIC=1
waterFlag=1
sign=0

col.intensity=5
col.label=6
col.id=1
col.hx1=2
col.hx2=3
col.x1=4
hx1.tol=0.05
hx1.sw=10000
hx1.shift=0
hx1.type=H
hx2.tol=0.05
hx2.sw=10000
hx2.shift=0
hx2.type=H
x1.tol=0.5
x1.sw=10000
x1.shift=0
x1.type=N15
[INPUT/c13.noesy]
dimension=3
IC=0
haveIC=1
waterFlag=1
sign=0

col.intensity=5
col.label=6
col.id=1
col.hx1=2
col.hx2=3
col.x1=4
hx1.tol=0.05
hx1.sw=10000
hx1.shift=0
hx1.type=H
hx2.tol=0.05
hx2.sw=10000
hx2.shift=0
hx2.type=H
x1.tol=0.5
x1.sw=10000
x1.shift=0
x1.type=C13
[INPUT/c13IC.noesy]
dimension=3
IC=1
waterFlag=1
sign=0
iperc=0.07
col.intensity=5
col.label=6
col.id=1
col.hx1=2
col.hx2=3
col.x1=4
hx1.tol=0.05
hx1.sw=10000
hx1.shift=0
hx1.type=H
hx2.tol=0.05
hx2.sw=10000
hx2.shift=0
hx2.type=H
x1.tol=0.5
x1.sw=10000
x1.shift=0
x1.type=C13

How to understand the output_dir?

1. under the Output_dir directory:

*_NA.ovw general report about AutoStructure calculation.

*_NA.sec information about secondary structure analysis.

*_NA.note report of the preprocessing of the inputfiles.

*_NA.exm complete report about the cycle0-0 analysis, providing information about why this peak is assigned or not assigned.

*_NA.unassign peaks that unassigned during valication cycles.

*.noise peaks that excluded from noesy_assign analysis (noise peaks).

*_NA.Val If you calculate M-score *_NA.Val is the file to check. It also provides guide for chemical shift refinement. For atoms that shifts consistently more than 0.3ppm for C/N or 0.03ppm for H, it is recommended to adjust them manually.

source a subdirectory of all inputfiles used in this calculation.

2. In each Structure Calculation cycle:

*.upl the upper limit distance constraints.

*.aco the angle constraints.

hbond.lol the h-bond lower limit distance constraints.

hbond.upl the h-bond upper limit distance constraints.

*.pdb the coordinates generated by DYANA by using above constraints.

*.ovw the overview file generated by DYANA.

*_assOrder reports the NOE assignments for each peaks ordered by intra, seq, mid and long-range.

*_assSparky reports the NOE assignments in Sparky format. It can be loaded into
Sparky and display assignment in SPARKY. You can convert this file to other format for display.

*_match.gz reports the details of the analysis process.

log* DYANA logs for structure caclulation.

3. Your final structure and constraints are the in the last cycle.
4. Before structure calculation cycle is finished, all constraint files and intermediate results are stored in WorkingCycle directory.

How to measure M-score and average shift? --- The M score measures the input data quality. It is the fraction of the expected short-range two and three-bond connected NOEs that are not found in the peak list. High M score indicates problems with resonance assignments, peak lists or global referencing.

1. run `bin/autostructure -v -c control-file -o output_dir'
2. check file *_NA.Val in your output directory

Note: The 1.0beta version only measures M score for 3D noesy.

Example: extracted from TMZIP_NA.Val report for TMZIP:

...........
    # Summary for n15.noesy
    Total simulated peaks: 44 Number of peaks NOT matched in the Peak List: 4
    M score = 0.091
    Average shift in HX: 0.001 Average ab-shift in HX: 0.0067(Assignment used: 39)
    Average shift in X: 0.019 Average ab-shift in X: 0.075(Assignment used: 36)
    Average shift in HX2: -0.018 Average ab-shift in HX2: 0.022(Assignment used: 40)
    ...........
    # Summary for c13.noesy
    Total simulated peaks: 270 Number of peaks NOT matched in the Peak List: 83
    M score = 0.31
    Average shift in HX: 0.0061 Average ab-shift in HX: 0.0095(Assignment used: 100)
    Average shift in X: 0.022 Average ab-shift in X: 0.083(Assignment used: 86)
    Average shift in HX2: -0.0026 Average ab-shift in HX2: 0.014(Assignment used: 117)
    ..............

How to measure I, L scores? In general, the I and L scores should be similar to M scores. If I/L is significantly high, it is an indication that the fold is not right.

option: -q structure-file
run `bin/autostructure -c control-file -o output_dir -q structure-file'
check file *_NA.Ref in your output directory
Note: The 1.0beta version only measures I,L score for 3D noesy.

How to exam input data quality?

Things to check:
1. M score and average shift - the M score measures the input data quality. It is the fraction of the expected short-range two and three-bond connected NOEs that
are not found in the peak list. High M score indicates problem with resonance assignments, peak lists or relative global referencing. Average shift in each dimension should close to 0ppm. Otherwise global referencing is needed for that dimension. (see How to measure M-score and average shift?)
2. Most of the expect two, three and four bond connected intra and close sequential NOE should be assigned in the CYCLE1-0. If this is not the
case, it indicates there is a problem related to resonance assignments, peak lists or global referencing.
Quick ways to check the expected assignments:
1) Look *_NA.ovw file:
TMZIP example: from TMZIP_NA.ovw

PeakList n15.noesy:
                                    #NOESY_Assign Process#
    Cycle Method Peak_Assigned Assignment_Made Assignment_Removed Peak_Not_Assigned Noise_Peak
                 ------------- ---------------
                 new   cum     new    cum
    1-0   E      159   159     168    168     -                   222               13
.......

This table shows there are 159 expected NOE assignment. As there are 38 residues. For each residues, HN-HA(i,i), HN-HB(i,i), Ha-HN(i,i+1), Hb-HN(i,i+1) NOEs are expected. The average total is 4*38 ~ 152, which are close to 159.
same for c13-NOESY.
2) For detail, look *_assignOrder and *.upl file in the CYCLE1-0/.

How to understand and exam the secondary structure file?

The secondray structure plays an important role in AutoStructure. We use CSI method to identify the secondary structure elements. Those
secondary structure elements are then refined based on the J, NH-slow and NOE data. The alignments between b-strands are also identified from
NOE data. File *_NA.sec report the secondray structure information.
CSI Method --- check the consensus column. 1 -> helix, -1 -> sheet. This results should be general consistent with your 3D structures, expect
for the N/C terminals.
    TMZIP example: (from TMZIP_NA.sec file)

........
    Summary:
    #       AA      HA      CA      CB      SUM     CONSENSUS
    -       --      --      --      --      ---     ---------
    1       G       0       -1      0       NA      0
    2       A       0       0       0       NA      0
    3       G       0       0       0       NA      0
    4       S       0       0       1       NA      0
    5       S       0       0       0       NA      0
    6       S       0       0       0       NA      0
    7       L       0       1       0       NA      1
    8       E       -1      1       -1      NA      1
    9       A       -1      1       -1      NA      1
    10      V       0       1       0       NA      1
    11      R       -1      1       0       NA      1
    12      R       -1      1       0       NA      1
    13      K       -1      1       0       NA      1
    14      I       -1      1       0       NA      1
    15      R       -1      1       0       NA      1
    16      S       -1      1       0       NA      1
    17      L       0       1       1       NA      1
    18      Q       -1      1       -1      NA      1
    19      E       -1      1       0       NA      1
    20      Q       0       1       -1      NA      1
    21      N       -1      1       0       NA      1
    22      Y       -1      1       -1      NA      1
    23      H       -1      1       -1      NA      1
    24      L       -1      1       1       NA      1
    25      E       -1      1       -1      NA      1
........

The consensuc column shows that from residue 7 --> 25 -> ... are helix.
Combined analysis of CSI, NOE, Jval and/or NH -- the N/C terminal of secoondary elements may extened or shorted from CSI results
    Alignments between b-strands are reported here.
    1) all possible registers and their scores.
    FGF Example:
            ** Poss: Register for Antiparallel Strands 10-12 Found : 63 75 score: 8
    2) Validation -- registers that inconsistent with other registers are removed. If final self-consistent registers defined the b-sheets of the protein.
Final secondary structure and AntiParallel (Or Parallel) Strands Registers - This is used by AutoStructure to rule-in and rule-out possible assignments. Expected NOEs from the secondary structures are assigned and inconsistent NOEs from the secondary structures are rule-out. Long range cross-strand NOEs that consistent with the alignments are assigned. Also cross-strand Hbond and helical Hbond are added in the structure calculation.
The registers are identified based on the frequency (total number of expected long-range NOEs ) that are found in the peak list. When there are a lot of noise in the peak list, noise can increase the frequency of an incorrect register and the strands can then be aligned in a wrong way. In this case, manually check all register's related assigned long-range NOE is necessary. We have noticed that the noise in HN-HN region distrubs the results. A clean HN-HN region helps a lot. Other regions such as HA-HA, HA-HN are also important.
*** warning. If the aligment is wrong, you will definitely get a wrong fold. ***

How to run AutoStructure in a practical way?

First, we set nCycles=2 in the control-fiile, run AutoStructure and exam:
1) the input data quality of the resonance assignments, peak lists and referencing (see How to exam input data quality?).
2) the secondary structure results (see How to understand and exam the secondary structure file?)
3) peaks unassigned in the CYCLE1-1/, CYCLE1-2/....
If there are many peaks unassigned in CYCLE1-1, CYCLE1-2, ... ( in *_NA.ovw file), it is an indication that there is a
problem. It is usually just a subset of the long-range constraints that are causing the problem. The fact that number of peaks unassigned is few
indicates that all assigned from CYCLE1-0 are self-consistent and it is OK to use the inital model to do iterative calculcaitons.
Ways to exam number of assigned and unassigned peaks in CYCLE1-0, 1-1, ...
Things to check:
1) *_NA.unassign - This file list all peaks that unassigned.
2) *_assOrder in CYCLE1-0/ - This file list all peaks that assigned.
3) *_assSparky in CYCLE1-0/ - this is a peak list file with assigment from AutoStructure. It can be loaded into Sparky and display assignment in
SPARKY. You can convert this file to other format for display.
4) Due to the effect of the incompleteness of resonance assignments, NOE assignments that assigned by Unique method or SYM method may be
incorrect. Many intra peaks may be assigned to long-range assignments as one of the real resonances is not assigned. In many cases, long-range
unassigned NOE assignments turned out to be intra assignments to one of the unassigned atoms.
    Example 1: An Example that indicates there is a problem (from old RBFA data set):

......
PeakList hrnoesy3:
                                    #NOESY_Assign Process#
    Cycle Method Peak_Assigned Assignment_Made Assignment_Removed Peak_Not_Assigned Noise_Peak
                 ------------- ---------------
                 new   cum     new    cum
    1-0   E      301   301     307    307      -                  8367              5708
    1-0   U      22    323     22     329      -                  8345              5708
    1-0   SYM    40    363     40     369      -                  8305              5708
    1-0   E      5     368     7      376      -                  8300              5708
    1-0   CF     3     371     3      379      -                  8297              5708
    1-1   VIO    -     333     -      341      38                 8335              5708
    2-0   PDB    430   763     444    785      -                  7905              5708
.......

There are 38 peaks un assigned before CYCLE2-0.
Example 2: After we made more resonance assigments and looked carefully at the NOESY crosspeaks corresponding to ALL (violated and satisified) long range constraints using SPARKY ("guided peak list editing"). Here is the report that look like for RBFA data set

......
PeakList c13_0928_3:
                                    #NOESY_Assign Process#
    Cycle Method Peak_Assigned Assignment_Made Assignment_Removed Peak_Not_Assigned Noise_Peak
                 ------------- ---------------
                 new   cum     new    cum
    1-0   E      886   886     948    948      -                  2229              621
    1-0   U      9     895     9     957      -                  2220              621
    1-0   SYM    70    965     70     1027     -                  2150              621
    1-0   E      45    1010    59     1086     -                  2105              621
    1-0   CF     4     1014    4      1090     -                  2101              621
    1-1   VIO    -     1011    -      1087     3                  2104              621
    2-0   PDB    542   1553    695    1782     -                  1562              621
......

Now, there are more peaks assigned and much fewer unassigned.
4) This calculation may need to re-run several times. After we are happy about the input data and the initial model, we then set nCycles=20 and run
it and we exam the results from the last cycle. If the final results are not good, better resonance assignments and better peak list are needed.
Things to check:
1) percent of peaks assigned in the last cycle (reported in the *NA.ovw file). We think that at least 60% of peaks should be assigned for c13-NOESY and ~70% of peaks should be assigned for n15-NOESY.
Example:

    .....
    PeakList c13_0928_3:
    ......
                                       #Peak-Assignment Stat#
    Cycle Total_Assignable_Peak Unambiguous_Assigned Ambiguous_Assigned Total_Assigned Peak_Not_Assigned
          --------------------- -------------------- ------------------ -------------- -----------------
          #         %           #         %          #        %         #      %       #       %
    1-0   2494      100         948       0.38       66       0.03      1014   0.41    1480    0.59
    1-1   2494      100         945       0.38       66       0.03      1011   0.41    1483    0.59
    2-0   2494      100         1356      0.54       206      0.08      1562   0.63    932     0.37
    2-1   2494      100         1329      0.53       196      0.08      1525   0.61    969     0.39
    ......
    20-0 2494      100         1390      0.56       412      0.17      1802   0.72    692     0.28
    20-1 2494      100         1391      0.56       411      0.16      1802   0.72    692     0.28
    20-2 2494      100         1391      0.56       411      0.16      1802   0.72    692     0.28
    ......

For this data set, about 72% of peaks assigned.
2) quality-factors - I,L scores. In general, the I and L scores should be similar to M scores. If I/L is significantly high, it is an indication that the fold is not right.
3) number of constraints per residue. On average, we expect at least 15 conformationally constraints for each residue.

can not open input file OUTPUT2/Workingcycle/*.pdb

When running structure calculation, all constraints are stored in Workingcycle directories. There are 14 log files in
that directory, corresponding to 14 calculations in the farm. They are all the same as this:
     DYANA, version 1.5 (gnu, double precision)
     Copyright (c) 1996-98 ETH Zurich
     dyana> dyana> dyana>   - calc_para1: readdata IL13
         Library file "/usr/local/dyana-1.5/lib/lib/dyana.lib" read, 54 residue types.
         Sequence file "IL13.seq" read, 113 residues.
     *** ERROR: Illegal atom name "S" for residue CYS 29.
     dyana>
This means that atom name S in your manual constraints is not supported by DYANA. It think the atom name is SG for
DYANA.

Thanks

We express our thanks to Dr. Keith L Constantine and Dr. Robert Powers for their comments and input on AutoStructure.

yphuang@cabm.rutgers.edu

RASMOL	rasmol command (viewer command, optional)
DYANA	dyana command
DQS	command to run parallel calculation (optional)
ASBIN	path of AutoStructure bin directory
NALIB	path of NOESY_Assign liberary
AyudaPATH	path of PDBSTAT liberary (optional)

proteinName	the name of the protein
seqFile	the sequence file in bmrb format (example)
chemicalShiftFile	the resonance assignment file in bmrb format (example)
JListFile	J-List file (optional) in bmrb format (example)
NHSlowList	NH slow exchange file (optional) (example) AutoStructure use slow amide exchange data to determine secondary structures and identify hydrogen bonds.
ACO	dihedral angle constraint file (optional) in DYANA format (example)
HBOND	h-bonds file (optional) (example)
UPL	upper-limit distance constraint file (optional) in DYANA format (example)
par	parameter file (optional, for advanced usage)
nCycles	nCycles is the maximum number of bootstrapping cycles. AutoStructure will stop when no more assignments can be made or after nCycles of bootstrapping. When AutoStructure stops normaly, it will print 'The program is finished.' at the end of the _NA.ovw file. Sometimes, when the queue system is unstable, the calculation may be stopped by the queue system or hung there forever. In these cases, 'The program is finished.' will not be printed out in the .ovw file. CYCLE-0 is a bootstrapping cycle. In this cycle, new assignments are made based on the 3D structures from last iteration. The maximum number of bootstrapping cycles are specified by nCycles. CYCLE-1 (or CYCLE*-2, etc) is a validation cycle. In this cycle, assignments that are consistently and significantly violated with the 3D structures from last iteration are unassigned, and 3D structures are re-computed. No new assignments are made in the validation cycle. There are no limitation on the number of validation cycles.

viewCommand	View is a script to run rasmol command (viewer command, optional)
hyperCommand	hyper command, all option of hyper can be added at the end of -N (optional) (example)
dyanaCommand	CreateProc is a script to run parallel DYANA computing over the DQS system. `CreatProc TMZIP x y z' means calculating xy structures using x machines and selecting the best z. On each machine, there are y structures calculated. CreateProcOne is a script that uses one cpu, no DQS system required. `CreatProcOne TMZIP 1 x y' means calculating x structures on one machine and select the best y. Example: `CreatProc TMZIP 14 4 10' means calculating 144 structures using 14 machines and selecting the best 10. On each machine, there are 4 structures calculated. `CreatProcOne TMZIP 1 10 5' means calculating 10 structures on one machine and select the best 5.
cnsCommand	a script to run cns command (under development, coming soon)

dimension		the dimension of peak list dimension = 2 means that the peak list has hx1 and hx2 dimensions. dimension = 3 menas that the peak list has hx1, x1 and hx2 dimensions. dimension = 4 means that the peak list has hx1, x1, hx2 and x2 dimensions. Only 4D CC-NOESY is supported right now.
IC haveIC		for monomer: IC = 0, haveIC = 0 for homo-dimer ( detail see next section How to handle homo-dimer protein?) : IC = 0, haveIC = 0 means the NOESY peak list have only intra chain NOEs. IC = 0, haveIC =1 means the NOESY peak list have both intra chain and inter chain NOEs. IC = 1, means the NOESY peak list have only inter chain NOEs. It is a X-filtered experiment.
waterFlag		if in water solution, waterFlag = 1; if in D20 solution, waterFlag = 0;
sign		sign=1 tells the program to use half-dwell sampling in C/N to filter the possible assignments list. sign=0 tells the program no half-dwell sampling filter is applied. Both positive and negative peaks are considered in both cases The current version only support C/N half-dwell sampling filter in 3D spectrum. The coming version(unreaseled yet) is different. It support half-dwell sampling filter in any H/C/N dimension.
iperc		the noise lever = highest intenisty in the peak list * iperc. All peaks that below this noise lever are not assgined by Unique method. If not specified, the default value is used.
column	col.intensity	the intensity column
	col.label	the label column. Label is a comment string column for user. User can write any string on that column, such as NOE-assignments. This column is read in but not used by NOESY_Assign.
	col.id	the id column
	col.hx1	the hx1 column
	col.hx2	the hx2 column
	col.x1	the x1 column (not used for 2D NOESY, if dimension = 2) and hx1--> x1
	col.x2	the x2 column (not used for 2D or 3D NOESY, if dimension = 2 or 3) and hx2-->x2
tol	hx1.tol hx2.tol x1.tol x2.tol	Match tolerance for hx1, hx2, x1 and x2 dimensions in ppm. x1.tol and x2.tol are not used for 2D NOESY x2.tol is not used for 3D NOESY
sw	hx1.sw hx2.sw x1.sw x2.sw	Sweep width for hx1, hx2, x1 and x2 dimensions in ppm. It is used to determined all possible aliased chemical shift positions. In the 1.0beta version, only C/N aliasing are supported. In the coming version, H aliasing is also supported. The program may run faster given a large sweep width for unaliased dimension, such as sw=1000 or 10000.
shift	hx1.shift hx2.shift x1.shift x2.shift	`shift' is used to do global referencing. If x1.shift=0.1, then all chemical shift in x1 dimension are added by 0.1ppm. If your spectrum is well referenced with your resonance assignments, set all shift=0.
type	hx1.type hx2.type x1.type x2.type	atom type for hx1, hx2, x1 and x2 dimensions. type = H for proton type = N15 for nigtron type = C13 for carbon

*_NA.ovw	general report about AutoStructure calculation.
*_NA.sec	information about secondary structure analysis.
*_NA.note	report of the preprocessing of the inputfiles.
*_NA.exm	complete report about the cycle0-0 analysis, providing information about why this peak is assigned or not assigned.
*_NA.unassign	peaks that unassigned during valication cycles.
*.noise	peaks that excluded from noesy_assign analysis (noise peaks).
*_NA.Val	If you calculate M-score *_NA.Val is the file to check. It also provides guide for chemical shift refinement. For atoms that shifts consistently more than 0.3ppm for C/N or 0.03ppm for H, it is recommended to adjust them manually.
source	a subdirectory of all inputfiles used in this calculation.

*.upl	the upper limit distance constraints.
*.aco	the angle constraints.
hbond.lol	the h-bond lower limit distance constraints.
hbond.upl	the h-bond upper limit distance constraints.
*.pdb	the coordinates generated by DYANA by using above constraints.
*.ovw	the overview file generated by DYANA.
*_assOrder	reports the NOE assignments for each peaks ordered by intra, seq, mid and long-range.
*_assSparky	reports the NOE assignments in Sparky format. It can be loaded into Sparky and display assignment in SPARKY. You can convert this file to other format for display.
*_match.gz	reports the details of the analysis process.
log*	DYANA logs for structure caclulation.