AutoStructure-1.0 HowTo? 

How to install AutoStructure?
How to set up pc-clusters for parallel DYANA calculation?
How to run AutoStructure?
How to prepare input experimental data set?
How to prepare control-file and input date set?
    General section
    Command section
    PeakList section
How to analysis homo-dimer protein?
How to understand the output_dir?
How to measure M-score and average shift?
How to measure I, L scores?
How to examine input data quality?
How to understand and exam the secondary structure file?
How to run AutoStructure in a practical way?
    Ways to exam number of assigned and unassigned peaks in CYCLE1-0, 1-1, ...

How to install AutoStructure?

Update the following enviorment variables defined  in the script file 'bin/autostructure':
 
RASMOL rasmol command (viewer command, optional) 
DYANA dyana command
DQS command to run parallel calculation (optional)
ASBIN path of AutoStructure bin directory
NALIB path of NOESY_Assign liberary
AyudaPATH path of PDBSTAT liberary (optional)


Example: (the red-colored word need to be changed for your system)

 
#!/bin/sh

#rasmol command
RASMOL=/usr/bin/X11/rasmol

#dyana command
DYANA=/usr/local/bin/dyana

#command to run parallel calculation
DQS=/usr/local/DQS/bin/qsub332

#path of AutoStructure bin
ASBIN=/farm/software/AutoStructure-1.0beta/bin

#path of Noesy_assign lib
NALIB=/farm/software/AutoStructure-1.0beta/noesy_assign-1.0beta/Lib

#path of PDBSTAT lib
AyudaPATH=/farm/software/AutoStructure-1.0beta/PDBSTAT/Lib

#export
export NALIB
export AyudaPATH
export ASBIN
export DQS
export DYANA
export RASMOL

#start with noesy_assign
$ASBIN/noesy_assign $*


How to set up pc-clusters for parallel DYANA calculation?

'/DQS/bin/qstat -f'  shows all jobs in the queue system.

This command will show you a list like this:

     Queue Name    Queue Type    Quan  Load          State
     ----------    ----------    ----  ----          -----
     bmw                              batch         1/1   1.01  er      UP
       roberto  CalcStrBRCT SmtBRCT5.sh bmw          82863 04/10/102 17:52:13
     broccoli                         batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT16.s broccoli     82852 04/10/102 17:52:13
     cabbage                          batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT3.sh cabbage      82865 04/10/102 17:52:14
     carrot                           batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT2.sh carrot       82866 04/10/102 17:52:14
     corn                             batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT10.s corn         82858 04/10/102 17:52:13
     cucumber                         batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT14.s cucumber     82854 04/10/102 17:52:13
     eggplant                         batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT12.s eggplant     82856 04/10/102 17:52:13
     falcon                           batch         0/1   0.00  er      UP
     ferrari                          batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT15.s ferrari      82853 04/10/102 17:52:13
     garlic                           batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT8.sh garlic       82860 04/10/102 17:52:13
     jaguar                           batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT4.sh jaguar       82864 04/10/102 17:52:14
     lettuce                          batch         1/1   1.00  er      UP
       roberto  CalcStrBRCT SmtBRCT1.sh lettuce      82867 04/10/102 17:52:14
     lotus                            batch         1/1   1.01  er      UP
       roberto  CalcStrBRCT SmtBRCT9.sh lotus        82859 04/10/102 17:52:13
     olive                            batch         0/1   0.00  er      UP
     onion                            batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT11.s onion        82857 04/10/102 17:52:13
     porsche                          batch         1/1   1.10  er      UP
       roberto  CalcStrBRCT SmtBRCT7.sh porsche      82861 04/10/102 17:52:13
     potato                           batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT13.s potato       82855 04/10/102 17:52:13
     spinach                          batch         0/1   0.00  er      UP
     squash                           batch         0/1   0.00  er      UP
     tarzan                           batch         0/1   0.07  er      UP
     tomato                           batch         1/1   0.96  er      UP
       roberto  CalcStrBRCT SmtBRCT6.sh tomato       82862 04/10/102 17:52:13
 

First column list all computers in the queue system (the farm). In this example,  roberto (account name) is running 16 calculations on the farm.


How to run AutoStructure?

    1. prepare input data files
    2. prepare control file
    3. run 'bin/autostructure -c control-file -o output_dir'
    4. All results are in output_dir.


How to prepare input data set?


How to prepare control-file? --- control file has General, Command and PeakList  three sections.

control-file Example 1

1. General Section : the first part of control-file is the General section.  This section gives the name of the protein, the names of the sequence file, resonance assignment file, J-list file, NH slow exchange file  --> basicly all input files except peak lists. When manual analysis results such as upper limit distance constraints, dihedral angle constraints and h-bonds are avaiavible,  these constraints may added in and used in structure calculation.

Keywords:

 
proteinName the name of the protein 
seqFile the sequence file in bmrb format (example)
chemicalShiftFile the resonance assignment file in bmrb format (example)
JListFile J-List file (optional) in bmrb format (example)
NHSlowList NH slow exchange file (optional) (example)
AutoStructure use slow amide exchange data to determine secondary structures and identify hydrogen bonds.
ACO dihedral angle constraint file (optional) in DYANA format (example)
HBOND h-bonds file (optional) (example)
UPL upper-limit distance constraint file (optional)  in DYANA format (example)
par parameter file (optional, for advanced usage) 
nCycles nCycles is the maximum number of bootstrapping cycles. AutoStructure will stop when no more assignments can be made or after nCycles of bootstrapping. 

When AutoStructure stops normaly,  it will print  'The program is finished.'  at the end of the *_NA.ovw file.  Sometimes, when the queue system is unstable,  the calculation may be stopped by the queue system or hung there forever.  In these cases, 'The program is finished.' will not be printed out in the *.ovw file.

CYCLE*-0 is a bootstrapping cycle. In this cycle, new assignments are made based on the 3D structures from last iteration. The maximum number of bootstrapping cycles  are specified by nCycles. 

CYCLE*-1 (or CYCLE*-2, etc) is a validation cycle. In this cycle, assignments that are  consistently and significantly violated with the 3D structures from last iteration are unassigned, and 3D structures are re-computed.  No new assignments are made in the validation cycle.  There are no limitation on the number of validation cycles. 

Example 1: (the red-colored word need to be changed for your data set)
 
[General]
proteinName=FGF

#input files except peak lists
seqFile=INPUT/seq.bmrb
chemicalShiftFile=INPUT/chemicalshift.bmrbStereo
JListFile=INPUT/FGFJval.bmrb
NHSlowList=INPUT/NHSlowList
ACO=INPUT/FGF.acoManual
HBOND=INPUT/hbond.dyaManual
UPL=INPUT/FGF.uplManual

#you can comment this next line out if you want to use the default one
par=INPUT/par.tbl

#max. nunber of bootstrapping cycles
nCycles=20

Example 2:
 
[General]

proteinName=TMZIP

#input files except peak lists
seqFile=INPUT/sequence.bmrb
chemicalShiftFile=INPUT/chemicalshift.bmrb
JListFile=INPUT/TMZIP.Jvalloose

#you can comment this next line out if you want to use the default one
par=INPUT/par.tbl

#max. nunber of bootstrapping cycles
nCycles=20

2. Command Section:  the second part of control-file is the command section.  All the command script are in the $ASBIN directory. There can only be one line after each command entry which is treated as a shell command line and can be commented out, if not using it.

Keyswords:

 
viewCommand View is a script to run rasmol command (viewer command, optional) 
hyperCommand hyper  command, all option of hyper can be added at the end of -N (optional) (example)
dyanaCommand CreateProc is a script to run parallel DYANA computing over the DQS system.  `CreatProc TMZIP x y z' means calculating x*y structures using x machines and selecting the best z.  On each machine, there are y structures calculated. 

CreateProcOne is a script that uses one cpu, no DQS system required. `CreatProcOne TMZIP 1 x y' means calculating x structures on one machine and select the best y. 

Example: 
`CreatProc TMZIP 14 4 10' means calculating 14*4 structures using 14 machines and selecting the best 10. On each machine, there are 4 structures calculated. 
`CreatProcOne TMZIP 1 10 5' means calculating 10 structures on one machine and select the best 5. 
 

cnsCommand a script to run cns command (under development, coming soon)
Example 1:
 
# there can only be one line after each command entry which is treated as a shell command line and can be commented out. 
#here only dyanaCommand is actived
# dyanaCommand: calc structures on 14 machines, each calc 4 and select best 10
[viewerCommand]
#$ASBIN/View 
[hyperCommand]
#$ASBIN/hyper -N 
[dyanaCommand]
$ASBIN/CreateProc FGF 14 4 10
Example 2:
 
# there can only be one line after each command entry which is used as a shell command line and can be commented out. 
# viewerCommand: for demo and view the 3d using rasmol 
# hyperCommand: call for hyper 
# dyanaCommand: calc structures on 14 machines, each calc 4 and select best 10
[viewerCommand]
$ASBIN/View
[hyperCommand]
$ASBIN/hyper -N 
[dyanaCommand]
$ASBIN/CreateProc TMZIP 14 4 10
3. PeakList Section:  Each peak list is an entry in control file.

Keywords:

 
dimension
the dimension of peak list
dimension = 2 means that the peak list has hx1 and hx2 dimensions.
dimension = 3 menas that the peak list has hx1, x1 and hx2 dimensions.
dimension = 4 means that the peak list has hx1, x1, hx2 and x2 dimensions.  Only 4D CC-NOESY is supported right now. 
IC
haveIC
for monomer: IC = 0, haveIC = 0
for homo-dimer ( detail see next section How to handle homo-dimer protein?) :
      IC = 0, haveIC = 0 means the NOESY peak list have only intra chain NOEs.
      IC = 0, haveIC =1 means the NOESY peak list have both intra chain and inter chain NOEs.
      IC = 1, means the NOESY peak list have only inter chain NOEs. It is a X-filtered experiment. 
waterFlag
if  in water solution, waterFlag = 1;
if in D20 solution,  waterFlag = 0;
sign
sign=1 tells the program to use half-dwell sampling in C/N to filter the possible assignments list. 
sign=0 tells the program no half-dwell sampling filter is applied. 
Both positive and negative peaks are considered in both cases
The current version only support C/N half-dwell sampling filter in 3D spectrum. The coming version(unreaseled yet) is different.  It support half-dwell sampling filter in any  H/C/N dimension. 
iperc
 the noise lever = highest intenisty in the peak list * iperc.
All peaks that below this noise lever are not assgined by Unique method. 
If not specified, the default value is used. 
column
col.intensity the intensity column 
col.label the label column. Label is a comment string column for user.  User can write any string on that column, such as NOE-assignments. 
This column is read in but not used by NOESY_Assign. 
col.id the id column
col.hx1 the hx1 column
col.hx2 the hx2 column
col.x1 the x1 column (not used for 2D NOESY, if dimension = 2) and hx1--> x1 
col.x2 the x2 column (not used for 2D or 3D NOESY, if dimension = 2 or 3) and hx2-->x2
tol
hx1.tol
hx2.tol
x1.tol
x2.tol
Match tolerance for hx1, hx2, x1 and x2 dimensions in ppm. 
x1.tol and x2.tol are not used for 2D NOESY
x2.tol is not used for 3D NOESY
sw
hx1.sw
hx2.sw
x1.sw
x2.sw
Sweep width for hx1, hx2, x1 and x2 dimensions in ppm.
It is used to determined all possible aliased chemical shift positions.  In the 1.0beta version, only C/N aliasing are supported. In the coming version, H aliasing is also supported.
The program may run faster given a large sweep width for unaliased dimension, such as sw=1000 or 10000.
shift
hx1.shift
hx2.shift
x1.shift
x2.shift
`shift' is used to do global referencing. 
If x1.shift=0.1, then all chemical shift in x1 dimension are added by 0.1ppm. 
If your spectrum is well referenced with your resonance assignments, set all shift=0.
type
hx1.type
hx2.type
x1.type
x2.type
atom type for hx1, hx2, x1 and x2 dimensions. 
type = H for proton
type = N15 for nigtron
type = C13 for carbon
Example :  Peak list entries for a monomer protein (the red-colored word need to be changed for your dataset)
 
2D NOESY 3D N15-NOESY (peak list example) 3D C13-NOESY (peak list example) 4D CC-NOESY
[INPUT/2d.noesy]
# line above is the peak list file name

#it is a 3d noesy 
dimension=2

 IC=0
haveIC=0

#in h2o
waterFlag=1

#half-dwell sampling filter off
sign=0

# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=5
col.label=2
col.id=1
col.hx1=3
col.hx2=4
 
 

# the match tolerance, sweep width, global reference, atom type  for hx1
hx1.tol=0.03
hx1.sw=1000
hx1.shift=0
hx1.type=H

# the match tolerance, sweep width, global reference, atom type  for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H

 

[INPUT/n15.noesy]
# line above is the peak list file name

#it is a 3d noesy 
dimension=3

IC=0
haveIC=0

#in h2o
waterFlag=1

#half-dwell sampling filter off
sign=0

# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=6
col.label=2
col.id=1
col.hx1=3
col.hx2=4
col.x1=5
 

# the match tolerance, sweep width, global reference, atom type  for hx1
hx1.tol=0.05
hx1.sw=13.44
hx1.shift=0
hx1.type=H

# the match tolerance, sweep width, global reference, atom type  for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H

# the match tolerance, sweep width, global reference, atom type  for x2
x1.tol=0.5
x1.sw=27.0
x1.shift=0
x1.type=N15

[INPUT/c13.noesy]
# line above is the peak list file name

#it is a 3d noesy
dimension=3

IC=0
haveIC=0

#in d2o
waterFlag=0

#half-dwell sampling filter on
sign=1

# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=6
col.label=2
col.id=1
col.hx1=3
col.hx2=4
col.x1=5
 

# the match tolerance, sweep width, global reference, atom type  for hx1
hx1.tol=0.05
hx1.sw=9.16
hx1.shift=0
hx1.type=H

# the match tolerance, sweep width, global reference, atom type  for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H

# the match tolerance, sweep width, global reference, atom type  for x1
x1.tol=0.5
x1.sw=20.7
x1.shift=0
x1.type=C13

[INPUT/c13.noesy]
# line above is the peak list file name

#it a 4d noesy
dimension=4

IC=0
haveIC=0

#in d2o
waterFlag=0

#half-dwell sampling filter on
sign=1

# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=7
col.label=2
col.id=1
col.hx1=3
col.hx2=4
col.x1=5
col.x2=6

# the match tolerance, sweep width, global reference, atom type  for hx1
hx1.tol=0.05
hx1.sw=9.16
hx1.shift=0
hx1.type=H

# the match tolerance, sweep width, global reference, atom type  for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H

# the match tolerance, sweep width, global reference, atom type  for x1
x1.tol=0.5
x1.sw=20.7
x1.shift=0
x1.type=C13

# the match tolerance, sweep width, global reference, atom type  for x2
x1.tol=0.5
x1.sw=20.7
x1.shift=0
x1.type=C13


How to handle homo-dimer protein?
1) For dimeric proteins,  pseudo linkers (i.e. PL, LL5 and LP) that connect two chains are required to be added in order to run dyana calculations.  This is set up
  internally by AutoStructure to handle homodimer proteins. For monomer, no change is needed.

Example:  GlyTM1bZip datasets

 
traditional 3D n15-NOESY peak list
(it has both inter and intra chain NOE)
traditional 3D c13-NOESY peak list
(it has both inter and intra chain NOE)
3D  X-filtered C13-NOESY peak list 
(it has only interchain noes)
[INPUT/n15.noesy]
dimension=3
IC=0
haveIC=1

waterFlag=1
sign=0
 

col.intensity=5
col.label=6
col.id=1
col.hx1=2
col.hx2=3
col.x1=4

hx1.tol=0.05
hx1.sw=10000
hx1.shift=0
hx1.type=H

hx2.tol=0.05
hx2.sw=10000
hx2.shift=0
hx2.type=H

x1.tol=0.5
x1.sw=10000
x1.shift=0
x1.type=N15

[INPUT/c13.noesy]
dimension=3
IC=0
haveIC=1

waterFlag=1
sign=0
 

col.intensity=5
col.label=6
col.id=1
col.hx1=2
col.hx2=3
col.x1=4

hx1.tol=0.05
hx1.sw=10000
hx1.shift=0
hx1.type=H

hx2.tol=0.05
hx2.sw=10000
hx2.shift=0
hx2.type=H

x1.tol=0.5
x1.sw=10000
x1.shift=0
x1.type=C13

[INPUT/c13IC.noesy]
dimension=3

IC=1

waterFlag=1
sign=0
iperc=0.07

col.intensity=5
col.label=6
col.id=1
col.hx1=2
col.hx2=3
col.x1=4

hx1.tol=0.05
hx1.sw=10000
hx1.shift=0
hx1.type=H

hx2.tol=0.05
hx2.sw=10000
hx2.shift=0
hx2.type=H

x1.tol=0.5
x1.sw=10000
x1.shift=0
x1.type=C13


How to understand the output_dir?

1. under the Output_dir directory:
 
*_NA.ovw general report about AutoStructure calculation.
*_NA.sec information about secondary structure analysis.
*_NA.note report of the preprocessing of the inputfiles.
*_NA.exm complete report about the cycle0-0 analysis, providing information about why this peak is assigned or not assigned.
*_NA.unassign peaks that unassigned during valication cycles.
*.noise peaks that excluded from noesy_assign analysis (noise peaks).
*_NA.Val If you calculate M-score *_NA.Val is the file to check. It also provides guide for chemical shift refinement. For atoms that shifts consistently more than 0.3ppm for C/N or 0.03ppm for H, it is recommended to adjust them manually.
source a subdirectory of all inputfiles used in this calculation. 
2. In each Structure Calculation cycle:
 
*.upl the upper limit distance constraints.
*.aco the angle constraints.
hbond.lol the h-bond lower limit distance constraints.
hbond.upl the h-bond upper limit distance constraints.
*.pdb the coordinates generated by DYANA by using above constraints.
*.ovw the overview file generated by DYANA.
*_assOrder reports the NOE assignments for each peaks ordered by intra, seq, mid and long-range.
*_assSparky reports the  NOE assignments in Sparky format.  It can be loaded into
        Sparky and display assignment in SPARKY.  You can convert this file to other format for display.
*_match.gz  reports the details of the analysis process.
log* DYANA logs for structure caclulation. 
3. Your final structure and constraints are the in the last cycle.

4. Before structure calculation cycle is finished, all constraint files and intermediate results are stored in WorkingCycle directory.


How to measure M-score and average shift? --- The M score measures the input data quality. It is the fraction of the expected short-range two and three-bond connected NOEs that are not found in the peak list.  High M score indicates problems with resonance assignments, peak lists or global referencing.

    1. run `bin/autostructure -v -c control-file -o output_dir'
    2. check file  *_NA.Val in your output directory

    Note: The 1.0beta version only measures M score for 3D noesy.
 

    Example:  extracted from TMZIP_NA.Val report for TMZIP:

 
........... 
    # Summary for n15.noesy 
    Total simulated peaks: 44 Number of peaks NOT matched in the Peak List: 4 
    M score = 0.091 

    Average shift in HX: 0.001 Average ab-shift in HX: 0.0067(Assignment used: 39) 
    Average shift in X: 0.019 Average ab-shift in X: 0.075(Assignment used: 36) 
    Average shift in HX2: -0.018 Average ab-shift in HX2: 0.022(Assignment used: 40) 
    ........... 

    # Summary for c13.noesy 
    Total simulated peaks: 270 Number of peaks NOT matched in the Peak List: 83 
    M score = 0.31 

    Average shift in HX: 0.0061 Average ab-shift in HX: 0.0095(Assignment used: 100) 
    Average shift in X: 0.022 Average ab-shift in X: 0.083(Assignment used: 86) 
    Average shift in HX2: -0.0026 Average ab-shift in HX2: 0.014(Assignment used: 117) 
    ..............


How to measure I, L scores? In general, the I and L scores should be similar to M scores. If I/L is significantly high, it is an indication that the fold is not right.

option: -q structure-file

run `bin/autostructure -c control-file -o output_dir  -q structure-file'
check file  *_NA.Ref in your output directory

Note: The 1.0beta version only measures I,L score for 3D noesy.


How to exam input data quality?

Things to check:
1. M score  and average shift - the M score measures the input data quality. It is the fraction of the expected short-range two and three-bond connected NOEs that
are not found in the peak list.  High M score indicates problem with resonance assignments, peak lists or relative global referencing.  Average shift in each dimension should close to 0ppm. Otherwise global referencing is needed for that dimension. (see How to measure M-score and average shift?)
2. Most of the expect two, three and four bond connected intra and close sequential NOE should be assigned in the CYCLE1-0. If this is not the
case,  it indicates there is a problem related to resonance assignments, peak lists or global referencing.
Quick ways to check the expected assignments:

1) Look *_NA.ovw file:

TMZIP example: from TMZIP_NA.ovw
 
 
PeakList n15.noesy: 

                                    #NOESY_Assign Process# 

    Cycle Method Peak_Assigned Assignment_Made Assignment_Removed Peak_Not_Assigned Noise_Peak 
                 ------------- --------------- 
                 new   cum     new    cum 
    1-0   E      159   159     168    168     -                   222               13 
.......

This table shows there are 159 expected NOE assignment. As there are 38 residues. For each residues, HN-HA(i,i), HN-HB(i,i), Ha-HN(i,i+1), Hb-HN(i,i+1) NOEs are expected. The average total is 4*38 ~ 152, which are close to 159.

 same for c13-NOESY.

2) For detail, look *_assignOrder and *.upl file in the CYCLE1-0/.


How to understand and exam the secondary structure file?

The secondray structure plays an important role in AutoStructure.  We use CSI method to identify the secondary structure elements. Those
secondary structure elements are then refined based on the J, NH-slow and NOE data. The alignments between b-strands are also identified from
NOE data.  File *_NA.sec report the secondray structure information.

CSI Method --- check the consensus column. 1 -> helix, -1 -> sheet.  This results should be general consistent with your 3D structures, expect
for the N/C terminals.

    TMZIP example: (from TMZIP_NA.sec file)
 
 ........ 
    Summary: 
    #       AA      HA      CA      CB      SUM     CONSENSUS 
    -       --      --      --      --      ---     --------- 
    1       G       0       -1      0       NA      0 
    2       A       0       0       0       NA      0 
    3       G       0       0       0       NA      0 
    4       S       0       0       1       NA      0 
    5       S       0       0       0       NA      0 
    6       S       0       0       0       NA      0 
    7       L       0       1       0       NA      1 
    8       E       -1      1       -1      NA      1 
    9       A       -1      1       -1      NA      1 
    10      V       0       1       0       NA      1 
    11      R       -1      1       0       NA      1 
    12      R       -1      1       0       NA      1 
    13      K       -1      1       0       NA      1 
    14      I       -1      1       0       NA      1 
    15      R       -1      1       0       NA      1 
    16      S       -1      1       0       NA      1 
    17      L       0       1       1       NA      1 
    18      Q       -1      1       -1      NA      1 
    19      E       -1      1       0       NA      1 
    20      Q       0       1       -1      NA      1 
    21      N       -1      1       0       NA      1 
    22      Y       -1      1       -1      NA      1 
    23      H       -1      1       -1      NA      1 
    24      L       -1      1       1       NA      1 
    25      E       -1      1       -1      NA      1 
 ........

The consensuc column shows that from residue 7 --> 25 -> ... are helix.

 Combined analysis of CSI, NOE, Jval and/or NH  -- the N/C terminal of secoondary elements may extened or shorted from CSI results

    Alignments between b-strands are reported here.
    1) all possible registers and their scores.

    FGF Example:
            ** Poss:  Register for Antiparallel Strands 10-12 Found : 63 75 score: 8

    2) Validation -- registers that inconsistent with other registers are removed. If final self-consistent registers defined the b-sheets of the protein.

Final secondary structure and AntiParallel (Or Parallel) Strands Registers - This is used by AutoStructure to rule-in and rule-out possible assignments. Expected NOEs from the secondary structures are assigned and inconsistent NOEs from the secondary structures are rule-out.  Long range cross-strand NOEs that consistent with the alignments are assigned. Also cross-strand Hbond and helical Hbond are added in the structure calculation.

The registers are identified based on the frequency  (total number of expected long-range NOEs ) that are found in the peak list. When there are a lot of noise in the peak list, noise  can  increase the frequency of an incorrect register and the strands can then be aligned in a wrong way. In this case, manually check all register's related assigned long-range NOE is necessary.  We have noticed that the noise in HN-HN region distrubs the results.  A clean HN-HN region helps a lot. Other regions such as HA-HA, HA-HN are also important.

*** warning. If the aligment is wrong, you will definitely get a wrong fold. ***


How to run AutoStructure in a practical way?

First, we set nCycles=2 in the control-fiile, run AutoStructure and exam:
1) the input data quality of the resonance assignments, peak lists and referencing (see How to exam input data quality?).
2) the secondary structure results (see How to understand and exam the secondary structure file?)
3) peaks unassigned in the CYCLE1-1/, CYCLE1-2/....
If there are many peaks unassigned in CYCLE1-1, CYCLE1-2, ... ( in *_NA.ovw file), it is an indication that there is a
problem.  It is usually just a subset of the long-range constraints that are causing the problem. The fact that number of peaks unassigned is few
indicates that all assigned from CYCLE1-0 are self-consistent and it is OK to use the inital model to do iterative calculcaitons.

Ways to exam number of assigned and unassigned peaks in CYCLE1-0, 1-1, ...

Things to check:
1)  *_NA.unassign - This file list all peaks that unassigned.
2)  *_assOrder in CYCLE1-0/  - This file list all peaks that assigned.
3)  *_assSparky in CYCLE1-0/ - this is a peak list file with assigment from AutoStructure. It can be loaded into Sparky and display assignment in
SPARKY.  You can convert this file to other format for display.
4)  Due to the effect of the incompleteness of resonance assignments, NOE assignments that assigned by Unique method or SYM method may be
incorrect.  Many intra peaks may be assigned to long-range assignments as one of the real resonances is not assigned.  In many cases, long-range
unassigned NOE assignments turned out to be intra assignments to one of the unassigned atoms.

    Example 1: An Example that indicates there is a problem (from old RBFA data set):
 
......
PeakList hrnoesy3: 
                                    #NOESY_Assign Process# 

    Cycle Method Peak_Assigned Assignment_Made Assignment_Removed Peak_Not_Assigned Noise_Peak 
                 ------------- --------------- 
                 new   cum     new    cum 
    1-0   E      301   301     307    307      -                  8367              5708 
    1-0   U      22    323     22     329      -                  8345              5708 
    1-0   SYM    40    363     40     369      -                  8305              5708 
    1-0   E      5     368     7      376      -                  8300              5708 
    1-0   CF     3     371     3      379      -                  8297              5708 
    1-1   VIO    -     333     -      341      38                 8335              5708 
    2-0   PDB    430   763     444    785      -                  7905              5708 
.......

There are 38 peaks un assigned before CYCLE2-0.

 Example 2: After we made more resonance assigments and looked carefully at the NOESY crosspeaks corresponding to ALL (violated and satisified) long range constraints using SPARKY  ("guided peak list editing").  Here is the report that look like for RBFA data set
 
......
PeakList c13_0928_3: 

                                    #NOESY_Assign Process# 

    Cycle Method Peak_Assigned Assignment_Made Assignment_Removed Peak_Not_Assigned Noise_Peak 
                 ------------- --------------- 
                 new   cum     new    cum 
    1-0   E      886   886     948    948      -                  2229              621 
    1-0   U      9     895     9     957      -                  2220              621 
    1-0   SYM    70    965     70     1027     -                  2150              621 
    1-0   E      45    1010    59     1086     -                  2105              621 
    1-0   CF     4     1014    4      1090     -                  2101              621 
    1-1   VIO    -     1011    -      1087     3                  2104              621 
    2-0   PDB    542   1553    695    1782     -                  1562              621
......

  Now, there are more peaks assigned and much fewer unassigned.

4) This calculation may need to re-run several times.  After we are happy about the input data and the initial model, we then set nCycles=20 and run
it and we exam the results from the last cycle.  If the final results are not good, better resonance assignments and better peak list are needed.
Things to check:
1) percent of peaks assigned in the last cycle (reported in the *NA.ovw file).  We think that at least 60% of peaks should be assigned for c13-NOESY and ~70% of peaks should be assigned for n15-NOESY.

Example:
 
    ..... 
    PeakList c13_0928_3: 

    ...... 
                                       #Peak-Assignment Stat# 

    Cycle Total_Assignable_Peak Unambiguous_Assigned Ambiguous_Assigned Total_Assigned Peak_Not_Assigned 
          --------------------- -------------------- ------------------ -------------- ----------------- 
          #         %           #         %          #        %         #      %       #       % 
    1-0   2494      100         948       0.38       66       0.03      1014   0.41    1480    0.59 
    1-1   2494      100         945       0.38       66       0.03      1011   0.41    1483    0.59 
    2-0   2494      100         1356      0.54       206      0.08      1562   0.63    932     0.37 
    2-1   2494      100         1329      0.53       196      0.08      1525   0.61    969     0.39 
    ...... 
    20-0  2494      100         1390      0.56       412      0.17      1802   0.72    692     0.28 
    20-1  2494      100         1391      0.56       411      0.16      1802   0.72    692     0.28 
    20-2  2494      100         1391      0.56       411      0.16      1802   0.72    692     0.28 
    ......

For this data set, about 72% of peaks assigned.

2) quality-factors - I,L scores. In general, the I and L scores should be similar to M scores. If I/L is significantly high, it is an indication that the fold is not right.
3) number of constraints per residue. On average, we expect at least 15 conformationally constraints for each residue.
 


can not open input file OUTPUT2/Workingcycle/*.pdb

When running structure calculation, all constraints are stored in Workingcycle directories.  There are 14 log files in
that directory, corresponding to 14 calculations in the farm.  They are all the same as this:

     DYANA, version 1.5 (gnu, double precision)

     Copyright (c) 1996-98 ETH Zurich

     dyana> dyana> dyana>   - calc_para1: readdata IL13
         Library file "/usr/local/dyana-1.5/lib/lib/dyana.lib" read, 54 residue types.
         Sequence file "IL13.seq" read, 113 residues.
     *** ERROR: Illegal atom name "S" for residue CYS 29.
     dyana>

This means that atom name S in your manual constraints is not supported by DYANA.  It think the atom name is SG for
DYANA.




Thanks

We express our thanks to Dr. Keith L Constantine and Dr. Robert Powers for their comments and input on AutoStructure.
 



yphuang@cabm.rutgers.edu