Update the following enviorment variables defined in the script file 'bin/autostructure':
RASMOL rasmol command (viewer command, optional) DYANA dyana command DQS command to run parallel calculation (optional) ASBIN path of AutoStructure bin directory NALIB path of NOESY_Assign liberary AyudaPATH path of PDBSTAT liberary (optional)
Example: (the red-colored word need to be changed for your system)
#!/bin/sh #rasmol command
RASMOL=/usr/bin/X11/rasmol#dyana command
DYANA=/usr/local/bin/dyana#command to run parallel calculation
DQS=/usr/local/DQS/bin/qsub332#path of AutoStructure bin
ASBIN=/farm/software/AutoStructure-1.0beta/bin#path of Noesy_assign lib
NALIB=/farm/software/AutoStructure-1.0beta/noesy_assign-1.0beta/Lib#path of PDBSTAT lib
AyudaPATH=/farm/software/AutoStructure-1.0beta/PDBSTAT/Lib#export
export NALIB
export AyudaPATH
export ASBIN
export DQS
export DYANA
export RASMOL#start with noesy_assign
$ASBIN/noesy_assign $*
'/DQS/bin/qstat -f' shows all jobs in the queue system.This command will show you a list like this:
Queue Name Queue Type Quan Load State
---------- ---------- ---- ---- -----
bmw batch 1/1 1.01 er UP
roberto CalcStrBRCT SmtBRCT5.sh bmw 82863 04/10/102 17:52:13
broccoli batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT16.s broccoli 82852 04/10/102 17:52:13
cabbage batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT3.sh cabbage 82865 04/10/102 17:52:14
carrot batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT2.sh carrot 82866 04/10/102 17:52:14
corn batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT10.s corn 82858 04/10/102 17:52:13
cucumber batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT14.s cucumber 82854 04/10/102 17:52:13
eggplant batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT12.s eggplant 82856 04/10/102 17:52:13
falcon batch 0/1 0.00 er UP
ferrari batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT15.s ferrari 82853 04/10/102 17:52:13
garlic batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT8.sh garlic 82860 04/10/102 17:52:13
jaguar batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT4.sh jaguar 82864 04/10/102 17:52:14
lettuce batch 1/1 1.00 er UP
roberto CalcStrBRCT SmtBRCT1.sh lettuce 82867 04/10/102 17:52:14
lotus batch 1/1 1.01 er UP
roberto CalcStrBRCT SmtBRCT9.sh lotus 82859 04/10/102 17:52:13
olive batch 0/1 0.00 er UP
onion batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT11.s onion 82857 04/10/102 17:52:13
porsche batch 1/1 1.10 er UP
roberto CalcStrBRCT SmtBRCT7.sh porsche 82861 04/10/102 17:52:13
potato batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT13.s potato 82855 04/10/102 17:52:13
spinach batch 0/1 0.00 er UP
squash batch 0/1 0.00 er UP
tarzan batch 0/1 0.07 er UP
tomato batch 1/1 0.96 er UP
roberto CalcStrBRCT SmtBRCT6.sh tomato 82862 04/10/102 17:52:13
First column list all computers in the queue system (the farm). In this example, roberto (account name) is running 16 calculations on the farm.
control-file Example 11. General Section : the first part of control-file is the General section. This section gives the name of the protein, the names of the sequence file, resonance assignment file, J-list file, NH slow exchange file --> basicly all input files except peak lists. When manual analysis results such as upper limit distance constraints, dihedral angle constraints and h-bonds are avaiavible, these constraints may added in and used in structure calculation.
Keywords:
Example 1: (the red-colored word need to be changed for your data set)
proteinName the name of the protein seqFile the sequence file in bmrb format (example) chemicalShiftFile the resonance assignment file in bmrb format (example) JListFile J-List file (optional) in bmrb format (example) NHSlowList NH slow exchange file (optional) (example)
AutoStructure use slow amide exchange data to determine secondary structures and identify hydrogen bonds.ACO dihedral angle constraint file (optional) in DYANA format (example) HBOND h-bonds file (optional) (example) UPL upper-limit distance constraint file (optional) in DYANA format (example) par parameter file (optional, for advanced usage) nCycles nCycles is the maximum number of bootstrapping cycles. AutoStructure will stop when no more assignments can be made or after nCycles of bootstrapping. When AutoStructure stops normaly, it will print 'The program is finished.' at the end of the *_NA.ovw file. Sometimes, when the queue system is unstable, the calculation may be stopped by the queue system or hung there forever. In these cases, 'The program is finished.' will not be printed out in the *.ovw file.
CYCLE*-0 is a bootstrapping cycle. In this cycle, new assignments are made based on the 3D structures from last iteration. The maximum number of bootstrapping cycles are specified by nCycles.
CYCLE*-1 (or CYCLE*-2, etc) is a validation cycle. In this cycle, assignments that are consistently and significantly violated with the 3D structures from last iteration are unassigned, and 3D structures are re-computed. No new assignments are made in the validation cycle. There are no limitation on the number of validation cycles.
Example 2:
[General]
proteinName=FGF#input files except peak lists
seqFile=INPUT/seq.bmrb
chemicalShiftFile=INPUT/chemicalshift.bmrbStereo
JListFile=INPUT/FGFJval.bmrb
NHSlowList=INPUT/NHSlowList
ACO=INPUT/FGF.acoManual
HBOND=INPUT/hbond.dyaManual
UPL=INPUT/FGF.uplManual#you can comment this next line out if you want to use the default one
par=INPUT/par.tbl#max. nunber of bootstrapping cycles
nCycles=202. Command Section: the second part of control-file is the command section. All the command script are in the $ASBIN directory. There can only be one line after each command entry which is treated as a shell command line and can be commented out, if not using it.
[General] proteinName=TMZIP
#input files except peak lists
seqFile=INPUT/sequence.bmrb
chemicalShiftFile=INPUT/chemicalshift.bmrb
JListFile=INPUT/TMZIP.Jvalloose#you can comment this next line out if you want to use the default one
par=INPUT/par.tbl#max. nunber of bootstrapping cycles
nCycles=20Keyswords:
Example 1:
viewCommand View is a script to run rasmol command (viewer command, optional) hyperCommand hyper command, all option of hyper can be added at the end of -N (optional) (example) dyanaCommand CreateProc is a script to run parallel DYANA computing over the DQS system. `CreatProc TMZIP x y z' means calculating x*y structures using x machines and selecting the best z. On each machine, there are y structures calculated. CreateProcOne is a script that uses one cpu, no DQS system required. `CreatProcOne TMZIP 1 x y' means calculating x structures on one machine and select the best y.
Example:
`CreatProc TMZIP 14 4 10' means calculating 14*4 structures using 14 machines and selecting the best 10. On each machine, there are 4 structures calculated.
`CreatProcOne TMZIP 1 10 5' means calculating 10 structures on one machine and select the best 5.
cnsCommand a script to run cns command (under development, coming soon) Example 2:
# there can only be one line after each command entry which is treated as a shell command line and can be commented out.
#here only dyanaCommand is actived
# dyanaCommand: calc structures on 14 machines, each calc 4 and select best 10
[viewerCommand]
#$ASBIN/View
[hyperCommand]
#$ASBIN/hyper -N
[dyanaCommand]
$ASBIN/CreateProc FGF 14 4 103. PeakList Section: Each peak list is an entry in control file.
# there can only be one line after each command entry which is used as a shell command line and can be commented out.
# viewerCommand: for demo and view the 3d using rasmol
# hyperCommand: call for hyper
# dyanaCommand: calc structures on 14 machines, each calc 4 and select best 10
[viewerCommand]
$ASBIN/View
[hyperCommand]
$ASBIN/hyper -N
[dyanaCommand]
$ASBIN/CreateProc TMZIP 14 4 10Keywords:
Example : Peak list entries for a monomer protein (the red-colored word need to be changed for your dataset)
dimension the dimension of peak list
dimension = 2 means that the peak list has hx1 and hx2 dimensions.
dimension = 3 menas that the peak list has hx1, x1 and hx2 dimensions.
dimension = 4 means that the peak list has hx1, x1, hx2 and x2 dimensions. Only 4D CC-NOESY is supported right now.IC
haveICfor monomer: IC = 0, haveIC = 0
for homo-dimer ( detail see next section How to handle homo-dimer protein?) :
IC = 0, haveIC = 0 means the NOESY peak list have only intra chain NOEs.
IC = 0, haveIC =1 means the NOESY peak list have both intra chain and inter chain NOEs.
IC = 1, means the NOESY peak list have only inter chain NOEs. It is a X-filtered experiment.waterFlag if in water solution, waterFlag = 1;
if in D20 solution, waterFlag = 0;sign sign=1 tells the program to use half-dwell sampling in C/N to filter the possible assignments list.
sign=0 tells the program no half-dwell sampling filter is applied.
Both positive and negative peaks are considered in both cases
The current version only support C/N half-dwell sampling filter in 3D spectrum. The coming version(unreaseled yet) is different. It support half-dwell sampling filter in any H/C/N dimension.iperc the noise lever = highest intenisty in the peak list * iperc.
All peaks that below this noise lever are not assgined by Unique method.
If not specified, the default value is used.column col.intensity the intensity column col.label the label column. Label is a comment string column for user. User can write any string on that column, such as NOE-assignments.
This column is read in but not used by NOESY_Assign.col.id the id column col.hx1 the hx1 column col.hx2 the hx2 column col.x1 the x1 column (not used for 2D NOESY, if dimension = 2) and hx1--> x1 col.x2 the x2 column (not used for 2D or 3D NOESY, if dimension = 2 or 3) and hx2-->x2 tol hx1.tol
hx2.tol
x1.tol
x2.tolMatch tolerance for hx1, hx2, x1 and x2 dimensions in ppm.
x1.tol and x2.tol are not used for 2D NOESY
x2.tol is not used for 3D NOESYsw hx1.sw
hx2.sw
x1.sw
x2.swSweep width for hx1, hx2, x1 and x2 dimensions in ppm.
It is used to determined all possible aliased chemical shift positions. In the 1.0beta version, only C/N aliasing are supported. In the coming version, H aliasing is also supported.
The program may run faster given a large sweep width for unaliased dimension, such as sw=1000 or 10000.shift hx1.shift
hx2.shift
x1.shift
x2.shift`shift' is used to do global referencing.
If x1.shift=0.1, then all chemical shift in x1 dimension are added by 0.1ppm.
If your spectrum is well referenced with your resonance assignments, set all shift=0.type hx1.type
hx2.type
x1.type
x2.typeatom type for hx1, hx2, x1 and x2 dimensions.
type = H for proton
type = N15 for nigtron
type = C13 for carbon
2D NOESY 3D N15-NOESY (peak list example) 3D C13-NOESY (peak list example) 4D CC-NOESY [INPUT/2d.noesy]
# line above is the peak list file name#it is a 3d noesy
dimension=2IC=0
haveIC=0#in h2o
waterFlag=1#half-dwell sampling filter off
sign=0# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=5
col.label=2
col.id=1
col.hx1=3
col.hx2=4
# the match tolerance, sweep width, global reference, atom type for hx1
hx1.tol=0.03
hx1.sw=1000
hx1.shift=0
hx1.type=H# the match tolerance, sweep width, global reference, atom type for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H
[INPUT/n15.noesy]
# line above is the peak list file name#it is a 3d noesy
dimension=3IC=0
haveIC=0#in h2o
waterFlag=1#half-dwell sampling filter off
sign=0# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=6
col.label=2
col.id=1
col.hx1=3
col.hx2=4
col.x1=5
# the match tolerance, sweep width, global reference, atom type for hx1
hx1.tol=0.05
hx1.sw=13.44
hx1.shift=0
hx1.type=H# the match tolerance, sweep width, global reference, atom type for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H# the match tolerance, sweep width, global reference, atom type for x2
x1.tol=0.5
x1.sw=27.0
x1.shift=0
x1.type=N15[INPUT/c13.noesy]
# line above is the peak list file name#it is a 3d noesy
dimension=3IC=0
haveIC=0#in d2o
waterFlag=0#half-dwell sampling filter on
sign=1# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=6
col.label=2
col.id=1
col.hx1=3
col.hx2=4
col.x1=5
# the match tolerance, sweep width, global reference, atom type for hx1
hx1.tol=0.05
hx1.sw=9.16
hx1.shift=0
hx1.type=H# the match tolerance, sweep width, global reference, atom type for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H# the match tolerance, sweep width, global reference, atom type for x1
x1.tol=0.5
x1.sw=20.7
x1.shift=0
x1.type=C13[INPUT/c13.noesy]
# line above is the peak list file name#it a 4d noesy
dimension=4IC=0
haveIC=0#in d2o
waterFlag=0#half-dwell sampling filter on
sign=1# intensity is in column 6
# label is in column 2
# id is in column 1
# hx1 is in column 3
# hx2 is in column 4
# x1 is column 5
col.intensity=7
col.label=2
col.id=1
col.hx1=3
col.hx2=4
col.x1=5
col.x2=6# the match tolerance, sweep width, global reference, atom type for hx1
hx1.tol=0.05
hx1.sw=9.16
hx1.shift=0
hx1.type=H# the match tolerance, sweep width, global reference, atom type for hx2
hx2.tol=0.05
hx2.sw=1000
hx2.shift=0
hx2.type=H# the match tolerance, sweep width, global reference, atom type for x1
x1.tol=0.5
x1.sw=20.7
x1.shift=0
x1.type=C13# the match tolerance, sweep width, global reference, atom type for x2
x1.tol=0.5
x1.sw=20.7
x1.shift=0
x1.type=C13
1) For dimeric proteins, pseudo linkers (i.e. PL, LL5 and LP) that connect two chains are required to be added in order to run dyana calculations. This is set up
internally by AutoStructure to handle homodimer proteins. For monomer, no change is needed.Example: GlyTM1bZip datasets
traditional 3D n15-NOESY peak list
(it has both inter and intra chain NOE)traditional 3D c13-NOESY peak list
(it has both inter and intra chain NOE)3D X-filtered C13-NOESY peak list
(it has only interchain noes)[INPUT/n15.noesy]
dimension=3
IC=0
haveIC=1waterFlag=1
sign=0
col.intensity=5
col.label=6
col.id=1
col.hx1=2
col.hx2=3
col.x1=4hx1.tol=0.05
hx1.sw=10000
hx1.shift=0
hx1.type=Hhx2.tol=0.05
hx2.sw=10000
hx2.shift=0
hx2.type=Hx1.tol=0.5
x1.sw=10000
x1.shift=0
x1.type=N15[INPUT/c13.noesy]
dimension=3
IC=0
haveIC=1waterFlag=1
sign=0
col.intensity=5
col.label=6
col.id=1
col.hx1=2
col.hx2=3
col.x1=4hx1.tol=0.05
hx1.sw=10000
hx1.shift=0
hx1.type=Hhx2.tol=0.05
hx2.sw=10000
hx2.shift=0
hx2.type=Hx1.tol=0.5
x1.sw=10000
x1.shift=0
x1.type=C13[INPUT/c13IC.noesy]
dimension=3IC=1
waterFlag=1
sign=0
iperc=0.07col.intensity=5
col.label=6
col.id=1
col.hx1=2
col.hx2=3
col.x1=4hx1.tol=0.05
hx1.sw=10000
hx1.shift=0
hx1.type=Hhx2.tol=0.05
hx2.sw=10000
hx2.shift=0
hx2.type=Hx1.tol=0.5
x1.sw=10000
x1.shift=0
x1.type=C13
1. under the Output_dir directory:
2. In each Structure Calculation cycle:
*_NA.ovw general report about AutoStructure calculation. *_NA.sec information about secondary structure analysis. *_NA.note report of the preprocessing of the inputfiles. *_NA.exm complete report about the cycle0-0 analysis, providing information about why this peak is assigned or not assigned. *_NA.unassign peaks that unassigned during valication cycles. *.noise peaks that excluded from noesy_assign analysis (noise peaks). *_NA.Val If you calculate M-score *_NA.Val is the file to check. It also provides guide for chemical shift refinement. For atoms that shifts consistently more than 0.3ppm for C/N or 0.03ppm for H, it is recommended to adjust them manually. source a subdirectory of all inputfiles used in this calculation. 3. Your final structure and constraints are the in the last cycle.
*.upl the upper limit distance constraints. *.aco the angle constraints. hbond.lol the h-bond lower limit distance constraints. hbond.upl the h-bond upper limit distance constraints. *.pdb the coordinates generated by DYANA by using above constraints. *.ovw the overview file generated by DYANA. *_assOrder reports the NOE assignments for each peaks ordered by intra, seq, mid and long-range. *_assSparky reports the NOE assignments in Sparky format. It can be loaded into
Sparky and display assignment in SPARKY. You can convert this file to other format for display.*_match.gz reports the details of the analysis process. log* DYANA logs for structure caclulation. 4. Before structure calculation cycle is finished, all constraint files and intermediate results are stored in WorkingCycle directory.
Note: The 1.0beta version only
measures M score for 3D noesy.
Example: extracted from TMZIP_NA.Val report for TMZIP:
...........
# Summary for n15.noesy
Total simulated peaks: 44 Number of peaks NOT matched in the Peak List: 4
M score = 0.091Average shift in HX: 0.001 Average ab-shift in HX: 0.0067(Assignment used: 39)
Average shift in X: 0.019 Average ab-shift in X: 0.075(Assignment used: 36)
Average shift in HX2: -0.018 Average ab-shift in HX2: 0.022(Assignment used: 40)
...........# Summary for c13.noesy
Total simulated peaks: 270 Number of peaks NOT matched in the Peak List: 83
M score = 0.31Average shift in HX: 0.0061 Average ab-shift in HX: 0.0095(Assignment used: 100)
Average shift in X: 0.022 Average ab-shift in X: 0.083(Assignment used: 86)
Average shift in HX2: -0.0026 Average ab-shift in HX2: 0.014(Assignment used: 117)
..............
option: -q structure-filerun `bin/autostructure -c control-file -o output_dir -q structure-file'
check file *_NA.Ref in your output directoryNote: The 1.0beta version only measures I,L score for 3D noesy.
Things to check:
1. M score and average shift - the M score measures the input data quality. It is the fraction of the expected short-range two and three-bond connected NOEs that
are not found in the peak list. High M score indicates problem with resonance assignments, peak lists or relative global referencing. Average shift in each dimension should close to 0ppm. Otherwise global referencing is needed for that dimension. (see How to measure M-score and average shift?)
2. Most of the expect two, three and four bond connected intra and close sequential NOE should be assigned in the CYCLE1-0. If this is not the
case, it indicates there is a problem related to resonance assignments, peak lists or global referencing.Quick ways to check the expected assignments:1) Look *_NA.ovw file:
TMZIP example: from TMZIP_NA.ovw
PeakList n15.noesy: #NOESY_Assign Process#
Cycle Method Peak_Assigned Assignment_Made Assignment_Removed Peak_Not_Assigned Noise_Peak
------------- ---------------
new cum new cum
1-0 E 159 159 168 168 - 222 13
.......This table shows there are 159 expected NOE assignment. As there are 38 residues. For each residues, HN-HA(i,i), HN-HB(i,i), Ha-HN(i,i+1), Hb-HN(i,i+1) NOEs are expected. The average total is 4*38 ~ 152, which are close to 159.same for c13-NOESY.
2) For detail, look *_assignOrder and *.upl file in the CYCLE1-0/.
The secondray structure plays an important role in AutoStructure. We use CSI method to identify the secondary structure elements. Those
secondary structure elements are then refined based on the J, NH-slow and NOE data. The alignments between b-strands are also identified from
NOE data. File *_NA.sec report the secondray structure information.CSI Method --- check the consensus column. 1 -> helix, -1 -> sheet. This results should be general consistent with your 3D structures, expect
for the N/C terminals.TMZIP example: (from TMZIP_NA.sec file)
........
Summary:
# AA HA CA CB SUM CONSENSUS
- -- -- -- -- --- ---------
1 G 0 -1 0 NA 0
2 A 0 0 0 NA 0
3 G 0 0 0 NA 0
4 S 0 0 1 NA 0
5 S 0 0 0 NA 0
6 S 0 0 0 NA 0
7 L 0 1 0 NA 1
8 E -1 1 -1 NA 1
9 A -1 1 -1 NA 1
10 V 0 1 0 NA 1
11 R -1 1 0 NA 1
12 R -1 1 0 NA 1
13 K -1 1 0 NA 1
14 I -1 1 0 NA 1
15 R -1 1 0 NA 1
16 S -1 1 0 NA 1
17 L 0 1 1 NA 1
18 Q -1 1 -1 NA 1
19 E -1 1 0 NA 1
20 Q 0 1 -1 NA 1
21 N -1 1 0 NA 1
22 Y -1 1 -1 NA 1
23 H -1 1 -1 NA 1
24 L -1 1 1 NA 1
25 E -1 1 -1 NA 1
........The consensuc column shows that from residue 7 --> 25 -> ... are helix.
Combined analysis of CSI, NOE, Jval and/or NH -- the N/C terminal of secoondary elements may extened or shorted from CSI results
Alignments between b-strands are reported here.
1) all possible registers and their scores.FGF Example:
** Poss: Register for Antiparallel Strands 10-12 Found : 63 75 score: 82) Validation -- registers that inconsistent with other registers are removed. If final self-consistent registers defined the b-sheets of the protein.
Final secondary structure and AntiParallel (Or Parallel) Strands Registers - This is used by AutoStructure to rule-in and rule-out possible assignments. Expected NOEs from the secondary structures are assigned and inconsistent NOEs from the secondary structures are rule-out. Long range cross-strand NOEs that consistent with the alignments are assigned. Also cross-strand Hbond and helical Hbond are added in the structure calculation.
The registers are identified based on the frequency (total number of expected long-range NOEs ) that are found in the peak list. When there are a lot of noise in the peak list, noise can increase the frequency of an incorrect register and the strands can then be aligned in a wrong way. In this case, manually check all register's related assigned long-range NOE is necessary. We have noticed that the noise in HN-HN region distrubs the results. A clean HN-HN region helps a lot. Other regions such as HA-HA, HA-HN are also important.
*** warning. If the aligment is wrong, you will definitely get a wrong fold. ***
First, we set nCycles=2 in the control-fiile, run AutoStructure and exam:
1) the input data quality of the resonance assignments, peak lists and referencing (see How to exam input data quality?).
2) the secondary structure results (see How to understand and exam the secondary structure file?)
3) peaks unassigned in the CYCLE1-1/, CYCLE1-2/....If there are many peaks unassigned in CYCLE1-1, CYCLE1-2, ... ( in *_NA.ovw file), it is an indication that there is a4) This calculation may need to re-run several times. After we are happy about the input data and the initial model, we then set nCycles=20 and run
problem. It is usually just a subset of the long-range constraints that are causing the problem. The fact that number of peaks unassigned is few
indicates that all assigned from CYCLE1-0 are self-consistent and it is OK to use the inital model to do iterative calculcaitons.Ways to exam number of assigned and unassigned peaks in CYCLE1-0, 1-1, ...
Things to check:
1) *_NA.unassign - This file list all peaks that unassigned.
2) *_assOrder in CYCLE1-0/ - This file list all peaks that assigned.
3) *_assSparky in CYCLE1-0/ - this is a peak list file with assigment from AutoStructure. It can be loaded into Sparky and display assignment in
SPARKY. You can convert this file to other format for display.
4) Due to the effect of the incompleteness of resonance assignments, NOE assignments that assigned by Unique method or SYM method may be
incorrect. Many intra peaks may be assigned to long-range assignments as one of the real resonances is not assigned. In many cases, long-range
unassigned NOE assignments turned out to be intra assignments to one of the unassigned atoms.Example 1: An Example that indicates there is a problem (from old RBFA data set):
......
PeakList hrnoesy3:
#NOESY_Assign Process#Cycle Method Peak_Assigned Assignment_Made Assignment_Removed Peak_Not_Assigned Noise_Peak
------------- ---------------
new cum new cum
1-0 E 301 301 307 307 - 8367 5708
1-0 U 22 323 22 329 - 8345 5708
1-0 SYM 40 363 40 369 - 8305 5708
1-0 E 5 368 7 376 - 8300 5708
1-0 CF 3 371 3 379 - 8297 5708
1-1 VIO - 333 - 341 38 8335 5708
2-0 PDB 430 763 444 785 - 7905 5708
.......There are 38 peaks un assigned before CYCLE2-0.
Example 2: After we made more resonance assigments and looked carefully at the NOESY crosspeaks corresponding to ALL (violated and satisified) long range constraints using SPARKY ("guided peak list editing"). Here is the report that look like for RBFA data set
......
PeakList c13_0928_3:#NOESY_Assign Process#
Cycle Method Peak_Assigned Assignment_Made Assignment_Removed Peak_Not_Assigned Noise_Peak
------------- ---------------
new cum new cum
1-0 E 886 886 948 948 - 2229 621
1-0 U 9 895 9 957 - 2220 621
1-0 SYM 70 965 70 1027 - 2150 621
1-0 E 45 1010 59 1086 - 2105 621
1-0 CF 4 1014 4 1090 - 2101 621
1-1 VIO - 1011 - 1087 3 2104 621
2-0 PDB 542 1553 695 1782 - 1562 621
......Now, there are more peaks assigned and much fewer unassigned.
it and we exam the results from the last cycle. If the final results are not good, better resonance assignments and better peak list are needed.Things to check:
1) percent of peaks assigned in the last cycle (reported in the *NA.ovw file). We think that at least 60% of peaks should be assigned for c13-NOESY and ~70% of peaks should be assigned for n15-NOESY.Example:
.....
PeakList c13_0928_3:......
#Peak-Assignment Stat#Cycle Total_Assignable_Peak Unambiguous_Assigned Ambiguous_Assigned Total_Assigned Peak_Not_Assigned
--------------------- -------------------- ------------------ -------------- -----------------
# % # % # % # % # %
1-0 2494 100 948 0.38 66 0.03 1014 0.41 1480 0.59
1-1 2494 100 945 0.38 66 0.03 1011 0.41 1483 0.59
2-0 2494 100 1356 0.54 206 0.08 1562 0.63 932 0.37
2-1 2494 100 1329 0.53 196 0.08 1525 0.61 969 0.39
......
20-0 2494 100 1390 0.56 412 0.17 1802 0.72 692 0.28
20-1 2494 100 1391 0.56 411 0.16 1802 0.72 692 0.28
20-2 2494 100 1391 0.56 411 0.16 1802 0.72 692 0.28
......For this data set, about 72% of peaks assigned.2) quality-factors - I,L scores. In general, the I and L scores should be similar to M scores. If I/L is significantly high, it is an indication that the fold is not right.
3) number of constraints per residue. On average, we expect at least 15 conformationally constraints for each residue.
can not open input file OUTPUT2/Workingcycle/*.pdb
When running structure calculation, all constraints are stored in Workingcycle directories. There are 14 log files in
that directory, corresponding to 14 calculations in the farm. They are all the same as this:DYANA, version 1.5 (gnu, double precision)
Copyright (c) 1996-98 ETH Zurich
dyana> dyana> dyana> - calc_para1: readdata IL13
Library file "/usr/local/dyana-1.5/lib/lib/dyana.lib" read, 54 residue types.
Sequence file "IL13.seq" read, 113 residues.
*** ERROR: Illegal atom name "S" for residue CYS 29.
dyana>This means that atom name S in your manual constraints is not supported by DYANA. It think the atom name is SG for
DYANA.
We express our thanks to Dr. Keith L Constantine and Dr. Robert Powers
for their comments and input on AutoStructure.