Página Personal de Roberto Tejero

                                Selecciona español         Pick   english


Inicio


Presentación
   Detalles
   Investigacion
   Proyectos
   Publicaciones

Modelización
   Mecánica Molec.
   Dinámica Molec.
   Homología
   Análisis estruc.

Granjas PC
    Hardware
    Software

PdbStat

Water Ref

CASD-NMR

Manuales

Documentos

Galería

Programas


Enlaces

DQS 3 Installation and Maintenance

Distributed Queueing System - 3.1.3

Installation And Maintenance Manual

August 28, 1996


Introduction


The Distributed Queuing System (DQS)

The Distributed Queuing System (DQS)is an experimental batch queuing system which has been under development at the Supercomputer Computations Research Institute (SCRI) at Florida State University for the past 7 years. The first years of this activity were funded by the Department of energy Contract DE-FC0585ER250000. DQS is freely distributed to all parties with the understanding that it continues to be an evolving development system, and no warranties should be implied by this distribution.

DQS is intended to provide a mechanism for the management of requests for execution of batch jobs on one or more members of a homogeneous or heterogeneous network of computers. Facilities for load-balancing, prioritization and expediting of a wide variety of computational jobs are included to assist each site in tailoring the behavior of the system to their particular environment.

SCRI support

SCRI will make every effort, within its resources to assure that DQS is suitable for operation as a batch queuing system in as many site situations as possible. SCRI staff will respond to requests for assistance as well as investigating bugs, incorporating repairs and updating documentation, from those who are utilizing DQS. However it is not possible, at this time, to make a formal commitment for the long term support and enhancement of this system. Any user or organization which decides to adopt DQS will be assuming all risks from that undertaking.

With this release, DQS 3.1.3, the distribution and support of the previous version of DQS 3.1.2.4 will be continued for at least the balance of calendar year 1996. Depending on the need for continued support and SCRI resource availability .some level of support may be continued beyond that time. We feel, however that since the DQS 3.1.3 system is based on the DQS 3.1.2.4 release most users will be using DQS 3.1.3 in preference to DQS 3.1.2.4 in the near future.

DQS 3.1.3 and future enhancements can be obtained by Internet ftp from "ftp.scri.fsu.edu".

Announcements of new releases and improvements will be emailed to anyone who contacts SCRI to add their name to the announcement list. This is done by:

send email to: dqs-announce@scri.fsu.edu

Leave the "subj:" field blank

Send a one line message: subscribe

Names can be removed from this announcement list by:

send email to: dqs_announce@scri.fsu.edu

Leave the "subj" field blank

Send a one line message: unsubscribe

Bug reports should be sent to : dqs@scri.fsu.edu

DQS user information exchange is provided by Rensselaer Polytechnic Institute. To add your name and email address to this list:

Send email to dqs-l@vm.its.rpi.edu

Leave the "subj:" line blank

Send a one line message: SUBSCRIBE dqs-l 1stname Lastname

To remove name and email address:

Send email to dqs-l@vm.its.rpi.edu

Leave the "subj:" line blank

Send a one line message: UNSUBSCRIBE dqs-l 1stname Lastname

Where 1stname is the user's first name and Lastname is the user's last name.

With the release of DQS 3.1.3 the user intercommunication through dqs_user@scri.fsu.edu will be re-instituted . All messages, inquiries, announcements from any user or the DQS development staff will be relayed to all other users automatically.

What's New in DQS 3.1.3

The release of DQS 3.0 was a major departure for the DQS evolution. It was based on several years' experience with DQS 2.1 in a variety of computing environments. Although it retained many features of the 2.1 version, DQS 3.0 was a major restructuring and re-coding of the basic system with a major focus on supporting parallel (clustered) computation on two or more UNIX based hardware platforms. The newly emerging message passing scheme (MPI) was considered throughout the DQS 3.0 implementation.

In early 1995 DQS 3.0-3.1 was subjected to extensive testing and the contributions of numerous users were incorporated to produce DQS 3.1.2 which was released in March and augmented over a period of six months to become DQS 3.1.2.4. With the exception of some minor "improvements this system has been fairly stable and in operational use for nine months.

Operational experience at SCRI and other large production sites revealed several features which needed to be added or adapted to make the system easier to use or to manage. Several sites provided the DQS development team with valuable insight, advice and code which has been incorporated into this new release. Although all user interfaces have not been changed (albeit "enhanced") the internals of this system have undergone considerable change, hence the naming of this release as 3.1.3 instead of 3.1.2.5. We took this opportunity to restructure the documentation (one more time!) in response to numerous requests to make it easier to access. In addition to numerous bug-fixes for DQS 3.1.2.4 provided by several very helpful sites (see "acknowledgments") a number of new features have been added to the system.

The "new" features of DQS 3.1.3 tend to be somewhat invisible to the DQS user. The bulk of this effort has been focused on further "bulletproofing" the system to minimize, if not eliminate, the unreported termination of daemons , utilities and jobs. Some features are "semi-visible" such as the revised scheduling system. A few are quite evident to all, as the "job pre-validation" feature returns immediate feedback on the complete absence of a requested resource. With this in mind we list here the major changes which appear in DQS 3.1.3:

Job pre-validation

When a job is submitted to DQS using the QSUB utility it is checked to make ensure:

  1. The "fixed" resources requested as HARD are present somewhere in an existing DQS complex. If the resource is in use by another job it is still considered "present" for the purposes of pre-validation.
  2. The "consumable" resources reqested as HARD are present in at least one DQS Consumables file. If the resource is in use by another job it is still considered "present" for the purposes of pre-validation.

Consumable Resources

Many sites are confronted with the need to allocate scarce resources to jobs during the scheduling process. Resources such as FORTRAN compiler licenses, data base licenses, shared memory and disk space can be assigned names and values by the DQS administrator. Job scheduling the reconciles requests for Consumable Resources and when a job is placed into execution the available amount of these resources is reduced until the job terminates or releases the resource with a DQS system utility. Facilities for managing the Consumable Resource reservoir have been added to the QCONF utility.

qhold, qrls

The QHOLD and QRLS utilities have been implemented. These permit a user or administrator to place a "hold" on an already submitted job until it has been released by the user or the DQS administrator or removed using the QDEL utility.

qmove(multi-cell job transfer)

The POSIX utility QMOVE has been partially implemented for this release. In a single cell system a queued job can be moved from one queue to another using the QALTER utility. This has involved using the "-q" option to explicitly identify a target queue. Or when queues are implicitly specified on the basis of resource requests ("-l" option) the QALTER utility may be used to change the resource request.

In a multi-cell system the QMOVE utility must be used to initiate the transfer of a job from consideration by one cell to another. The QMOVE request is pre-validated as any other QSUB submission, and the job will not be moved if it cannot pass this first level test.

"fair use" scheduling

The DQS scheduler has been rewritten .. again. Of many components in an operating system the scheduling process is the most perplexing and complex feature to provide in an adequately general form. The DQS 3.1.3 scheduler has been commented the code blocked out in a manner which we hope will make site modifications easier and more comprehensible. The scheduling methodology now in use at SCRI is provided as the default in this release. It attempts to prevent one or two users to dominate the utilization of the system resources, while keeping all hosts as busy as possible.

Those submitting massive quantities of jobs to the system at one time will discover four levels at which their jobs will be handled by the scheduler. First, there is a limit on how many jobs will be accepted at QSUB time. Second, there is a limit on how many jobs in the queue for a single user will be considered by the scheduler. And third, the user's jobs which are considered for scheduling will be assigned sub-priorities according to their DQS sequence number and the number of jobs for that user preceding it in the queue. Finally a queue can be assigned a time delay which is imposed between consecutive allocations of that queue to the same user.

FORTRAN / "C" resource requests

In DQS 3.1.2.4 resources are requested by using the form "-l qty.eq.1,mem.gt.32,disk.gt.64" (for example) DQS 3.1.3 permits the retention of this format but the user may now use either FORTRAN or "C" syntax for these requests. The above example could then appear as "-l qty==1&&mem>32&&disk>64" or, alternatively "-l qty.eq.1.AND.mem.gt.32.AND.disk.gt.64". The logical operators ".NOT." (or "!") and ".OR.." (or ||) may also be used, as well as parenthesis to increase readability. Future releases will permit more complex, compound resource requests with the ability to specify alternative resources which could satisfy the request. (This is different from the using HARD and SOFT classifications.) For the time being parenthesis only assist in readability, as in "-l qty==1&&(mem>32)&&(disk>64)" .

subordinate queues

DQS 2.1 introduced a feature known as "subordinate queues" which provided the capability to identify a queue as being subordinate to another queue. If a job is running in the subordinate queue and a job is launched in its "superior" queue the subordinate job is suspended until termination of the "superior" job. This feature is particularly important when managing a system where hosts can function both as single processor and multiple processor platforms. DQS 3.1.3 provides a re-implementation of this feature.

SMP AFS re-authentication

DQS 3.1.2.4 provided a simple facility for operating in an AFS environment. Actual use of this system uncovered a number of problems. The most significant of these was solved by Axel Brandes and Bodo Bechenback and is incorporated in DQS 3.1.3. A key element of their solution involves the use of a temporary daemon which we call the Process Leader and others call the "process sheep-herder".

The Process Leader is spawned by the dqs_execd and does the actual job launching and cleanup. It can respond to system requests which the job is not equipped to deal with, such as the AFS periodic re-authentication task. This capability also makes it possible to run multiple jobs for the same queue on the same host, and to detach the job from the dqs_execd daemon in case that daemon needs to be restarted, without killing the job.

qmaster<->dqs_execd synchronization

A glaring shortcoming in DQS 3.1.2.4 was the lack of synchronization among the DQS daemons. Under some circumstances the queue status maintained by the qmaster did not reflect the actual state of jobs handed off to the dqs_execd. There was no mechanism for making the two states congruent, other than using the "clean queue" (QCNF -cq ) mechanism, which only affected the qmaster view of the system. DQS 3.1.3 has implemented auxiliary communications between the qmaster and dqs_execd to provide for automatic and manual methods of re-synchronizing the system.

Programmed aborts of the dqs_execd using the system "abort" or "exit" commands has been eliminated. Instead, all dqs_execd errors previously considered fatal are now communicated to the qmaster which emails an urgent message to the DQS administrator and pauses the dqs_execd until the administrator can intervene. Note that if a job is running under the Process Leader management, it will continue execution, ignorant of the dqs_execd pause. (If the dqs_execd error is due to a failure in the dqs_execd<->qmaster interface, the dqs_execd independently mails the urgent cry for help to the administrator.)

parallel job consistency and accounting

In DQS 3.1.2 parallel job scheduling handed off parallel jobs when sufficient queues became available for the execution of the requested number of processes. However only the dqs_execd which was managing the MASTER process was aware of the parallel job and the only accounting information obtained for the job was from the MASTER host..

DQS 3.1.3. scheduling alerts all of the SLAVE queue managers to the fact that they will be running one of the parallel job processes. When the parallel job is launched by the MASTER dqs_execd, each of the SLAVE dqs_execd verify that they are permitted to participate in that job before the slave process is started. A Process Leader is used to launch each of these slave processes and at their termination, accounting information is gathered and sent to the qmaster. This ensures that DQS is in charge of the execution of all parallel job components. In the event that a LINDA parallel job is involved, the Process Leader is initiated and it waits for the LINDA process to be started by the master process on the MASTER host. It then attaches itself to this process (since it cannot launch it itself) in order to handle termination and accounting reporting.

qidle integration

In DQS 3.1.2.4 the QIDLE utility was part of the X-windows component of the system and interfaced with DQS by invoking the QMOD utility as a separate task. This created several problems, the principle one being that at many sites the "system console" was connected to a host which was also managing a DQS queue. Since such a console usually has many users accessing it, there is not one single "owner" for the queue on that machine with permission to invoke the suspension of the queue n order to use the console.

The QIDLE function in DQS 3.1.3 is now an authenticated system utility like QMOD, QDEL, etc. It communicates with the qmaster itself and can suspend queues on a host for which the QIDLE function is permitted to run in an X-windows environment.

enhanced status displays

The somewhat cryptic symbols "a" "e" "r" "u" "s" in the QSTAT display have been replaced with more descriptive words ALARM, ENABLED, RUNNING,UNKNOWN, SUSPENDED. More important, the reason why a job is residing in the PENDING queue are listed. Thus between the pre-validation of jobs and this description of PENDING causes DQS 3.1.3 should have eliminated the most common problems of jobs never executing because they had requested non-existent or illogical combinations of resources.

accounting tools

The DQS accounting information can play a key role in the management and optimization of system resources. In the operational environment at SCRI we have developed a small collection of tools for extracting, summarizing and analyzing DQS accounting data. These have been included in DQS 3.1.3 as a starting point for other sites to develop their own methods.

"Streamlined Installation"

Many sites will find that the installation of DQS has been "streamlined" requiring less interaction to prepare a basic system for configuration and testing. Sites which are already running DQS 3.1.2.4 and have the need for extensive local adaptations will use the more complex "custom" installation process or the manual editing of Makefile.proto, def.h, and dqs.h with which they are already familiar. The new installation process is based on the GNU Autoconf package

job "wrapper" scripts

DQS 3.1.3 provides a mechanism for executing site-defined scripts upon termination of the queued job. This script is executed by the Process Leader and hence posseses root permissions which can be handy for specialized cleanup operations. This is important for system which support PVM and P4 daemons which may have to be stopped by the system when the MASTER process terminates abnormally.

elective linking or copying of output files during job execution

DQS 3.1.3 supports the special handling of files on a host's local disk during execution of a job, without the intervention of the user. Options set in the DQS conf_file determine whether the output files are to be left in place on the local disk, linked to a site-defined file system or copied to a site-defined file system.

Logging improvements

All DQS log entries are now time stamped with the local time of the qmaster host system. The DEBUG and DEBUG_EXT output is now written to a file (defined in def.h) instead of stderr. This minimizes the jumbling of file output when several processes attempt to write the file simultaneously. All error messages are now numbered and an appendix to this document lists these error messages and suggests remedial actions when appropriate.

Documentation

The DQS 3.1.3 Documentation has been reorganized.. .again. The POSIX specification has been extricated from the document body and is now an Appendix. The reference manual pertains only to the DQS 3.1.3 implementation and all confusing references to "standard" and "non-standard" options removed.

The documentation consists of three principle chapters and three appendices. The Installation and Maintenance Manual is primarily aimed at the DQS system administrator. The User Guide is obviously targeted at the DQS user community. The Reference Manual will be accessed by both users and administrators. Appendix A contains a catalog of all DQS error messages with information on methods for dealing with the error. Appendix B contains the POSIX specification on which DQS 3.1.3 is based. Appendix C contains several miscellaneous sections, including installation variants and system tuning guidelines.

The documentation is supplied in several forms:

  1. Microsoft WORD (6.0 or 7.0 )
  2. PostScript
  3. HTML format (can be viewed with MOSAIC or any of the commercial WEB browser products.

Installation

DQS is designed to be installed on almost every existing UNIX platform. This process thus must cope with many differences and idiosyncrasies of the varied hardware configurations and operating systems. DQS 3.1.3 attempts to detect and resolve these differences to minimize the need for operator actions, but with even the simplest installation there will be a need for some input from the DQS administrator.

Obtaining DQS 3.1.3

DQS 3.1.3 can be obtained by ftp download from ftp.scri.fsu.edu/pub/dqs. The README.313 file in that directory will indicate which version should be downloaded. To reduce download bandwidth, improvements and big-fixes will be distributed on a file-by-file replacement basis rather than requiring a complete download of the DQS 3.1.3 system. For this reason we do not envision distributing systems such as DQS 3.1.3.1….DQS.1.3.n in the future. (But you never know.)

Setting up for installation

DQS 3.1.3 is distributed as a compressed TAR file. After this file is uncompressed it is recommended that the DQS system be extracted (with TAR) into a directory which is accessible by all operating systems for which DQS will be built. The DQS installation process will create a separate directory in the sub-directory ….DQS/ARCS for each different architecture/operating system .

Once the DQS tree has been extracted the installation process can be commenced by changing to ../DQS as a working directory and typing "install". This UNIX script will execute the system evaluation procedures and produce a description of the system on which the installation is being done. Three choices are offered to the administrator, "quick install", "Custom Install" and "quit installation".

"Quick" Install

A very simplistic "Quick Install" feature is provided to assist in an initial installation of DQS. For those site who are testing DQS for the first time we recommend using this method. Choosing all of the defaults will result in an unrealistic operating environment for DQS 3.1.3 but will offer a sample of the system

The choice of "Quick Install" produces a list of defaults which will be used for the installation. The user is asked to review this list to ensure that it meets their requirements. The default-cell name and default initial queue name are derived from the host-name of the machine on which the installation process is being executed. If the installation is being executed as "root" the system will be setup to use "reserved" ports for communication, otherwise "non-reserved" ports will be utilized.

The "quick" installation method is intended for new DQS sites which wish to experiment/evaluate DQS 3.1.3 and develop some experience on which to base an operational system setup. If the installation parameters are acceptable the user type "y" to accept them and begin the actual installation.

The installation proceeds in six stages,

  1. First the GNU configure program is used to determine installation parameters for the host being used for the installation process. One of the directories modified by the GNU configuration program is the DQS CONFIG. Once it is updated , the DQS config utility is then built on that host platform.
  2. . The DQS config program then asks the user to provide a base directory to use for the installation of DQS binaries, libraries and documentation as well as the DQS configuration and resolve files and directories The first step uses the GNU configure program to determine the system for use by the qmaster and the dqs_execd . The Default paths offered by the dialogue are based on the current working directory(if running as non-root) or /usr/local//DQS (when running as root). This latter path is commonly used at DQS sites as all hosts of a common architecture often share the path "/usr/local". The simple install will only request on starting point for building a DQS313 tree. If the administrator wishes to differentiate the various components, binaries, libraries, spool directories, etc. They can type "CUSTOM" when asked to enter an alternative base path.
  3. The next step invokes the make operation to create all of the DQS 3.1.3 executables The binaries are placed in a subdirectory within the ../DQS/ARCS directory named for the specific platform being built This provides a separate repository for each type of host system in the cluster. NOTE The addition of "qidle" has created some installation problems on SOLARIS platforms not using the GNU "C" compiler. If error messages appear related to missing X Windows include files or libraries the DQS administrator may have to add appropriate compiler or linker directives to the Makefile.proto AFTER the "configure" step is completed.
  4. The fourth step moves the binaries to the directory from which they will be executed. This process renames the executables by placing a tag "313" at the end of each name The fourth step move the sample conf_file and resolve file to the conf directory. This is done to differentiate these binaries from other DQS versions which might have preceded the DQS313.
  5. The next step involves the addition of the three DQS 3.1.3 entries to the /etc/services file on one or more hosts. This step must be done with root permission and by someone familiar with UNIX system administration knowledge. While DQS attempts to identify proper port numbers to be used in the /etc/services file, local conditions may dictate another choice. Upon successful completion of the installation the administrator can proceed to "Testing the Installation". If error message appear and the installation is aborted the administrator should refer to "Solving Installation Problems".
  6. Finally the administrator should proceed to the step "Testing the DQS313 system.

Custom Install

"Custom" Installation presents the administrator with the same default configuration as the "Quick" install process. Any of the parameters can be changed by the administrator before the installation proceeds. Two choices are presented to the administrator. The first initiates an interactive session where each parameter is displayed, the proposed default and, if a previous installation has been completed the prior setup value. The administrator may choose either of the displayed values or enter their own parameter.

During this interactive exchange each parameter is validated for consistency with the host system as well as DQS. Upon completing the interactive setup the administrator may proceed with the same installation steps as the "Quick" installation:

The installation proceeds in five stages, during most of these the DQS administrator must make a selection as requested by the config program.

  1. First the GNU configure program us used to determine installation parameters for the host being used for the installation process. One of the directories modified by the GNU configuration program is the DQS CONFIG. Once it is updated , the DQS config utility is then built on that host platform. The GNU configure program will attempt to build al the Makefiles to use the GNU "C" compiler "gcc" If the administrator wishes to use an alternative compiler for any phase the following files must be modified AFTER the GNU configure step is complete: CONFIG/Makefile.proto.in, SRC/Makefile.proto.in and DQS/Makefile.proto.
  2. . The DQS config program then asks the user to provide a base directory to use for the installation of DQS binaries, libraries and documentation as well as the DQS configuration and resolve files and directories This base directory will the be used to provide a "default path" for all items requiring the entry of a file path. The Default paths offered by the dialogue are based on the current working directory(if running as non-root) or /usr/local//DQS (when running as root). This latter path is commonly At each interactive step, a default value is presented. Typing a question mark "?" will provide a brief comment about that entry (which is intended to be helpful). A more detailed explanation of each item to be entered may be found in "Appendix C Miscellaneous - "Key System Variables and Manual Installation"
  3. The next step invokes the make operation to create all of the DQS 3.1.3 executables The binaries are placed in a subdirectory within the ../DQS/ARCS directory named for the specific platform being built This provides a separate repository for each type of host system in the cluster.
  4. The fourth step moves the binaries to the directory from which they will be executed. The target during the configure process. binary directory is prescribed by the administrator This process renames the executables by placing a tag "313" at the end of Each name The fourth step move the sample conf_file and resolve file to the conf directory. This is done to differentiate these binaries from other DQS versions which might have preceded the DQS313.
  5. The next step involves the addition of the three DQS 3.1.3 entries to the /etc/services file on one or more hosts. This step must be done with root permission and by someone familiar with UNIX system administration knowledge. While DQS attempts to identify proper port numbers to be used in the /etc/services file, local conditions may dictate another choice.
  6. Upon successful completion of the installation the administrator can proceed to "Testing the Installation". If error message appear and the installation is aborted the administrator should refer to "Solving Installation Problems".

An optional approach is available to the knowledgeable DQS administrator which omits all interaction . This requires the editing of three DQS files used during the make process. Details for this approach may be found in the Appendix C Miscellaneous - "Key System Variables and Manual Installation"

The Graphical Interface

The X-windows based DQS graphical interface is installed as a separate step. Change directory to … DQS/XSRC and read the INSTALL script. The X-Windows interface is being restructured and will be intgerated fully in future DQS releases.

Testing the installation

The installation process creates a series of directories and subdirectories and two crucial files, the "conf_file" (configuration file) and the "resolve_file". If the system installation was completed correctly the conf_file will contain information which will be read by every DQS binary files when it is started. This includes the DQS daemons, qmaster and dqs_execd, and the DQS interface "utilities" qsub, qdel, qmod, qconf, qstat, qrls, qhold and qmove. It is best that these two files are accessible through an NFS/AFS/DFS file cross-mounting. If that is not possible then the administrator must ensure that identical copies of these files are present on each host.

Once the binaries have been moved to their execution directory (we will use the path /usr/local/DQS/bin" for all future examples), the qmaster can be started. If during the installation process the administrator chose "FALSE (NO)" when asked the question "Reserved ports?", then the /etc/services file will have been updated (by a root user) with the three entries suggested by the config process (or a rational alternative). The conf_file will contain the names of these entries along with the DEFAULT_CELL name which must match the first entry on the first (non-commented) line in the resolve file. The administrator should make a visual check of these three crucial files, conf_file, resolve_file and ./etc/services to make sure that they conform to these requirements.

QMASTER

<The qmaster manages all resources for a single DQS cell.>

Once satisfied that all is well the qmaster can be started by typing "/usr/local/DQS/bin/qmaster. (We will use the 313 appendage in all future discussions.) On this first occasion, it would be useful to check that the process has actually started by viewing the UNIX process status (ps). If the qmaster name does not appear in the hosts process list, the administrator should check the "err_file" in the qmaster spool directory (chosen during the DQS config stage-default: " /usr/;local/DQS/common/conf").

If the qmaster appears to be operating, it can be tested by executing the command "/usr/local/DQS/bin/qstat313 -f", on the same host where the qmaster313 is running A normal response to this command would be one or more lines of output describing the status of the current queues. For brand new installations this will be simply a header with no other lines. Error messages may appear if things are not quite :in harmony", refer to "DQS Error Messages" and "Solving Installation Problems: for assistance in this case.

DQS_EXECD

<The dqs_execd is a DQS daemon which resides on each host which has at least one queue and will be executing DQS managed jobs.>

If the "qstat313" command succeeds, it is time to start a dqs_execd, which actually manages a particular queue. For this test, on the same host where the qmaster "dwelleth" type the command "/usr/local/DQS/bin/dqs_execd313".. Again the UNIX process status should be examined (ps). If the dqs_execd is not executing, refer to the err_file for significant error messages. Consult "DQS Error Messages" and "Solving Installation Problems: for assistance.

Executing the command "qconf -aq" ( queue configuration, add queue) will produce an edit session with the default editor on that host. If the "qconf" command yields an error message and shuts down consult "Solving Installation Problems". A queue "template" will be displayed which can be modified using the editor commands. For this test the queue name, and queue host name should be changed to match the name of the host on which the dqs_execd is executing. We will deal with the remaining entries later (see .The Queue Configuration).

  • Q_name ibms30
  • hostname ibms30.scri.fsu.edu
  • seq_no 0
  • load_masg 1
  • load_alarm 175
  • priority 0
  • type batch
  • rerun FALSE
  • quantity 1
  • tmpdir /tmp
  • shell /bin/csh
  • klog /usr/local/bin/klog
  • reauth_time 6000
  • last_user_delay 0
  • max_user_jobs 4
  • notify 60
  • owner_list NONE
  • user_acl NONE
  • xuser_acl NONE
  • subordinate_list NONE
  • complex_list NONE
  • consumables NONE
  • s_rt 7fffffff
  • h_rt 7fffffff
  • s_cpu 7fffffff
  • h_cpu 7fffffff
  • s_fsize 7fffffff
  • h_fsize 7fffffff
  • s_data 7fffffff
  • h_data 7fffffff
  • s_stack 7fffffff
  • h_stack 7fffffff
  • s_core 7fffffff
  • h_core 7fffffff
  • s_rss 7fffffff
  • h_rss 7fffffff
  • When the queue name and queue host name are modified, exit the editor in the normal manner (ESC-ZZ for vi or CTRL-X CTRL-C for emacs). This will trigger the qconf utility to parse the submitted definition and, if no syntactical errors are discovered will create the requested queue.

  • Queue Name Queue Type Quan Load State
  • ---------- ---------- ---- ---- -----
  • ibms30 batch 0/1 0.10 dr DISABLED
  • Note that the status entry in the right column of the qstat output will display the word "DISABLED". All new queues are initiated in DISABLED mode. To enable the queue we need to invoke another DQS command "/usr/local/DQS/bin/qmod313 -e <queue name>" (modify queue, enable the queue name given here as <queue name>).

    Again execute the "/usr/local/DQS/bin/qstat313 -f" command :

  • Queue Name Queue Type Quan Load State
  • ---------- ---------- ---- ---- -----
  • ibms30 batch 0/1 0.10 er UP

    TEST SCRIPT

    Once the qmaster and at least one daemon a simple test. In the directory ../DQS/tests directory are a collection of sample scripts. The entire contents of this directory should be copied to a user (non-root) directory owned by the administrator. As a first test change directory to this non-root directory and type "/usr/local/DQS/bin/qsub313 dqs.sh". This will submit the simple script to DQS:

  • #!/bin/csh
  • #$ -l qty.eq.1
  • #$ -N UTESTJOB
  • #$ -A dummy_account
  • #$ -cwd
  • echo 'we are now doing something else'
  • printenv
  • sleep 30
  • echo 'end of script'
  • A message should appear in response to the qsub313 command:

    "your job 1 has been submitted".

    After 30 seconds the job should complete and in the directory where the job was submitted two output files should appear:

    UTESTJOB.e1.25674 and UTESTJOB.o1.25674

  • The title UTESTJOB was established by the DQS directive "#$ -N UTESTJOB". The next field (either e1 or o1) contains the job number preceded by the type of file. The stderr file for the job will have an "e" in that position and the stdout file will have an "o". The UTESTJOB.e1.25674 file should be zero length. If not examine its contents for the cause of any error. The stdout file should begin with the line : 'we are now doing something else', followed by a display of the user's environment and ending with the line 'end of script'.
  • COMPLETION OF INSTALLATION

    If the test script completes correctly, hosts can be added and additional queues created and more complex job tests can be submitted. If the "Quick Install" method was chosen the time has probably arrived to plan an operational cell organization and setup resource files and queues. In order to layout an effective system it is important to understand how DQS is constructed, the capabilities of its components and how they may be tailored for a specific site.

    System Topology and Operation

    A basic DQS system consists of at least one computer host which is running the qmaster program and at least one instantiation of the dqs_execd daemon which manages the actual execution of jobs on the host which they 'inhabit'. All of the resources managed an monitored by a qmaster are considered to be a "cell".

    Within a cell there are three classes of programs operating. The qmaster daemon, the dqs_execd daemon and the DQS utilities which include qsub, qstat, qmod, qconf, qdel, qhold, qhold, qrls.

    1. The qmaster maintains all of the critical files and tables for a cell. There are actually two types of tables managed by the qmaster which are called "queues". The first is the job queue which is a linear, ordered list of all jobs in the system. This list is sorted by job priority, an internal job sub-priority (based on a site parameterized "fair use" policy) and then by the order in which jobs have been submitted. The second table type is a list of "execution queues", where each potential target for running a job is defined by a queue configuration for that target.
    2. The qmaster possesses a set of "auxiliary" files which are used to maintain information for system security and to parameterize DQS for specific site characteristics. Access control lists, static and consumable resource definitions, and a table of "trusted hosts" who are permitted to contact the qmaster are "mirrored" in memory and disk at all times so that the qmaster can survive interruptions such as power-outages.
    3. The primary mode of operation of the qmaster is "listening and waiting". The qmaster listens for messages from other qmasters ( [e] which are managing their own cells) , its own dqs_execd daemons [a] and the DQS utilities[b]. Periodically the qmaster examines the job list and attempts to find an execution queue which can satisfy the requirements of one or more jobs in the table.
    4. The basic operation of the dqs_execd is "sleep through class.. and wake up in time to answer a teacher's question or hear the end-of-class bell". The "class bell" in this case is a periodic event where the dqs_execd gathers information on the health and state of the host machine on which it resides. This period is defined in the "conf_file" and can be varied be each site. At this point the "load average" is sent to the qmaster [a] to provide the qmaster with information to help it distribute jobs among the available hosts. (If the conf_file parameter "DEFAULT_SORT_SEQ_NO" is set to TRUE, the load average report is subservient to the sequence number of a queue.)
    5. The "teacher's question" in this case is a probe from the qmaster for a system integrity test or a system request, usually to begin execution of a job [c]. At this prodding the dqs_execd sets to work as we will see later.
    6. In a quiescent system, with no jobs queued, and none executing, the qmaster and dqs_execd daemons continue their "sleepy handshaking" described above. The term "sleepy" was chosen because these programs have been designed to utilize minimal system resources (memory and cpu cycles) on their hosts. Thus both programs are either sleeping or performing the minor handshaking indicated by the [a] in the diagram. In DQS313, a qmaster in one cell does not poll or communicate with other qmasters except to request an action such as moving a job from one queue to another.
    7. Into the idle system described here, a user submits a job from one of the "trusted hosts" in the system This could be a host in the cell which also houses the qmaster or a dqs_execd or on a host with neither daemons, but which was made a trusted host by the administrator using the "qconf -ah … " command. Two validation steps occur upon invocation of the "qsub" command.
      1. The qsub command line and the script file are scanned for DQS directives. DQS directives may occur in either stream, but the scanning stops when a string is encountered which is neither a comment nor a DQS directive. (The default flag for a DQS directive is the character pair '#$' ).. All DQS directives are "parsed" for syntactical errors and rejected at this point if problems are found.
      2. The syntactically verified command line and script file are then sent to the qmaster (shown as [b] in the diagram). The qmaster then performs a "semantic" validation of the job request. By "semantic" here we mean "does the request make sense in the context of this system at this time".

    The second test compares the user's request for site-defined resources (such as those actually present in the system at the moment. Unless the submitted job possesses the DQS directive "-F" ( force the acceptance of the resource request ), If one or more of the requested resources do not exist (please note that this test verifies that a resource is present in the system, not whether or not it is in use by another job!).

    1. If the job request passes these tests it is placed in the job queue [c]. This queue is "mirrored' on disk so that it may be recovered after a system restart. When a new job is placed in the list, the qmaster scheduler scans all the jobs in the list and tries to find an execution queue which will satisfy each entry's request for resources. This process does NOT begin with the newly arrived job but begins at the top of the list, so it is possible that the job submission may trigger the scheduling of a previously submitted job and leave this job "awaiting another time".
    2. At some point, motivated by the submission of a new job, the termination of a running job or a period of seconds defined by the "SCHEDULE_TIME" parameter in the conf_file, the qmaster will scan the job list and find a job which meets the resource requirements. The job description and script file are "packaged up" and sent to the target dqs_execd [[d]. The status information for the target queue is updated to indicate the change of state and the identity of the job's host machine. Where parallel jobs have been specified, the qmaster will assign additional hosts and mark their status as running the selected job. Slave processes, however are initiated by the dqs_execd host for the Master process, and not the qmaster.
    3. The dqs_execd first records the job request information in its own "mirror" disk file, so that it may be retrieved in the event of a system restart while the job is executing. Then the job is prepared for execution. This process consists of first creating a separate UNIX process to monitor and manage the executing the job. In DQS313 this is called the "shepherd" process. It is the presence of this "shepherd" which permits a single dqs_execd to manage multiple job executions on the same host and deals with the need for AFS re-authentication invisible to the executing jobs.
    4. The first step for the "shepherd" is to establish an environment for the job which matches that of the submitting user, modified by the parameters in the job script and on the command line. Next the "shepherd" determines how the system and the user wish to handle the stdout and stderr files for the job. This is directed by DQS directives and the system-wide parameters in the "conf_file".
    5. If one of the forms of parallel job execution has been specified ( the "-p " option in the DQS directives) the Master dqs_execd will "remote-shell" the DQS task "dsh" (distributed shell) to the target Slave processes. The dqs_execd on each Slave host will start a process to manage the SLAVE task. (In this release of DQS 3..1.3 this task is NOT identical to the process shepherd and does not support AFS re-authentication of the SLAVE process.)
    6. After the user's environment has been setup and any SLAVE process managers started on other hosts, the DQS313 "process shepherd" sends the job startup notice (if requested) and then launches the job.
    7. The "process shepherd" then enters its own "sleep" loop, occasionally awakening to peek at the running job and copy output files (as directed) to their target directories.
    8. Upon job termination the "process shepherd" executes a system defined "add-on script" which usually performs additional job-cleanup operations . The dqs_execd then forms an accounting record including job execution statistics, which is sent to the qmaster, signaling the completion of all activities related to the job[a]. Any SLAVE processes terminate their own portion of a job independently. These SLAVE tasks are usually shut down by their master process, according to the methodology of the specific parallel paradigm, P4, MPI, TCGMSG, or PVM.
    9. As with the "qsub" job submission program, all DQS313 utilities interact only with the qmaster. The qmaster rejects any requests if the originating computer is not in the cell's host list. The qmaster then checks to see if the user has permission to perform the actions. For example at most sites any user can request a display of the queue status (qstat command), while only a DQS administrator is permitted to add, delete or disable queues.
    10. Thus in this system a valid request by a user to delete one of their running jobs consists of the following sequence qdel <job > qmaster ; qmaster validates request; qmaster sends a job terminate message to the appropriate dqs_execd; qmaster sends an acknowledgment to the qdel utility; qdel posts a message to the submitting user; dqs_execd sends a UNIX SIGKILL to the job; job termination triggers the dqs_execd to gather usage data and send an end-job message to the qmaster; qmaster logs the accounting information; qmaster deletes all job information; qmaster marks the queue as available for scheduling.

    Cells, Hosts, Queues

    In the previous section a diagram of the elements constituting a "cell" were displayed. A DQS313 site may have several independent cells, or they may be aggregated into a common operating environment:

    This example displays three cells A,B&C, each managed by its own qmaster QM-A, QM-B or QM-C. The hosts are labeled A1 and A2 for Cell-A, B1, B2 and B3 for Cell-B and C1 and C2 for Cell-C. For this discussion we will assign the qmasters to a separate host in each cell. QM-A will thus be on host A0, QM-B on host B0 and QM-C on host C0.

    Communications among the various hosts in a cell and between cells is structured by the inclusion of a host within a qmaster's host list. In the above example qmaster QM-A has four hosts in its table, A0(its own host), A1, A2 and B0(the qmaster host for cell B). Instead of a completely symmetrical inter-cell arrangement here we have chosen to not have QM-A linked with QM-C. Thus neither of these qmasters will have the other cell's qmaster host in its own hosts table.

    An option, which is less secure, is to permit the host from one cell to contact the qmaster in another cell (as shown by path [c]. In this case host B3 could execute utilities and perhaps launch jobs in Cell-C as well as Cell-B. Even without this "sneak path" hosts in cells A and C can interrogate the status of queues in Cell-B, if the user permissions allow such an activity.

    Note, once again, that a host in a cell may have no queues assigned to it for execution, or it may have one or more queues assigned to it. It is also quite common to have a dqs_execd running on the same host as the qmaster daemon. The DQS313 utilities can be executed on any host in a cell, regardless of whether that host is running a dqs_execd daemon.

    The first level of security within DQS is then a "trust" relationship among a cell's hosts and between each cell's qmasters. The next level of security is the level of permissions established by a qmaster's "manager" and "administrator" lists. The third level of security is defined by specific user permissions or exclusions for each queue. Certain activities are permitted to a DQS administrator or manager which a queue owner may not invoke, Among them are deleting the queue itself or changing its configuration. A queue owner and the DQS managers may perform activities such as queue suspension, which of course the average user is prohibited from doing.

    System Directories

    To manage system security, queues, jobs and user access, a number of directories are created during the startup process. The DQS administrator will normally not have to deal with these directories nor their contents. However when all DQS files cannot or should not be cross-mounted it is important that the function of these elements are understood so that they can be placed correctly in the system.

    Shared & Local

    As indicated in the installation instructions, the easiest method for managing a DQS is to have all the system files and directories mounted by NFS/AFS or DFS on all hosts. The one exception to this is that the directories containing the binaries for all DQS executables which, of course, should only be shared by hosts with identical architecture and operating system configurations. A knowledgeable administrator may wish to make changes directly to the contents of one of these directories. Where appropriate a hint or two are provided to assist the system manager. A typical installation will posses a directory tree somewhat like: (underlined names are directories, italicized names are files)

  • /user
  • /local
  • /DQS
  • /common /bin
  • /conf
  • /qmaster resolve_file conf_file /dqs_execd act_file log_file
  • /QM-A host-A1host_An
  • /common_dir /exec_dir
  • complex_file script_file
  • consumables_file /job_dir
  • generic_queue job1
  • host_file job2
  • man_file ..
  • op_file ..
  • seq_num_file ..
  • acl_file ..
  • /job_dir /rusage_dir
  • job1 current_usage
  • job2 /tid_dir
  • .. tid_#xxxx
  • .. tid_#xxxx
  • /queue_dir pid_file
  • queue-A1 core
  • queue-A2
  • queue_a3
  • /tid_dir
  • tid_#xxxx
  • tid_#xxxx
  • pid_file
  • stat_file
  • core

    Four system files are classed as "should be shared by all hosts, if at all possible". They are:

    conf_file --- This file is created during the DQS313 "config" step of the installation or system update. This file contains system-wide configuration which is read by the qmaster, dqs_execd and all DQS utilities when they startup. If it is necessary to make changes to this file, the qmaster and all dqs_execd's should be shutdown and restarted after the changes are complete, so that they will posses the latest configuration. Failure to observe this step may often result in bizarre and unexplained behavior of the system if not an outright collapse. If this file cannot be cross mounted by all hosts, then an IDENTICAL COPY of this file needs to be distributed to all hosts before restarting the qmaster or dqs_execd daemons or any of the command-utilities.

    The location from which this file is read is "hard-wired" into the compiled DQS code based in the #define CONF_FILE statement in the dqs.h file which is also created by the DQS "config" step. It is important to understand that the default installation setup places the conf_file in "/usr/local/common/conf" directory, which is also used as the default location for the qmaster and dqs_execd spool directories. While those directories can be relocated by changing the conf_file and restarting the daemons, the location of the resolve_file and conf_file can only be changed by modifying "dqs.h" with an editor or be re-executing the "config" program.

    The following are the initial entries in the conf file with a description of each line's effect on the system.

  • QMASTER_SPOOL_DIR /usr/local/DQS/common/conf
  • This parameter points to the starting directory from which the qmaster's sub-directories are created. While at some sites with several cells the resulting tree can be shared by multiple qmasters, it is only necessary that the qmaster have access to the sub-directories for itself. This tree appears above as "…/qmaster/QM-A".
  • EXECD_SPOOL_DIR /usr/local/DQS/common/conf
  • This parameter points to the starting directory from which all of the dqs_execd's in the cell will find their individual queue management directories. In the default DQS setup all dqs_execd's in a cell use this same directory tree terminating their own specific set of sub-directories. This is illustrated in the preceding diagram by "../dqs_execd/host-A1".
  • DEFAULT_CELL user-network
  • The system-wide, unique name for a given cell. This can be any arbitrary ASCII string and is defaulted to the qmaster's host domain name during the installation process. If this name is changed the corresponding sting in the "resolve_file" must changed accordingly… and vice-versa.
  • RESERVED_PORT TRUE
  • This parameter indicates that all daemons and utilities in a cell will be using UNIX reserved ports for socket communications. UNIX system port numbers from 0 to 1023 are designated as "reserved". If this parameter is set to TRUE then all of the DQS313 programs MUST execute with root ownership. If this parameter is set to FALSE then the /etc/services port numbers for DQS313 services must be greater than 1024.
  • DQS_EXECD_SERVICE dqs313_dqs_execd
  • Any arbitrary ASCII string can be used to identify the tcp port number to be used when the qmaster or the DQS utility "dsh" is communicating with the dqs_execd. The only requirement is that this name must be unique among all names in the /etc/services file.
  • QMASTER_SERVICE dqs313_qmaster
  • Any arbitrary ASCII string can be used to identify the tcp port number to be used when the dqs_execd or DQS utilities are communicating with the qmaster. The only requirement is that this name must be unique among all names in the /etc/services file
  • INTERCELL_SERVICE dqs313_dqs_intercell
  • Any arbitrary ASCII string can be used to identify the tcp port number to be used when the one qmaster is communicating another qmaster. The only requirement is that this name must be unique among all names in the /etc/services file
  • KLOG /usr/local/bin/klog
  • The re-authentication process in AFS systems will use the klog program. This entry is only used when AFS support was selected during DQS installation.
  • REAUTH_TIME 60
  • If AFS has been selected, all daemons and executing jobs will be re-authenticated every period of this number of seconds.
  • MAILER /bin/mail
  • All jobs can select options to send brief "job startup", "job end" and "job abort" messages to one or more designated users. In addition the DQS313 system will send mail messages to the administrator in the event of extraordinary system events.
  • DQS_BIN /usr/local/DQS/bin
  • The qmaster, dqs_execd and all user initiated utilities locate their binaries in the BIN_DIR established during the "config" step of installation. This entry is set by that step, and acts as a "place-holder" for that target directory. This parameter is used, however by the parallel queue management system. If the administrator wishes this parameter can be changed to point to a different directory where PVM,P4,TCGMSG and MPI support programs may reside. Doing so will not affect the continued use of the BIN_DIR for the remaining DQS executables.
  • ADMINISTRATOR admin@host_machine
  • On startup of the qmaster this entry is used to identify the primary DQS administrator for this cell. This also forms the email address used to send system error messages.
  • DEFAULT_ACCOUNT GENERAL
  • Any arbitrary ASCII string (without separator characters such as blanks, periods, commas) can be used as an account identifier. Each job submission can provide its own account identifier, which overrides this default string. No validation is performed on this or the user submitted account name string. When a job terminates a record is created from hardware and software usage data. The "account string" is appended and the record is appended to the qmaster's "act_file".
  • LOGMAIL FALSE
  • By default none of the mail generated by the DQS, either to users or the system's managers is not logged. Setting this parameter to TRUE will cause the qmaster to create a mail log file, where all system emails are recorded and time-stamped.
  • DEFAULT_RERUN FALSE
  • It is our sincere hope to have the rerun feature if DQS implemented in future versions. In DQS313 this parameter is ignored.
  • DEFAULT_SORT_SEQ_NO FALSE
  • During the qmaster's scheduling process two major steps occur. First the jobs themselves are sorted according to their submitted priorities and internal policy criteria. Second all of the available queues are scanned to find one which suits the needs of the first job to be scheduled. The ordering of this queue scanning process can be changed by this parameter. When this parameter is FALSE all of the queue entries are sorted in the decreasing order f their host's usage data (as reported by the dqs_execd). Thus the first queue examined will be the least "busy" queue, in an effort to spread the workload across the system.
  • If this parameter is set to TRUE the queues are examined in the order iof the sequence number assigned by the administrator in each queue configuration. Many sites use this method to ensure that their most powerful hosts are scanned first, by assigning those hosts very low sequence numbers to the corresponding queues.
  • SYNC_IO FALSE
  • In multi-host systems utilizing NFS mounted files it is possible for I/O actions to become disordered in their results. The ordering of lines of output sent to stdout or stdout can become totally confused. DQS313 is supposed to have a feature in its "process shepherd" to ensure that all stdout and stderr output is properly time sequenced, even when multiple SLAVE processes are involved. In the initial DQS313 release this feature is not active.
  • USER_ACCESS ACCESS_FREE
  • This feature for differentiating levels of access for users or classes of users is not implemented in DQS313.
  • LOGFACILITY LOG_VIA_COMBO
  • Many system messages are generated to aid in the maintenance and diagnosis of DQS operation. Three files are used for this activity, the "err_file", the "log_file" and the "syslog_file". Depending on the level of attention required messages are directed to one of these files. All messages with ERR,CRIT, or WARNING are always sent to err-file. Messages with levels of INFO, WARNIING or NOTICE can be sent to the system log or the normal activity log file. The normal mode is to use both the system log and normal log file. In DQS313 the system log has been disabled, so that all non-error messages are directed to the "log_file".
  • LOGLEVEL LOG_INFO
  • Information is logged depending on the level assigned within the DQS. In increasing order they are LOG_INFO, LOG_NOTICE, LOG_WARNING, LOG_ERR, LOG_CRIT, LOG_ALERT, LOG_EMERG. Setting the LOGFACILITY parameter establishes the minimum level of messages to be recorded. A parameter of LOG_INFO ensures that all messages will appear in the "log_fils".
  • MIN_UID 10
  • MIN_GID 10
  • For security reasons it is desirable to establish a minimum user and group identifier uid or gid)which will be permitted in execution of any of the DQS utilities. The qmaster and dqs_execd, of course normally operate at root level. The recommended setting is "10" for these parameter values as most UNIX critical processes run with uid and gid values below "10". It is strongly recommended that these default values be retained.
  • Attempts to run DQS utilities such as qsub, qalter, qstat, etc. will fail if these default values are used, which is the "correct" , albeit confusing (to new system managers) behavior of DQS.
  • MAXUJOBS 10
  • There are a number of DQS "system policy" parameters available to the DQS313 administrator. One of these is a system-wide limit on the total number of jobs a user may have considered for scheduling at any one time. This is not a limit on the total number of jobs which a user can have queued up in the system, but it does instruct the qmaster not to consider more than MAXUJOBS for a user during a scheduling pass. The effect of this limit can become quite subtle. For example, if a limit of 10 is established and the user submits 100 jobs, they will be ordered in sequence of their priority and submission time. If the first ten of these jobs require system resources not currently available, they cannot be scheduled. Neither will any following jobs, which may need some resource which is actually available. An additional user limit can be found in each queue configuration.
  • OUTPUT_HANDLING LEAVE_OUTPUT_FILES
  • When a job is started by the qmaster it may be able to produce large stdout or stderr files. The writing of these files to a a remote, NFS mounted file system can have negative impacts on system performance. In some cases, retaining these files on a hosts local filesystems could prevent network congestion and minimize I/O delays for the running job. DQS313 provides three options for handling these output files. The default LEAVE_OUTPUT_FILES causes the stdout and stderr files to be left in the working directory established by the user's "qsub" script.
  • This parameter can be changed to LINK_OUTPUT_FILES. In this case the administrator must create a special file in one or all the dqs_execd spool directories. The name of this file is defaulted to "netpath" during the DQS "config" step. This default name may be changed in the dqs.h file by the administrator , if they are prepared to recompile the entire DQS313 system. The "netpath" file should contain one ASCII line defining the fully qualified network path of the target directory into which the stdout and stderr files are to actually be places.
  • If the parameter is set to COPY_OUTPUT_FILES the DQS313 process "shepherd" creates temporary standard output and standard error files local to the host executing the job. A special "copy" process is started which wakes up periodically (set by the hard-wired COPY_FILE_DELAY in the dqs.h file), and copies the current contents of those files to their actual destination.
  • ADDON_SCRIPT NONE
  • At the conclusion of a user's job, and in the working space of that job it is sometimes necessary to conduct system cleanup tasks. This is particularly true of parallel processing tasks which may might leave "orphan" daemons running, in the event of unplanned process termination. A system script maintained within the DQS can be created and invoked at the conclusion of EVERY user job. This parameter must then contain the fully qualified path-name to this script file.
  • ADDON_INFO NONE
  • When OUPUT_HANDLING is set to anything other than LEAVE_OUTPUT_FILES, the system administrator may wish to maintain a diagnostic awareness of the "process shepherd" handling of the copying or linking of a user's stdout and stderr files. If this parameter is set to something other than NONE, the parameter string should be a fully qualified path to a file containing a ASCII string to be appended to the "stdout" file along with other job information.
  • LOAD_LOG_TIME 30
  • Upon startup the dqs_execd sets this parameter (specified in seconds) as a minimum period for the dqs_execd to deliver system usage statistics to the qmaster.
  • STAT_LOG_TIME 600
  • Various system statistics, beyond the host usage provided by the dqs_execd daemons, are gathered periodically, based on the value of this parameter (specified in seconds).
  • SCHEDULE_TIME 60
  • The qmaster scans the cell's job queue after every new job is submitted to the system or upon termination of a running job. Absent these occurrences the qmaster will trigger a scheduling pass of the jobs based on this parameter (in seconds).
  • MAX_UNHEARD 90
  • The qmaster does not poll other daemons for their status. Instead it updates the queue status for each dqs_execd which reports in. If a dqs_execd fails to report in to the qmaster within this threshold (seconds) the qmaster will mark all queues managed by the dqs_execd as "status UNKOWN". This status is updated every interval, and can be changed from UNKNOWN to UP if the dqs_execd has finally succeeded in updating the qmaster.
  • ALARMS 3
  • ALARMM 4
  • ALARML 5
  • The admonition to "avoid changing these parameters" in the installation is well founded. These parameters control the amount of time permitted before the UNIX system interrupts an attempt at inter-host communications. The ALARMS value is the time in seconds before a DQS utility such as qsub, qmod is interrupted. A message will appear for the user with message "Alarm Clock Shutdown" indicating that the utility cannot contact the qmaster within "ALARMS" seconds. The ALARMM parameter sets a similar limit on dqs_execd<->qmaster communications attempts. ALARML is the longest period established for inter-process interchange attempts, and is used to control qmaster<->qmaster communications.
  • In systems where the qmaster host is also running other jobs or where the network interconnect can become congested is possible for one or more communications attempts to fail due to an ALARM time-out. If the err_file contains frequent "ALARM CLOCK Shutdown" warnings or utility execution fails often with similar error messages the three ALARM parameters should be increased. These values should be kept as small as practical to prevent a failing DQS element from tying up the host's tcp/ip interface.
  • resolve_file-This file is also created during the DQS "config" process. It is the equivalent of a combination of the UNIX "resolv.conf" and "hosts.equiv" files for managing network security. The default resolve_file is:

  • # NOTE! blank lines NOT permitted #
  • # NOTE! fields must be separated by one(1) AND ONLY one space #
  • # 1st field = cell_name
  • # 2nd field = primary qmaster
  • # 3rd field = primary qmaster alias
  • # 4th field = secondary qmaster
  • # 5th field = secondary qmaster alias
  • user-network QM-A0 QM-A0.user.com NONE NONE
  • The comment lines direct the DQS manager as to the format of new entries or entry changes, Some of the aspects of this file need further explanation.

    1. The cell name appearing in the first field of the first non-commented line MUST be identical to the name appearing as the DEFAULT_CELL parameter in the conf_file.
    2. DQS313 does not yet support alternate qmasters and thus the last two fields of each non-commented line must be "NONE" and "NONE"
    3. Additional cells nay be defined by adding lines to the resolve_file following the primary cell entry. If a host in one cell is permitted to contact a qmaster in another cell (via a "sneak path") then the cell name and qmaster name for that other cell must appear in the source cell's resolve_file.

    err_file --- The master, dqs_execd and all DQS utilities may originate error messages which are directed to a hard-wired filename "err_file". This name is created during the DQS "config" step and implanted in the "dqs.h" include file in the ../DQS/SRC directory. The installation process assumes that all DQS313 programs will have write-access to the path name which appears as QMASTER_SPOOL_DIR in the conf_file. If this path name is inappropriate for ALL DQS programs the administrator may choose to change the definition of ERR_FILE in the include file "dqs.h". This will require recompilation of the entire DQS313 system.

    As an alternative, the administrator may choose to let each program write to its own "err_file" and gather and collate all the files when it is necessary to examine error information. In this case, however the path-name accessible by each host must be identical to the QMASTER_SPOOL_DIR name.

    log_file --- The master, dqs_execd and all DQS utilities may originate error messages which are directed to a hard-wired filename "log_file". This name is created during the DQS "config" step and implanted in the "dqs.h" include file in the ../DQS/SRC directory. The installation process assumes that all DQS313 programs will have write-access to the path name which appears as QMASTER_SPOOL_DIR in the conf_file. If this path name is inappropriate for ALL DQS programs the administrator may choose to change the definition of ERR_FILE in the include file "dqs.h". This will require recompilation of the entire DQS313 system.

    "log_file" and gather and collate all the files when it is necessary to examine error information. In this case, however the path-name accessible by each host must be identical to the QMASTER_SPOOL_DIR name.

    Qmaster

    The qmaster directory contains a major sub-directory for each qmaster registered in this cell. Each qmaster's directory contains four sub-directories whose contents change constantly during DQS313 operation, and hence must permit write operations an all files. There are also two files created by the qmaster , the pid_file and stat_file. An additional, unwelcome file may appear here also. In the event of a qmaster crash, its core file will be placed in this directory.

    common_dir

    This directory contains files common to the scheduling and dispatching of jobs by the qmaster.

    complex_file-This file contains all of the definitions of complexes created by the add complex command (qconf -ac ).

    consumables_file-This file contains all of the definitions of consumable resources created by the add consumable resource command ( qconf -acons ).

    generic_queue-This file is read by the qmaster each time the create queue command (qconf -aq) is performed and no name is provided as a parameter following the "-aq" option flag. The contents of this file form the starting template presented in the editor for modification by the administrator.

    host_file - The host_file is read up at startup of the qmaster and contains a list of all the hosts known to the qmaster and occasionally called "trusted hosts". Any program attempting to contact the qmaster must have its host's name in this list or be rejected. On the initial startup of the qmaster this file will not be present. The qmaster will post a warning in the err_file and create the host_file.

    man_file-This file contains the login names of all individuals identified as cell "managers". A cell "manager" is given permission to access al DQS313 system files and to execute every option of every DQS313 utility.

    op_file --- This file contains the login names of all individuals identified as cell "operator". A cell "operator" is given permission to perform a number of system operations normally reserved to the system manager, and prohibited to the standard system user. The functions qdel, qmod, qmove, and qrls are permitted by operators. Functions such as creating or deleting queues or adding and deleting managers and operators is, of course, limited to cell managers.

    seq_num_file - Jobs are assigned an internal sequence number. The next number to assigned by the qmaster appears as a single binary value in this file. It is thus not possible to manually reset sequence numbers, other than to delete this file, forcing the numbering sequence to begin over with '1".

    acl_file -- This file contains all of the access control list "acl" names for all queues. This is actually a list of lists. An "acl" is a list of names to be given access to one or more queues. A queue definition can include these individuals by naming the corresponding "acl" in its "user_acl" parameter.

    job_dir

    This directory contains a file for each job currently in the queuing system. Each file contains the submitted script file along with tables and lists created by the qsub operation and used to manage the job while it is in the queue awaiting assignment to a host, as well as during actual job execution.

    queue_dir

    This directory contains a file for each queue . The file name is, in fact the name assigned to that queue. Each file contains the queue configuration, encoded in binary form, along with various tables which the queue manager utilizes to manage the queues.

    tid_dir-To maintain internal coherency during system operation, in the face of multiple hosts executing multiple processes a unique identifier label is generated by the qmaster and dqs_execd for every inter-host communications. This label is called a "task identifier" or "tid". An empty file for each generated "tid" is created in this An acknowledgment by the receiving host for a transaction causes the corresponding tid file to be deleted from this directory.

    In the event of aberrant behavior of a hardware or DQS313 software element some "orphan tid's" may be found in this directory, however the administrator is cautioned to NOT clear out tid files manually without careful analysis. This scheme was created to ensure inter-host synchronization despite multiple restarting of the qmaster or the dqs_execd.

    pid_file-This file contains a list of the process id of the running qmaster. This is a "canonical" location where site procedures may find this pid for system management actions.

    Stat_file-Based on the period defined as "STAT_LOG_TIME" the qmaster records summary information about all the queues it is managing. This data is time-stamped so that DQS managers might determine when queue status chenges occur inadvertently.

    dqs_execd

    The dqs_execd directory contains major sub-directories for each dqs_execd operting in this cell. Each dqs_execd directory contains four sub-directories plus one file, the "pid_file" which contains the process id of the dqs_execd. Of course there is also the possibility of a core file being placed here in the event of a dqs_execd crash.

    exec_dir

    The exec_dir contains the actual job file for the executing job. When the dqs_execd launches a job, the script file is copied here and executed.

    job_dir

    The job_dir contains a file for each job which the dqs_execd is managing (usually only one). In addition to the job's DQS script this file contains all the tables and information necessary for the qmaster and the dqs_execd to manage this job.

    rusage_dir

    Upon job termination usage data is collected and formatted into a "termination record" to be sent to the qmaster. This record is written to this directory and retained until the qmaster has received and recorded this information. The procedure is used to prevent vital data from being lost, particularly from long-running jobs, in the event of an interruption of dqs_execd or qmaster service.

    tid_dir -- To maintain internal coherency during system operation, in the face of multiple hosts executing multiple processes a unique identifier label is generated by the qmaster and dqs_execd for every inter-host communications. This label is called a "task identifier" or "tid". An empty file for each generated "tid" is created in this An acknowledgment by the receiving host for a transaction causes the corresponding tid file to be deleted from this directory.

    Temporary Files

    The dqs_execd creates and deletes a number of temporary files in the "/tmp" directory of its host. These are deleted after use, but if the dqs_execd has been shut down during job launching and execution these files may be left n the "/tmp" directory inadvertently. Since they are given unique names for the job execution they will remain until removed by the system manager.

    The Queue Configuration

    The queue configuration was introduced during the discussion of setting up an initial DQS313 cell and queue. The queue configuration is the primary means of tailoring a DQS system to a particular site's requirements. The queue configuration can be changed dynamically by the DQS cell manager without requiring a shutdown and restart of either the qmaster of dqs_execd, unlike the more static "conf_file". Changing the queue configuration will not affect any jobs already in execution. The modified configuration will be considered during the next scheduling pass of the qmaster after the change has been completed. A description of each element follows:

  • Q_name QA1
  • Any ASCII string of numbers and letters may be used in the queue name. I t must ba a unique queue name in a given cell.

  • hostname QA1_host
  • The hostname entered here may be any form of the host's name which is used by the network members,. DQS will convert the entered name to the fully qualified host name and insert that into the registered queue configuration.

  • seq_no 0
  • The seq_no is the an arbitrary sequence number assigned by the DQS administrator. It is ignored if the conf_file parameter "DEFAULT_SORT_SEQ_NO" is set to FALSE. If "DEFAULT_SORT_SEQ_NO" is set toTRUE the qmaster will scan the queue list in the order of the sequence numbers starting with zero "0".

    The DS administrator may choose one of several strategies for assigning sequence numbers. At SCRI the lowest sequence number is assigned to the most powerful computing engines, with less powerful machines being assigned higher sequence numbers.

  • load_masg 1
  • Each dqs_execd collects information about the state of its host's overall computational and I/O load as reported by the UNIX system through the "rusage" structure. A 'total system load" is provided as an integer value representing a fractional percentage of the system usage. A value of 1 represents a load of 0.01, a value of 10 represents a load of 0.10 , and a value of 100 represents a load of 1.0.

    When DEFAULT_SORT_SEQ_NO is set TRUE the qmaster attempts to assign jobs to the least loaded queues which meet the resources requested by the job. The queues are sorted into increasing order of the load average, weighted by multiplying by the reported load average by the "massage factor" (the load_masg value). The load_masg factor thus permits the adnininstrator to adjust the system wide relationships between different hosts which may be necessitated by variations in usage measurements or background task activity.

  • load_alarm 175
  • A threshold value can bet set beyond which a queue will not be considered for scheduling by the qmaster. When a host reports a load average greater than this threshold it is in an "ALARM" state, and this flag is displayed in qstat output. The default load_alarm represents a load average of 1.75.

  • priority 0
  • This field may be confusing at this point because jobs also posses a submission priority. The difference is that the job priority determines only how it is ordered among other jobs in competition for system resources. The job submission priority has no influence on the UNIX priority with which that job is executed.

    The queue priority field here IS the UNIX priority assigned to any job executed in this queue and thus may range from -19 (low) to +19(high).

  • type batch
  • DQS was designed to support the scheduling and management of batch and interactive jobs. DQS313 supports only batch queues. This parameter is ignored

  • rerun FALSE
  • Automatic job rerun is not enabled in DQS313, this field is ignored.

  • quantity 1
  • A DQS313 queue can manage more than one job in execution at a time, though this is usually not a practical way to operate a single cpu host.

  • tmpdir /tmp
  • During job startup and execution several temporary files are created. This parameter should be the fully qualified path name to the hosts temporary directory.

  • shell /bin/csh
  • The default shell for executing jobs in this queue. This default can be overridden by commands in the job script.

  • klog /usr/local/bin/klog
  • The path name to the AFS klog executable.

  • reauth_time 6000
  • The time period in milliseconds for performing an AFS re-authentication of the executing job.

  • last_user_delay 0
  • To prevent a single user from dominating the utilization of a queue the administrator can set this time-out value (seconds) during which a user's job will not be consdered for scheduling following termination of a previous job for that user.

  • max_user_jobs 4
  • This is the second system parameter available for implementing scheduling policies for DQS313 at a site. The MAXUJOBS parameter in the conf_file limits the total number of jobs a user can have considered for scheduling across the entire system. The queue configuration "max_user_jobs" establishes a limit on the number of jobs a user can have queued which will be considered for scheduling for this queue. See "SCHEDULING" for a more complete discussion of this topic.

  • notify 60
  • A user job may invoke the "-notify" option which instructs the system to send the job a SIGUSR1 or SIGUSR2 signal as a warning in advance of a SIGSTOP or SIGTERM signal. This "notify" parameter in the queue configuation establishes the number of seconds between sending the warning signal and the SIGTERM or SIGSTOP.

  • owner_list NONE
  • In addition to the DQS manager and DQS operator an individual can be designated a queue owner. A queue owner can perform many system management tasks permitted to the managers and operators but limited to this queue. Job deletion, queue suspension, enabling and disabling are among those actions One or more login names can be entered for this parameter.

  • user_acl NONE
  • The administrator can create one or more access lists using the "qconf -au" command. This command adds one or more users to a named list. (This named list will be created if it doesn't exist.) These named lists (of names) can be used to include or exclude groups of users in access to a specific queue, This queue configuration parameter "user_acl" can contain a list of one or more acl_list names which will be permitted to use the queue. (That is the parameter can itself be a list of names of lists of names… confused ?).

  • xuser_acl NONE
  • The administrator can create one or more access lists using the "qconf -au" command. That command adds one or more users to a named list. (This named list will be created if it doesn't exist.) These named lists (of names) can be used to include or exclude groups of users in access to a specific queue, This queue configuration parameter "user_acl" can contain a list of one or more acl_list names which will be excluded from access to the queue.

  • subordinate_list NONE
  • One or more DQS313 queues can be subordinated to another queue The queue specifying a list of subordinates with this parameter is called the "superior queue". A "superior queue" can NOT be a subordinate queue to another. A queue can only be subordinated to one other queue. The "subordinate_list" parameter can contain a list of one or more queue names in the same cell as the queue defining this parameter.

    Superior queues are analyzed for scheduling in the same manner as all queues, If a job is assigned to a superior queue, the qmaster will suspend the execution of jobs in all of the queues in the superior queue's subordinate list.

  • complex_list NONE
  • This parameter can contain one or more names of complexes defined by the "add complex" function of the qconf command (qconf -ac). See "Complexes and Consumables". Any complex name can be preceded by the DQS reserved word "REQUIRED" (must be all caps). This indicates that no job will be scheduled for this queue UNLESS it requests a resource described in that complex.

  • consumables NONE
  • This parameter can contain one or more names of consumable resources defined by the "add consumable " function of the qconf command ( qconf -acons). See "Complexes and Consumables". Any consumable name can be preceded by the DQS reserved word "REQUIRED" (must be all caps). This indicates that no job will be scheduled for this queue UNLESS it requests a resource described in that consumable.

  • s_rt 7fffffff
  • h_rt 7fffffff
  • s_cpu 7fffffff
  • h_cpu 7fffffff
  • s_fsize 7fffffff
  • h_fsize 7fffffff
  • s_data 7fffffff
  • h_data 7fffffff
  • s_stack 7fffffff
  • h_stack 7fffffff
  • s_core 7fffffff
  • h_core 7fffffff
  • s_rss 7fffffff
  • h_rss 7fffffff
  • These parameters establish the "hard" or "soft" limitations on a host's resource utilization a job executing under control of this queue. The "hard" limits are transferred to the job's execution environment in the hopes that the host operating system provides support for these limits. Note, however, that if a host does support these limits they apply only on a process-by-process basis!! If a job script contains multiple invocations of processes, as in a FORTRAN compilation and execution, the limits apply to each individual step in the job.
  • DQS313 does check the "soft" and "hard" real-time limits (s_rt & h_rt) and will terminate jobs based on the values of those parameters. A job exceeding the "soft" real-time limit is sent a SIGTERM signal which can be intercepted by the job using the "-notify" option in the job script. If the job exceeds the "hard" real-time limits it is sent a SIGKILL signal which cannot be caught by the user job.
  • Complexes & Consumables

    The most valuable aspect of DQS, and easily its most confusing property is the ability to define and utilize a variety of system "resources" which can then be requested in a user's QDS job script. These resource requests are used to differentiate and assign jobs to the variety of system capabilities found in today's heterogeneous computing environments. Let us look at an example of how and why resource definitions are created at a site. The diagram shows five DQS hosts with different capabilities.

    Many users will have created an application compiled for one machine architecture, say AIX. In the pictured environment the user could run their application on one of the AIX machines by specifying the queue name, say QN1. The negative aspect of this simple approach is that the job may be kept waiting for QN1 because of a previous job on that machine while either QN3 or QN5 might be available.

    The solution for this situation is to create a resource definition for all AIX machines in the cell and name it "AIX1". Then the user can submit a job using the qsub command with the "-l" option. What are the steps needed to accomplish this:

    1. A complex is created by typing "qconf -ac AIX1" (create a complex named AIX1)
    2. The default text editor is started and an empty page displayed. The administrator enters an arbitrary string such as "our_AIX". Then save the results and close the editor.
    3. Now that we have a complex defined (AIX1) we can add that complex to a queue definition.
    4. Assuming that the queue has already been defined we will modify it using the qconf command. Typing "qconf -mq QN1" opens up another editor window with the complete queue definition displayed.
    5. Replace the parameter entry for "complex_list" from NONE with AIX1. (the name The DQS administrator creates a resource definition called a "complex".
    6. given to the complex definition NOT the contents of that definition.
    7. In the same manner add the complex name AIX1 to the queues QN4 and QN5.
    8. Advertise the resource name "our_AIX" to all users.
    9. A user can then direct their jobs to any one of the AIX machines by including the resource request "-l our_AIX" in their DQS job script.

    This simple example illustrates two key points.

    1. The complex name is used by the administrator to assist in designing and managing collections of resources and queues. The complex name IS NOT USED by the user in resource requests.
    2. Resource requests in job submissions use the descriptions within one or more complex definitions.

    Let us expand the example slightly and create a new complex which cuts across machine architecture features, but shares a different attribute:

    1. Create a complex for systems supporting PVM by typing "qconf -ac PVM1".
    2. When the editor window opens enter a single line "our_PVM"
    3. Save the file and close the editor and advertise the resource name "our_PVM" to the users.
    4. Add the complex name PVM! To the complex_list parameter of queues QM2 and QM4.
    5. A user wishing to submit a job to a queue which is running on an AIX machine which provides PVM support would use a resource request "-l our_AIX.and.our_PVM"

    So far the sample resource definitions have been a single string such as "our_AIX" or "our_PVM". We could have used an alternative form for describing alternatives as we did with AIX versus HPUX. This form would replace the string we entered in the complex files: arch=our_AIX and arch=our_HPUX. The string "arch" is one created by the administrator and could be any arbitrary name. A resource request would then have the form "-l arch=our_AIX", or "-l arch=our_HPUX" .

    Resource definitions can contain numeric values and the corresponding resource requests can perform numeric comparison on these values to satisfy a criteria.. A complex called BigMemory could be defined containing the line "mem=128" . For our example let QN1 and QN2 both be operating on hosts which have 128 megabytes each. The complex BigMemory would be added to the QN1 and QN2 . A request for an AIX machine with at least 64 bytes of memory might be stated as "-l our_AIX.and.mem.ge.64".

    Resource definitions can possess more than the single line examples in each named complex. A complex definition named "BIG_HUMMER" might look like:

  • AIX414
  • mem=1028
  • Horsepower=10
  • IO_bandwidth=250
  • A resource request which needs a BIG_HUMMER host would, in this case look like:

    "-l AIX414.and.mem.ge.1028.and Horsepoer.ge.10.and.IO_bandwidth.ge.250"

    There is one type of resource we have singled out for special handling in DQS313. These are resources which are not static during the operation of a DQS cell. While machine horsepower, memory size and operating systems and compilers for long periods of times (on the order of days or weeks), shared memory multiprocessor cpus will have varying amounts of shared memory available to them as different jobs are executed on other of its cpus. An increasingly common resource situation is "licensed software" such as compilers and data-base management systems. In many cases there are fewer licenses available within a system than there are hosts to execute the software.

    This type of resource is called a "consumable" in DQS313. The definition of a consumable resource is somewhat different than a DQS "complex", in that the administrator will describe the total number of a resource which is available in a system, and the number of that resources consumed by a satisfied resource request. In the case of a FORTRAN compiler license, a site usually purchases a number of licenses for their system which are managed by a "license server". The consumable resource manager in DQS313 does not supplant a license server nor can it effectively mimic such a server. Instead it provides a mechanism parallel to the license server which attempts to keep track of how many licenses are in use by DQS clients.

    The administrator defines a consumable resource by executing the command "qconf -acons FORTRAN"(using the compiler as an example).. The default editor will open a window with the following template

    Consumable xlf

    Available = <the amount of resources available>

    Consume_by < quantum by which resource is reduced by a request>

    Current = < currently available resources>

    The field for Available should be filled in with the number of FORTRAN licenses authorized to this system. The Consume_by will be 1 for software such as compilers. The Current field will usually be equal to the Available field, unless there are several licenses in use at the time this Consumable is being defined. The Current field is also used to rest the DQS313 consumable counter when DQS313 gets out of sync with the actual license manager.

    Queues which must manage this consumable resource should then have the consumable name added to the consumables parameter list in the queue configuration. The user need not be aware of the distinction between standard complexes and consumables. Their resource requests are stated in the same way: "-l our_AIX.and.mem.ge.64.and.xlf". The qmaster will determine if an xlf license is available by examining its internal counters (which may NOT match the license server's). If the license and other resources are available the job will be launched. At the time the job is started the consumable count for the FORTRAN resource will be decremented.

    Upon job termination this resource count will be incremented. Obviously this is not a satisfactory situation for a user who wishes to submit a job which does a quick FORTRAN compile which produces an executable which is then to run a week long job. The consumable count would remain decremented for the duration of the job while the license manager will have had the license "token" returned at the conclusion of the compilation.

    For this situation the cooperation of the user is required, to avoid breaking up jobs into compile-only and compute-only separate jobs. The "qalter" command has been modified to permit any user tp execute the "qalster" command but only if it has the "-rc " . return consumable, command. The user job would then have a script file which might look like:

  • #!/bin/csh
  • #$ -l xlf.and.our_AIX
  • xlf my myprogram
  • qalter -rc xlf 1
  • myprogram mydata
  • The qalter command here specifies the name of the resource being returned followed by the quantity being returned. When resources such as high performance disk or shared memory are being defined as a consumable resource often a "quanta" of the resource is granted and recovered. An example might be that a UNIX page is the minimum quanta or an integral number of pages could be the "quanta". Where licenses are normally doled out one at a time, memory might be allocated 1 MB at a time. Hence the Consume_by field in the consumable definition.

    REQUIRED Complexes and Consumables

    A job submission may contain one or more resource requests (the "-l" option). A job with no specific resource requests is thus a candidate for assignment to any available queue. In many installations some queues are best utilized by very specific job configurations. An example might be a site which possesses a heterogeneous collection of cpus with very wide differences in computing capacity. The more robust computers should not be assigned to "tiny" but persistent jobs in some cases. DQS 3.1.3 provides a special keyword-"REQUIRED" which can precede any complex or consumable which a user MUST request in order for that job to be considered for scheduling on that queue.

    Job Scheduling

    The crux of any resource allocation and management system is its ability to provide resources in an "efficient" and "fair" manner. "Efficiency" is usually measured terms of maximizing job throughput and effective utilization of the available resources. "Efficiency" can be quantified in ways usually referred to the hardware hosts in a system "Fairness" is less easily described, is often measured by perceptions and is most often referred to the human users of a system. Further, priorities for efficiency and fairness and their relative values can vary widely from site to site. The burden of meeting these objectives falls upon the system job scheduling mechanism.

    Forty years of experience with attempts at creating comprehensive job scheduling algorithms have demonstrated several points:

    1. It is virtually impossible to produce a "one size fits all" algorithm which will satisfy the demands for efficiency plus fairness at every site.
    2. Scheduling systems which attempt to provide a 'flexible' software solution do so by offering to the administrator numerous parameters for adjusting the methods used for allocating resources. The plethora of variables presented is ultimately confusing if not confounding.
    3. Most sites with complex requirements and knowledgeable support personnel end up writing their own scheduling code or modifying the code provided with the system

    DQS313 therefore attempts to provide only a minimal amount of job scheduling technology. Hopefully small sites will be able to achieve a good level of balance in host usage and perceived "fairness" with the system as it is delivered. As a site develops experience with batch job management the staff will experiment with the few parameters provided in DQS. At some point the administrator will want to probe the module dqs_schedule.c , adding or subtracting from its capabilities. To that end we will describe the basic features of DQS scheduling and try to illuminate the routines most likely to be modified.

    A user job passes through two screening processes before being considered by the qmaster for scheduling:

    1. At the time of job submission a user job is checked to see if it meets two system criteria:

    1. Are resources present in the system which meet the requirements specified for the job (usually through the "-l " parameter in a qsub script) ?

    b. Is this user under the maximum threshold established for using system resources ?

    If a job fails these tests it is rejected at the time of submission and an error message returned to the user submitting the job. ( In the event that a job is submitted in anticipation of resources being added to the system, such a new host architecture, the user can choose to override the first test by using the "force" option ("-F") in the qsub command.

    2. Once a user job has been accepted into the system it will be placed into the qmaster's job list where it will remain until it has been executed or deleted. If a job's submission exceeds the MAXUJOBS limit placed in the conf_file, it will remain in the queue BUT it will not be considered during scheduling passes by the qmaster.

    The qmaster conducts an examination (or "pass") over the job list :

    1. Every time a job is added to the list
    2. Every time a job terminates
    3. If neither of these steps occur, the qmaster will scan the list on a periodic basis based on the number of seconds in the "SCHEDULE_TIME" parameter in the conf_file.

    The scanning process consists of sorting the jobs according to their submitted priority ("-p" option), then by an internally generated "subpriority" and finally by the job sequence number (establishing its submission order). After the jobs are sorted they are examined in order, testing each available queue ( each ordered by load average or sequence number) looking for the first one which matches the resources requested by that job. If a match is found the job is dispatched and the next job is examined.

    Manipulation of a job's subpriority before the sorting step is the easiest way to affect the basic scheduling algorithm. In DQS313 this simply consists of increasing the subpriority field of a job based on the number of previously submitted jobs (at the same priority level) for that user. Thus two or more users with several jobs queued at the same priority and for the same system resource will have their jobs interleaved, so that no one user can dominate a resource by submitting a large quantity of jobs.

    The system administrator will probably experiment with this subpriority computation as a first step in customizing DQS. Flirting with the resource matching is considered to be a more risky affair as the side effects of such changes are harder to predict or detect.

    AFS Operation

    DQS313 provides a minimal AFS support capability. The introduction of the "process shepherd" has made the job re-authentication in DQS conform to AFS security requirements. The output file handling feature addresses the 'cross platform' security problems of dealing with stdout and stderr.

    Multi-Cell Operation

    A limited multi-cell operation capability is provided in DQS313. Jobs may be moved from cell to cell if they are not yet in execution, and users authenticated in one cell can view the status of the queues in another cell.

    Accounting

    Site accounting methods vary as widely as any aspect of a batch processing system. DQS313 records as much information as possible about a job's scheduling and execution in a single ASCII line of text. These entries are preceded by an ASCII string of the standard UNIX GMT time of the entry.

    Extraction of the accounting information simply requires using a structure definition for the act_file entries in one's "c" extraction program. An example of this technique may be found in the program acte.c which can be found in the ../DQS/tools directory. Included in the tools directory is a script "dostats" which employs acte to create a series of system summary files for the administrator.

    System Management

    The process of DQS system management first consists of laying out the physical and logical structure of a cell. The physical organization is described by adding hosts and assigning them to queues. The logical organization consists of defining resource "complexes" and consumable resources and assigning these to their appropriate queue hosts. Finally setting system parameters in the conf_file and each queue configuration establishes the operating environment for DQS operation.

    The ongoing management steps should include:

    a. review of the queue status information to spot queues in UNKNOWN or ALARM state; (DQS313 will send email to the administrator whenever possible, but a sudden crash of a daemon may only be detected from the qstat command display)

    1. regular review of the err_file, log_file, stat_file and act_file looking for operational anomalies; Some will be obvious, such as dqs_execd's which have vanished or been restarted. One key thing to look for is a sequence of jobs aborting on the same host ( a potential problem with DQS or the host) or a sequence of jobs aborting for the same user (may point to a problem with the user's jobs or the user's permissions). (Job aborts may be detected by examining the exit_status of jobs in the act_file.
    2. Changing queue parameters, adding and deleting jobs and performing queue suspend/unsuspend, or disable/enable operations as required.

    The majority of the DQS313 utilities set and their options are provided for the system management function. While users may employ the qalter command, for example, to change the characteristics of a submitted job, more often the administrator will avail themselves of this function. A not-uncommon occurrence is for the administrator to increase the submission priority of a job to move it ahead of other jobs in the scheduling.

    One utility should be highlighted here, the "qidle" function. Many DQS hosts may actually reside on someone's desk and serve as their personal; workstation. At the same time these machines are utilized for their computational capabilities in a cell. To serve both functions, it must be possible for the workstation user to have priority access to their machine and not suffer keyboard and mouse response deficiencies because the host is being shared with DQS. A first step is to make the "owner" of the workstation also an "owner" of all queues assigned to that host. Then when the workstation owner wishes to have exclusive use of the machine they will have DQS permission to suspend any queues on that machine.

    Enter the "qidle" utility. This is an X-Windows based program, since we presume that workstation users will be operating with X-Windows. It can be started at any workstation and performs the following functions on behalf of the workstation "owner" who the administrator has also designated a queue "owner" in the queue configuration.

    1. If the workstation mouse and keyboard are used in some way (mouse movement, button clicks, keyboard typing), all queues on that host are suspended.
    2. If the keyboard and mouse have not been used for a period of time specified in the qidle command, then all queue suspensions are removed.

    What happens in the case where more than user may have access to a workstation. The "system console" is an example where many users may be permitted to operate the keyboard and mouse. Making all users "owners" of that station's queues could result in an unmanageable list and is a potential security problem, since a queue owner has privileges beyond queue suspension actions.

    The qidle in DQS313 has thus been modified from its DQS 3.1.2.4 form. It is now a member of the DQS313 utilities group and communicates directly with the qmaster rather than indirectly through the qmod utility. It can be started on any workstation by any user who has permission to login to that workstation. Once started it performs the same functions described above.

    Problem Solving

    Solving Installation Problems

    Most installation difficulties can be divided into three categories (in the order of probability)

    1. One or more bugs remain in the DQS 3.1.3 installation procedure. This release has not been tested on all available UNIX platforms (hardware or software versions 0.
    2. The interactive interface has produced messages or questions which may confuse the reader. Some of these are natural warnings from the make process or compiler. A few will be labeled "error" when they do not effect the installation process. These often occur when an installation is being performed over an old one and the target directories already exist.
    3. The administrator is running as non-root and attempting operations not permitted in that mode.
    4. Host machines to be used for qmaster and/or dqs_execd do not have uniform access (through NFS or AFS or DFS) to the DQS binary files, or the spool directories defined during the installation procedure..
    5. Attempts to use qstat313, qsub313,etc receive a message ".. unable to contact qmaster". This is usually due to a user trying to invoke one of the DQS utilities on a host not known to the qmaster. The qmaster maintains a list of all "trusted hosts" in the cell which it manages. Hosts are added automatically when a queue is configured for them: "qmaster313 -aq" or by an explicit host addition "qconf313 -ah <host name > ".

    Identify the symptoms of the installation failure and refer to one of the following sections:

    INSTALL fails during the make process of the "config" program.

    The GNU configure program uses the "Makefile.in" template in the DQS/CONFIG directory to produce the Makefile for the DQS config utility. It is possible that a new configuration of compilers or linkers can cause the GNU facility to create an erroneous Makefile. Visually check the Makefile for correctness.

    Although DQS313 installation has been tested on many platforms, variants of the compiler or operating systems can create WARNING messages during the compilation which we have not made provision for. Even different versions of GNU "C" yield different warning messages. If the error is fatal to the compilation please contact the DQS313 support team for assistance.

    INSTALL fails during the execution of the DQS config program.

    During the config process the system attempts to create a number of directories and sub-directories. The default starting point for this process is the current working directory of the user if running as non-root or /usr/local/DQS if running as root.. If any of the directories exist, an error message is displayed on stdout, but the config program continues. If the user discovers that they have erroneously specified directory names, config can be interrupted by typing CTRL-C. This will unwind many aspects of the configuration process, however NO DIRECTORIES will be removed. The administrator will have to cleanup any relevant directories manually. After reviewing the directory already exists" messages the administrator can choose to ignore those which are expected because the directories were previously created..

    INSTALL fails during the "make" process.

    During the DQS config step, all of the target directories are created except for the ones associated with the compiled output object ('.o' files) and the interim executables (qmaster, dqs_execd…). If a previous installation occurred under a "root" user and the current "make" is being done as a "non-root" the attempt to create the ARCS sub-directories will fail for lack of permissions. The solution is to perform the "make" as root or change the owner of the ARCS sub-directories to the user doing the installation of DQS313.

    The GNU CC compiler is chosen as the default compiler or the "make" process if it is available. Some sites may experience a large number of "gcc" warning messages if there have been local modifications to the gnu include files. If this occurs or if the site prefers to use the native "C" compilers then the following steps should be taken"

    1. Stop the "make" operation. The GNU configure program and the DQS config utility will have been executed and all Makefile templates will contain the GCC default. Change directory to …DQS/SRC and edit the Makefile.proto file.
    2. Search the Makefile.proto for any lines which match "CC=gcc" and replace the string "gcc" with the native compiler name, (usually "cc" ).
    3. Change directory back to the base directory, …DQS and type "make" to restart the process.

    If only "warning" messages appear in the stdout results you can feel reasonably secure with the installation. However we will try to eliminate these in future releases and would appreciate receiving information on these occurrences. If an error fatal to the compilation occurs please contact the DQS support staff.

    INSTALL fails during the "make installbin" phase

    Once the make process has created the temporary executables in the ARCS directory they should be moved to their "final resting place" as chosen during the DQS config step. For operational installations this step should be performed as root. If the INSTALL script was started as non-root and the target directory requires root permissions the INSTALL process will fail at this point.

    If this occurs the administrator should switch to "root", change directory to …./DQS and type "make installbin".

    Since the DQS config process attempts to create the BIN target directory, this phase may generate several warning messages that "directory already exists". Ignore these warnings. If, however the message is "error, permission denied", the process should be repeated in "root" mode.

    To prevent confusion between DQS313 binaries and previously installed versions we have appended the string "313" during the installbin process. The usual next step is to provide soft-links in /usr/local/bin to these binaries something of the form:

    "ln -s /usr/local/DQS/bin/qmaster313 /usr/local/bin/qmaster

    INSTALL fails during the "make installconf" phase

    After the binaries have been installed in their directory the 'resolve_file" and "conf_file" will be moved to their target directory, ( a possible default might be "/usr/local/DQS/common/conf" ). In our "quick install example" this process should proceed automatically. If the INSTALL script was initiated by a non-root user and the destination directory is restricted to a root-user this step will fail with a "permission denied" error message. However when a series of different platform types are being aggregated into a single cell, only one conf_file and resolve_file need be moved to the common/conf directory. If this has already been done then this step can be skipped.

    Startup of the qmaster fails.

    The principle reason for the qmaster not executing during initial testing is the absence of the /etc/services entries directed by the installation process. The err_file should be examined. Warning messages bout absent hosts, acl and complex files should be ignored. Look for an entry "Bad Service" which points to the /etc/services file.

    An obvious error, but one which occurs often is trying to start the qmaster in user-mode while the RESERVED_PORTS TRUE appears in the conf_file.

    If attempts at starting the qmaster fail, after checking root-mode and the .etc/services file. the administrator should set the environment variable DEBUG to 1 and then restart the qmaster as follows : "qmaster313 >&debug.out &" (assuming a C shell environment). After the qmaster crashes send the file "debug.out" to the DQS support staff.

    Startup of the dqs_execd fails

    The principle reason for the dqs_execd not executing during initial testing is the absence of the /etc/services entries on its host as directed by the installation process. The err_file should be examined. Warning messages should be ignored. Look for an entry "Bad Service" which points to the /etc/services file.

    An obvious error, but one which occurs often is trying to start the dqs_execd in user-mode while the RESERVED_PORTS TRUE appears in the conf_file.

    If the dqs_execd is not able to check-in with the qmaster during dqs_execd startup the daemon will shut down ( once executing the dqs_execd will not shut down if the qmaster is absent). Make sure the qmaster is running before attempting to start the dqs_execd.

    If attempts at starting the dqs_execd fail, after checking root-mode and the .etc/services file. the administrator should set the environment variable DEBUG to 1 and then restart the qmaster as follows : "dqs_execd313 >&debug.out &" (assuming a C shell environment). After the qmaster crashes send the file "debug.out" to the DQS support staff.

    Startup of qconf fails

    If the first attempt at using qconf produces error messages and the qconf terminates there are several possible causes:

    1. The user is attempting to execute qconf in root-mode while the MIN_UID and MIN_GID are non zero. For security reasons root users are not normally permitted to execute DQS utilities unless the MIN_UID is set to zero.
    2. qconf is being started in user-mode but the utility itself is NOT owned by root and does not have the permissions for the owner set correctly. This can occur when a manager uses a path to the ARCS directory for the utility rather than the BIN_DIR target where installbin is supposed to put all DQS binaries.
    3. qconf is being started on a host not yet known to the qmaster. Here we have a cart-and-horse situation. We need to use the qconf function to add hosts, but cannot execute qconf because its host is not "legal". The only solution is to initiate qconf on the same host where the qmaster resides.

    qstat display shows queue status as UNKNOWN

    During the initial test phase, the manager will have created one queue using qconf. After it has been created, execution of qstat should show the presence of a queue and a status of DISABLED. An UNKNOWN status indicates a failure of the dqs_execd to contact the qmaster in the time prescribed as MAX_UNHEARD in the conf_file. Check the err_file for messages relating to the dqs_execd being unable to contact the qmaster. Since the dqs_execd would not even start if it could not check in with the qmaster, some new problem must have developed.. Check to see if the dqs_execd is still running.

    qsub fails to submit test job

    The test script should be accepted by the DQS system at this point with no problem, since utility<->qmaster interaction has been operating successfully in the previous steps. The most likely reason for a failure of this qsub test is represented by a message of the form "ALARM CLOCK shutdown".. This is due to the qmaster or the network interfaces being overburdened. Often the host on which the qmaster is running may be executing some non-DQS managed computational hog. If the ALARM message occurs try increasing the ALARM values in the conf file and re-executing the qsub command. (note that for this experiment the dqs_execd and qmaster need not be restarted after changing the conf_file, as the qsub is the only one complaining. However if the new values of ALARM' parameters prove satisfactory the daemons should be restarted as soon as practicable.).

    test job end with no output

    If the permissions for the user submitting the test script are not sufficient for the target host the job launching process will be terminated and a message sent to the err_file. An accounting record will also be sent to the DQS act_file. Check these files for information.

    Test script produces a non-zero length stderr file

    The test script should create two output files, one containing stdout information and the other the stderr output. If the stderr output is not zero length than some "very unlikely" event occurred during the job execution. Examine this stderr file and the err_file to determine what the cause was.

    Operational errors

    Once the system has succeeded in running the test script, the administrator will configure hosts, queues and resources for its operational settings. A myriad of situations can then occur which may appear to be, or in fact are, DQS system errors. For this reason DQS produces a large number of informational, warning and error messages which are posted to the system err_file.

    In the event that an operational aberration is detected the err_file should be examined closely. If no explanations are obvious,. The DQS support staff should be contacted and sent a relevant extraction from the err_file and act_file.

    Privado

    Album

    Albumes


    Docencia 2014/15


    Docencia 2013/14

    Docencia 2012/13

    Docencia 2011/12

    Docencia 2009/10

    Docencia 2008/09

    Docencia 2007/08

    Docencia 2006/07

    Docencia 2005/06


    Algunas estructuras

    Amigos CABM


    NMRLab at CABM


    [ Powered by Apache ]