Página Personal de Roberto Tejero |
|
|||||||||||||||||||||
Presentación Detalles Investigacion Proyectos Publicaciones Modelización Mecánica Molec. Dinámica Molec. Homología Análisis estruc. Granjas PC Hardware Software |
Distributed Queueing System - 3.1.3User GuideMarch 31, 1996Table of Contents
IntroductionDQS is actually a simple system which provides a multitude of options to accommodate the requirements of a wide variety of sites and users. As the number of options increase, as they do with each succeeding generation of DQS, a user might mistakenly come to view the system as quite complex. This user guide is intended to provide an introduction to DQS for the new user as well as explaining those features most often used by the experienced user. In particular the concept of "resources" is explored with attention focused on the new DQS 3.1.3 feature, "consumable resources". A DQS jobAny job which a user needs to execute on one or more computers can be a "DQS job". For those whose sole contact with computers has been through the means of personal UNIX workstations the concept of running their jobs in a "batch" mode may be somewhat disconcerting. Users accustomed to submitting their jobs to mainframe computers will be more familiar with the attributes of DQS. But, unlike the mainframe system the DQS batch environment customarily includes multiples of autonomous UNIX based computational platforms heterogeneous in hardware architecture and operating system variant. In its most fundamental form, a DQS job is an extension of a UNIX script used to run an application, as one might even on their own personal workstation. Let us use the "traditional" example of a FORTRAN compilation and execution of a simple application: FTN test.f -o test testwhere test simply produces the classical "Hello World" output which is sent to standard error. If we then wish to run this same application within a UNIX script we would create a file called "test.run" with the following lines: #!/bin/sh FTN test.f -o test 2> test.errors test > test.out Note that we redirect the stdout and stderr files to "test.out" and "test.errors" respectively. This script would then be executed by the user on a machine of their choice, most likely their own workstation. What then is needed to turn this script into a DQS job ? Nothing, as long as one doesn't care what machine it will be executed on. All that is needed is to "submit" this job to the DQS batch queuing system. Submitting a jobThe simple example becomes a "DQS job" by submitting it to the DQS system with the "qsub" utility : qsub test.runThe qsub utility will contact the qmaster and request that the job be "validated" for execution within the system. This "validation" process of determining whether or not the not the job requires something which does not exist in the current system. Since our test script makes no obvious requests for resources (the FTN command is not recognized as a request for a compiler resource known by DQS) all that is needed is for any host in this hypothetical cell to be idle, and available to execute the job. Let us now take advantage of some basic DQS facilities. First we would like to have an email message sent to us upon job termination. We must instruct DQS to perform this task by inserting a "DQS directive" into the test.run script. By default DQS interprets any line of script as "DQS directive" if the first two characters of the line are the string "#$". This can be changed by the user (see qsub -C option in the Reference Manual). Thus we add one line to our script: #!/bin/sh #$ -me FTN test.f -o test 2> test.errors test > test.out The DQS directive "#$ -me" tells the system that a mail message should be sent to the person submitting the job at the end of the job. We could also have directed that we wish to have a mail message sent at the beginning of the job and also if the job aborts with the directive "#$ -meab". The order of the symbol 'e' 'a' and 'b' in this list is not significant. Note that the directive could also be communicated with DQS on the qsub command line. Instead of inserting the directive in the script, we could perform the submission with: qsub -me test.runIn cases where only a few directives are needed this approach might be used, but as the user will see many job submissions will benefit from more complex sets of DQS directives which are better "captured" in the job script. Querying job statusOnce a user has relinquished their job to the "welcoming arms" of a queuing system they need a means for monitoring and controlling its destiny. A first step is to query the system to establish the status and "DQS identity" of the job. The "qstat" utility is used to display the state of queues and jobs. There are three forms of this display:
The simplest command then to get in touch with our job is to execute the command: qstatand scan through the output looking for jobs we have submitted. Instead of being deluged with information about every other job in the system one can execute: qstat -u <my user name>where <my user name> is the login name of the user who submitted the job. The output of this variant might look like: <my user name> my-job-name dqs-job-number 0:0 QUEUED 03/25 20:40which would indicate with some dismay accruing to <my user name> in that the job is not RUNNING on any machine in the system. But it is queued with a priority of zero (the leftmost digit from "0:0"). And our sub-priority is zero (rightmost digit) indicating that there are no prior jobs for this user. Or more optimistically the display might offer: <my user name> my-job-name a-host-name dqs-job-number 0:1 r RUNNING 06/26/97 12:35: 8which would hearten us in our endeavors, because our job is (apparently) executing. Let us examine the output of qstat -f -u: capuchon$ qstat -f -u user Queue Name Queue Type Quan Load State ---------- ---------- ---- ---- ----- hardhat batch 1/1 0.00 er UP user foo 144 0:1 r RUNNING 06/26/97 12:43:42
The job description is a bit less cryptic. The entry begins with user and followed by the DQS assigned job number (144). The values 0:1 give the submission priority of the job, defaulted to zero and the sub-priority :1 which indicates that this is the first job running for this user. The submission priority is assigned by the user with the qsub option flag "-p" while the sub-priority is an internal parameter computed during each scheduling pass for all the queues. The command "qstat -ext" produces a comprehensive display of queue and job parameters as well as the status obtained with the "-f" option. Discussion of relevant portions of these extended displays will appear in later sections. Modifying a job requestOften a user will find one or more of their jobs in the pending queue awaiting assignment to an execution queue. After review of their pending jobs, this user may decide to change the jobs submission parameters to affect the jobs future scheduling. One method for this would be to delete the job and resubmit it. A more convenient technique is to use the QALTER utility to modify one or more of the parameters which the user assigned at the time of QSUB, or defaulted by DQS when not explicitly designated by the user. In the simple example given here, the user provided no parameters to the QSUB command and hence the submission priority has been set to the default value of zero. If the user wishes to increase that priority the QALTER utility would be invoked with : qalter -jid <job number> -p <new priority>The <job number> is that which DQS assigned to the job in the pending queue, and the <new priority> value must be in the range -1024 to +1023. Except for the job number, any parameter which can be employed with the QSUB command can be used with the QALTER command, including replacing the script file which originally accompanied the QSUB command. The QALTER command may not be used for jobs already in the RUNNING state, with exception of the return of "consumable resources" (see below). Holding, deleting jobsThe user has a number of tools available to work with their jobs once the those jobs are in the queuing system. For example they may decide to place a "hold" on one of their jobs in the pending queue so that another job may progress ahead of it or to delay scheduling until some other event or job has occurred. First the user may chose to submit a job to the system with a "hold" placed on the job at the time of the submission. This step involves the use of the "-h" option in the QSUB command. Once a job is submitted the user can use the "QHOLD" utility to place a hold on a job if it is still in the PENDING queue. The QHOLD uses the same "-h" option. The "-h" option is used for system administration tasks as well as user access. Thus the DQS 3.1.3 Reference Manual describes four alternatives. The user is permitted only the "u" (or user hold) or the "n" (no hold) variants. Thus at job submission the user might place a hold: qsub ... -h u ... test.runOr if the job is in the pending queue: qhold -jid <job number> -h uOnce a "hold" has been placed on a job in the pending queue it will not be considered eligible for scheduling until it has either been "released" from the hold or it is deleted from the queue entirely. A job can be released from a user invoked "hold" with the QRLS utility: qrls -jid <job number> -h uor the user may modify the "hold" state by using the QALTER command: qalter -jid <job number> ... -h nWhich will set the user accessible hold to "none". A user may delete one or more of their own jobs from the queuing system if the jobs are in either the pending queue or the executing queue: qdel <job number>or: qdel <job number>,<job number>,...Note that the job numbers are separated by commas and NOT spaces. Requesting resourcesThe simple example we have been using so far (test.run) has made no unusual demands for system resources. It presumes that all queues in the system have a FORTRAN compiler and that the FORTRAN dialect in our test program is consistent with all the compilers. Further, memory, disk-space and data-base locality are also not consequential in this example. These are unrealistic assumptions in most cases. Most sites using DQS contain heterogeneous collections of hardware and software and often subdivide these collections into types of use (long-term jobs , short-term jobs, etc.). The DQS administrator is supplied with tools for organizing the system and defining resources to be accessible by the user. Typical resources are CPU memory sizes. hardware architecture and operating system versions. Hard and soft resourcesMost jobs will have one or more imperative requirements. One of the most common is the need for a particular hardware/software system (i.e. AIX-3.2.5). By default requested resources are considered essential (or "hard") unless the user precedes the request in the QSUB command with the option "-soft'. Requirements for multiples of various resources in parallel jobs, such as 2 or more CPUs can be either "hard" or "soft". Many users choose to request at least 2 CPUs to run their parallel job and then request more processors following the option "-soft" flag in the QSUB command line or job script. While a non-parallel user might expect to use the "-soft" option for a request of the form "I need at least 32 MB of memory but would be much happier with 64 MB), most site resource allocations will not make effective use of such a request. The most common use of the "-soft" option for non-parallel jobs is to state a preference for a queue without making it a "hard" demand. Consumable resourcesSite resources are by and large static over periods of time like days or weeks. CPU memory sizes and CPU computing power are not subject to moment-by-moment changes. When they are modified the DQS site manager can adjust the resource descriptions to match the new configurations. There is a class of resource which does vary within short periods of time. A very common commercial practice, these days, is to manage software licenses for Compilers, Data Base Managers, etc. dynamically at a given site. Many sites do not purchase licenses for all of their extant platforms. A job submitted to DQS must not be scheduled for execution if that job needs one or more software licenses in order to complete but those licenses are already in use by another job. Another common form of a time-varying resource would be the amount of shared memory available to a processor in a shared-memory multi-processor system. Shared local disk space might be another resource which is depleted and restored as jobs startup and terminate. Resources of this type are called, by DQS, "consumable resources". Forming resource requestsA user specifies the resources they require in the QSUB command line or in the DQS script file. A most direct method is to identify a specific queue as the place for the submitted job to execute: qsub -q <my queue>That request will require <my queue> for execution. If the user would prefer, but not insist on that queue they might make the command line request: qsub -soft -q <my queue>Note that DQS scans the command line and script commands from left to right. During that process any resource requests to the right of a "-hard" or "-soft" option flag will be interpreted as requiring that type of resource. Hence one could mix hard and soft resources thus : qsub -soft -q <my queue> ... -hard <some other resource>... The typical job request will not demand a specific queue. Instead the user will request one or more classes of resources which have been established by the DQS administrator. Let us presume a site with three different hardware platform architectures for which there are several CPUs available each. The site administrator has named the resources with their operating system tags, AIX325, IRIX53, SOLARIS24. In addition this example site will own one FORTRAN license each for the different operating systems. The administrator will name these , XLF, SGIFTN and FORTRAN. To further complicate our example, each brand of CPU has a different amount of memory on each of its three separate CPUs, 32 Megabytes, 64 Megabytes and 128 Megabytes. The example we have been using (test.run) will now be submitted in a more realistic manner: qsub -me -l AIX325.and.(mem.gt.32).and.(XLF.eq.1) test.runThe command line now has the resource request appended to it. Requests for resources other than specific queue names begin with the "-l" flag and consists of a string of resource names, interspersed with logical and relational operators. Since the string must have NO imbedded blanks, parenthesis make be used to aid readability. The resource request is interpreted by DQS as follows:
A command line or DQS script may contain one or more request strings beginning with the "-l" option flag. Each one of these strings will request at least one queue to meet the requirement. Thus: qsub -l AIX325 -l AIX325Would request that two queues/CPUs be allocate to this job. This same request can be restated more simply: qsub -l (qty.eq.2).and.AIX325Depending upon the topology of the DQS site and the requirements of a given job, resource requests can contain a number of elements. Obviously parallel jobs will require more complex resource requests than simple single-processor jobs. Note: Relational operators can be given in FORTRAN or "C" syntax (.eq. == , .ne. != , .lt. <, .gt.. > , .le. <=, .ge. >= ). Logical operators can also be given in either language syntax ( .and. &&, .or. ||, .not. !). For compatibility with DQS 3.2.4 the comma (.) may be used in place of the logical ".and." operator. The consumable resource "XLF" requested by the job can be returned to the license pool by a RUNNING job by executing the DQS command QALTER with the "-rc" option: qalter -rc XLF=1This command would return one XLF license to the system "Potential" resourcesDQS 3.1.3 performs a pre-validation of jobs before accepting them into the queuing system. This pre-validation consists of searching all queue definitions to see if the "hard" resources requested for the job actually exist, even if they may be in use by some other job at the time this job was submitted. If all of the "hard" resources so not exist, the job is rejected, and an error message with the reason for the rejection returned to the QSUB utility and displayed for the user. In some cases a user may be aware that a resource (such as a new) queue will be added or returned to the DQS at some point in the future. They may wish to submit their job and place it into the pending queue to await the appearance of the new resource. This can be done by adding the "FORCE REQUEST" flag ('-F')to the QSUB command line or DQS script: qsub -F -l (wild_eyed_scheme).and.mem.gt.1000000The "-F" flag should be used with care as no pre-validation is performed and a job may have an erroneous resource request which will leave it "orphaned" in the pending queue until someone deletes it at a later time. Moving jobsOnce a job has been placed into the RUNNING state and is executing in one or more queues its parameters cannot be modified nor can it be moved to another location in the system. Pending jobs can be moved from one target queue to another by one of the following methods:
Cells and queuesWhat is a cell? It is the collection of computer hosts and DQS software which make up a single entity managed by a daemon called the "qmaster". Each host on which jobs can be executed runs a daemon called "dqs_execd". A "trusted host" is any host that can submit jobs to a cell; all of the hosts running qmaster or dqs_execd are trusted and additional hosts can be added. When a job is submitted with qsub, it is sent to the qmaster daemon, which holds it until it is ready to be executed. Then the qmaster daemon passes the job to the selected dqs_execd daemon, which runs the job. The qmaster also contacts the dqs_execd daemons to gather statistics, such as the load on each host, to assist in selecting which host to send a new job to. A DQS site may have more than one "cell". The site administrator may choose to keep each cell independent and separate from the others. On the other hand they may organize the system so that one or more cells will have authorized communications with others. If so, a user logged into a host in one cell can submit jobs to the other cells, or they can perform the QSTAT function for the other cells. QmoveThe user can move one of their jobs in a pending queue in one cell to the pending queue in another cell. The qmove utility is provided for this inter-cell transfer purpose only. The usual command would be: qmove <job number>@CELL_C2Which would move the numbered job from CELL_C2 to the cell in which the qmove utility is being executed. Where a user in CELL_C3 wishes to move a job from CELL_C2 to CELL_C1 the command would be: qmove -cell CELL_C1 <job number>@CELL_C2The effects of this move process can be somewhat surprising:
Suspending queues and jobsThe user will note that from time to time one or more queue may display the SUSPENDED status. When this occurs any job executing on that queue is suspended also, but NOT terminated. As the queue is un-suspended the job is continued from the point where it was submitted, During the period of its suspension all of its files remain open and all memory and paging space allocated to the job remain in that state. When does a queue get suspended? The DQS administrator and anyone designated as the queue's owner can suspend that queue using the QMOD command. There is one additional method which may appear in some site configurations. If a queue is assigned to a host which is also serving as the personal workstation for some user of the system, they may chose to use the QIDLE command at that workstation. This utility is a X-windows facility which monitors the keyboard and mouse on a workstation. If these devices are being used the QIDLE facility will suspend the queues on that workstation. One additional means by which a queue may be suspended is when it is designated as a subordinate queue to another queue, by the DQS administrator. The usual application of this facility is when a host serves both a s a parallel and single processor resource. The single processor queue is made subordinate to the parallel queue. When a parallel job is started the subordinate queue and any job being executed there will be suspended. Parallel jobsA major feature of DQS is its support for the scheduling and management of parallel jobs to be run on two or more of the hosts in a system. There are three components in submitting parallel jobs:
For example: qsub -me -l (qty.eq.4).and.(exec.eq.mpirun).and.AIX325This will request four AIX325 hosts to run a parallel job. After the job is put into execution, but before the user's job script is executed, the function "mpirun" will be executed in the working directory of that user. Job execution environmentThe simple "test.run" example we have been using so far will have operated with the following characteristics
For detailed instructions on changing the jobs' environment please see QSUB in the DQS reference manual. DQS scheduling strategiesOnce a job has disappeared into the maw of DQS it is subjected to a variety of manipulations which are intended to utilize the entire system resources in the most optimum way while ensuring that each user is given "fair" access to those resources. The default operation of the scheduler is often adapted by each site to its own requirements. The basic process consists of:
After a job has been validated as to requesting "real" resources, it is tested against the site's queues to determine which ones it would be eligible for. Of the eligible queues , the values of the "maximum user jobs" for each queue is extracted and the smallest one selected. At the same time the number of jobs in RUNNING state for this user is computed. If the minimum queue-maximum-user-jobs is not greater than the number of that user's jobs RUNNING.. the job is rejected at QSUB time and an error message returned to the user. This last scheduling pre-validation most certainly may confuse the reader but it is the core of the "fair play" method developed at SCRI and needs to be used for a while to demonstrate its behavior and value. Problem solvingEven when one starts with the simple test case with which we began this User Guide. It is possible to get into one or more dead-ends on one's first, second, or whatever ,attempts at using the DQS. We will proceed through a number of typical problems which a user may encounter along the way:
The DQS error file (err_file) and accounting file (acc_file) contain valuable information which can assist the knowledgeable user the means for analyzing and correcting their problems with the system. Refer to DQS 3.1.3 Error Messages for further information. |
|
||||||||||||||||||||
|