OAR_logo

HOWTO

  1. Introduction
  2. User guide
    1. Description of the different commands
    2. Visualisation tools
  3. Admin guide
    1. Admin commands
    2. Database scheme
    3. Some tricks



    1. Introduction

          OAR is a resource manager or (batch scheduler) for large clusters. In functionnalities, it's near of PBS, LSF, CCS and Condor. It's suitable for productive plateforms and research experiments.

    2. User guide

            2.1. Description of the different commands
           
                All user commands are installed on cluster login nodes. So you must connect to one of these computers first.                         This command prints jobs in execution mode on the terminal.

                        Options:
                            -f prints each job in full details
                            -a prints more details and keeps table format

                        Examples:
                            # oarstat
                            # oarstat -f

                         This command prints informations about cluster nodes (state, which jobs on which nodes, node properties, ...)
      
                         Example:
                            # oarnodes

                         The user can submit a job with this command.
                         So what is a job in our context?
                            A job is defined by needed resources and a script/program to run. So, the user must specify how many nodes and what kind of resources needed by his application. Thus, OAR system will give him or not what he wants and will control the execution.
                            When a job is launched, OAR executes user program only on the first reservation node. So this program can access some environnement variables to know its environnement:
                               $OAR_NODEFILE    contains the name of a file which lists all reserved nodes for this job
                               $OAR_JOBID            contains the OAR job identificator
                               $OAR_NB_NODES   contains the number of reserved nodes

                         Options:
                            -q queuename : specify the queue for this job
                            -I : turn on INTERACTIVE mode (OAR gives you a shell instead of executing a script)
                            -l : defines resource list requested for this job; the different parameters are:
                               nodes : request number of nodes
                               weight : the weight that you want to reserve on each node
                               walltime : Request maximun time. Format is [hour:mn:sec|hour:mn|hour]; after this elapsed time, the job will be killed
                            -p "properties" : specify with SQL syntax reservation properties
                            -r "2004-05-11 23:32:03" : ask for a reservation job to begin at the date in argument
                            -c jobId : connect to a reservation in Running state
                            -v : turn on verbose mode

                         Exemples:
                            # oarsub test.sh
                            (the "test.sh" script will be run on 1 node of default weight in the default queue with a walltime of 1 hour)
                            # oarsub -l nodes=2,walltime=2:15:00 test.sh
                            (the "test.sh" script will be run on 2 nodes of default weight in the default queue with a walltime of  2:15:00)
                            # oarsub -p "hostname = 'host2' OR hostname = 'host3'" test.sh
                            (the "test.sh" script will be run on the node host2 or on the node host3)
                            # oarsub -I
                            (gives a shell on a node)

                         The user can delete his jobs with this command.

                         Exemples:
                         # oardel 14
                         (delete job 14)

            2.2. Visualisation tools

                         This is a web cgi normally installed on the cluster frontal. This tool executes oarnodes and oarstat then format data in a html page. Thus you can have a global view of cluster state and where your jobs are running.
                         (Monika screenshot)

                         This is also a web cgi. It creates a Gantt chart which shows job repartition on nodes in the time. It is very usefull to see cluster occupation in the past and to know when a job will be launched in the futur.
                         (DrawOARGantt screenshot)

    3. Admin guide

            3.1. Admin commands

                         This comman must be run by oar user. It change node state dynamically or add a new node in OAR database if it does not already exist.

                         Options:
                            -s : specify the new node state (Alive, Absent or Dead)
                            -h : specify node name
                            -w : specify mawWeight for this node. This option is relevant only if this is a new node otherwise it is not interpreted.

                        Exemple:
                            # oarnodesetting -s Alive -h host1.imag.fr -w 2
                            (add a new node "host1" in OAR database with a maxWeight od 2)
                            # oarnodesetting -s Absent -h host1.imag.fr
                            (turn node "host1" in Absent state. So it will be unaccessible in OAR)

            3.2. Database scheme

    TABLE jobs :
        Each oarsub inserts a new line in this table.

            idJob INT UNSIGNED NOT NULL AUTO_INCREMENT
                job identity number. it is given to users when they submit a job.
            jobType ENUM('INTERACTIVE','PASSIVE') DEFAULT 'PASSIVE' NOT NULL
                INTERACTIVE means that the user ask for a shell on the reserved nodes. PASSIVE means that the job is a script/executable to run on the reserved nodes.
            infoType VARCHAR( 255 )
                string with syntax "host:port". This field is not NULL for interactive jobs. "host" is the host where oarsub command was launched and port is the socket port where oarsub waits for a connection. When the interactive job is run, OAR connects to the oarsub socket to wake it up and launch its bipbip (log on the reserved nodes).
            state ENUM('Waiting','Hold','toLaunch','toError','toAckReservation','Launching','Running','Terminated','Error') NOT NULL
                this is the job state.
            message VARCHAR( 255 )
                log message. This is usefull when a job is in ERROR and we want to know why.
            user VARCHAR( 20 ) NOT NULL
                user name.
            nbNodes INT UNSIGNED NOT NULL
                number of nodes to reserve.
            weight INT UNSIGNED NOT NULL
                weight to reserve on each node
            command VARCHAR( 255 ) NOT NULL
                the command to launch if it is a PASSIVE job.
            bpid VARCHAR( 255 )
                string with syntax "host:pid:port". "host" is the hostname where bipbip is launched. "pid" is the process id of bipbip. "port" is the socket port which is opened by bipbip. Leon connects to this socket to give kill instructions.
            queueName VARCHAR( 100 ) NOT NULL
                queue used for this job.
            reservation ENUM('None','toSchedule','Scheduled') DEFAULT 'None'  NOT NULL ,
                This is for job in reservation mode (different states for a reservation job).
            maxTime TIME NOT NULL
                walltime for this job.
            properties VARCHAR( 255 )
                this string is a sub-request to give constraints on nodes that this job wants. It is a "WHERE" clause SQL syntaxe on the table nodeProperties.
            launchingDirectory VARCHAR( 255 ) DEFAULT ' ' NOT NULL
                This the folder where user has launched oarsub command.
            submissionTime DATETIME NOT NULL
                Time when the job was inserted in the database.
            startTime DATETIME NOT NULL
                Time when the job was started its execution.
            stopTime DATETIME NOT NULL
                Time when the job was finished or killed.


    TABLE admissionRules :
        This table is used when a new job is submitted. You can give default behavior when all properties are not given by the user. For example you can specify a default walltime when it is not set on the oarsub command line.

            rule VARCHAR( 255 ) NOT NULL
                this a string in Perl langage.


    TABLE nodes :
        This table contains node informations.

            hostname VARCHAR( 100 ) NOT NULL
                node name.
            state ENUM('Alive','Dead','Suspected','Absent') NOT NULL
                node state:
                    - Alive : this node can be reserved
                    - Absent : the node is not in pool but will come back soon.
                    - Dead : this node is out of order and will not come back soon.
                    - Suspected : OAR suspects that the node is down.
            maxWeight INT UNSIGNED DEFAULT 1 NOT NULL
                maximum weight for the node. For example you can give a weight of 2 for a dual processor computer, thus users can reserved half nodes.
            weight INT UNSIGNED NOT NULL
                current weight used on this node.
            nextState ENUM('UnChanged','Alive','Dead','Absent','Suspected') DEFAULT 'UnChanged' NOT NULL
                this field is used for dynamic nodes. When a node wants to change its state, this field is set to the next state and OAR manages this action. For example OAR can kill some jobs when a node go out.

    TABLE CREATE nodeState_log :
            hostname VARCHAR( 100 ) NOT NULL
                node hostname.
            changeState ENUM('Alive','Dead','Suspected','Absent') NOT NULL
                after change node state.
            date DATETIME NOT NULL
                event date.

    TABLE nodeProperties :
        This table specify some node properties. You can add all properties that you want (just add new fields).

            hostname VARCHAR( 100 ) NOT NULL
                hostname that you can find in the nodes table.
            besteffort ENUM('YES','NO') DEFAULT 'YES' NOT NULL
                This property indicates if a node accepts or not besteffort job.


    TABLE processJobs :
        This table links current jobs and nodes (you can know where a is launched)

            idJob INT UNSIGNED NOT NULL
                job identity
            hostname VARCHAR( 100 ) NOT NULL
                node where the job is running


    TABLE processJobs_log :
        This is the same table as processJobs but it contains old jobs.

            idJob INT UNSIGNED NOT NULL
            hostname VARCHAR( 100 ) NOT NULL


    TABLE fragJobs :
        When a job is killed, this table is set up

            fragIdJob INT UNSIGNED NOT NULL
                job identity to kill
            fragDate DATETIME NOT NULL
                request date.
            fragState ENUM('LEON','TIMER_ARMED','LEON_EXTERMINATE','FRAGGED') DEFAULT 'LEON' NOT NULL
                job kill state:
                    - LEON : "soft" Leon must be run on this job.
                    - TIMER_ARMED : a Leon was launched and we wait the end of this job.
                    - LEON_EXTERMINATE : "hard" Leon must be run on this job.
                    - FRAGGED : job is fragged, nothing to do.


    TABLE queue :
        This table give the right scheduler for a queue.

            queueName VARCHAR( 100 ) NOT NULL
                queue name that you can also find in the job table.
            priority INT UNSIGNED NOT NULL
                queue priority.
            schedulerPolicy VARCHAR( 100 ) NOT NULL
                program name that corresponds to the scheduler which implements this policy.
            state ENUM('Active','notActive') NOT NULL DEFAULT 'Active'
                you can activate or not a queue.

    TABLE ganttJobsPrediction :
        This table store scheduler decisions. You can know when a job will start.

            idJob INT UNSIGNED NOT NULL
                job identity.
            startTime DATETIME NOT NULL ,
                date when the job will start.

    TABLE ganttJobsNodes :
        This table indicates which nodes a job will be assigned to a job
            idJob INT UNSIGNED NOT NULL
                job identity.
            hostname VARCHAR( 100 ) NOT NULL
                assigned node by scheduler

    DEFAULT DATA IN DATABASE:
        INSERT IGNORE INTO `admissionRules` ( `rule` ) VALUES ('if (not defined($maxTime)) {$maxTime = "1:00:00";}');
            The default walltime is 1 hour.
        INSERT IGNORE INTO `admissionRules` ( `rule` ) VALUES ('if (not defined($queueName)) {$queueName="default";}');
            The default job queue is default.
        INSERT IGNORE INTO `admissionRules` ( `rule` ) VALUES ('if ((defined($maxTime)) && ($jobType eq "INTERACTIVE") &&
(sql_to_duration($maxTime) > sql_to_duration("12:00:00"))) {$maxTime = "12:00:00";}');
            The maximum walltime for an interactive job is 12 hours.
          INSERT IGNORE INTO `admissionRules` ( `rule` ) VALUES ('if (($queueName eq "admin") && ($user ne "oar")) {$queueName="default";}');
             oar user can use the admin queue. So he can pass before all waiting other jobs.

        INSERT IGNORE INTO `queue` (`queueName` , `priority` , `schedulerPolicy`) VALUES ('default','1','oar_sched_fifo_queue_killer');
            Define the scheduler to use for the default queue.
        INSERT IGNORE INTO `queue` (`queueName` , `priority` , `schedulerPolicy`) VALUES ('besteffort','0','oar_sched_fifo_queue');
            Define the scheduler to use for the besteffort queue.

            3.3. Some tricks