Europar Tutorial GRMSr

download Europar Tutorial GRMSr

of 153

Transcript of Europar Tutorial GRMSr

  • 8/13/2019 Europar Tutorial GRMSr

    1/153

    University Dortmund

    Tutorial

    Grid Resource

    Management and Scheduling

    EuroPar 2004

    Pisa, Italy

    Ramin Yahyapour

    CEI, University of Dortmund

  • 8/13/2019 Europar Tutorial GRMSr

    2/153

    2

    Agenda

    Background on Resource Management and Scheduling

    Job scheduling and resource management on HPC resources

    Transition to Grid RM and Grid Scheduling

    Current state-of-the-art

    Future of GRMS

    Requirements, outlook for upcoming GRMS

  • 8/13/2019 Europar Tutorial GRMSr

    3/153

    3

    Introduction

    We all know what the Grid is

    one of the many definitions:

    Resource sharing & coordinated problem solving in dynamic, multi-

    institutional virtual organizations (Ian Foster)

    however, the actual scope of the Grid is still quite controversial

    Many people consider High Performance Computing (HPC) as the

    main Grid application.

    todays Grids are mostly Computational Gridsor Data Gridswith HPCresources as building blocks

    thus, Grid resource management is much related to resource

    management on HPC resources (our starting point).

    we will return to a broader Grid scope and its implications later

  • 8/13/2019 Europar Tutorial GRMSr

    4/153

    4

    Resource Management

    on HPC Resources

    HPC resources are usually parallel computers or large scale clusters

    The local resource management systems (RMS) for such resources

    includes:

    configuration management

    monitoring of machine state

    job management

    There is no standard for this resource management.

    Several different proprietary solutions are in use.

    Examples for job management systems: PBS, LSF, NQS, LoadLeveler, Condor

  • 8/13/2019 Europar Tutorial GRMSr

    5/153

    5

    HPC Management Architecture

    in General

    Compute Resources/Processing Nodes

    Master

    Server

    Control Service

    Job Master

    Resource/

    Job Monitor

    Resource/

    Job Monitor

    Resource/

    Job Monitor

    Compute

    Node

    Compute

    Node

    Compute

    Node

    Resource and Job

    Monitoring and Management Services

  • 8/13/2019 Europar Tutorial GRMSr

    6/153

    6

    Computational Job

    A job is a computational task that requires processing capabilities (e.g. 64 nodes) and

    is subject to constraints (e.g. a specific other job must finish before thestart of this job)

    The job information is provided by the user resource requirements

    CPU architecture, number of nodes, speed

    memory size per CPU

    software libraries, licenses

    I/O capabilities

    job description additional constraints and preferences

    The format of job description is not standardized, but usually verysimilar

  • 8/13/2019 Europar Tutorial GRMSr

    7/153

    7

    Example: PBS Job Description

    Simple job script:

    #!/bin/csh

    # resource limits: allocate needed nodes#PBS -l nodes=1

    ## resource limits: amount of memory and CPU time([[h:]m:]s).

    #PBS -l mem=256mb#PBS -l cput=2:00:00

    # path/filename for standard output#PBS -o master:/mypath/myjob.out

    ./my-task

    whole job file is a shell script

    information for the RMS are

    comments

    the actual job is started in the

    script

  • 8/13/2019 Europar Tutorial GRMSr

    8/153

    8

    Job Submission

    The user submits the job to the RMSe.g. issuing qsub jobscript.pbs

    The user can control the job

    qsub: submit

    qstat: poll status information

    qdel: cancel job

    It is the task of the resource management system to start a job on the

    required resources Current system state is taken into account

  • 8/13/2019 Europar Tutorial GRMSr

    9/153

    9

    Job ExecutionJob & Resource

    Monitor

    Processing Node

    Job ExecutionJob & Resource

    Monitor

    Processing Node

    PBS Structure

    Job Submission

    Management

    ServerScheduler

    Job ExecutionJob & Resource

    Monitor

    Processing Node

    qsub jobscript

  • 8/13/2019 Europar Tutorial GRMSr

    10/153

    10

    Execution Alternatives

    Time sharing:

    The local scheduler starts multiple processes per physical CPU withthe goal of increasing resource utilization.

    multi-tasking

    The scheduler may also suspend jobs to keep the system load undercontrol

    preemption

    Space sharing:

    The job uses the requested resources exclusively; no other job isallocated to the same set of CPUs.

    The job has to be queued until sufficient resources are free.

  • 8/13/2019 Europar Tutorial GRMSr

    11/153

    11

    Job Classifications

    Batch Jobs vs interactive jobs batch jobs are queued until execution

    interactive jobs need immediate resource allocation

    Parallel vs. sequential jobs

    a job requires several processing nodes in parallel

    the majority of HPC installations are used to run batch jobs in space-

    sharing mode!

    a job is not influenced by other co-allocated jobs

    the assigned processors, node memory, caches etc. are exclusivelyavailable for a single job.

    overhead for context switches is minimized

    important aspects for parallel applications

  • 8/13/2019 Europar Tutorial GRMSr

    12/153

    12

    Preemption

    A job is preempted by interrupting its current execution the job might be on hold on a CPU set and later resumed; job still

    resident on that nodes (consumption of memory)

    alternatively a checkpoint is written and the job is migrated to anotherresource where it is restarted later

    Preemption can be useful to reallocate resources due to new jobsubmissions (e.g. with higher priority)

    or if a job is running longer then expected.

  • 8/13/2019 Europar Tutorial GRMSr

    13/153

    13

    Job Scheduling

    A job is assigned to resources through a scheduling process responsible for identifying available resources

    matching job requirements to resources

    making decision about job ordering and priorities

    HPC resources are typically subject to high utilization

    therefore, resources are not immediately available and jobs are

    queued for future execution

    time until execution is often quite long (many production systems have an

    average delay until execution of >1h)

    jobs may run for a long time (several hours, days or weeks)

  • 8/13/2019 Europar Tutorial GRMSr

    14/153

    14

    Typical Scheduling Objectives

    Minimizing the Average Weighted Response Time

    Maximize machine utilization/minimize idle time

    conflicting objective

    criteria is usually static for an installation and implicit given by the

    scheduling algorithm

    Jobsj

    j

    Jobsj

    jjj

    w

    )r(tw

    AWRT

    r : submission time of a job

    t : completion time of a job

    w : weight/priority of a job

  • 8/13/2019 Europar Tutorial GRMSr

    15/153

    15

    Job Steps

    Scheduler

    Schedule

    time

    lokale

    Job-Queue

    HPC

    Machine

    Grid-

    User

    Job Execution

    Management

    Node Job

    MgmtNode Job

    MgmtNode Job

    Mgmt

    Job

    Description

    A user job enters a job queue,

    the scheduler (its strategy)

    decides on start time andresource allocation of the job.

  • 8/13/2019 Europar Tutorial GRMSr

    16/153

    16

    Scheduling Algorithms:

    FCFS

    Well known and very simple: First-Come First-Serve

    Jobs are started in order of submission

    Ad-hoc scheduling when resources become free again

    no advance scheduling

    Advantage: simple to implement

    easy to understand and fair for the users

    (job queue represents execution order)

    does not require a priori knowledge about job lengths

    Problems:

    performance can extremely degrade; overall utilization of a machine can

    suffer if highly parallel jobs occur, that is, if a significant share of nodes is

    requested for a single job.

  • 8/13/2019 Europar Tutorial GRMSr

    17/153

    17

    FCFS Schedule

    Scheduler

    Schedule

    time

    Job-Queue

    Compute

    Resource

    Resources

    Procssing Nodes

    Time

    Queue

    1.

    2.

    3.

    4

  • 8/13/2019 Europar Tutorial GRMSr

    18/153

    18

    Scheduling Algorithms:

    Backfilling

    Improvement over FCFS A job can be started before an earlier submitted job if it does not

    delay the first job in the queue

    may still cause delay of other jobs further down the queue

    Some fairness is still maintained Advantage:

    utilization is improved

    Information about the job execution length is needed

    sometimes difficult to provide user estimation not necessarily accurate

    Jobs are usually terminated after exceeding its allocated execution time;

    otherwise users may deliberately underestimate the job length to get anearlier job start time

  • 8/13/2019 Europar Tutorial GRMSr

    19/153

    19

    Backfill Scheduling

    Scheduler

    Schedule

    time

    Job-Queue

    Compute

    Resource

    Queue

    1.

    2.

    3.

    4

    Job 3 is started before Job 2 as it does not delay it

    Resources

    Procssing Nodes

    Time

  • 8/13/2019 Europar Tutorial GRMSr

    20/153

    20

    Backfill Scheduling

    Scheduler

    Schedule

    time

    Job-Queue

    Compute

    Resource

    Resources

    Procssing Nodes

    Time

    However, if a job finishes earlier than expected, the backfilling causesdelays that otherwise would not occur

    need for accurate job length information (difficult to obtain)

    Job finishes earlier!

    Queue

    1.

    2.

    3.

    4

  • 8/13/2019 Europar Tutorial GRMSr

    21/153

    21

    Job Execution Manager

    After the scheduling process,the RMS is responsible for the job execution:

    sets up the execution environment for a job,

    starts a job,

    monitors job state, and cleans-up after execution (copying output-files etc.)

    notifies the used (e.g. sending email)

  • 8/13/2019 Europar Tutorial GRMSr

    22/153

    22

    Scheduling Options

    Parallel job scheduling algorithms are well studied; performance isusually acceptable

    Real implementations may have addition requirements instead of need

    of more complex theoretical algorithms:

    Prioritization of jobs, users, or groups while maintaining fairness

    Partitioning of machines

    e.g.: interactive and development partition vs. production batch partitions

    Combination of different queue characteristics

    For instance, the Maui Scheduler is often deployed as it is quite flexible

    in terms of prioritization, backfilling, fairness etc.

  • 8/13/2019 Europar Tutorial GRMSr

    23/153

    University Dortmund

    Transition to Grid Resource

    Management and Scheduling

    Current state of the art

  • 8/13/2019 Europar Tutorial GRMSr

    24/153

    24

    Transition to the Grid

    More resource typescome into play: Resources are any kind of entity, service or capability to perform a specific

    task

    processing nodes, memory, storage, networks, experimental devices, instruments

    data, software, licenses

    people

    The task/job/activity can also be of a broader meaning

    a job may involve different resources and consists of several activities in a

    workflow with according dependencies

    The resources are distributed and may belong to different administrative

    domains

    HPC is still key the application for Grids. Consequently, the main resources in

    a Grid are the previously considered HPC machines with their local RMS

    I li i G id R

  • 8/13/2019 Europar Tutorial GRMSr

    25/153

    25

    Implications to Grid Resource

    Management

    Several security-related issues have to be considered: authentication,authorization,accounting

    who has access to a certain resource?

    what information can be exposed to whom?

    There is lack of global information:

    what resources are when available for an activity?

    The resources are quite heterogeneous:

    different RMS in use

    individual access and usage paradigms

    administrative policies have to be considered

  • 8/13/2019 Europar Tutorial GRMSr

    26/153

    26

    Scope of Grids

    Cluster Grid Enterprise Grid Global Grid

    Source: Ian Foster

  • 8/13/2019 Europar Tutorial GRMSr

    27/153

    27

    Resource Management Layer

    Grid Resource Management System consists of : Local resource management system (Resource Layer)

    Basic resource management unit

    Provide a standard interface for using remote resources

    e.g. GRAM, etc.

    Global resource management system (Collective Layer) Coordinate all Local resource management system within multiple or distributed

    Virtual Organizations (VOs)

    Provide high-level functionalities to efficiently use all of resources

    Job Submission

    Resource Discovery and Selection

    Scheduling Co-allocation

    Job Monitoring, etc.

    e.g. Meta-scheduler, Resource Broker, etc.

  • 8/13/2019 Europar Tutorial GRMSr

    28/153

    28

    Grid Middleware

    Application

    FabricControl local execution

    ConnectivityCommunication with internalresource functions and services

    ResourceAdd resource: Negotiate access,control access and utilization

    Collective

    Coordination of severalresources: infrastructureservices, application services

    Internet

    Transport

    Application

    Link

    InternetProtocolArchitecture

    Source: Ian Foster

  • 8/13/2019 Europar Tutorial GRMSr

    29/153

    29

    Grid Middleware (2)

    Resource

    Broker

    Grid Resource

    Manager

    Grid Resource

    Manager

    Grid Resource

    Manager

    Information

    Services

    Monitoring

    Services

    SecurityServices

    Core Grid

    Infrastructure Services

    Grid Middleware

    PBS LSF

    Resource Resource Resource

    Local ResourceManagement

    Higher-Level

    Services

    User/

    Application

  • 8/13/2019 Europar Tutorial GRMSr

    30/153

    30

    Globus Grid Middleware

    Globus Toolkit common source for Grid middleware

    GT2

    GT3Web/GridService-based

    GT4WSRF-based

    GRAM is responsible for providing a service for a given job

    specification that can:

    Create an environment for a job

    Stage files to/from the environment

    Submit a job to a local scheduler

    Monitor a job

    Send job state change notifications

    Stream a jobs stdout/err during execution

  • 8/13/2019 Europar Tutorial GRMSr

    31/153

  • 8/13/2019 Europar Tutorial GRMSr

    32/153

    32

    Globus GT2 Execution

    User/Application Resource Broker

    Resource

    Allocation

    PBS LSF

    Resource Resource Resource

    GRAM GRAM GRAM

    MDS

    RSL

    Specialized

    RSL

    RSL

  • 8/13/2019 Europar Tutorial GRMSr

    33/153

    33

    RSL

    Grid jobs are described in the resource specificationlanguage (RSL)

    RSL Version 1 is used in GT2

    It has an LDAP filter-like syntax that supports boolean

    expressions:

    Example:

    & (executable = a.out)(directory = /home/nobody )

    (arguments = arg1 "arg 2")(count = 1)

  • 8/13/2019 Europar Tutorial GRMSr

    34/153

    34

    Globus Job States

    suspended

    pending stage-in active done

    failed

    stage-out

  • 8/13/2019 Europar Tutorial GRMSr

    35/153

    35

    Globus GT3

    With transition to Web/Grid-Services, the job management becomes the Master Managed Job Factory Service (MMJFS)

    the Managed Job Factory Service (MJFS)

    the Managed Job Service (MJS)

    The client contacts the MMJFS

    MMJFS informs the MJFS to create a MJS for the job

    The MJS takes care of managing the job actions.

    interact with local scheduler

    file staging store job status

  • 8/13/2019 Europar Tutorial GRMSr

    36/153

    36

    Globus GT3 Job Execution

    User/Application

    File Streaming

    Factor Service

    Managed

    Job Service

    Master Managed

    Job Factory Service

    Resource Information

    Provider Service

    Managed

    Factory Job Service

    Local Scheduler

    File Streaming

    Service

    Globus as a toolkit does not perform scheduling and automatic

    resource selection

    Example: Extending the Globus

  • 8/13/2019 Europar Tutorial GRMSr

    37/153

    37

    Providers

    ResourcePreferenceProvider

    Example: Extending the Globus

    Architecture at KAIST

    JobSubmission

    Service

    Client

    JobMonitoring

    Service

    SchedulingService

    Resource

    Selection

    Service

    ResourceInformation

    Service

    LocalResource

    MonitoringService (RIPS)

    InformationProvider

    InformationProvider

    Job MangerService (MJS)

    LocalResource

    Manager (PBS)

    ResourceReservation

    Service

    WorkflowMonitoring Information Flow

    1

    2

    9

    3 4

    57

    86

    Source: Jin-Soo Kim

  • 8/13/2019 Europar Tutorial GRMSr

    38/153

    38

    Job Description with RSL2

    The version 2 of RSL is XML-based Two namespaces are used:

    rsl: for basic types as int, string, path, url

    gram: for the elements of a job

    *GNS = http://www.globus.org/namespaces

  • 8/13/2019 Europar Tutorial GRMSr

    39/153

    39

    RSL2 Attributes

    (type = rsl:integerType) Number of processes to run (default is 1)

    (type = rsl:integerType)

    On SMP multi-computers, number of nodes to distribute the count processes

    across

    count/hostCount = number of processes per host

    (type = rsl:stringType)

    Queue into which to submit job

    (type = rsl:longType)

    Maximum wall clock runtime in minutes

    (type = rsl:longType)

    Maximum CPU runtime in minutes

    (type = rsl:longType)

    Only applies if above are not used

    Maximum wall clock or cpu runtime (schedulerss choice) in minutes

  • 8/13/2019 Europar Tutorial GRMSr

    40/153

    40

    Job Submission Tools

    GT 3 provides the Java class GramClient GT 2.x: command line programs for job submission

    globus-job-run: interactive jobs

    globus-job-submit: batch jobs

    globusrun: takes RSL as input

  • 8/13/2019 Europar Tutorial GRMSr

    41/153

    41

    Globus 2 Job Client Interface

    A multirequest specifies multiple resources for a job

    globus-job-run -dumprsl -: host1 /bin/uname -a \-: host2 /bin/uname a+ ( &(resourceManagerContact="host1")(subjobStartType=strict-barrier) (label="subjob 0")

    (executable="/bin/uname") (arguments= "-a"))( &(resourceManagerContact="host2")(subjobStartType=strict-barrier)(label="subjob 1")(executable="/bin/uname") (arguments= "-a"))

    A simple job submission requiring 2 nodes:globus-job-run np 2 s myprog arg1 arg2

  • 8/13/2019 Europar Tutorial GRMSr

    42/153

    42

    Globus 2 Job Client Interface

    The full flexibility of RSL is available through the command line toolglobusrun

    Support for file staging: executable and stdin/stdout

    Example:

    globusrun -o r hpc1.acme.com/jobmanager-pbs

    '&(executable=$(HOME)/a.out) (jobtype=single)

    (queue=time-shared)

    Problem: Job Submission

  • 8/13/2019 Europar Tutorial GRMSr

    43/153

    43

    Problem: Job Submission

    Descriptions differ

    The deliverables of the GGF Working Group JSDL:

    A specification for an abstract standard Job Submission Description

    Language(JSDL) that is independent of language bindings, including;

    the JSDL feature set and attribute semantics,

    the definition of the relationship between attributes,

    and the range of attribute values.

    A normative XML Schemacorresponding to the JSDL specification.

    A document of translation tablesto and from the scheduling languages of a

    set of popular batch systems for both the job requirements and resource

    description attributes of those languages, which are relevant to the JSDL.

  • 8/13/2019 Europar Tutorial GRMSr

    44/153

    44

    JSDL Attribute Categories

    The job attribute categories will include:

    Job Identity Attributes

    ID, owner, group, project, type, etc.

    Job Resource Attributes

    hardware, software, including applications, Web and Grid Services, etc.

    Job Environment Attributes

    environment variables, argument lists, etc.

    Job Data Attributes

    databases, files, data formats, and staging, replication, caching, and disk

    requirements, etc.

    Job Scheduling Attributes start and end times, duration, immediate dependencies etc.

    Job Security Attributes

    authentication, authorisation, data encryption, etc.

    Problem: Resource Management Systems

  • 8/13/2019 Europar Tutorial GRMSr

    45/153

    45

    Problem: Resource Management Systems

    DifferAcross Each Component

    Interface Format Execution Environment Platform Mix

    LSFHas API plus Batch Utilities via

    LSF Scripts

    User: Local disk exported

    System: Remote initialized (option)Unix, Windows

    Grid EngineGDI API Interface plus

    Command line interface

    System: Remote initialized, with SGE local

    variables exportedUnix only

    PBSAPI (script option)

    Batch Utilities via PBS Scripts

    System: Remote initialized, with PBS local

    variables exportedUnix only

    Application

    Scheduler

    Task definition

    Submit

    Task Analysis

    Executing Host

    Staging

    Task

    Source: Hrabri Rajic

  • 8/13/2019 Europar Tutorial GRMSr

    46/153

    46

    GGF-WG DRMAA

    GGF Working Group Distributed Resource ManagementApplication API

    From the charter: Develop an API specification for the submission and

    control of jobs to one or more Distributed ResourceManagement (DRM) systems.

    The scope of this specification is all the high levelfunctionality which is necessary for an application to

    consign a job to a DRM system including commonoperations on jobs like termination or suspension.

    The objective is to facilitate the direct interfacing ofapplications to today's DRM systems by application'sbuilders, portal builders, and Independent SoftwareVendors (ISVs).

  • 8/13/2019 Europar Tutorial GRMSr

    47/153

    47

    DRMAA State Diagram

    The remote job could be in followingstates:

    system hold

    user hold

    system and user holdsimultaneously

    queued active

    system suspended

    user suspended

    system and user suspended

    simultaneously running

    finished (un)successfully

    Source: Hrabri Rajic

  • 8/13/2019 Europar Tutorial GRMSr

    48/153

    48

    Example: Condor-G

    Condor-G is a Condor system enhanced to manage Globus jobs.

    It provides two main features

    Globus Universe:interface for submitting, queuing and monitoring jobs that use Globus

    resources

    GlideIn:system for efficient execution of jobs on remote Globus resources

    Condor-G runs as a personal Condor system daemons run as non-privileged user processes

    each user runs her/his Condor-G

  • 8/13/2019 Europar Tutorial GRMSr

    49/153

    49

    Condor-G: GlideIn

    Globus is used to run the Condor daemons on Grid resources Condor daemons run as a Globusmanaged job

    GRAM service starts daemons rather than the Condor jobs

    When the resources run these GlideIn jobs, they will join the personal

    Condor pool

    These daemons can be used to launch a job from Condor-G to aGlobus resource

    Jobs are submitted as Condor jobs and they will be matched and run

    on the Grid resources

    the daemons receive jobs from the users Condor queue

    combines the benefits of Globus and Condor

  • 8/13/2019 Europar Tutorial GRMSr

    50/153

    50

    Using GlideIn

    Schedd JobManager

    LSF

    Startd

    Collector

    Condor-G Grid Resource

    GridManager

    600 Condorjobs

    glide-ins

    Source: ANL/USC ISI

    E l DAGM

  • 8/13/2019 Europar Tutorial GRMSr

    51/153

    51

    Example: DAGMan

    Directed Acyclic Graph Manager DAGMan allows you to specify the dependenciesbetween your Condor-G

    jobs, so it can managethem automatically for you.

    (e.g., Dont run job B until job A has completed successfully.)

    A DAG is defined by a .dag file, listing each of its nodes and their

    dependencies:# diamond.dagJob A a.subJob B b.subJob C c.subJob D d.subParent A Child B C

    Parent B C Child D

    each node will run the Condor-G job specified by its accompanying Condor

    submit file

    Job A

    Job B Job C

    Job D

    Source: Miron Livny

    University Dortmund

  • 8/13/2019 Europar Tutorial GRMSr

    52/153

    Grid Scheduling

    How to select resources in the Grid?

    Diff L l f S h d li

  • 8/13/2019 Europar Tutorial GRMSr

    53/153

    53

    Different Level of Scheduling

    Resource-level scheduler low-level scheduler, local scheduler, local resource manager

    scheduler close to the resource, controlling a supercomputer, cluster, ornetwork of workstations, on the same local area network

    Examples: Open PBS, PBS Pro, LSF, SGE

    Enterprise-level scheduler Scheduling across multiple local schedulers belonging to the same

    organization

    Examples: PBS Pro peer scheduling, LSF Multicluster

    Grid-level scheduler

    also known as super-scheduler, broker, community scheduler Discovers resources that can meet a jobs requirements

    Schedules across lower level schedulers

    G id L l S h d l

  • 8/13/2019 Europar Tutorial GRMSr

    54/153

    54

    Grid-Level Scheduler

    Discovers & selects the appropriate resource(s) for a job If selected resources are under the control of several local

    schedulers, a meta-scheduling action is performed

    Architecture:

    Centralized: all lower level schedulers are under the control of a singleGrid scheduler

    not realistic in global Grids

    Distributed: lower level schedulers are under the control of several gridscheduler components; a local scheduler may receive jobs from several

    components of the grid scheduler

    G id S h d li

  • 8/13/2019 Europar Tutorial GRMSr

    55/153

    55

    Grid Scheduling

    Scheduler

    Schedule

    time

    Job-Queue

    Machine 1

    Scheduler

    Schedule

    time

    Job-Queue

    Machine 2

    Scheduler

    Schedule

    time

    Job-Queue

    Machine 3

    Grid-SchedulerGrid User

    A ti iti f G id S h d l

  • 8/13/2019 Europar Tutorial GRMSr

    56/153

    56

    Activities of a Grid Scheduler

    GGF Document:10 Actions of Super

    Scheduling (GFD-I.4)1. Authorization Filtering

    3. Min. Requirement Filtering

    2. Application Definition

    Phase One-Resource Discovery

    5. System Selection

    4. Information Gathering

    Phase Two - System Selection

    7. Job Submission

    6. Advance Reservation

    9. Monitoring Progress

    8. Preparation Tasks

    11. Clean-up Tasks

    10 Job Completion

    Phase Three- Job Execution

    Source: Jennifer Schopf

    G id S h d li

  • 8/13/2019 Europar Tutorial GRMSr

    57/153

    57

    Grid Scheduling

    A Grid scheduler allows the user to specify the requiredresources and environment of the job without having toindicate the exact location of the resources

    A Grid scheduler answers the question: to which local resourcemanger(s) should this job be submitted?

    Answering this question is hard:

    resources may dynamically join and leave a computational grid

    not all currently unused resources are available to grid jobs:

    resource owner policies such as maximum number of grid jobs

    allowed

    it is hard to predict how long jobs will wait in a queue

    S l t R f E ti

  • 8/13/2019 Europar Tutorial GRMSr

    58/153

    58

    Select a Resource for Execution

    Most systems do not provide advance information about future jobexecution

    user information not accurate as mentioned before

    new jobs arrive that may surpass current queue entries due to higherpriority

    Grid scheduler might consider current queue situation,however this does not give reliable information for future executions:

    A job may wait long in a short queue while it would have been executedearlier on another system.

    Available information:

    Grid information service gives the state of the resources and possiblyauthorization information

    Prediction heuristics: estimate jobs wait time for a given resource, basedon the current state and the jobs requirements.

    S l ti C it i

  • 8/13/2019 Europar Tutorial GRMSr

    59/153

    59

    Selection Criteria

    Distribute jobs in order to balance load across resources not suitable for large scale grids with different providers

    Data affinity: run job on the resource where data is located

    Use heuristics to estimate job execution time.

    Best-fit: select the set of resources with the smallest capabilities and

    capacities that can meet jobs requirements

    Quality of Service of

    a resource or

    its local resource management system what features has the local RMS?

    can they be controlled from the Grid scheduler?

    S h d li Att ib t

  • 8/13/2019 Europar Tutorial GRMSr

    60/153

    60

    Scheduling Attributes

    Working Group in the Global Grid Forum toDefine the attributes of a lower-level scheduling instance that can be exploited

    by a higher-level scheduling instance.

    The following attributes have been defined:

    Attributes of allocation properties

    Revocation of an allocation The local scheduler reserves the right to withdraw a given allocation.

    Guaranteed completion time of allocation,

    A deadline for job completion is provided by the local scheduler.

    Guaranteed Number of Attempts to Complete a Job A local scheduler will retry a given job task, e.g. useful for data transfer actions.

    Allocations run-to-completion A job is not preempted if it has been started

    Exclusive Allocations

    A job has exclusive access to the given resources; e.g. no time-sharing is performed

    Malleable Allocations

    The given resource set may change during runtime; e.g. a computational job will gain(moldable job) or lose processors

    S h d li Att ib t (2)

  • 8/13/2019 Europar Tutorial GRMSr

    61/153

    61

    Scheduling Attributes (2)

    Attributes of available information Access to tentative schedule

    The local scheduler exposes his schedule of future allocations

    Option: Only the projected start time of a specified allocation is available

    Option: Only partial information on the current schedule is available

    Exclusive control

    The local scheduler is exclusively in charge of the resources; no other jobs can appear onthe resources

    Event Notification

    The local scheduler provides an event subscription service

    Attributes for manipulating allocation execution

    Preemption

    Checkpointing

    Migration

    Restart

    Scheduling Attributes (3)

  • 8/13/2019 Europar Tutorial GRMSr

    62/153

    62

    Scheduling Attributes (3)

    Attributes for requesting resources Allocation Offers

    The local system can provides an interface to request offers for an allocation

    Allocation Cost or Objective Information

    The local scheduler can provide cost or objective information

    Advance Reservation

    Allocations can be reserved in advance

    Requirement for Providing Maximum Allocation Length in Advance

    The higher-level scheduler must provide a maximum job execution length

    Deallocation Policy

    A policy applies for the allocation that must be met to stay valid

    Remote Co-Scheduling

    A schedule can be generated by a higher-level instance and imposed on the localscheduler

    Consideration of Job Dependencies

    The local scheduler can deal with dependency information of jobs; e.g. for workflows

    CSFCommunity Scheduler

  • 8/13/2019 Europar Tutorial GRMSr

    63/153

    63

    y

    Framework

    An open source implementation of OGSA-based metascheduler forVOs.

    Supports emerging WS-Agreement spec

    Supports GT GRAM

    Fills in gaps in existing resource management picture

    Contributed from platform to the Globus Toolkit

    Extensible, Open Source framework for implementing meta-

    schedulers

    Provides basic protocols and interfaces to help resources work

    together in heterogeneous environments

    CSF Architecture

  • 8/13/2019 Europar Tutorial GRMSr

    64/153

    64

    CSF Architecture

    Platform

    LSF User

    Globus

    Toolkit User

    Platform LSF

    LSF

    Meta-

    schedulerPlugin

    SGE PBS

    Grid Service Hosting

    Environment

    Job

    Service

    Reservation

    Service

    Meta-SchedulerGlobal

    Information

    Service

    RIPS

    GRAM

    SGE RIPS

    GRAM

    PBS RIPS

    RM

    Adapter

    RIPS = Resource Information

    Provider Service

    Queuing

    Service

    Source: Chris Smith

    Global Information Service

  • 8/13/2019 Europar Tutorial GRMSr

    65/153

    65

    Global Information Service

    Global Information Service

    RIPS DB

    Index Service

    RegistrySD

    Aggregator

    SDProvider

    Manager

    Data

    Storage

    Rsrc

    Info

    Req

    Cluster

    Info

    Req

    Data

    Store

    Req

    Data

    Load

    Req

    Cluster

    Register Rsrc,

    job rsv

    info

    RIPS

    Data

    Store

    Load

    Job

    Info

    Rsv

    Info

    Source: Chris Smith

    Support for Virtual Organizations

  • 8/13/2019 Europar Tutorial GRMSr

    66/153

    66

    User of VO A

    Virtual

    Organization A

    Organization 1Organization 2

    Organization 3

    Community

    SchedulerCommunity

    Scheduler

    VirtualOrganization B

    User of VO B

    Support for Virtual Organizations

    Source: Chris Smith

    CSF Grid Services

  • 8/13/2019 Europar Tutorial GRMSr

    67/153

    67

    CSF Grid Services

    Job Service: creates, monitors and controls compute jobs

    Reservation Service:

    guarantees resources are available for running a job

    Queuing Service:

    provides a service where administrators can customize and definescheduling policies at the VO level and/or at the different resource

    manager level

    Defines an API for plug in schedulers

    RM Adaptor Service: provides a Grid service interface that bridges the Grid service protocol

    and resource managers (LSF, PBS, SGE, Condor and other RMs)

    GT3 Job Submission /

  • 8/13/2019 Europar Tutorial GRMSr

    68/153

    68

    Architecture

    Site AMMJFS on node1

    SGE

    MJS for SGE

    MMJFS

    RIPS

    Index Service

    PBS

    MJS for PBS

    MMJFS

    RIPS

    LSF

    MJS for LSF

    MMJFS

    RIPS

    managed-job-

    globusrun

    managed-job-

    globusrun

    managed-job-

    globusrun

    Site BMMJFS on node2

    Site CMMJFS on node3

    MMJFS = Master Managed Job Factory Service

    MJS = Managed Job ServiceBlueindicates a Grid Service hosted in a GT3 container

    Source: Chris Smith

    GT3 + CSF Architecture

  • 8/13/2019 Europar Tutorial GRMSr

    69/153

    69

    GT3 + CSF Architecture

    Queuing

    Service

    Job

    Service

    Reservation

    Service

    Virtual

    Organization

    Index Service

    PBS

    Site B

    RM Adapter

    for PBS

    RIPS

    LSF

    Site A

    RM Adapterfor LSF

    RIPS

    SGE

    Site C

    MMJFS/MJS

    RIPS

    Source: Chris Smith

    Queue Service

  • 8/13/2019 Europar Tutorial GRMSr

    70/153

    70

    Queue Service

    In CSF, Job Service instances are submitted to theQueue Service for dispatch to a resource manager.

    The Queue Service provides a plug in API for extending

    the scheduling algorithms provided by default with CSF.

    The Queue Service is responsible for: loads and validates configuration information

    loads all configured scheduler plugins

    calls the plugin API functions

    schedInit() after loading the plugin successfully

    schedOrder() when a new job is submitted

    schedMatch() during the scheduling cycle

    schedPost() before the scheduling cycle ends, and after scheduling

    decisions are sent to the job service instances

  • 8/13/2019 Europar Tutorial GRMSr

    71/153

  • 8/13/2019 Europar Tutorial GRMSr

    72/153

    Co-allocation

  • 8/13/2019 Europar Tutorial GRMSr

    73/153

    73

    Co-allocation

    It is often requested that several resources are used for a single job. that is, a scheduler has to assure that all resources are available when

    needed.

    in parallel (e.g. visualization and processing)

    with time dependencies (e.g. a workflow)

    The task is especially difficult if the resources belong to different

    administrative domains.

    The actual allocation time must be known for co-allocation

    or the different local resource management systems must synchronize

    each other (wait for availability of all resources)

    Example Multi-Site Job Execution

  • 8/13/2019 Europar Tutorial GRMSr

    74/153

    74

    Example Multi-Site Job Execution

    Scheduler

    Schedule

    time

    Job-Queue

    Machine 2

    Scheduler

    Schedule

    time

    Job-Queue

    Machine 3

    A job uses several resources at different sites in parallel.

    Network communication is an issue.

    Scheduler

    Schedule

    time

    Job-Queue

    Machine 1

    Grid-Scheduler

    Multi-Side Job

    Advanced Reservation

  • 8/13/2019 Europar Tutorial GRMSr

    75/153

    75

    Advanced Reservation

    Co-allocation and other applications require a priori information aboutthe precise resource availability

    With the concept of advanced reservation, the resource provider

    guarantees a specified resource allocation.

    includes a two- or three-phase commit for agreeing on the reservation

    Implementations:

    GARA/DUROC/SNAP provide interfaces for Globus to create advanced

    reservation

    implementations for network QoS available.

    setup of a dedicated bandwidth between endpoints

    Limitations of current Grid RMS

  • 8/13/2019 Europar Tutorial GRMSr

    76/153

    76

    Limitations of current Grid RMS

    The interaction between local scheduling and higher-level Gridscheduling is currently a one-way communication

    current local schedulers are not optimized for Grid-use

    limited information available about future job execution

    a site is usually selected by a Grid scheduler and the job enters the

    remote queue.

    The decision about job placement is inefficient.

    Actual job execution is usually not known

    Co-allocation is a problem as many systems do not provide advance

    reservation

    Example of Grid Scheduling

  • 8/13/2019 Europar Tutorial GRMSr

    77/153

    77

    Decision Making

    Scheduler

    Schedulet

    ime

    Job-Queue

    Machine 1

    Scheduler

    Schedulet

    ime

    Job-Queue

    Machine 2

    Scheduler

    Schedulet

    ime

    Job-Queue

    Machine 3

    Grid-SchedulerGrid User

    15 jobs running

    20 jobs queued

    5 jobs running

    2 jobs queued

    40 jobs running

    80 jobs queued

    Where to put the Grid job?

    Available Information from the

  • 8/13/2019 Europar Tutorial GRMSr

    78/153

    78

    Local Schedulers

    Decision making is difficult for the Grid scheduler limited information about local schedulers is available

    available information may not be reliable

    Possible information:

    queue length, running jobs detailed information about the queued jobs

    execution length, process requirements,

    tentative schedule about future job executions

    These information are often technically not provided by the localscheduler

    In addition, these information may be subject to privacy concerns!

    Consequence

  • 8/13/2019 Europar Tutorial GRMSr

    79/153

    79

    Consequence

    Consider a workflow with 3 short steps (e.g. 1 minute each) thatdepend on each other

    Assume available machines with an average queue length of 1 hour.

    The Grid scheduler can only submit the subsequent step if theprevious job step is finished.

    Result:

    The completion time of the workflow may be larger than 3 hours(compared to 3 minutes of execution time)

    Current Grids are suitable for simple jobs, but still quite inefficient in

    handling more complex applications

    Need for better coordination of higher- and lower-level scheduling!

    University Dortmund

  • 8/13/2019 Europar Tutorial GRMSr

    80/153

    GRMS in

    Next Generation Grids

    Outlook on future Grid ResourceManagement and Scheduling

    Example Grid Scenario

  • 8/13/2019 Europar Tutorial GRMSr

    81/153

    81

    Example Grid Scenario

    Remote CenterReads and Generates TB of Data

    LAN/WAN Transfer

    WAN Transfer ComputeResources

    Visualization

    Assume a data-intensive simulation

    that should be visualized and

    steered during runtime!

    Resource Request of a Simple

    G id J b

  • 8/13/2019 Europar Tutorial GRMSr

    82/153

    82

    Grid Job

    A specified architecture with

    48 processing nodes,

    1 GB of available memory, and

    a specified licensed software package

    for 1 hour between 8am and 6pm of the following day

    Time must be known in advance.

    A specific visualization device during program execution

    Minimum bandwidth between the VR device and the main computer during program

    execution

    Input: a specified data set from a data repository

    at most 4

    preference of cheaper job execution over an earlier execution.

    actually a pretty simple example (no complex workflows)

    Use-Case Example:Coordinated

    Si l ti d Vi li ti

  • 8/13/2019 Europar Tutorial GRMSr

    83/153

    83

    Simulation and Visualization

    Expected output of a Grid scheduler:

    time

    Data Transfer

    Loading Data Parallel Computation Providing Data

    Data TransferNetwork 1

    Computer 1

    Parallel ComputationComputer 2

    Communication for Computation

    Network 3

    VR-Cave Visualization

    Data Data Access Storing Data

    Communication for Visualization

    Network 2

    Software UsageSoftware License

    Data StorageStorage

    resources

    Reservations are

    necessary!

    Need for a Grid Scheduling

    A hit t

  • 8/13/2019 Europar Tutorial GRMSr

    84/153

    84

    Architecture

    Grid-

    Scheduler

    Grid-

    User/Application

    Information

    Services

    Monitoring

    Services

    Security

    Services

    Accounting/Billing

    Other

    Grid ServicesLocal RMS

    Schedule

    time

    Other

    Resources/

    Services

    Schedule

    time

    Schedule

    time

    Schedule

    time

    Grid

    Middleware

    Local RMS

    Grid

    Middleware

    Local RMS

    Grid

    Middleware

    Local RMS

    Grid

    Middleware

    Data

    Resources

    Compute

    Resources

    Network

    Resources

  • 8/13/2019 Europar Tutorial GRMSr

    85/153

    Service Oriented Architectures

  • 8/13/2019 Europar Tutorial GRMSr

    86/153

    86

    Service Oriented Architectures

    Services are used to abstract all resources and functionalities. Concept of OGSI and WSRF

    using WebServices, SOAP, XML to implement the services

    OGSI idea of GridServices is implemented in GT3

    transition to WSRF with GT4

    Core service for building a Grid are discussed in the Open Grid

    Services Architecture (OGSA)

    Open Grid Services Architecture

  • 8/13/2019 Europar Tutorial GRMSr

    87/153

    87

    Web Services: Basic Functionality

    OGSA

    OGSI: Interface to Grid Infrastructure

    Applications in Problem Domain X

    Compute, Data & Storage Resources

    Distributed

    Application & Integration Technology for Problem Domain X

    Users in Problem Domain X

    Virtual Integration Architecture

    Generic Virtual Service Access and Integration Layer

    -

    Structured DataIntegration

    Structured Data Access

    Structured DataRelational XML Semi-structured

    Transformation

    Registry

    Job Submission

    Data Transport Resource Usage

    Banking

    Brokering Workflow

    Authorisation

    p

    OGSA Outlook

  • 8/13/2019 Europar Tutorial GRMSr

    88/153

    88

    Context

    Services

    Status +

    MonitoringServices

    Infrastructure

    Services

    Security

    Services

    Resource

    Management

    Services

    Job &

    Workflow

    Management

    Data

    Services

    Policy +

    Agreement

    Virtual

    Organization

    Data Access

    Data Integration

    Data Provision

    Data Catalog

    FirewallTransition

    Delegation

    Authorization

    Authentication

    WS-RF

    (OGSI)Notification

    WS

    DistributedManagement

    Event

    Problem

    Determination

    Information

    Service

    Job

    Manager

    Job

    Service

    Logging

    Service

    Broker

    Execution

    Planning

    Service

    Workflow

    Manager

    Workload

    Manager

    Provisioning

    Application

    Content

    Manager

    Deploy +Configuration

    Service

    Service

    Container

    Reservation

    Service

    OGSA Execution Planning

  • 8/13/2019 Europar Tutorial GRMSr

    89/153

    89

    g

    Work load Mgmt.

    Framework

    User/Job

    Proxies

    Environment

    Mgmt.

    Policies

    SupplyDemand

    CMM

    Resource Mgmt. Framewo rk

    Reservation

    Opt imiz ing Framework

    Resource Selection

    Resource Workload

    Optimal Mapping

    Workload Optimization

    Workload Post Balancing

    Resource Provisioning

    Work load Opt imiz ing Framework

    Workload Optimization

    Workload Orchestration

    Workload Models (History/Prediction)

    Dependency management

    Scheduling Resource Opt imiz ing Framework

    Capacity Management

    Resource Placement

    Primary Interaction

    Meta - Interaction

    Represents one or more OGSA services

    Resource

    Allocation

    (or Binding)

    Job

    Factory

    Admission Control (Resources)

    Admission Control (Workload)

    SLA Management (Workload)

    Quality of Service (Resources)

    Queuing Services

    Resource

    Factory

    Information Provider

    Selection Context (e.g. VO)

    Functional Requirements

    for Grid Scheduling

  • 8/13/2019 Europar Tutorial GRMSr

    90/153

    90

    for Grid Scheduling

    Functional Requirements:

    Cooperation between different resource providers

    Interaction with local resource management systems

    Support for reservations and service level agreements

    Orchestration of coordinated resources allocations

    Automatic handling of accounting and billing

    Distributed Monitoring

    Failure Transparency

    What are Basic Blocks for

    a Grid Scheduling Architecture?

  • 8/13/2019 Europar Tutorial GRMSr

    91/153

    91

    a Grid Scheduling Architecture?

    Scheduling

    Service

    Data Management

    Service

    Network Management

    Service

    Information Service

    Resources

    Data

    Network

    Network-Resources

    Management System

    Network

    Network Manager

    Management

    System

    Compute/ Storage /Visualization etc

    Compute Manager Data Manager

    Data-Resources

    Query for resources

    Maintain information

    Maintain information

    static &

    scheduled/forecasted

    Reservation

    Accounting

    and Billing

    Service

    Job

    SupervisorService

    Scheduling-relevant Interfaces

    of Basic Blocks are still to be

    defined!

    Information Service

    Resource Discovery

  • 8/13/2019 Europar Tutorial GRMSr

    92/153

    92

    Resource Discovery

    Relevant for Grid Scheduling: Access to static and dynamic information

    Dynamic information include data about planned or forecasted future

    events

    e.g.: existing reservations, scheduled tasks, future availabilities

    need for anonymous and limited information (privacy concerns)

    Information about all resource types

    including e.g. data and network

    future reservation, data transfers etc.

    Job/Workflow Description

    Requirement Description

  • 8/13/2019 Europar Tutorial GRMSr

    93/153

    93

    Requirement Description

    Information about the job specifics (what is the job) and job requirements (what is required for the job)

    including data access and creation

    Need for common workflow description

    e.g. a DAG formulation

    include static and dynamic dependencies

    need for the ability to extract workflow information to schedule a whole

    workflow in advance

    Reservation Management

    Agreement and Negotiation

  • 8/13/2019 Europar Tutorial GRMSr

    94/153

    94

    Agreement and Negotiation

    Interaction between scheduling instances, betweenresource/agreement providers and agreement initiators (higher-level

    scheduler)

    access to tentative information necessary

    negotiations might take very long

    individual scheduling objectives to be considered probably market-oriented and economic scheduling needed

    Need for combining agreements from different providers

    coordinate complex resource requests or workflows

    Maintain different negotiations at the same time

    probable several levels of negotiations, agreement commitment andreservation

    Accounting and Billing

  • 8/13/2019 Europar Tutorial GRMSr

    95/153

    95

    g g

    Interaction to budget information Charging for allocations, reservations;

    preliminary allocation of budgets

    Concepts for reliable authorization of Grid schedulers to spend

    money on behalf of the user

    Re-funding in terms of resource/SLA failure, re-scheduling etc.

    Reliable monitoring and accounting

    required for tracing whether a party fulfilled an agreement

    Monitoring Services

  • 8/13/2019 Europar Tutorial GRMSr

    96/153

    96

    Monitoring of resource conditions

    agreements

    schedules

    program execution

    SLA conformance workflow

    Monitoring must be reliable as it is part of accountability

    fail or fulfillment of a service/resource provider must be clearly identifiable

    Conclusions for Grid Scheduling

  • 8/13/2019 Europar Tutorial GRMSr

    97/153

    97

    Grids ultimately require coordinated scheduling services.

    Support for different scheduling instances

    different local management systems

    different scheduling algorithms/strategies

    For arbitrary resources

    not only computing resources, also

    data, storage, network, software etc.

    Support for co-allocation and reservation

    necessary for coordinated grid usage

    (see data, network, software, storage)

    Different scheduling objectives cost, quality, other

    Scheduling Model

  • 8/13/2019 Europar Tutorial GRMSr

    98/153

    98

    Using a Brokerage/Trading strategy:

    Submit Grid

    Job Description

    Analyze Query

    Query for

    Allocation Offers

    Higher-level

    scheduling

    Lower-levelscheduling

    Discover ResourcesSelect Offers

    Collect Offers

    Coordinate Allocations

    Generate

    Allocation Offer

    Consider individual owner

    policies

    Consider individual userpolicies

    Properties ofMulti Level Scheduling Model

  • 8/13/2019 Europar Tutorial GRMSr

    99/153

    99

    Multi-Level Scheduling Model

    Multi-level scheduling must support different RM systems andstrategies.

    Provider can enforce individual policies in generating resource

    offers.

    User receive resource allocation optimized to the individualobjective

    Different higher-level scheduling strategies can be applied.

    Multiple levels of scheduling instances are possible

    Support for fault-tolerant and load-balanced services.

    Negotiation in Grids

  • 8/13/2019 Europar Tutorial GRMSr

    100/153

    100

    Multilevel Grid scheduling architecture Lower level local scheduling instance

    Implementation of owner policies

    Higher level Grid scheduling instance

    Resource selection and coordination

    (Static) Interface definition between both instances Different types of resources

    Different local scheduling systems with different properties

    Different owner policies

    (Dynamic) Communication between both instances

    Resource discovery Job monitoring

    Using Service Level Agreements

  • 8/13/2019 Europar Tutorial GRMSr

    101/153

    101

    The mapping of jobs to resources can be abstracted using theconcept of Service Level Agreement (SLAs) (Czajkowski, Foster,Kesselman & Tuecke)

    SLA: Contract negotiated between

    resource provider, e.g. local scheduler

    resource consumer, e.g., grid scheduler, application SLAs provide a uniform approach for the client to

    specify resource and QoS requirements, while

    hiding from the client details about the resources,

    such as queue names and current workload

    Service Level Agreement Types

  • 8/13/2019 Europar Tutorial GRMSr

    102/153

    102

    Resource SLA (RSLA)A promise of resource availability

    Client must utilize promise in subsequent SLAs

    Advance Reservation is an RSLA

    Task SLA (TSLA)

    A promise to perform a task Complex task requirements

    May reference an RSLA

    Binding SLA (BSLA) Binds a resource capability to a TSLA

    May reference an RSLA (i.e. a reservation) May be created lazily to provision the task

    Allows complex resource arrangements

    Agreement-Based Negotiation

  • 8/13/2019 Europar Tutorial GRMSr

    103/153

    103

    A client (application) submits a task to a Grid scheduler The client negotiates a TSLA for the task with the Grid Scheduler

    In order to provision the TSLA, the Grid Scheduler mayobtain an RSLA with the Grid resource or may use a pre-existing RSLA that the Grid scheduler has negotiatedspeculatively

    TSLA that refers to an RSLA assures the jobs gets the reservedresources at a specified time

    TSLA without an RSLA tells little about when the resources will be

    available to the job

  • 8/13/2019 Europar Tutorial GRMSr

    104/153

  • 8/13/2019 Europar Tutorial GRMSr

    105/153

    GGF: GRAAP-WG

  • 8/13/2019 Europar Tutorial GRMSr

    106/153

    106

    Goal: Defining WebService-based protocols for negotiation and

    agreement management WS-Agreement Protocol:

    Towards Grid Scheduling

  • 8/13/2019 Europar Tutorial GRMSr

    107/153

    107

    Grid Scheduling Methods: Support for individual scheduling objectives and policies

    Multi-criteria scheduling models

    Economic scheduling methods to Grids

    Architectural requirements:

    Generic job description

    Negotiation interface between higher- and lower-level scheduler

    Economic management services

    Workflow management

    Integration of data and network management

    Grid Scheduling Strategies

  • 8/13/2019 Europar Tutorial GRMSr

    108/153

    108

    Current approach:

    Extension of job scheduling for parallel computers.

    Resource discovery and load-distribution to a remote resource

    Usually batch job scheduling model on remote machine

    But actually required for Grid scheduling is:

    Co-allocation and coordinationof different resource allocations for a Grid job

    Instantaneous ad-hoc allocation is not always suitable

    This complex task involves:

    Cooperation between different resource providers

    Interaction with local resource management systems Support for reservations and service level agreements

    Orchestration of coordinated resources allocations

    User Objective

  • 8/13/2019 Europar Tutorial GRMSr

    109/153

    109

    Local computingtypically has:

    A given scheduling objective as minimization of response time

    Use of batch queuing strategies

    Simple scheduling algorithms: FCFS, Backfilling

    Grid Computingrequires: Individual scheduling objective

    better resources

    faster execution

    cheaper execution

    More complex objective functions apply for individual Grid jobs!

    Provider/Owner Objective

  • 8/13/2019 Europar Tutorial GRMSr

    110/153

    110

    Local computingtypically has:

    Single scheduling objective for the whole system:

    e.g. minimization of average weighted response time

    or high utilization/job throughput

    In Grid Computing: Individual policies must be considered:

    access policy,

    priority policy,

    accounting policy, and other

    More complex objective functionsapply for individual resource

    allocations!

    User and owner policies/objectives may be subject to privacy

    considerations!

    Grid EconomicsDifferent Business Models

  • 8/13/2019 Europar Tutorial GRMSr

    111/153

    111

    Different Business Models

    Cost model

    Use of a resource

    Reservation of a resource

    Individual scheduling objective functions

    User and owner objective functions

    Formulation of an objective function Integration of the function in a scheduling algorithm

    Resource selection

    The scheduling instances act as broker Collection and evaluation of resource offers

    Scheduling Objectives in the Grid

  • 8/13/2019 Europar Tutorial GRMSr

    112/153

    112

    In contrast to local computing, there is no general scheduling

    objective anymore

    minimizing response time

    minimizing cost

    tradeoff between quality, cost, response-time etc.

    Cost and different service quality come into play the user will introduce individual objectives

    the Grid can be seen as a market where resource are concurring

    alternatives

    Similarly, the resource provider has individual scheduling policies

    Problem: the different policies and objectives must be integrated in the scheduling

    process

    different objectives require different scheduling strategies

    part of the policies may not be suitable for public exposition

    (e.g. different pricing or quality for certain user groups)

    Grid Scheduling Algorithms

  • 8/13/2019 Europar Tutorial GRMSr

    113/153

    113

    Due to the mentioned requirements in Grids its not to be expected

    that a single scheduling algorithm or strategy is suitable for all

    problems.

    Therefore, there is need for an infrastructure that

    allows the integration of different scheduling algorithms the individual objectives and policies can be included

    resource control stays at the participating service providers

    Transition into a market-oriented Grid scheduling model

    Economic Scheduling

  • 8/13/2019 Europar Tutorial GRMSr

    114/153

    114

    Market-oriented approaches are a suitable way to implement theinteraction of different scheduling layers

    agents in the Grid market can implement different policies and strategies

    negotiations and agreements link the different strategies together

    participating sites stay autonomous

    Needs for suitable scheduling algorithms and strategies for creating

    and selecting offers

    need for creating the Pareto-Optimal scheduling solutions

    Performance relies highly on the available information

    negotiation can be hard task if many potential providers are available.

    Economic Scheduling (2)

  • 8/13/2019 Europar Tutorial GRMSr

    115/153

    115

    Several possibilities for market models:

    auctions of resources/services

    auctions of jobs

    Offer-request mechanisms support:

    inclusion of different cost models, price determination individual objective/utility functions for optimization goals

    Market-oriented algorithms are considered:

    robust

    flexible in case of errors simple to adapt

    markets can have unforeseeable dynamics

  • 8/13/2019 Europar Tutorial GRMSr

    116/153

    Offer Creation (2)

  • 8/13/2019 Europar Tutorial GRMSr

    117/153

    117

    R1 R2 R3 R4 R5 R6 R7 R8

    t

    Offer 1

    End of current

    schedule

  • 8/13/2019 Europar Tutorial GRMSr

    118/153

    Offer Creation (4)

  • 8/13/2019 Europar Tutorial GRMSr

    119/153

    119

    R1 R2 R3 R4 R5 R6 R7 R8

    t

    Offer 3

    End of current

    schedule

    New end ofcurrent schedule

    Evaluate Offers

  • 8/13/2019 Europar Tutorial GRMSr

    120/153

    120

    Evaluation with utility functions

    )( 21max pricealatencyaUutil

    A utility function is a mathematical representation of a users

    preference

    The utility function may be complex and

    contain several different criteria

    Example using response time (or delay time) and price:

    Optimization Space

  • 8/13/2019 Europar Tutorial GRMSr

    121/153

    121

    latency

    price

    A1

    A1

    A3

    A3

    A2

    A2

    The utility depends on the user and provider objectives

    Example: NWIRE

  • 8/13/2019 Europar Tutorial GRMSr

    122/153

    122

    Following are some slides to illustrate the scheduling process in the

    NWIRE project.

    NWIRE models a Grid infrastructure without central services

    A P2P scheduling model is employed where Grid Scheduler/Managersthat reside on each local site are queried for offers.

    A Grid schedulers asks known other Schedulers. Similar to P2P systems, no Grid manager knows the whole grid but

    keeps a list of know other Grid managers

    Those schedulers can forward the request to other schedulers

    The scope and depths of the query can be parameterized

    Individual objective functions are used to evaluate the utility for a userand the resource provider.

    The scheduler selects the allocation offer with the highest utility value

    The whole model represents a Grid market and auction system.

    Evaluation of Economic Approach

  • 8/13/2019 Europar Tutorial GRMSr

    123/153

    123

    Real Workload Data of Grids is not yet available.

    However, available are workloads of single HPC installations.

    Example of evaluation results:

    considered: traces from Cornell Theory Center (430 nodes IBM RS/6000 SP)

    4 workload adapted from the real traces, each 10000 jobs

    Assumed Synthetic Grid machine configurations:

    Name Configuration Largest Maschine

    m64 4*64 + 6*32 + 8*8 64

    m128 4*128 128

    m256 2*256 256

    m384 1*384 + 1*64 + 4*16 384

    m512 1*512 512

    Scheduling Process

  • 8/13/2019 Europar Tutorial GRMSr

    124/153

    124

    ResourceManager ResourceManager ResourceManager

    Grid

    ManagerApplication

    Request

    RequirementsJob Attributes

    ObjectiveFunction

    1) Client sends a

    request

    Example Request

  • 8/13/2019 Europar Tutorial GRMSr

    125/153

    125

    KEY Hops {VALUE HOPS {2}}

    KEY MaxOfferNumber {VALUE MaxOfferNumber {10}}

    KEY MinMulti-Site {VALUE MinMulti-Site {10}}

    ...

    KEY Utility {

    ELEMENT 1 {

    CONDITION{((OperatingSystem EQ Linux)&&( NumberOfProcessors >= 8))}

    VALUE UtilityValue {-EndTime}

    VALUE RunTime {5}

    }

    ...

    ELEMENT 2 {

    CONDITION{ ... }VALUE UtilityValue {-JobCost}

    }

    ...

    }

    Scheduling Process

  • 8/13/2019 Europar Tutorial GRMSr

    126/153

    126

    Grid

    ManagerGrid

    Manager

    Resource

    Manager

    Resource

    Manager

    Resource

    Manager

    Grid

    ManagerApplication

    RequestRequest

    RequestRequest Request

    Scheduler requests offers from other Grid Managers/Schedulers.

    Search depths and scope can be parameterized.

    Request

    Requirements

    Job AttributesObjectiveFunction

    2) Query for

    other offers

    3) Limitations of

    direction and depth

    Scheduling Process

  • 8/13/2019 Europar Tutorial GRMSr

    127/153

    127

    Grid

    ManagerGrid

    Manager

    Resource

    Manager

    Resource

    Manager

    Resource

    Manager

    Grid

    ManagerApplication

    RequestRequest

    Offer

    AttributesObjectiveFunction

    RequestRequest Request

    Offer

    Selection of allocation according to the utility value

    Returned offers are collected.

    4) Allocation;

    Maximization of the

    objectives

    Scheduling Process

  • 8/13/2019 Europar Tutorial GRMSr

    128/153

    128

    Grid

    ManagerGrid

    Manager

    Resource

    Manager

    Resource

    Manager

    Resource

    Manager

    Grid

    ManagerApplication

    RequestRequest

    RequestRequest Request

    Offer

    The user can directly be informed about the schedule for his execution.

    The Grid Manager may re-schedule his allocations to optimize the result further.

    ScheduleAllokation_1Allokation_2Allokation_3

    5) User and Resource

    Managers are informed

    about allocations

    Offer Creation

  • 8/13/2019 Europar Tutorial GRMSr

    129/153

    129

    Zeit

    1 2 3 4 5 6 7

    Resources

    B

    C

    D

    A

    Offer Creation

  • 8/13/2019 Europar Tutorial GRMSr

    130/153

    130

    Zeit

    1 2 3 4 5 6 7

    Resources

    B

    C

    D

    A

    Offer Creation

  • 8/13/2019 Europar Tutorial GRMSr

    131/153

    131

    Zeit

    1 2 3 4 5 6 7

    Resources

    B

    C

    D

    A

  • 8/13/2019 Europar Tutorial GRMSr

    132/153

    Offer Creation

  • 8/13/2019 Europar Tutorial GRMSr

    133/153

    133

    Zeit

    1 2 3 4 5 6 7

    Resources

    B

    C

    D

    A

  • 8/13/2019 Europar Tutorial GRMSr

    134/153

    Offer Creation

  • 8/13/2019 Europar Tutorial GRMSr

    135/153

    135

    Zeit

    1 2 3 4 5 6 7

    Resources

    B

    C

    D

    A

    Offer Creation

  • 8/13/2019 Europar Tutorial GRMSr

    136/153

    136

    Zeit

    1 2 3 4 5 6 7

    Resources

    B

    C

    D

    A

    AWRT for Economic SchedulingCompared to Conservative Backfilling

  • 8/13/2019 Europar Tutorial GRMSr

    137/153

    137

    Changes in average completion time of

    market-economic vs conventional scheduler

    -15,000

    -10,000

    -5,000

    0,000

    5,000

    10,000

    M128 M256 M384

    Resource Configuration

    Changesonpct

    Conventional

    algoritms better

    Markted-oriented

    better

    Utilization

  • 8/13/2019 Europar Tutorial GRMSr

    138/153

    138

    Changes in Utilization of

    market-economic vs conventional schedulers

    -3,00

    -2,00

    -1,00

    0,00

    1,00

    2,00

    3,00

    M128 M256 M384

    Resource Configurations

    Changesinpc

    t

    Conventional

    better

    Markted-oriented

    better

  • 8/13/2019 Europar Tutorial GRMSr

    139/153

    Results on Economic Models

  • 8/13/2019 Europar Tutorial GRMSr

    140/153

    140

    Economical models provide results in the range of conventional

    algorithms in terms of AWRT

    Economic methods leave much more flexibilities in defining the

    desired resources

    Problems of site autonomy, heterogeneous resources and individualowner and user preferences are solved

    Advantage of reservation of resources ahead of time

    Further analysis needed:

    Usage of other utility functions

  • 8/13/2019 Europar Tutorial GRMSr

    141/153

  • 8/13/2019 Europar Tutorial GRMSr

    142/153

    Data Management

  • 8/13/2019 Europar Tutorial GRMSr

    143/153

    143

    Access to information about the location of data sets

    Information about transfer costs

    Scheduling of data transfers and data availability

    optimize data transfers in regards to available network bandwidth and

    storage space

    Coordination with network or other resources

    Similarities with general grid scheduling:

    access to similar services

    similar tasks to execute

    interaction necessary

    Functional View of GridData Management

  • 8/13/2019 Europar Tutorial GRMSr

    144/153

    144

    Location based on

    data attributes

    Location of one or

    more physical replicas

    State of grid resources,

    performance measurementsand predictions

    MetadataService

    Application

    Replica Location

    Service

    Information Services

    Planner:Data location,Replica selection,Selection of compute

    and storage nodes

    Security and Policy

    Executor:Initiates

    data transfers andcomputations

    Data Movement

    Data Access

    Compute Resources Storage Resources

    Source: Carl Kesselmann

    Replica Location Service InContext

  • 8/13/2019 Europar Tutorial GRMSr

    145/153

    145

    Replica Location ServiceReliable Data

    Transfer Service

    GridFTP

    Reliable Replication Service

    Replica Consistency Management Services

    Metadata

    Service

    The Replica Location Service is one component in a layered datamanagement architecture

    Provides a simple, distributed registry of mappings

    Consistency management provided by higher-level services

    Source: Carl Kesselmann

    Example of a Scheduling Process

  • 8/13/2019 Europar Tutorial GRMSr

    146/153

    146

    Scheduling Service:

    1. receives job description2. queries Information Service for static resource

    information

    3. prioritizes and pre-selects resources

    4. queries for dynamic information about resourceavailability

    5. queries Data and Network Management Services

    6. generates schedule for job

    7. reserves allocation if possibleotherwise selects another allocation

    8. delegates job monitoring to Job Supervisor

    Job Supervisor/Network and Data Management:service, monitor and initiate allocation

    Example:

    40 resources of requested type are

    found.

    12 resources are selected.

    8 resources are available.

    Network and data dependencies are

    detected.

    Utility function is evaluated.

    6thtried allocation is confirmed.

    Data/network provided and

    job is started

    Use-Case Example:CoordinatedSimulation and Visualization

  • 8/13/2019 Europar Tutorial GRMSr

    147/153

    147

    Expected output of a Grid scheduler:

    time

    Data Transfer

    Loading Data Parallel Computation Providing Data

    Data TransferNetwork 1

    Computer 1

    Parallel ComputationComputer 2

    Communication for Computation

    Network 3

    VR-Cave Visualization

    Data Data Access Storing Data

    Communication for Visualization

    Network 2

    Software UsageSoftware License

    Data StorageStorage

    resources

    Re-Scheduling

  • 8/13/2019 Europar Tutorial GRMSr

    148/153

    148

    Reconsidering a schedule with already made agreements may be a

    good idea from time to time because resource situation may have changed, or

    workload situation has changed

    Optimization of the schedule can only work with the bounds of made

    agreements and reservations given guarantees must be observed

    The schedulers can try to maximize the utility values of the overallschedule

    a Grid scheduler may negotiate with other resource providers in order toget better agreements; may cancel previous agreements

    a local scheduler may optimize the local allocations to improve theschedule.

  • 8/13/2019 Europar Tutorial GRMSr

    149/153

    Conclusion

  • 8/13/2019 Europar Tutorial GRMSr

    150/153

    150

    Resource management and scheduling is a key service in an Next

    Generation Grid. In a large Grid the user cannot handle this task.

    Nor is the orchestration of resources a provider task.

    System integration is complex but vital.

    The local systems must be enabled to interact with the Grid.

    Providing sufficient information, expose services for negotiation

    Basic research is still required in this area.

    No ready-to-implement solution is available.

    New concepts are necessary.

    Current efforts provide the basic Grid infrastructure. Higher-levelservices as Grid scheduling are still lacking.

    Future RMS systems will provide extensible negotiation interfaces

    Grid scheduling will include coordination of different resources

    Further Outlook

  • 8/13/2019 Europar Tutorial GRMSr

    151/153

    151

    Commercial Interest in the Grid may have stronger focus on non HPC-

    related aspects eCommerce-Solutions

    Dynamically migrating Application Server Load

    Inter-operation between companies

    Supply-Chain Management

    request distributed resources under certain constraints Enterprise Application Integration EAI

    dynamically discover and use available services

    The problems tackled by Grid RMS is common to many application

    fields. Grid solutions may become a key technology infrastructure

    or solutions from other areas (e.g. the WebService-Community) maysurpass Grid efforts?

    References

  • 8/13/2019 Europar Tutorial GRMSr

    152/153

    152

    Book: Grid Resource Management: State of the Art

    and Future Trends,co-editors Jarek Nabrzyski, Jennifer M. Schopf, and

    Jan Weglarz, Kluwer Publishing, 2004

    PBS, PBS pro: www.openpbs.organd

    www.pbspro.com

    LSF, CSF: www.platform.com

    Globus: www.globus.org

    Global Grid Forum: www.ggf.org, see SRM area

    http://www.openpbs.org/http://www.pbspro.com/http://www.platform.com/http://www.globus.org/http://www.ggf.org/http://www.ggf.org/http://www.globus.org/http://www.platform.com/http://www.pbspro.com/http://www.openpbs.org/
  • 8/13/2019 Europar Tutorial GRMSr

    153/153

    Contact:

    [email protected]

    Computer Engineering Institute

    University of Dortmund

    mailto:[email protected]:[email protected]