SAPA: A Domain-independent Heuristic Temporal Planner Minh B. Do & Subbarao Kambhampati Arizona...

24
SAPA: A Domain- independent Heuristic Temporal Planner Minh B. Do & Subbarao Kambhampati Arizona State University
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of SAPA: A Domain-independent Heuristic Temporal Planner Minh B. Do & Subbarao Kambhampati Arizona...

SAPA: A Domain-independent Heuristic Temporal Planner

Minh B. Do & Subbarao KambhampatiArizona State University

Buenos dias, amigos.Obviamente este es al articulo de Binh Minh.De todas maneras, yo lo convenci de que seria mejor para el usarsu tiempo en trabajar en otro articulo proximo mas que en visitarToledo, un pueblo del oeste medio en Ohio.

Yo entiendo que esta es basicamente la estrategia que Malik usopara presentar tambien el articulo de Romain.

Talk Outline

Temporal Planning and SAPAAction representation and search algorithmObjective functions and heuristics– Admissible/Inadmissible– Resource adjustment

Empirical resultsRelated & future work

Planning

Most academic research has been done in the context of classical planning:

– Already P-SPACE complete– Useful techniques are likely to be

applicable in more expressive planning problems

Real world application normally has more complex requirements: Non-instantaneous actions Temporal constraints on goals Resource consumption

Classical planning has been able to scale up to big problems recently

Can winning strategies in classical planning be applicable in more expressive environments?

Related Work

Planners that can handle similar types of temporal and resource constraints: TLPlan, HSTS, IxTexT, Zeno– Cannot scale up without domain knowledge

Planners that can handle a subset of constraints:– Only temporal: TGP

– Only resources: LPSAT, GRT-R

– Subset of temporal and resource constraints: TP4, Resource-IPP

SAPA

Forward state space planner – Based on [Bachus&Ady].– Make resource reasoning easier

Handles temporal constraintsActions with static and dynamic durationsTemporal goals with deadlinesContinuous resource consumption and productionHeuristic functions to support a variety of objective functions

Action Representation

Flying

(in-city ?airplane ?city1)

(fuel ?airplane) > 0

(in-city ?airplane ?city1) (in-city ?airplane ?city2)

consume (fuel ?airplane)

Durative with EA = SA + DA

Instantaneous effects e at time te = SA + d, 0 d DA

Preconditions need to be true at the starting point, and protected during a period of time d, 0 d DA

Action can consume or produce continuous amount of some resource

Action Conflicts:

Consuming the same resourceOne action’s effect conflictingwith other’s precondition or effect

Searching time-stamped states

Search through the space of time-stamped states

S=(P,M,,Q,t)

Set <pi,ti> of predicates pi and thetime of their last achievement ti < t.

Set of functions represent resource values.

Set of protectedpersistent conditions.

Event queue.

Time stamp of S.

Search Algorithm (cont.)Goal Satisfaction:

S=(P,M,,Q,t) G if <pi,ti> G either: <pi,tj> P, tj < ti and no event in Q deletes pi. e Q that adds pi at time te < ti.

Action Application: Action A is applicable in S if:

– All instantaneous preconditions of A are satisfied by P and M.– A’s effects do not interfere with and Q.– No event in Q interferes with persistent preconditions of A.

When A is applied to S:– S is updated according to A’s instantaneous effects.– Persistent preconditions of A are put in – Delayed effects of A are put in Q.

Flying

(in-city ?airplane ?city1)

(fuel ?airplane) > 0

(in-city ?airplane ?city1) (in-city ?airplane ?city2)

consume (fuel ?airplane)

Flying

(in-city ?airplane ?city1)

(fuel ?airplane) > 0

(in-city ?airplane ?city1) (in-city ?airplane ?city2)

consume (fuel ?airplane)

S=(P,M,,Q,t)

Heuristic Control

Temporal planners have to deal with more branchingpossibilities More critical to have good heuristic guidance

Design of heuristics depends on the objective function

Classical PlanningNumber of actionsParallel execution timeSolving time

Temporal Resource PlanningNumber of actionsMakespanResource consumptionSlack…….

In temporal Planning heuristics focus on richer obj. functions that guide both planning and scheduling

Objectives in Temporal Planning

Number of actions: Total number of actions in the plan.Makespan: The shortest duration in which we can possibly execute all actions in the solution.Resource Consumption: Total amount of resource consumed by actions in the solution.Slack: The duration between the time a goal is achieved and its deadline.– Optimize max, min or average slack values

Deriving heuristics for SAPA

We use phased relaxation approach to derive different heuristics

Relax the negative logical and resource effectsto build the Relaxed Temporal Planning Graph

Pruning a bad statewhile preservingthe completeness.

Deriving admissible heuristics:–To minimize solution’s makespan.–To maximize slack-based objective functions.

Find relaxed solution which is used as distance heuristics

Adjust the heuristic valuesusing the negative interaction

(Future work)

Adjust the heuristic valuesusing the resource consumptionInformation.

[AltAlt,AIJ2001]

Relaxed Temporal Planning Graph

Heuristics in Sapa are derived from the Graphplan-stylebi-level relaxed temporal planning graph (RTPG)

Relaxed Action:No delete effectsNo resource consumption

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tgwhile(true) forall Aadvance-time applicable in S S = Apply(A,S) if SG then Terminate{solution}

S’ = Apply(advance-time,S) if (pi,ti) G such that ti < Time(S’) and piS then Terminate{non-solution} else S = S’end while;

Heuristics directly from RTPG

For Makespan: Distance from a state S to the goals is equal to the duration between time(S) and the time the last goal appears in the RTPG.For Min/Max/Sum Slack: Distance from a state to the goals is equal to the minimum, maximum, or summation of slack estimates for all individual goals using the RTPG.

Proof: All goals appear in the RTPG at times smalleror equal to their achievable times.

ADMISSIBLE

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tg

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tg

Heuristics from Solution Extracted from RTPG

RTPG can be used to find a relaxed solution which is thenused to estimate distance from a given state to the goals

Sum actions: Distance from a state S to the goals equals the number of actions in the relaxed plan.

Sum durations: Distance from a state S to the goals equals the summation of action durations in the relaxed plan.

PersonAirplane

Person

A B

Load(P,A)

Fly(A,B) Fly(B,A)

Unload(P,A)

Unload(P,B)

Init Goal Deadline

t=0 tg

Resource-based Adjustments to Heuristics

Resource related information, ignored originally, can be used to improve the heuristic values

Adjusted Sum-Action:

h = h + R (Con(R) – (Init(R)+Pro(R)))/R

Adjusted Sum-Duration:

h = h + R [(Con(R) – (Init(R)+Pro(R)))/R].Dur(AR)

Will not preserve admissibility

Aims of Empirical Study

Evaluate the effectiveness of the different heuristics.Ablation studies:– Test if the resource adjustment technique helps

different heuristics.

Compare with other temporal planning systems.

Empirical Results

  Adjusted Sum-Action Sum-DurationProb time #act nodes dur time #act nodes dur

Zeno1 0.317 5 14/48 320 0.35 5 20/67 320

Zeno2 54.37 23 188/1303 950 - - - -

Zeno3 29.73 13 250/1221 430 6.20 13 60/289 450

Zeno9 13.01 13 151/793 590 98.66 13 4331/5971 460

Log1 1.51 16 27/157 10.0 1.81 16 33/192 10.0

Log2 82.01 22 199/1592 18.87 38.43 22 61/505 18.87

Log3 10.25 12 30/215 11.75 - - - -

Log9 116.09 32 91/830 26.25 - - - -

Sum-action finds solutions faster than sum-dur Admissible heuristics do not scale up to bigger problems

Sum-dur finds shorter duration solutions in most of the casesResource-based adjustment helps sum-action, but not sum-durVery few irrelevant actions. Better quality than TemporalTLPlan.

So, (transitively) better than LPSAT

Comparison to other planners

Planners with similar capabilities – IxTet, Zeno

• Poor scaleup– HSTS, TLPLAN

• Domain dependent search control

Planners with limited capabilities– TGP and TGP – Compared on a set of random temporal logistics problem: Domain specification and problems are defined by TP4’s creator

(“P@trik” Haslum) No resource requirements No deadline constraints or actions with dynamic duration

Empirical Results (cont.)

Logistics domain with driving restricted to intra-city(traditional logistics domain)

0102030405060708090

100

0 100 200 300 400 500 600 700 800 900 1000

Solving Time (s)

Pro

ble

m S

olv

ed

(%

)

SAPA

TP4

TGP

Sapa is the only planner that can solve all 80 problems

Empirical Results (cont.)

The “sum-action” heuristic used as the default in Sapa can be mislead by the long duration actions...

Logistics domain with inter-city driving actions

Future work on fixed point time/level propagation

0102030405060708090

100

0 100 200 300 400 500 600 700 800 900 1000

Solving Time (s)

Pro

ble

ms

So

lve

d (

%)

SAPA

TP4

TGP

Conclusion

Presented SAPA, a domain-independent forward temporal planner that can handle:– Durative actions– Deadline goals– Continuous resources

Developed different heuristic functions based on the relaxed temporal planning graph to address both satisficing and optimizing searchMethod to improve heuristic values by resource reasoningPromising initial empirical results

Future Work

Exploit mutex information in:– Building the temporal planning graph– Adjusting the heuristic values in the relaxed solution

Relevance analysisImproving solution quality Relaxing constraints and integrating with full-scale scheduler