The Strategy Language

Following the previous Overview chapter, we now provide more details on Delphyne's strategy language. We provide a quick overview of the most useful techniques and concepts and refer to the Reference for more details and explanations (follow the hypertext links).

Defining Strategies

A strategy is a program with unresolved choice points, which can be reified into a search tree. Delphyne allows writing strategies as Python generators that yield internal tree nodes and receive associated actions in return. The Strategy type has three type parameters, corresponding to the strategy's signature (i.e., the type of nodes it can emit), its associated inner policy type and its return type. Strategy functions are typically defined via the strategy decorator, which creates functions that return StrategyInstance values, wrapping the underlying generator while adding some metadata and convenience methods (e.g., using).

@strategy # (1)!
def find_param_value(
    expr: str) -> Strategy[Branch | Fail, FindParamValueIP, int]: ...

Example from the previous chapter.

Query functions can have arguments of arbitrary type (including functions). When launching strategies from commands and in the presence of type annotations, arguments are automatically unserialized using Pydantic. Thus, it is useful for top-level strategies to have serializable argument types that are properly annotated.

Branching nodes can be yielded via the branch function:

def branch[P, T](cands: Opaque[P, T]) -> Strategy[Branch, P, T]: ...

Crucially, branch does not take a query as an argument but an opaque space (Opaque). An opaque space can be seen as a mapping from the ambient inner policy to an iterator of values (more precisely, a search stream). Opaque spaces can be produced from queries or strategy instances, by mapping the ambient inner policy to a prompting policy or a policy respectively:

class Query[T]:
    def using[Pout](self,
        get_policy: Callable[[Pout], PromptingPolicy]) -> Opaque[Pout, T]: ...

class StrategyInstance[N: Node, P, T]:
    def using(self,
        get_policy: Callable[[Pout], Policy[N, P]]) -> Opaque[Pout, T]: ...

Importantly, search policies such as dfs are unaware of whether an opaque space originates from a query or a strategy. This guarantees modularity and allows queries to be transparently refined into dedicated strategies whenever more guidance is desired. Opaque spaces also allow queries with different signatures (i.e., yielding different kinds of tree nodes) to be composed, while being associated independent search policies.

Strategy Inlining

It is also possible to inline a strategy call within another strategy, provided that both strategies share the same signature and inner policy type. This can be done using the inline method, which unwraps a StrategyInstance value and gives access to the underlying generator.

For an example of a strategy that branches over results of a sub-strategy, see the prove_program_via_abduction_and_branching strategy from the find_invariants example. In particular, see how doing so results in nested inner policy records.

Source for examples/find_invariants/abduct_and_branch.py

"""
A simple Delphyne strategy to discover loop invariants with Why3.
"""

from collections.abc import Callable, Sequence
from dataclasses import dataclass

import delphyne as dp
from delphyne import Branch, Compute, Fail, Strategy, Value, strategy

import why3_utils as why3

# fmt: off


#####
##### Inner Policy Types
#####


@dataclass
class ProposeInvsIP:
    propose: dp.PromptingPolicy
    novel: dp.PromptingPolicy


@dataclass
class ProveProgIP:
    propose: "dp.Policy[Branch | Fail, ProposeInvsIP]"
    eval: dp.PromptingPolicy
    quantify_eval: "Callable[[ProofStateMetrics], float] | None"


#####
##### Type Definitions
#####


@dataclass
class ProofStateMetrics:
    has_redundant_invs: bool


type Proposal = Sequence[why3.Formula]


type Blacklist = Sequence[Proposal]



#####
##### Strategies
#####


@strategy
def prove_program_via_abduction_and_branching(
    prog: why3.File,
) -> Strategy[Branch | Value | Fail | Compute, ProveProgIP, why3.File]:
    annotated: why3.File = prog
    while True:
        feedback = yield from dp.compute(why3.check)(prog, annotated)
        yield from dp.ensure(feedback.error is None, "invalid program")
        if feedback.success:
            return annotated
        remaining = [o for o in feedback.obligations if not o.proved]
        yield from dp.ensure(
            len(remaining) == 1, "too many remaining obligations")
        unproved = remaining[0]
        yield from dp.ensure(
            not why3.invariant_init_obligation(unproved), "init")
        if annotated != prog:  # if invariant annotations are present
            yield from dp.value(
                EvaluateProofState(unproved)
                    .using(lambda p: p.eval, ProveProgIP),
                lambda p: p.quantify_eval)
        new_invariants = yield from dp.branch(
            dp.iterate(
                lambda prior:
                    propose_invariants(unproved, prior)
                        .using(lambda p: p.propose, ProveProgIP)))
        annotated = why3.add_invariants(annotated, new_invariants)


@strategy
def propose_invariants(
    obligation: why3.Obligation,
    blacklist: Sequence[Proposal] | None,
) -> Strategy[Branch | Fail, ProposeInvsIP, tuple[Proposal, Blacklist]]:
    if blacklist is None:
        blacklist = []
    proposal = yield from dp.branch(
        ProposeInvariants(obligation, blacklist)
            .using(lambda p: p.propose, ProposeInvsIP))
    sanity_check = all(why3.no_invalid_formula_symbol(inv) for inv in proposal)
    yield from dp.ensure(sanity_check, "sanity check failed")
    if blacklist:
        novel = yield from dp.branch(
            IsProposalNovel(proposal, blacklist)
                .using(lambda p: p.novel, ProposeInvsIP))
        yield from dp.ensure(novel, "proposal is not novel")
    return proposal, [*blacklist, proposal]


#####
##### Queries
#####


@dataclass
class ProposeInvariants(dp.Query[Proposal]):
    unproved: why3.Obligation
    blacklist: Sequence[Proposal]

    __parser__ = dp.last_code_block.yaml


@dataclass
class IsProposalNovel(dp.Query[bool]):
    proposal: Proposal
    blacklist: Sequence[Proposal]

    __parser__ = dp.last_code_block.yaml


@dataclass
class EvaluateProofState(dp.Query[ProofStateMetrics]):
    unproved: why3.Obligation

    __parser__ = dp.last_code_block.yaml


#####
##### Policies
#####


def prove_program_via_abduction_and_branching_policy(
    fancy_model: str = "gpt-4o",
    base_model: str = "gpt-4o",
    max_depth: int = 2,
    max_requests_per_proposal: int = 5,
    root_proposal_penalty: float = 0.7,
    nonroot_proposal_penalty: float = 1.0,
    max_nonroot_proposals: int = 3,
    enable_state_evaluation: bool = False,
    min_value: float = 0.3,
) -> dp.Policy[Branch | Value | Fail | Compute, ProveProgIP]:

    def compute_value(metrics: ProofStateMetrics) -> float:
        prob = 1
        if metrics.has_redundant_invs:
            prob = 0
        return max(min_value, prob)

    def child_confidence_prior(depth: int, prev_gen: int) -> float:
        if depth >= 1:
            if prev_gen >= max_nonroot_proposals:
                return 0
            return nonroot_proposal_penalty ** prev_gen
        return root_proposal_penalty ** prev_gen

    n = dp.NUM_REQUESTS
    fancy = dp.openai_model(fancy_model)
    base = dp.openai_model(base_model)

    propose_ip = ProposeInvsIP(
        propose=dp.few_shot(fancy),
        novel=dp.take(1) @ dp.few_shot(base),
    )
    proposal_limit = dp.BudgetLimit({n: max_requests_per_proposal})
    prove_prog_ip = ProveProgIP(
        propose=dp.with_budget(proposal_limit) @ dp.dfs() & propose_ip,
        eval=dp.take(1) @ dp.few_shot(base),
        quantify_eval=compute_value if enable_state_evaluation else None,
    )
    bestfs = dp.best_first_search(
        child_confidence_prior=child_confidence_prior,
        max_depth=max_depth)

    return bestfs @ dp.elim_compute() & prove_prog_ip

Queries

New query types can be defined by subclassing the Query class. We refer to the associated API documentation for details. Some highlights:

Queries can be associated system prompts and instance prompts. Prompts can be defined inline or in separate Jinja files (see find_invariants example).
Query types can have fields of any type as long as they can be serialized and unserialized using Pydantic (this includes custom dataclasses).
Prompts can be parameterized, and parameters instantiated on the policy side (e.g., params argument of few_shot). This is useful for testing prompt variations, specializing specific prompt fragments for particular, etc...
It is possible to define several answer modes for a query, each mode being associated a distinct parser (see tests/example_strategies.py:GetFavoriteDish for an example).
Queries support structured output and tool calls.

Trees and Reification

Strategies can be reified (i.e. compiled) into trees using the reify function (see reference documentation for details and caveats). The Tree class is defined as follows:

@dataclass(frozen=True)
class Tree[N: Node, P, T]:
    node: N | Success[T]
    child: Callable[[Value], Tree[N, P, T]]
    ref: GlobalNodeRef # (1)!

To ignore on first reading. See documentation on references.

A tree contains either a node (Node) or a success leaf (Node). When applicable, children trees are indexed by actions (Value). Actions result from combining elements of local spaces (Space). For example, if node has type Branch, then an action corresponds to a branching candidate.

Adding New Effects

New types of effects beyond Branch and Fail can be added easily, by subclassing Node. Here are a number of additional effects defined in the Delphyne standard library:

Join: allows evaluating a sequence of independent sub-strategies, possibly in parallel.
Compute: allows performing expensive and possibly non-replicable/nondeterministic computations in strategies (see details in How-To Guide).
Value: allows adding value information into strategy trees. Such information can be leveraged by search policies (e.g. best_first_search).
Flag: allows providing a finite number of alternative implementations for sub-tasks, to be selected either offline or at runtime.
Message: allows decorating trees with debugging messages.

Node types are dataclasses whose fields can be of several kinds:

Nonparametric local spaces (Space), the main types of which are opaque spaces (OpaqueSpace) and embedded trees (EmbeddedTree).
Parametric local spaces, which are functions from local values (i.e. assembly of elements from local spaces) to local spaces.
Data fields that contain policy metadata, debugging information, etc...

More details are available in the API Reference. For examples of defining new effects, you can refer to the source code of the aforementioned effects in the Delphyne standard library (in the delphyne.stdlib.nodes module).