Evaluating Demonstrations
Evaluation Feedback
DemoFeedback
DemoFeedback = StrategyDemoFeedback | QueryDemoFeedback
Feedback sent by the server for each demonstration in a file.
QueryDemoFeedback
dataclass
Feedback sent by the server for a standalone query demonstration.
Attributes:
| Name | Type | Description |
|---|---|---|
kind |
Literal['query']
|
Always "query". |
diagnostics |
list[Diagnostic]
|
Global diagnostics. |
answer_diagnostics |
list[tuple[int, Diagnostic]]
|
Diagnostics attached to specific answers. |
Source code in src/delphyne/analysis/feedback.py
434 435 436 437 438 439 440 441 442 443 444 445 446 447 | |
StrategyDemoFeedback
dataclass
Feedback sent by the server for each strategy demonstration.
Attributes:
| Name | Type | Description |
|---|---|---|
kind |
Literal['strategy']
|
Always "strategy". |
trace |
Trace
|
The resulting browsable trace, which includes all visited nodes. |
answer_refs |
dict[TraceAnswerId, DemoAnswerId]
|
A mapping from answer ids featured in the
trace to the position of the corresponding answer in the
demonstration. This mapping may be partial. For example,
using value hints (e.g., |
saved_nodes |
dict[str, TraceNodeId]
|
Nodes saved using the |
test_feedback |
list[TestFeedback]
|
Feedback for each test in the demonstration. |
global_diagnostics |
list[Diagnostic]
|
Diagnostics that apply to the whole demonstration (individual tests have their own diagnostics). |
query_diagnostics |
list[tuple[DemoQueryId, Diagnostic]]
|
Diagnostics attached to specific queries. |
answer_diagnostics |
list[tuple[DemoAnswerId, Diagnostic]]
|
Diagnostics attached to specific answers. |
implicit_answers |
dict[ImplicitAnswerCategory, list[ImplicitAnswer]]
|
Implicit answers that were generated on the fly and that can be explicitly added to the demonstration, grouped by category. The dictionary should have no empty value: each mentioned catefory should have at least one implicit answer. |
Source code in src/delphyne/analysis/feedback.py
395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 | |
TestFeedback
dataclass
Feedback returned by the demo interpreter for a single test.
The test is considered successful if no diagnostic is a warning or an
error. Most of the time, and even when unsuccessful, a test stops at
a given node, which can be inspected in the UI and which is
indicated in field node_id.
Attributes:
| Name | Type | Description |
|---|---|---|
diagnostics |
list[Diagnostic]
|
List of diagnostics for the test. |
node_id |
TraceNodeId | None
|
Identifier of the node where the test stopped. |
Source code in src/delphyne/analysis/feedback.py
317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
ImplicitAnswer
dataclass
An implicit answer that is not part of the demonstration but was generated on the fly.
The VSCode extension then offers to add such answers explicitly in
the demonstration. This is particularly useful for handling
Compute nodes in demonstrations.
Attributes:
| Name | Type | Description |
|---|---|---|
query_name |
str
|
Query name. |
query_args |
dict[str, object]
|
Arguments passed to the query. |
answer_mode |
str | None
|
Answer mode. |
answer_content |
str | object
|
Answer content, as raw text or as a JSON value for structured output. |
answer_structured |
bool
|
Whether the answer is structured. |
answer_tool_calls |
Sequence[ImplicitAnswerToolCall]
|
Associated tool calls. |
answer_justification |
str | None
|
Justification for the answer. |
comment |
str | None
|
An optional comment that can be added to provide context, to be logged in the extension's output channel. |
Source code in src/delphyne/analysis/feedback.py
350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 | |
DemoAnswerId
DemoAnswerId = tuple[int, int]
A (query_id, answer_index) pair that identifies an answer in a demo.
DemoQueryId
DemoQueryId = int
Index of the query in the queries section of a demo.
Diagnostic
dataclass
A diagnostic message shown in the editor.
Source code in src/delphyne/analysis/feedback.py
22 23 24 25 26 27 28 29 30 | |
DiagnosticType
DiagnosticType = Literal['error', 'warning', 'info']
Diagnostic type.
Browsable Traces
Trace
dataclass
A browsable trace.
Raw traces contain all the information necessary to recompute a trace but are not easily manipulated by tools. In comparison, these offer a more redundant but also more explicit view. This module provides a way to convert a trace from the former format to the latter.
Attributes:
| Name | Type | Description |
|---|---|---|
nodes |
dict[TraceNodeId, Node]
|
A mapping from node ids to their description. |
Info
A browsable trace features answer identifiers, for which a
meaning must be provided externally. For example, the
demonstration interpreter also produces a mapping from answer
ids to their position in the demonstration file. In addition,
commands like run_strategy return a raw trace
(core.traces.Trace) in addition to the browsable version,
which maps answer ids to their actual content.
Source code in src/delphyne/analysis/feedback.py
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 | |
Node
dataclass
Information about a node.
Attributes:
| Name | Type | Description |
|---|---|---|
kind |
str
|
Name of the node type, or |
success_value |
ValueRepr | None
|
The success value if the node is a success leaf,
or |
summary_message |
str | None
|
A short summary message (see the
|
leaf_node |
bool
|
Whether the node is a leaf node |
label |
str | None
|
A label describing the node, which can be useful for writing node selectors (although there is currently no guarantee that the label constitutes a valid selector leading to the node). Currently, the label shows all node tags, separated by "&". |
tags |
list[str]
|
The list of all tags attached to the node. |
properties |
list[tuple[Reference, TraceSpaceId | None, NodeProperty]]
|
List of node properties (attached queries, nested trees, data fields...). Each property is accompanied by a pretty-printed, local space reference. |
actions |
list[Action]
|
A list of explored actions. |
origin |
NodeOrigin
|
The origin of the node in the global trace. |
Source code in src/delphyne/analysis/feedback.py
236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 | |
NodeOrigin
NodeOrigin = (
Literal["root"]
| tuple[Literal["child"], TraceNodeId, TraceActionId]
| tuple[Literal["nested"], TraceNodeId, TraceNodePropertyId]
)
Origin of a node.
A node can be the global root, the child of another node, or the root of a nested tree.
Action
dataclass
An action associated with a node.
Attributes:
| Name | Type | Description |
|---|---|---|
ref |
Reference
|
Pretty-printed local reference for the action. |
hints |
list[str] | None
|
If the trace results from executing a demonstration,
this provides the list of hints that can be used to recover
the action through navigation. Otherwise, it is |
related_success_nodes |
list[TraceNodeId]
|
List of related success nodes. A related success node is a node whose attached value was used in building the action. Indeed, in the VSCode extension's Path View, we get a sequence of actions and for each of them the list of success paths that were involved in building that action. |
related_answers |
list[TraceAnswerId]
|
List of related answers. A related answer is an answer to a local query that is used in building the action. Storing this information is useful to detect useless answers that are not used in any action. |
destination |
TraceNodeId
|
Id of the child node that the action leads to. |
Source code in src/delphyne/analysis/feedback.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 | |
NodeProperty
NodeProperty = Data | NestedTree | Query
Description of a node property (see NodePropertyId).
Data
dataclass
Generic property that displays some data.
Attributes:
| Name | Type | Description |
|---|---|---|
kind |
Literal['data']
|
Always "data". |
content |
str
|
string representation of the data content. |
Source code in src/delphyne/analysis/feedback.py
110 111 112 113 114 115 116 117 118 119 120 121 | |
NestedTree
dataclass
A nested tree.
Attributes:
| Name | Type | Description |
|---|---|---|
kind |
Literal['nested']
|
Always "nested". |
strategy |
str
|
Name of the strategy function that induces the tree. |
args |
dict[str, ValueRepr]
|
Arguments passed to the strategy function. |
tags |
list[str]
|
Tags attached to the space induced by the tree. |
node_id |
TraceNodeId | None
|
Identifier of the root node of the nested tree, or
|
Source code in src/delphyne/analysis/feedback.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
Query
dataclass
Information about a query.
Attributes:
| Name | Type | Description |
|---|---|---|
kind |
Literal['query']
|
Always "query". |
name |
str
|
Name of the query. |
args |
dict[str, object]
|
Query arguments, serialized in JSON. |
tags |
list[str]
|
Tags attached to the space induced by the query. |
answers |
list[Answer]
|
All answers to the query present in the trace. |
Source code in src/delphyne/analysis/feedback.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | |
Answer
dataclass
An answer to a query.
Attributes:
| Name | Type | Description |
|---|---|---|
id |
TraceAnswerId
|
Unique answer identifier. |
value |
ValueRepr
|
Parsed answer value. |
hint |
tuple[] | tuple[str] | None
|
If the trace results from executing a demonstration (vs
running a policy with tracing enabled), then |
Source code in src/delphyne/analysis/feedback.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |
Reference
dataclass
A reference to a space or to a value.
Several human-readable representations are provided:
Attributes:
| Name | Type | Description |
|---|---|---|
with_ids |
str
|
A pretty-printed, id-based reference. |
with_hints |
str | None
|
A pretty-printed, hint-based reference. These are typically available in the output of the demonstration interpreter, but not when converting arbitrary traces that result from running policies. |
Source code in src/delphyne/analysis/feedback.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | |
ValueRepr
dataclass
Multiple representations for a Python object.
We allow providing several representations for Python objects: short, one-liner string descriptions, detailed descriptions, JSON representation... All of these can be leveraged by different tools and UI components.
Attributes:
| Name | Type | Description |
|---|---|---|
short |
str
|
A short representation, typically obtained using the
|
long |
str | None
|
A longer, often multi-line representation, typically
obtained using the |
json |
object
|
A JSON representation of the object. |
json_provided |
bool
|
Whether a JSON representation is provided (the
JSON field is |
Source code in src/delphyne/analysis/feedback.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | |
TraceAnswerId
TraceAnswerId = int
Global answer id, as set by core.traces.Trace.
TraceActionId
TraceActionId = int
Index of an action within a given node.
TraceNodePropertyId
TraceNodePropertyId = int
Index of a property within a given node. A property is an element that can be listed in the UI, which is either an attached query, a nested tree or some data.
Demonstration Interpreter
evaluate_demo
evaluate_demo(
demo: Demo,
*,
object_loader: ObjectLoader,
answer_database_loader: AnswerLoader,
implicit_answer_generators: Sequence[ImplicitAnswerGenerator],
) -> DemoFeedback
Evaluate a query or strategy demonstration.
This is the main entrypoint of the demonstration interpreter.
Attributes:
| Name | Type | Description |
|---|---|---|
demo |
The demonstration to evaluate. |
|
object_loader |
An object loader that can be used to resolve query and strategy names. |
|
extra_objects |
Additional objects that can be resolved by name (with higher precedence). |
|
implicit_answer_generators |
Load the implicit answer
generators (e.g. including the one handling |
Returns:
| Type | Description |
|---|---|
DemoFeedback
|
A feedback object containing the results of the evaluation. |
Warning
This function creates an ObjectLoader internally and is
therefore not thread-safe.
Source code in src/delphyne/analysis/demo_interpreter.py
582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 | |
ObjectLoader
Utility class for loading Python objects.
Demonstration and command files may refer to Python identifiers that
need to be resolved. This is done relative to a list of directories
to be added to sys.path, along with a list of modules.
An exception is raised if an object with the requested identifier can be found in several modules.
Source code in src/delphyne/analysis/object_loaders.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 | |
__init__
__init__(
*,
strategy_dirs: Sequence[Path],
modules: Sequence[str],
extra_objects: dict[str, object] | None = None,
initializers: Sequence[str | ObjectLoaderInitializer] = (),
)
Attributes:
| Name | Type | Description |
|---|---|---|
strategy_dirs |
A list of directories in which strategy
modules can be found, to be added to |
|
modules |
A list of modules in which python object
identifiers should be resolved. Modules can be part of
packages and so their name may feature |
|
extra_objects |
Additional objects that can be resolved by name (with higher precedence). |
|
initializers |
A sequence of initialization functions to call before any object is loaded. Each element specifies a qualified function name, or a pair of a qualified function name and of a dictionary of arguments to pass. Each initializer function is called at most once per Python process (subsequent calls with possibly different arguments are ignored). |
Raises:
| Type | Description |
|---|---|
ModuleNotFound
|
a module could not be found. |
Source code in src/delphyne/analysis/object_loaders.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
trivial
staticmethod
trivial() -> ObjectLoader
Create a trivial object loader that always fails at loading objects.
Source code in src/delphyne/analysis/object_loaders.py
154 155 156 157 158 159 160 | |
find_object
find_object(name: str) -> Any
Find an object with a given name.
If the name is unqualified (it features no .), one attempts to
find the object in every registered module in order. If the name
is qualified, one looks at the specified registered module.
Raises:
| Type | Description |
|---|---|
ObjectNotFound
|
The object could not be found. |
AmbiguousObjectIdentifier
|
The object name is ambiguous, i.e. it is found in several modules. |
Source code in src/delphyne/analysis/object_loaders.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
load_and_call_function
load_and_call_function(name: str, args: dict[str, Any]) -> Any
Load and call a function by wrapping a call to find_object.
Source code in src/delphyne/analysis/object_loaders.py
201 202 203 204 205 206 207 | |
load_strategy_instance
load_strategy_instance(name: str, args: dict[str, Any]) -> StrategyComp[Any, Any, Any]
Load and instantiate a strategy function with given arguments.
Raises:
| Type | Description |
|---|---|
ObjectNotFound
|
If the strategy function cannot be found. |
AmbiguousObjectIdentifier
|
If an ambiguous name is given. |
StrategyLoadingError
|
If the object is not a strategy function or if the arguments are invalid. |
Source code in src/delphyne/analysis/object_loaders.py
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | |
load_query
load_query(name: str, args: dict[str, Any]) -> AbstractQuery[Any]
Load a query by name and instantiate it with given arguments.
Raises:
| Type | Description |
|---|---|
ObjectNotFound
|
if the query cannot be found. |
AmbiguousObjectIdentifier
|
if an ambiguous name is given. |
AssertionError
|
if the object is not a query. |
Source code in src/delphyne/analysis/object_loaders.py
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 | |
ObjectLoaderInitializer
dataclass
Specification of a function to be called upon creation of an object loader.
Source code in src/delphyne/analysis/object_loaders.py
75 76 77 78 79 80 81 82 83 | |
ImplicitAnswerGenerator
ImplicitAnswerGenerator = Callable[
[AnyTree, AttachedQuery[Any]], tuple[ImplicitAnswerCategory, Answer] | None
]
A function that optionally maps a tree node along with a query within this
node to an implicit answer for the query. This is useful in particular
for supporting Compute nodes in demonstrations.
Object Loader Exceptions
ModuleNotFound
dataclass
Bases: Exception
Raised by ObjectLoader when a module is not found.
Source code in src/delphyne/analysis/object_loaders.py
19 20 21 22 23 24 25 | |
ObjectNotFound
dataclass
Bases: Exception
Raised by ObjectLoader when an object cannot be found.
Source code in src/delphyne/analysis/object_loaders.py
28 29 30 31 32 33 34 | |
StrategyLoadingError
dataclass
Bases: Exception
Raised by ObjectLoader when a strategy instance cannot be loaded.
Source code in src/delphyne/analysis/object_loaders.py
37 38 39 40 41 42 43 | |
AmbiguousObjectIdentifier
dataclass
Bases: Exception
Raised when attempting to load an object with an ambiguous name.
Attributes:
| Name | Type | Description |
|---|---|---|
identifier |
str
|
the ambiguous identifier. |
modules |
Sequence[str]
|
a list of modules where different objects with the same identifier were found |
Source code in src/delphyne/analysis/object_loaders.py
46 47 48 49 50 51 52 53 54 55 56 57 58 | |