Evaluating Demonstrations
Evaluation Feedback
DemoFeedback
DemoFeedback = StrategyDemoFeedback | QueryDemoFeedback
Feedback sent by the server for each demonstration in a file.
QueryDemoFeedback
dataclass
Feedback sent by the server for a standalone query demonstration.
Attributes:
Name | Type | Description |
---|---|---|
kind |
Literal['query']
|
Always "query". |
diagnostics |
list[Diagnostic]
|
Global diagnostics. |
answer_diagnostics |
list[tuple[int, Diagnostic]]
|
Diagnostics attached to specific answers. |
Source code in src/delphyne/analysis/feedback.py
418 419 420 421 422 423 424 425 426 427 428 429 430 431 |
|
StrategyDemoFeedback
dataclass
Feedback sent by the server for each strategy demonstration.
Attributes:
Name | Type | Description |
---|---|---|
kind |
Literal['strategy']
|
Always "strategy". |
trace |
Trace
|
The resulting browsable trace, which includes all visited nodes. |
answer_refs |
dict[TraceAnswerId, DemoAnswerId]
|
A mapping from answer ids featured in the
trace to the position of the corresponding answer in the
demonstration. This mapping may be partial. For example,
using value hints (e.g., |
saved_nodes |
dict[str, TraceNodeId]
|
Nodes saved using the |
test_feedback |
list[TestFeedback]
|
Feedback for each test in the demonstration. |
global_diagnostics |
list[Diagnostic]
|
Diagnostics that apply to the whole demonstration (individual tests have their own diagnostics). |
query_diagnostics |
list[tuple[DemoQueryId, Diagnostic]]
|
Diagnostics attached to specific queries. |
answer_diagnostics |
list[tuple[DemoAnswerId, Diagnostic]]
|
Diagnostics attached to specific answers. |
implicit_answers |
dict[ImplicitAnswerCategory, list[ImplicitAnswer]]
|
Implicit answers that were generated on the fly and that can be explicitly added to the demonstration, grouped by category. The dictionary should have no empty value: each mentioned catefory should have at least one implicit answer. |
Source code in src/delphyne/analysis/feedback.py
379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 |
|
TestFeedback
dataclass
Feedback returned by the demo interpreter for a single test.
The test is considered successful if no diagnostic is a warning or an
error. Most of the time, and even when unsuccessful, a test stops at
a given node, which can be inspected in the UI and which is
indicated in field node_id
.
Attributes:
Name | Type | Description |
---|---|---|
diagnostics |
list[Diagnostic]
|
List of diagnostics for the test. |
node_id |
TraceNodeId | None
|
Identifier of the node where the test stopped. |
Source code in src/delphyne/analysis/feedback.py
301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 |
|
ImplicitAnswer
dataclass
An implicit answer that is not part of the demonstration but was generated on the fly.
The VSCode extension then offers to add such answers explicitly in
the demonstration. This is particularly useful for handling
Compute
nodes in demonstrations.
Attributes:
Name | Type | Description |
---|---|---|
query_name |
str
|
Query name. |
query_args |
dict[str, object]
|
Arguments passed to the query. |
answer_mode |
str | None
|
Answer mode. |
answer_content |
str | object
|
Answer content, as raw text or as a JSON value for structured output. |
answer_structured |
bool
|
Whether the answer is structured. |
answer_tool_calls |
Sequence[ImplicitAnswerToolCall]
|
Associated tool calls. |
answer_justification |
str | None
|
Justification for the answer. |
comment |
str | None
|
An optional comment that can be added to provide context, to be logged in the extension's output channel. |
Source code in src/delphyne/analysis/feedback.py
334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 |
|
DemoAnswerId
DemoAnswerId = tuple[int, int]
A (query_id, answer_index) pair that identifies an answer in a demo.
DemoQueryId
DemoQueryId = int
Index of the query in the queries section of a demo.
Diagnostic
Diagnostic = tuple[DiagnosticType, str]
A diagnostic gathers a type (i.e. severity) and a message.
DiagnosticType
DiagnosticType = Literal['error', 'warning', 'info']
Diagnostic type.
Browsable Traces
Trace
dataclass
A browsable trace.
Raw traces contain all the information necessary to recompute a trace but are not easily manipulated by tools. In comparison, these offer a more redundant but also more explicit view. This module provides a way to convert a trace from the former format to the latter.
Attributes:
Name | Type | Description |
---|---|---|
nodes |
dict[TraceNodeId, Node]
|
A mapping from node ids to their description. |
Info
A browsable trace features answer identifiers, for which a
meaning must be provided externally. For example, the
demonstration interpreter also produces a mapping from answer
ids to their position in the demonstration file. In addition,
commands like run_strategy
return a raw trace
(core.traces.Trace
) in addition to the browsable version,
which maps answer ids to their actual content.
Source code in src/delphyne/analysis/feedback.py
260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
|
Node
dataclass
Information about a node.
Attributes:
Name | Type | Description |
---|---|---|
kind |
str
|
Name of the node type, or |
success_value |
ValueRepr | None
|
The success value if the node is a success leaf,
or |
summary_message |
str | None
|
A short summary message (see the
|
leaf_node |
bool
|
Whether the node is a leaf node |
label |
str | None
|
A label describing the node, which can be useful for writing node selectors (although there is currently no guarantee that the label constitutes a valid selector leading to the node). Currently, the label shows all node tags, separated by "&". |
tags |
list[str]
|
The list of all tags attached to the node. |
properties |
list[tuple[Reference, NodeProperty]]
|
List of node properties (attached queries, nested trees, data fields...). Each property is accompanied by a pretty-printed, local space reference. |
actions |
list[Action]
|
A list of explored actions. |
origin |
NodeOrigin
|
The origin of the node in the global trace. |
Source code in src/delphyne/analysis/feedback.py
221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 |
|
NodeOrigin
NodeOrigin = (
Literal["root"]
| tuple[Literal["child"], TraceNodeId, TraceActionId]
| tuple[Literal["nested"], TraceNodeId, TraceNodePropertyId]
)
Origin of a node.
A node can be the global root, the child of another node, or the root of a nested tree.
Action
dataclass
An action associated with a node.
Attributes:
Name | Type | Description |
---|---|---|
ref |
Reference
|
Pretty-printed local reference for the action. |
hints |
list[str] | None
|
If the trace results from executing a demonstration,
this provides the list of hints that can be used to recover
the action through navigation. Otherwise, it is |
related_success_nodes |
list[TraceNodeId]
|
List of related success nodes. A related success node is a node whose attached value was used in building the action. Indeed, in the VSCode extension's Path View, we get a sequence of actions and for each of them the list of success paths that were involved in building that action. |
related_answers |
list[TraceAnswerId]
|
List of related answers. A related answer is an answer to a local query that is used in building the action. Storing this information is useful to detect useless answers that are not used in any action. |
destination |
TraceNodeId
|
Id of the child node that the action leads to. |
Source code in src/delphyne/analysis/feedback.py
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
|
NodeProperty
NodeProperty = Data | NestedTree | Query
Description of a node property (see NodePropertyId
).
Data
dataclass
Generic property that displays some data.
Attributes:
Name | Type | Description |
---|---|---|
kind |
Literal['data']
|
Always "data". |
content |
str
|
string representation of the data content. |
Source code in src/delphyne/analysis/feedback.py
95 96 97 98 99 100 101 102 103 104 105 106 |
|
NestedTree
dataclass
A nested tree.
Attributes:
Name | Type | Description |
---|---|---|
kind |
Literal['nested']
|
Always "nested". |
strategy |
str
|
Name of the strategy function that induces the tree. |
args |
dict[str, ValueRepr]
|
Arguments passed to the strategy function. |
tags |
list[str]
|
Tags attached to the space induced by the tree. |
node_id |
TraceNodeId | None
|
Identifier of the root node of the nested tree, or
|
Source code in src/delphyne/analysis/feedback.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
Query
dataclass
Information about a query.
Attributes:
Name | Type | Description |
---|---|---|
kind |
Literal['query']
|
Always "query". |
name |
str
|
Name of the query. |
args |
dict[str, object]
|
Query arguments, serialized in JSON. |
tags |
list[str]
|
Tags attached to the space induced by the query. |
answers |
list[Answer]
|
All answers to the query present in the trace. |
Source code in src/delphyne/analysis/feedback.py
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
|
Answer
dataclass
An answer to a query.
Attributes:
Name | Type | Description |
---|---|---|
id |
TraceAnswerId
|
Unique answer identifier. |
value |
ValueRepr
|
Parsed answer value. |
hint |
tuple[] | tuple[str] | None
|
If the trace results from executing a demonstration (vs
running a policy with tracing enabled), then |
Source code in src/delphyne/analysis/feedback.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
|
Reference
dataclass
A reference to a space or to a value.
Several human-readable representations are provided:
Attributes:
Name | Type | Description |
---|---|---|
with_ids |
str
|
A pretty-printed, id-based reference. |
with_hints |
str | None
|
A pretty-printed, hint-based reference. These are typically available in the output of the demonstration interpreter, but not when converting arbitrary traces that result from running policies. |
Source code in src/delphyne/analysis/feedback.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
ValueRepr
dataclass
Multiple representations for a Python object.
We allow providing several representations for Python objects: short, one-liner string descriptions, detailed descriptions, JSON representation... All of these can be leveraged by different tools and UI components.
Attributes:
Name | Type | Description |
---|---|---|
short |
str
|
A short representation, typically obtained using the
|
long |
str | None
|
A longer, often multi-line representation, typically
obtained using the |
json |
object
|
A JSON representation of the object. |
json_provided |
bool
|
Whether a JSON representation is provided (the
JSON field is |
Source code in src/delphyne/analysis/feedback.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
TraceAnswerId
TraceAnswerId = int
Global answer id, as set by core.traces.Trace
.
TraceActionId
TraceActionId = int
Index of an action within a given node.
TraceNodePropertyId
TraceNodePropertyId = int
Index of a property within a given node. A property is an element that can be listed in the UI, which is either an attached query, a nested tree or some data.
Demonstration Interpreter
evaluate_demo
evaluate_demo(
demo: Demo,
context: DemoExecutionContext,
*,
extra_objects: dict[str, object],
answer_database_loader: AnswerDatabaseLoader,
load_implicit_answer_generators: ImplicitAnswerGeneratorsLoader,
) -> DemoFeedback
Evaluate a query or strategy demonstration.
This is the main entrypoint of the demonstration interpreter.
Attributes:
Name | Type | Description |
---|---|---|
demo |
The demonstration to evaluate. |
|
context |
The execution context in which to resolve Python identifiers. |
|
extra_objects |
Additional objects that can be resolved by name (with higher precedence). |
|
load_implicit_answer_generators |
Load the implicit answer
generators (e.g. including the one handling |
Returns:
Type | Description |
---|---|
DemoFeedback
|
A feedback object containing the results of the evaluation. |
Warning
This function creates an ObjectLoader
internally and is
therefore not thread-safe.
Source code in src/delphyne/analysis/demo_interpreter.py
814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 |
|
DemoExecutionContext
dataclass
Demonstration Execution Context.
Attributes:
Name | Type | Description |
---|---|---|
strategy_dirs |
Sequence[Path]
|
A list of directories in which strategy modules
can be found, to be added to |
modules |
Sequence[str]
|
A list of modules in which python object identifiers
should be resolved. Modules can be part of packages and so
their name may feature |
Source code in src/delphyne/analysis/demo_interpreter.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
|
ObjectLoader
Utility class for loading Python objects.
Demonstration and command files may refer to Python identifiers that
need to be resolved. This is done relative to an execution context
(DemoExecutionContext
) that specifies a list of directories to be
added to sys.path
, along with a list of modules.
An exception is raised if an object with the requested identifier can be found in several modules.
Source code in src/delphyne/analysis/demo_interpreter.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
|
__init__
__init__(
ctx: DemoExecutionContext,
extra_objects: dict[str, object] | None = None,
reload: bool = True,
)
Attributes:
Name | Type | Description |
---|---|---|
ctx |
The execution context in which to resolve Python identifiers. |
|
extra_objects |
Additional objects that can be resolved by name (with higher precedence). |
|
reload |
Whether to reload all modules specified in the
execution context upon initialization. Setting this
value to |
Raises:
Type | Description |
---|---|
ModuleNotFound
|
a module could not be found. |
Source code in src/delphyne/analysis/demo_interpreter.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
find_object
find_object(name: str) -> Any
Find an object with a given name.
If the name is unqualified (it features no .
), one attempts to
find the object in every registered module in order. If the name
is qualified, one looks at the specified registered module.
Raises:
Type | Description |
---|---|
ObjectNotFound
|
The object could not be found. |
AmbiguousObjectIdentifier
|
The object name is ambiguous, i.e. it is found in several modules. |
Source code in src/delphyne/analysis/demo_interpreter.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
|
load_and_call_function
load_and_call_function(name: str, args: dict[str, Any]) -> Any
Load and call a function by wrapping a call to find_object
.
Source code in src/delphyne/analysis/demo_interpreter.py
180 181 182 183 184 185 186 |
|
load_strategy_instance
load_strategy_instance(name: str, args: dict[str, Any]) -> StrategyComp[Any, Any, Any]
Load and instantiate a strategy function with given arguments.
Raises:
Type | Description |
---|---|
ObjectNotFound
|
If the strategy function cannot be found. |
AmbiguousObjectIdentifier
|
If an ambiguous name is given. |
StrategyLoadingError
|
If the object is not a strategy function or if the arguments are invalid. |
Source code in src/delphyne/analysis/demo_interpreter.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
|
load_query
load_query(name: str, args: dict[str, Any]) -> AbstractQuery[Any]
Load a query by name and instantiate it with given arguments.
Raises:
Type | Description |
---|---|
ObjectNotFound
|
if the query cannot be found. |
AmbiguousObjectIdentifier
|
if an ambiguous name is given. |
AssertionError
|
if the object is not a query. |
Source code in src/delphyne/analysis/demo_interpreter.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
|
ImplicitAnswerGenerator
ImplicitAnswerGenerator = Callable[
[AnyTree, AttachedQuery[Any]], tuple[ImplicitAnswerCategory, Answer] | None
]
A function that optionally maps a tree node along with a query within this
node to an implicit answer for the query. This is useful in particular
for supporting Compute
nodes in demonstrations.