This page is the reference for the spans AgentMark collects and the attributes it reads off them. Use it to look up a span type, an attribute name, or a SpanOptions field. To wire any of this into your application, see Set up tracing.
Collected spans
AgentMark classifies every ingested OpenTelemetry span by type:
| Span type | Description | Key attributes |
|---|
GENERATION | A model call. Detected from gen_ai.* attributes or AI SDK span names (ai.generateText, ai.streamText, and their doGenerate / doStream children) | ai.model.id, ai.prompt, ai.response.text, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens; streaming calls add ai.response.msToFirstChunk, ai.response.avgCompletionTokensPerSecond |
SPAN | Any other operation: tool calls, retrieval steps, custom span() / observe() wrappers | ai.toolCall.name, ai.toolCall.args, ai.toolCall.result, agentmark.* |
The spans API accepts a third type, EVENT, for point-in-time records.
Span attributes
Each span carries detailed attributes:
Model information: ai.model.id (e.g., "gpt-5-mini"), ai.model.provider (e.g., "openai")
Token usage: gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.total_tokens (the ai.usage.promptTokens / ai.usage.completionTokens aliases are also accepted)
Telemetry metadata: ai.telemetry.functionId, ai.telemetry.metadata.*
Response details: ai.response.text, ai.response.toolCalls, ai.response.finishReason
Prompt identity and version linking
Prompt runs are linked to the exact prompt version, not just the prompt name:
agentmark.prompt_name: the prompt’s frontmatter name. Both the TypeScript and Python SDKs emit this attribute on the prompt-run span; it powers the prompt label column on the trace surfaces.
agentmark.metadata.commit_sha: the git commit the prompt content was served at. When a prompt is loaded from AgentMark Cloud, the gateway stamps the served-at commit into the prompt’s agentmark_meta.commit_sha (the pinned environment commit for keys bound to a pinned environment, the latest synced commit otherwise). The local CLI dev server stamps your repo’s HEAD the same way. The SDK’s run path echoes that commit onto the trace automatically, with no code change needed.
On ingest, the gateway verifies the SDK-supplied commit against the environment’s server-side deployment pointer (trust-but-verify): for API keys bound to an environment, the stamped CommitSha always reflects the server’s recorded deployment, so a client can never claim an arbitrary commit.
Trace-level input and output
The trace list and trace detail views show a single input/output per trace. These are derived from spans at read time, identically in AgentMark Cloud and the local dev server’s GET /v1/traces/:id:
- Root span first. The prompt-run (root) span’s
agentmark.input / agentmark.output attributes win when present. The WebhookRunner records these automatically on every run: the formatted messages as input right after the prompt renders, and the final text/object as output when the event stream drains, in both streaming and non-streaming modes.
- GENERATION fallback. When the root span carries no I/O (third-party OTEL instrumentation pointed straight at the collector), the first GENERATION span’s input and the last GENERATION span’s output are used instead, in timestamp order.
The fields resolve independently: a trace with only a root-span output still gets its input from the GENERATION fallback.
Two practical implications:
- Executors never set trace I/O. If
agentmark doctor --smoke (the CLI’s instrumentation smoke test) reports a trace missing input/output, the fix is in instrumentation, not in your executor.
- GENERATION spans come from your model SDK’s instrumentation, e.g.
experimental_telemetry with the Vercel AI SDK, or Agent.instrument_all(InstrumentationSettings(version=3)) with Pydantic AI.
SpanOptions
SpanOptions configure a span created with span() / span_context(). To set them up in code, see Grouping operations into a span.
| Option | Type | Required | Description |
|---|
name | string | Yes | Name for the span |
metadata | Record<string, string> | No | Custom key-value metadata (strings only) |
promptName | string | No | Prompt name, emitted as the agentmark.prompt_name attribute (same key in Python via prompt_name) |
sessionId | string | No | Group traces into a session |
sessionName | string | No | Human-readable session name |
userId | string | No | Associate trace with a user |
datasetRunId | string | No | Link to a dataset run |
datasetRunName | string | No | Human-readable dataset run name |
datasetItemName | string | No | Specific dataset item name |
datasetExpectedOutput | string | No | Expected output for evaluation |
datasetPath | string | No | Path to the dataset file |
experimentKey | string | No | Stable identity of the evaluation for experiment runs. Set it whenever you set datasetRunId |
sourceTreeHash | string | No | Git tree hash of the code state the run executed against |
datasetInput | string | No | JSON-serialized dataset row input, used to match rows across experiment runs |
The option names above are TypeScript. Python’s SpanOptions uses the snake_case equivalents: user_id, session_id, session_name, prompt_name, dataset_run_id, and so on.
observe() options
observe() wraps an async function with automatic input/output capture and lets you set a SpanKind. For the call patterns, see Wrapping functions with observe().
| Option | Type | Description |
|---|
name | string | Display name for the span (defaults to function name) |
kind | SpanKind | Type of operation (defaults to SpanKind.FUNCTION) |
captureInput / capture_input | boolean | Record function arguments (default: true) |
captureOutput / capture_output | boolean | Record return value (default: true) |
processInputs / process_inputs | function | Transform arguments before recording (useful for redacting sensitive data) |
processOutputs / process_outputs | function | Transform return value before recording |
SpanKind values
SpanKind sets the semantic kind of a span, which drives how spans are filtered and grouped on dashboards.
| Kind | Description |
|---|
SpanKind.FUNCTION | Generic computation step (default) |
SpanKind.LLM | A call to a language model |
SpanKind.TOOL | An external tool or API call |
SpanKind.AGENT | An orchestration loop |
SpanKind.RETRIEVAL | A vector database query or document search |
SpanKind.EMBEDDING | A call to an embedding model |
SpanKind.GUARDRAIL | A content safety or validation check |
SpanResult
span() returns a SpanResult in TypeScript:
interface SpanResult<T> {
result: Promise<T>; // The result of your callback (as a Promise)
traceId: string; // The trace ID for correlation
}
result is Promise<T>, not T. You need to await it to get the resolved value.
In Python, span_context() is an async context manager that exposes trace_id on the context object:
async with span_context(SpanOptions(name="my-operation")) as ctx:
print(ctx.trace_id) # Available immediately
result = await my_async_function()
Have questions?
Reach out any time: