Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agentmark.co/llms.txt

Use this file to discover all available pages before exploring further.

AgentMark uses OpenTelemetry to provide distributed tracing for your prompt executions. This gives you complete visibility into how your prompts perform in production.
Developers set up tracing in your application. See Tracing setup for setup instructions.
Traces panel showing prompt execution timeline with spans, token usage, and response times The Traces panel lists each execution with columns for Name, Status, Latency, Cost, Tokens, Spans, Tags, and Timestamp. Click a row to open the trace detail view with the full span tree and attribute drill-down.

Understanding traces

A trace represents the complete execution of a prompt, including all its steps, tool calls, and metadata. Each trace contains: Execution timeline — See exactly when each step occurred and how long it took. Token usage — Track input tokens, output tokens, and total tokens consumed. Costs — Monitor spending on a per-request basis. Tool calls — View all tool executions, their parameters, and results. Custom metadata — Add context like user IDs, session IDs, and custom attributes. Error information — Detailed error messages and stack traces when issues occur.

Collected spans

AgentMark records the following OpenTelemetry spans:
Span typeDescriptionAttributes
ai.inferenceFull length of the inference calloperation.name, ai.operationId, ai.prompt, ai.response.text, ai.response.toolCalls, ai.response.finishReason
ai.toolCallIndividual tool executionsoperation.name, ai.operationId, ai.toolCall.name, ai.toolCall.args, ai.toolCall.result
ai.streamStreaming response dataai.response.msToFirstChunk, ai.response.msToFinish, ai.response.avgCompletionTokensPerSecond

Span kinds

Each span carries a semantic kind that categorizes the type of operation it represents. Span kinds affect how spans can be filtered and how analytics are grouped on the dashboard.
KindDescription
functionGeneric computation step (default)
llmA call to a language model
toolAn external tool or API call
agentAn orchestration loop that decides what to do next
retrievalA vector database query or document search
embeddingA call to an embedding model
guardrailA content safety or validation check
Span kinds are set in code by wrapping functions with observe() — see SpanKind values for implementation details.

LLM span attributes

Each LLM span contains attributes that vary slightly depending on the adapter you use. The table below shows common attributes across integrations:
AttributeDescription
ai.model.idModel identifier
ai.model.providerModel provider name
ai.usage.promptTokensNumber of prompt tokens
ai.usage.completionTokensNumber of completion tokens
ai.settings.maxRetriesMaximum retry attempts
ai.telemetry.functionIdFunction identifier
ai.telemetry.metadata.*Custom metadata
ai.response.textResponse text
ai.response.toolCallsTool calls array
ai.response.finishReasonFinish reason
All adapters also support custom metadata via agentmark.metadata.* attributes.

Grouping traces

Organize related traces together using custom grouping. This is useful for understanding complex workflows that span multiple prompt executions. Grouped traces view showing a parent trace with nested child traces in the timeline Grouped traces show a parent-child hierarchy in the trace list, with child spans indented under their parent. Use this to model multi-step agent workflows, nested component execution, and parallel processing pipelines.

Viewing traces

Access traces in the AgentMark Dashboard under the Traces tab. Each trace shows:
  • Complete prompt execution timeline
  • Tool calls and their durations
  • Token usage and costs
  • Custom metadata and attributes
  • Error information (if any)
  • Graph visualization (when graph metadata is present)
  • Manual annotations for quality assessment
AgentMark provides powerful filtering across all trace dimensions — model, status, latency, cost, tokens, metadata, scores, and more. Filters can be combined, saved as views, and shared via URL. Learn more about filtering and search

Integration

AgentMark works with any application that uses OpenTelemetry. For detailed setup instructions, see Tracing setup.

MCP trace server

For debugging traces directly from your IDE, AgentMark provides an MCP server that exposes list_traces and get_trace tools. This lets you query and inspect traces without leaving your development environment.

Traces and spans API

You can query traces and spans programmatically using the REST API or the CLI. Both the local dev server and the AgentMark Cloud gateway expose /v1/traces, /v1/traces/{traceId}, and /v1/spans, so you can develop against local data and switch to Cloud without changing your integration. Bulk export (/v1/traces/export) is Cloud-only.
# List traces from the local dev server
npx agentmark api traces list --limit 20

# Get a specific trace with its spans
npx agentmark api traces get <traceId>

# Query spans across all traces
npx agentmark api spans list --limit 50

# Target AgentMark Cloud instead
npx agentmark api traces list --remote --limit 20
See the API reference for all available endpoints, filters, and response schemas. You can also create scores for spans and traces programmatically. The GET /v1/spans endpoint lets you search spans across all traces in your project. Unlike the traces API, which returns traces and their nested spans, the spans endpoint queries individual spans directly — regardless of which trace they belong to. This is useful when you need to:
  • Find all LLM calls using a specific model across your entire project
  • Identify slow operations by filtering on duration thresholds
  • Audit error spans across traces without browsing each trace individually
  • Analyze usage patterns for a particular span type (e.g., all GENERATION spans)
Available filters:
ParameterDescription
typeSpan type: GENERATION, SPAN, or EVENT
statusSpan status: STATUS_CODE_UNSET, STATUS_CODE_OK, or STATUS_CODE_ERROR
namePartial match on span name
modelPartial match on model name
minDurationMinimum duration in milliseconds
maxDurationMaximum duration in milliseconds
limitResults per page (1-500, default 100)
offsetPagination offset
# Find all error spans
npx agentmark api spans list --status STATUS_CODE_ERROR

# Find slow generations (over 5 seconds)
npx agentmark api spans list --type GENERATION --minDuration 5000

# Search spans by model
npx agentmark api spans list --model claude --limit 20

# Target cloud gateway
npx agentmark api spans list --remote --type GENERATION --model gpt-4o
Each span in the response includes its traceId, so you can drill into the full trace for any span that matches your search.

Best practices

  • Use meaningful IDs — Choose descriptive function IDs for easy filtering and debugging.
  • Add context — Include relevant metadata like user IDs, session IDs, and business context.
  • Monitor regularly — Check traces frequently to catch issues early.
  • Set up alerts — Configure alerts for cost, latency, or error thresholds.
  • Analyze patterns — Use the Dashboard’s filtering to identify trends and patterns.

Next steps

Sessions

Group related traces together

Alerts

Get notified of critical issues

Annotations

Manually label and score traces

Tracing setup

Integrate observability in your app

Have Questions?

We’re here to help! Choose the best way to reach us: