API reference - AgentMark Docs

The AgentMark Gateway API provides direct HTTP access to trace ingestion, scoring, and template retrieval.

Base URL

Cloud
Local

https://api.agentmark.co

http://localhost:9418

The local dev server (agentmark dev) and Cloud implement the same /v1/* endpoints. What differs is which handlers run where:

Both surfaces: /v1/config, /v1/traces (ingest + read), /v1/sessions, /v1/spans, /v1/scores (full CRUD + batch), /v1/datasets, /v1/experiments, /v1/templates, /v1/capabilities, /v1/pricing, /v1/filter-schema, and the root /health liveness check.
Cloud-only (local returns 404 or a 501 not_available_locally stub): /v1/metrics, /v1/scores/aggregations, the structured-search endpoints (/v1/traces/search, /v1/spans/search, /v1/scores/search), annotation queues (/v1/annotation-queues/*), and the dependency health endpoints (/v1/health/*).
Split: /v1/prompts lists every prompt file on Local. Cloud serves ?name=X lookups only and returns 501 not_available_on_cloud for the full no-name listing.
Deprecated: /v1/runs/{runId}/traces still works on local for backwards compatibility with older SDK versions, but new code should use /v1/traces?dataset_run_id={runId}. Both paths resolve to the same query. The Cloud endpoint has always returned 501.

Call GET /v1/capabilities to probe which features a server supports at runtime. Every endpoint carries the /v1/ prefix except the root health check.

Available endpoints

The generated pages in this sidebar, built from the OpenAPI spec, are the authoritative per-endpoint reference for parameters, schemas, and responses; this table is a map. The Where column shows which environments implement each route. “Cloud + Local” means the same handler semantics on both; “Cloud only” / “Local only” mean the other side returns 501 (with a not_available_on_cloud / not_available_locally error code) or 404.

Endpoint	Method	Where	Description
`/v1/traces`	`POST`	Cloud + Local	Ingest trace data in OTLP format (supports gzip)
`/v1/traces`	`GET`	Cloud + Local	List traces with filtering. Supports `dataset_run_id` and `session_id` for scoped listings
`/v1/traces/search`	`POST`	Cloud only	Search traces with structured JSON filters (AND of predicates + OR-groups)
`/v1/traces/{traceId}`	`GET`	Cloud + Local	Get a single trace with all its spans
`/v1/traces/{traceId}/spans`	`GET`	Cloud + Local	List every span belonging to a trace
`/v1/traces/{traceId}/spans/{spanId}`	`GET`	Cloud + Local	Get full input/output payload for a single span
`/v1/traces/{traceId}/graph`	`GET`	Cloud + Local · deprecated	Use `/v1/traces/{traceId}?fields=graph` instead. Removal scheduled for October 21, 2026
`/v1/sessions`	`GET`	Cloud + Local	List sessions. `?search=` matches session ID or name (case-insensitive substring)
`/v1/sessions/{sessionId}/traces`	`GET`	Cloud + Local · deprecated	Use `/v1/traces?session_id={sessionId}` instead. Removal scheduled for October 21, 2026
`/v1/spans`	`GET`	Cloud + Local	Query spans across traces with filtering by type, status, model, and duration
`/v1/spans/search`	`POST`	Cloud only	Search spans with structured JSON filters (AND of predicates + OR-groups)
`/v1/spans/{spanId}`	`GET`	Cloud only	Get a single span by its globally unique ID, including metadata and input/output payload, without needing the trace ID
`/v1/scores`	`POST`	Cloud + Local	Create a score record for a span or trace
`/v1/scores/batch`	`POST`	Cloud + Local	Create up to 1000 scores in one request (per-item results, 207-style)
`/v1/scores`	`GET`	Cloud + Local	List scores for a specific span or trace
`/v1/scores/{scoreId}`	`GET`	Cloud + Local	Get a single score by ID
`/v1/scores/{scoreId}`	`DELETE`	Cloud + Local	Delete a score record
`/v1/scores/names`	`GET`	Cloud + Local	List distinct score names (for UI filters)
`/v1/scores/aggregations`	`GET`	Cloud only	Aggregated score statistics grouped by name
`/v1/scores/search`	`POST`	Cloud only	Search scores with structured JSON filters over name, score, source, user, resource, and time
`/v1/filter-schema`	`GET`	Cloud + Local	Machine-readable filter schema: fields, operators, and limits per searchable resource
`/v1/score-configs`	`GET`	Cloud + Local	List score configurations (reusable score schemas)
`/v1/score-configs/{name}`	`GET`	Cloud + Local	Get a single score configuration by name
`/v1/metrics`	`GET`	Cloud only	Aggregated analytics (trace volume, latency, cost, tokens, error rates)
`/v1/config`	`GET`	Cloud + Local	Retrieve the synced `agentmark.json` project configuration plus the current commit SHA
`/v1/datasets`	`GET`	Cloud + Local	List datasets with per-dataset metadata (`row_count`, `created_at`), an exact-match `?name=` filter on the leaf name (case-sensitive), and canonical `{ data, pagination }` envelope
`/v1/datasets/{datasetName}/rows`	`POST`	Cloud + Local	Append a canonical dataset row with `input`, `expected_output`, and `metadata`
`/v1/datasets/{datasetName}/rows/from-traces`	`POST`	Cloud + Local	Import one or more traces into canonical dataset rows using optional field mapping
`/v1/datasets/{datasetName}/rows/from-spans`	`POST`	Cloud + Local	Import one or more spans into canonical dataset rows using optional field mapping
`/v1/experiments`	`GET`	Cloud + Local	List experiments
`/v1/experiments/{experimentId}`	`GET`	Cloud + Local	Get an experiment by ID
`/v1/experiments/baseline`	`GET`	Cloud + Local	Get per-row baseline scores for an `experiment_key` + `tree_hash`, driving the `run-experiment` regression gate
`/v1/prompts`	`GET`	Cloud + Local	List prompt file paths. Local lists every prompt file in the project; Cloud answers `?name=` lookups only and returns `501` for the full listing.
`/v1/runs/{runId}/traces`	`GET`	Local only · deprecated	Use `/v1/traces?dataset_run_id={runId}` instead. Both paths hit the same predicate. Kept on Local for older SDK versions; Cloud returns `501`.
`/v1/capabilities`	`GET`	Cloud + Local	Check which features the server supports (no auth required)
`/v1/templates?path={filePath}`	`GET`	Cloud + Local	Retrieve a prompt template by file path. Required `path` query param must end in `.mdx` or `.jsonl`; the `promptKind` query param (`image`, `speech`, `text`, or `object`) applies to `.prompt.mdx` files only, and you omit it for datasets and components.
`/v1/pricing`	`GET`	Cloud + Local	Per-model LLM pricing data (no auth required)
`/v1/annotation-queues`	`GET` · `POST`	Cloud only	List / create annotation queues for human review
`/v1/annotation-queues/{queueId}`	`GET` · `PATCH` · `DELETE`	Cloud only	Read / update / delete a queue
`/v1/annotation-queues/{queueId}/items`	`GET` · `POST`	Cloud only	List items or add traces/spans/sessions to a queue
`/v1/annotation-queues/{queueId}/items/{itemId}`	`GET` · `PATCH` · `DELETE`	Cloud only	Read / update / remove a single queue item
`/v1/annotation-queues/{queueId}/items/{itemId}/reviews`	`POST`	Cloud only	Submit a review. LLM-as-judge pipelines can land annotations the same way human reviewers do
`/v1/api-keys`	`GET` · `POST`	Cloud only	List API keys (metadata only, no plaintext) or mint a new key. The `POST` response returns the plaintext value of a newly created key exactly once, and it’s unrecoverable afterward.
`/v1/api-keys/{apiKeyId}`	`DELETE`	Cloud only	Revoke an API key. The gateway rejects revoked keys immediately.
`/v1/apps`	`GET` · `POST`	Cloud only	List apps or create a new app
`/v1/apps/{appId}`	`GET` · `PATCH` · `DELETE`	Cloud only	Read, update, or delete an app
`/v1/apps/{appId}/git`	`GET`	Cloud only	Get the app’s git connection status
`/v1/apps/{appId}/git/connect`	`POST`	Cloud only	Start a git connection (install) flow for the app
`/v1/apps/{appId}/git/repositories`	`GET`	Cloud only	List repositories available to the app’s git connection
`/v1/apps/{appId}/git/branches`	`GET`	Cloud only	List branches for the connected repository
`/v1/apps/{appId}/git/link`	`POST` · `DELETE`	Cloud only	Link or unlink a repository/branch to the app
`/v1/alerts`	`GET` · `POST`	Cloud only	List alerts or create a new alert
`/v1/alerts/{alertId}`	`GET` · `PUT` · `DELETE`	Cloud only	Read, update, or delete an alert
`/v1/alerts/{alertId}/history`	`GET`	Cloud only	List trigger history for an alert
`/v1/alerts/slack-channels`	`GET`	Cloud only	List Slack channels available for alert notifications
`/v1/deployments`	`GET`	Cloud only	List deployments
`/v1/deployments/{deploymentId}`	`GET`	Cloud only	Get a single deployment by ID
`/v1/environments`	`GET` · `POST`	Cloud only	List environments or create a new environment
`/v1/environments/{id}`	`GET` · `DELETE`	Cloud only	Get or delete an environment by ID
`/v1/environments/{id}/deployments`	`GET`	Cloud only	List deployments for an environment
`/v1/environments/{id}/promote`	`POST`	Cloud only	Promote a deployment into the environment
`/v1/environments/{id}/rollback`	`POST`	Cloud only	Roll the environment back to a previous deployment
`/health`	`GET`	Cloud + Local	Root health check (no auth required)
`/v1/health/ingestion`	`GET`	Cloud only	Ingestion pipeline health with dependency statuses
`/v1/health/files`	`GET`	Cloud only	Files service health with dependency statuses

Use the sidebar to browse interactive documentation for each endpoint.

Two programmatic surfaces, same OpenAPI spec under the hood:

From shell / CI: call the REST endpoints with curl and an AGENTMARK_API_KEY (or the session bearer from ~/.agentmark/auth.json after agentmark login).
From an IDE agent: run the agentmark-mcp MCP server. It fetches this spec at startup and exposes one MCP tool per operation (for example, list_traces, create_app, start_app_git_connect), so your Claude Code / Cursor / etc. agent can drive the gateway headlessly.

Response format

All responses are JSON unless otherwise noted (for example, CSV exports). Error responses follow a consistent canonical envelope:

{
  "error": {
    "code": "string_snake_case",
    "message": "Description of what went wrong"
  }
}

The error.code field is the programmatic discriminator: use it to branch on specific error cases. The error.message field is the human-readable description to show to users. Additional context (for example, retry_after_seconds, jobId) appears as extra fields directly inside error, alongside code and message. The one exception is 400 validation errors, which nest per-field messages in an error.details map (see Authentication):

{
  "error": {
    "code": "span_limit_exceeded",
    "message": "Monthly span limit of 20000 reached. Upgrade your plan to raise this limit.",
    "currentCount": 20000,
    "limit": 20000,
    "upgradeUrl": "https://app.agentmark.co/settings/billing"
  }
}

The shape matches Stripe, OpenAI, and Anthropic error conventions, so one parser works across all endpoints.

Rate limiting

Requests are rate-limited per tenant. When you exceed your rate limit, the API returns a 429 status code. Trace ingestion has additional monthly span and storage quotas depending on your plan. See Authentication for details.

Versioning

Every endpoint carries the /v1/ prefix. Breaking changes ship under new version prefixes (/v2/, etc.) with a 90+ day deprecation window, so /v1/ keeps working while you migrate. See API versioning & stability for the full policy on what’s breaking, what’s additive, and how AgentMark announces deprecations.

Why there is no `PATCH /v1/traces`

Traces are immutable in AgentMark. Once AgentMark stores a span, the row representing what happened during that execution becomes permanent. No endpoint mutates it. Other observability platforms expose a “patch trace” endpoint that lets clients backfill metadata, attach a label, or correct a field after ingestion. AgentMark covers those workflows through three separate, append-only resources instead:

Scores (POST /v1/scores, POST /v1/scores/batch): attach a graded value (numeric, categorical, or boolean) to a trace or span after the fact. AgentMark versions scores by created_at, and they never overwrite the underlying span.
Comments: free-form human notes on a trace or span, stored alongside the trace as a separate resource.
Annotation queues (/v1/annotation-queues/*): structured human-in-the-loop review that produces new score and comment records, again without modifying the trace itself.

The three resources above are the migration targets for any “patch trace” workflow you’d build on a competitor. This split is intentional: it keeps the audit trail clean (you can always tell what the model did versus what a reviewer added later) and lets retention, RBAC, and export rules apply differently to raw execution data than to human-attached metadata. This is a permanent design choice, not a missing feature. PATCH /v1/traces won’t ship in /v1/, /v2/, or any future version.

The `filter` query grammar

GET /v1/traces and GET /v1/spans accept a filter query parameter, a human-readable string expression. One expression composes across both surfaces: write it once, reuse it for trace and span listings. A filter is one or more clauses combined with and. A clause is a single predicate, or a parenthesized OR-group of predicates:

filter      = clause ("and" clause)*
clause      = predicate | "(" predicate ("or" predicate)* ")"
predicate   = field operator [value]

or is only valid inside parentheses and and only outside them, so there are no precedence rules to learn. Groups don’t nest. URL-encode the whole expression. For example, you send filter=metadata.env = "prod" and status = ERROR as:

GET /v1/traces?filter=metadata.env%20%3D%20%22prod%22%20and%20status%20%3D%20ERROR

Operators

Operator	Meaning	Applies to
`=` `!=`	equals / not equals	all fields
`>` `>=` `<` `<=`	numeric comparison	numeric fields
`contains` `not contains`	substring match	string + `metadata.*`; `contains` also applies to `tags` (tag membership)
`starts with` `ends with`	prefix / suffix match	string + `metadata.*`
`exists` `does not exist`	key present / absent (no value)	`metadata.*`

Quote any value containing spaces or special characters ("..." or '...'). Bare values are fine for simple tokens (status = ERROR, cost > 0.01).

Fields

Field	Kind	Operators
`model`, `user_id`, `session_id`, `trace_id`, `prompt_name`, `input`, `output`, `props`, `semantic_kind`	string	`=` `!=` `contains` `not contains` `starts with` `ends with`
`latency_ms`, `cost`, `prompt_tokens`, `completion_tokens`	numeric	`=` `!=` `>` `>=` `<` `<=`
`status`	enum (`OK` / `ERROR`)	`=`
`tags`	array	`=` `!=` `contains`
`metadata.<key>`	custom metadata	all string operators + `exists` / `does not exist`
`score__<name>`	evaluation score	numeric operators

<key> matches [a-zA-Z_][a-zA-Z0-9_]{0,63}. Up to 20 predicates per request, counting every predicate inside OR-groups.

Examples

GET /v1/traces?filter=metadata.debug_screenshot_url exists
GET /v1/traces?filter=metadata.debug_screenshot_url = "https://example.com/x"
GET /v1/traces?filter=cost > 0.01 and latency_ms <= 2000
GET /v1/spans?filter=model starts with "gpt-4" and metadata.env = "prod"
GET /v1/traces?filter=(model = "gpt-4o" or model = "o3") and status = ERROR

A malformed or unsupported filter returns 400 with an invalid_filter code and a message describing the problem. It’s never silently ignored.

Structured JSON filters (search endpoints)

POST /v1/traces/search, POST /v1/spans/search, and POST /v1/scores/search accept the same filters as a JSON request body, the form to use when building filters programmatically (SDKs, agents) instead of string-assembling DSL expressions:

{
  "filters": [
    { "field": "status", "operator": "equals", "value": "ERROR" },
    { "or": [
      { "field": "model", "operator": "equals", "value": "gpt-4o" },
      { "field": "model", "operator": "equals", "value": "o3" }
    ]},
    { "field": "latency_ms", "operator": "between", "value": [1000, 5000] }
  ],
  "limit": 50,
  "sort_by": "start_time",
  "sort_order": "desc"
}

Semantics mirror the string DSL exactly: the list is an AND of clauses, a clause is a predicate or a one-level OR-group, and both forms compile to the same query. Operator names are the canonical camelCase set (equals, notEquals, contains, notContains, startsWith, endsWith, gt, gte, lt, lte, exists, doesNotExist). The JSON form additionally supports membership and range operators that have no DSL syntax:

Operator	Value	Meaning
`in`	array (≤ 50 values)	field matches any listed value (`tags`: trace carries any listed tag)
`notIn`	array (≤ 50 values)	field matches none of the listed values
`between`	`[min, max]`	inclusive range (numeric fields; `created_at` on scores)

Trace/span search uses the field set above; score search filters on name, score, source, user_id, resource_id, label, and created_at. GET /v1/filter-schema returns the full machine-readable schema (fields, operators, and limits per resource), so a client (or an agent via MCP) can construct valid filters without trial-and-error. Search endpoints apply guardrails the frozen GET contracts don’t: requests default to the last 7 days when start_date is unset, the maximum window is 90 days, and the routes are rate-limited per tenant. The search endpoints are Cloud-only (the local dev server answers them with 501 not_available_locally), but the gateway serves GET /v1/filter-schema on both surfaces from the same generated contract.

The filter string grammar (GET endpoints) and the JSON form (POST search endpoints) are the same filter language in two encodings; use whichever fits the call site. JSON filters are only accepted in POST request bodies, never as a GET query parameter.

Reading custom metadata

The read endpoints return custom metadata you attach at ingestion (OTLP agentmark.metadata.* attributes, or the SDK’s metadata option):

GET /v1/traces/{traceId}: a trace-level metadata object (the root span’s metadata) and a per-span metadata object on each entry of spans.
GET /v1/traces/{traceId}/spans/{spanId}: a metadata object alongside the span’s input / output.
GET /v1/spans and GET /v1/spans/{spanId}: metadata on each span.

The endpoints return metadata as a flat string→string object. The public metadata object excludes reserved internal namespaces (such as graph.node.*, which GET /v1/traces/{traceId}?fields=graph surfaces separately).

Reading traces requires an API key with the trace.read permission (and span.read / session.read for the span/session endpoints). Write-only SDK keys (trace.write + score.write) can ingest traces and post scores but can’t read traces back. A programmatic consumer (CI pipeline, agent) that needs readback must use a key minted with the read permissions (the read-only or full-access preset, or check trace.read in the key’s permission picker). This is least-privilege by design: it keeps a leaked write-only SDK key from exfiltrating trace contents.

/v1/scores accepts session_id (scope scores to a session), alongside start_date, end_date, and source.

​Base URL

​Available endpoints

​Response format

​Rate limiting

​Versioning

​Why there is no PATCH /v1/traces

​The filter query grammar

​Operators

​Fields

​Examples

​Structured JSON filters (search endpoints)

​Reading custom metadata

Base URL

Available endpoints

Response format

Rate limiting

Versioning

Why there is no `PATCH /v1/traces`

The `filter` query grammar

Operators

Fields

Examples

Structured JSON filters (search endpoints)

Reading custom metadata