Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agentmark.co/llms.txt

Use this file to discover all available pages before exploring further.

The AgentMark Gateway API provides direct HTTP access to trace ingestion, scoring, and template retrieval.
Most developers should use the AgentMark SDK instead of calling the REST API directly. The SDK handles authentication, retries, and serialization automatically.

Base URL

https://api.agentmark.co
The local dev server (npx agentmark dev) and Cloud share the same /v1/* wire contract — the same Zod schemas are the source of truth on both surfaces. What differs is which handlers are implemented where:
  • Both surfaces: /v1/config, /v1/traces (ingest + read), /v1/sessions, /v1/spans, /v1/scores (full CRUD + batch), /v1/datasets, /v1/experiments, /v1/templates, /v1/capabilities, /v1/pricing.
  • Cloud-only (local returns 404 or a 501 not_available_locally stub): /v1/metrics, /v1/scores/aggregations, /v1/traces/export, annotation queues (/v1/annotation-queues/*), and the health endpoints (/health, /v1/health/*).
  • Local-only (Cloud returns 501 not_available_on_cloud): /v1/prompts lists the prompt files on disk — the Cloud handler is a documented stub pending an implementation decision.
  • Deprecated: /v1/runs/{runId}/traces still works on local for backwards compatibility with older SDK versions, but new code should use /v1/traces?dataset_run_id={runId} — both paths hit the same ClickHouse predicate. The Cloud endpoint has always returned 501.
Call GET /v1/capabilities to probe which features a server supports at runtime. All endpoints are prefixed with /v1/ except the root health check.

Available endpoints

The Where column shows which environments implement each route. “Cloud + Local” means the same handler semantics on both; “Cloud only” / “Local only” mean the other side returns 501 (with a not_available_on_cloud / not_available_locally error code) or 404.
EndpointMethodWhereDescription
/v1/tracesPOSTCloud + LocalIngest trace data in OTLP format (supports gzip)
/v1/tracesGETCloud + LocalList traces with filtering — supports dataset_run_id for run-scoped listings
/v1/traces/{traceId}GETCloud + LocalGet a single trace with all its spans
/v1/traces/{traceId}/spansGETCloud + LocalList every span belonging to a trace
/v1/traces/{traceId}/spans/{spanId}GETCloud + LocalGet full input/output payload for a single span
/v1/traces/{traceId}/graphGETCloud + LocalReturn nodes + edges for visualizing a trace’s agent-execution flow
/v1/traces/exportGETCloud onlyExport traces as JSONL, CSV, or OpenAI fine-tuning format
/v1/sessionsGETCloud + LocalList sessions with filtering by name and user
/v1/sessions/{sessionId}/tracesGETCloud + LocalList traces for a specific session
/v1/spansGETCloud + LocalQuery spans across traces with filtering by type, status, model, and duration
/v1/scoresPOSTCloud + LocalCreate a score record for a span or trace
/v1/scores/batchPOSTCloud + LocalCreate up to 1000 scores in one request (per-item results, 207-style)
/v1/scoresGETCloud + LocalList scores for a specific span or trace
/v1/scores/{scoreId}GETCloud + LocalGet a single score by ID
/v1/scores/{scoreId}DELETECloud + LocalDelete a score record
/v1/scores/namesGETCloud + LocalList distinct score names (for UI filters)
/v1/scores/aggregationsGETCloud onlyAggregated score statistics grouped by name
/v1/score-configsGETCloud + LocalList score configurations (reusable score schemas)
/v1/score-configs/{name}GETCloud + LocalGet a single score configuration by name
/v1/metricsGETCloud onlyAggregated analytics (trace volume, latency, cost, tokens, error rates)
/v1/configGETCloud + LocalRetrieve the synced agentmark.json project configuration plus the current commit SHA
/v1/datasetsGETCloud + LocalList datasets with per-dataset metadata (row_count, created_at), case-insensitive ?name= substring filter, and canonical { data, pagination } envelope
/v1/datasets/{datasetName}/rowsPOSTCloud + LocalAppend a canonical dataset row with input, expected_output, and metadata
/v1/datasets/{datasetName}/rows/from-tracesPOSTCloud + LocalImport one or more traces into canonical dataset rows using optional field mapping
/v1/datasets/{datasetName}/rows/from-spansPOSTCloud + LocalImport one or more spans into canonical dataset rows using optional field mapping
/v1/experimentsGETCloud + LocalList experiments
/v1/experiments/{experimentId}GETCloud + LocalGet an experiment by ID
/v1/promptsGETLocal onlyList prompt file paths in the project. Cloud returns 501 not_available_on_cloud.
/v1/runs/{runId}/tracesGETLocal only · deprecatedUse /v1/traces?dataset_run_id={runId} instead — both paths hit the same predicate. Kept on Local for older SDK versions; Cloud returns 501.
/v1/capabilitiesGETCloud + LocalCheck which features the server supports (no auth required)
/v1/templates/{templatePath}GETCloud + LocalRetrieve a prompt template by file path
/v1/pricingGETCloud + LocalPer-model LLM pricing data (no auth required)
/v1/annotation-queuesGET · POSTCloud onlyList / create annotation queues for human review
/v1/annotation-queues/{queueId}GET · PATCH · DELETECloud onlyRead / update / delete a queue
/v1/annotation-queues/{queueId}/itemsGET · POSTCloud onlyList items or add traces/spans/sessions to a queue
/v1/annotation-queues/{queueId}/items/{itemId}GET · PATCH · DELETECloud onlyRead / update / remove a single queue item
/v1/annotation-queues/{queueId}/items/{itemId}/reviewsPOSTCloud onlySubmit a review — LLM-as-judge pipelines can land annotations the same way human reviewers do
/v1/api-keysGET · POSTCloud onlyList API keys (metadata only — no plaintext) or mint a new key. The plaintext value of a newly created key is returned exactly once in the POST response and is unrecoverable afterward.
/v1/api-keys/{apiKeyId}DELETECloud onlyRevoke an API key. Revoked keys are rejected immediately.
/v1/connectGET (WebSocket upgrade)Cloud onlyPersistent connection for deployed workers to receive dispatched jobs.
/healthGETCloud onlyRoot health check (no auth required)
/v1/health/ingestionGETCloud onlyIngestion pipeline health with dependency statuses
/v1/health/filesGETCloud onlyFiles service health with dependency statuses
Use the sidebar to browse interactive documentation for each endpoint.
You can also access these endpoints from the command line using npx agentmark api. By default it targets the local dev server; pass --remote for Cloud. Run npx agentmark api __schema to discover available resources (requires a running server). See the CLI reference for details.

Response format

All responses are JSON unless otherwise noted (e.g., CSV exports). Error responses follow a consistent canonical envelope:
{
  "error": {
    "code": "string_snake_case",
    "message": "Description of what went wrong"
  }
}
The error.code field is the programmatic discriminator — use it to branch on specific error cases. The error.message field is the human-readable description to show to users. Additional context (e.g. retry_after_seconds, jobId) appears as a sibling details object inside the error:
{
  "error": {
    "code": "span_limit_exceeded",
    "message": "Monthly unit limit exceeded. Upgrade your plan for unlimited units.",
    "details": {
      "currentCount": 20000,
      "limit": 20000,
      "upgradeUrl": "https://app.agentmark.co/settings/billing"
    }
  }
}
The shape matches Stripe, OpenAI, and Anthropic error conventions — one parser works across all endpoints.

Rate limiting

Requests are rate-limited per tenant. When you exceed your rate limit, the API returns a 429 status code. Trace ingestion has additional monthly span and storage quotas depending on your plan. See Authentication for details.

Versioning

Every endpoint is prefixed with /v1/. Breaking changes ship under new version prefixes (/v2/, etc.) with a 90+ day deprecation window — /v1/ keeps working while you migrate. See API versioning & stability for the full policy on what’s breaking, what’s additive, and how deprecations are announced.

Why there is no PATCH /v1/traces

Traces are immutable in AgentMark. Once a span lands in ClickHouse, the row representing what happened during that execution is frozen — there is deliberately no endpoint that mutates it. Other observability platforms expose a “patch trace” endpoint that lets clients backfill metadata, attach a label, or correct a field after ingestion. AgentMark covers those workflows through three separate, append-only resources instead:
  • Scores (POST /v1/scores, POST /v1/scores/batch) — attach a graded value (numeric, categorical, or boolean) to a trace or span after the fact. Scores are versioned by created_at and never overwrite the underlying span.
  • Comments — free-form human notes on a trace or span, stored alongside the trace as a separate resource.
  • Annotation queues (/v1/annotation-queues/*) — structured human-in-the-loop review that produces new score and comment records, again without modifying the trace itself.
The three resources above are the migration targets for any “patch trace” workflow you’d build on a competitor. This split is intentional: it keeps the audit trail clean (you can always tell what the model did vs. what a reviewer added later) and lets retention, RBAC, and export rules apply differently to raw execution data than to human-attached metadata. This is a permanent design choice, not a missing feature — PATCH /v1/traces will not ship in /v1/, /v2/, or any future version.

Filtering on /v1/spans and /v1/scores

/v1/spans and /v1/scores accept the same filter vocabulary as /v1/traces. The point is that one filter expression composes across surfaces — write it once, reuse it for trace listings, span listings, score listings, and saved-filter exports. /v1/spans accepts:
  • start_date, end_date — ISO 8601 timestamps. Inclusive on both ends.
  • user_id, session_id — scope the result to a specific user or session.
  • filter — a JSON-encoded filter DSL, identical to the one /v1/traces accepts. Example:
    GET /v1/spans?filter=%7B%22op%22%3A%22and%22%2C%22exprs%22%3A%5B%7B%22field%22%3A%22model%22%2C%22op%22%3A%22eq%22%2C%22value%22%3A%22gpt-4o%22%7D%2C%7B%22field%22%3A%22latency_ms%22%2C%22op%22%3A%22gt%22%2C%22value%22%3A2000%7D%5D%7D
    
    Decoded:
    {
      "op": "and",
      "exprs": [
        { "field": "model", "op": "eq", "value": "gpt-4o" },
        { "field": "latency_ms", "op": "gt", "value": 2000 }
      ]
    }
    
    See Filtering & search for the full operator list.
/v1/scores accepts session_id (newly added — scope scores to a session), alongside start_date, end_date, and source which were already supported.