Skip to main content
AgentMark helps you build reliable AI agents: write prompts as code, run them against any model, trace every execution, score the results, and monitor what ships to production. Prompts, evals, and datasets live in your codebase; traces start on your machine; nothing requires an account. When you want it, AgentMark Cloud adds visual editing, rich trace exploration, team collaboration, and production monitoring.

Two ways to work

Local

Everything on your machine. Create prompts as files, run them via SDK or CLI, trace executions locally, run evaluations from your terminal. No account needed. No data leaves your environment.

Cloud

Edit prompts in a browser, explore traces visually, share dashboards and annotations with your team, and get alerted in production, all on top of your local workflow. No setup beyond connecting a repo.
Most teams start local and add Cloud as they grow. Some stay local-only. Both are fully supported. See pricing for Cloud tier details. The local workflow keeps working after you adopt Cloud. Your .prompt.mdx files, local traces, and agentmark dev stay unchanged. Cloud is additive. Team and Enterprise tiers add SSO (SAML), custom roles, and app-level permissions. For data residency options, contact us.

What you can do

Build agents

Create prompts as .prompt.mdx files in your editor, or use the visual editor in the Dashboard. Both produce the same format, so you can switch between them freely.
  • TemplateDX syntax with variables, expressions, logic, and reusable components
  • Multiple output types: text, structured objects, images, and speech
  • Tools and function calling for agentic workflows
  • Version control built in: every change tracked with history and rollback
Learn more about Build

Evaluate quality

Run evaluators from code or CLI to score outputs automatically. Use the Dashboard for human annotations and shared experiment results.
  • Datasets for bulk testing against input/output pairs
  • Custom evaluators: numeric scores, pass/fail, classifications, LLM-as-judge
  • Experiments to compare prompt versions and track performance over time
  • Annotations for human-in-the-loop scoring and labeling
Learn more about Evaluate

Observe in production

Add the SDK to your app and it captures a trace of every execution automatically, no manual logging. View traces in your terminal locally, or open the Dashboard to search, filter, chart, and set alerts on them.
  • Distributed tracing built on OpenTelemetry, tracking inference spans, tool calls, and streaming
  • Sessions to group related traces across multi-turn conversations
  • Cost and token tracking across models and time periods
  • Alerts for latency spikes, cost thresholds, error rates, and quality drops
  • REST API for programmatic access to traces, scores, and metrics
  • agentmark-mcp MCP server exposes the gateway as MCP tools. It works with both the local dev server and Cloud, and is what your IDE agent (Claude Code, Cursor) uses to query AgentMark headlessly
Learn more about Observe

Integrate with your stack

AgentMark works with the tools you already use.
  • TypeScript: Vercel AI SDK, Claude Agent SDK, Mastra
  • Python: Pydantic AI, Claude Agent SDK
  • Any framework via custom adapters and OpenTelemetry
Learn more about Integrations

Get started

Quickstart

Create your first prompt and see traces in under 5 minutes

Core Concepts

Organizations, apps, branches, and how they fit together

API Reference

Query traces, scores, and metrics via REST API

CLI Reference

Manage prompts, run evals, and query the API from your terminal

Have Questions?

We’re here to help! Choose the best way to reach us: