What is AgentMark? - AgentMark Docs

AgentMark is a prompt engineering and LLM observability platform for teams building AI agents. It covers the full lifecycle: create prompts, run them against any model, trace every execution, evaluate quality, and monitor production. Unlike most AI platforms, AgentMark doesn’t require a cloud account to get started. Your prompts live in your codebase as .prompt.mdx files, traces stay on your machine, and evaluations run from your terminal. AgentMark Cloud adds visual editing, rich trace exploration, team collaboration, and production monitoring — when you want it.

Two ways to work

Local

Everything on your machine. Create prompts as files, run them via SDK or CLI, trace executions locally, run evaluations from your terminal. No account needed. No data leaves your environment.

Cloud

Visual tools and collaboration on top of your local workflow. A prompt editor, trace explorer, dashboards, alerts, annotations, and team management — accessible from any browser.

Most teams start local and add Cloud as they grow. Some stay local-only. Both are fully supported. See pricing for Cloud tier details. The local workflow keeps working after you adopt Cloud. Your .prompt.mdx files, local traces, and agentmark dev stay unchanged — Cloud is additive.

What you can do

Build prompts

Create prompts as .prompt.mdx files in your editor, or use the visual editor in the Dashboard. Both produce the same format — you can switch between them freely.

TemplateDX syntax with variables, expressions, logic, and reusable components
Multiple output types: text, structured objects, images, and speech
Tools and function calling for agentic workflows
Version control built in — every change tracked with history and rollback

Learn more about Build

Evaluate quality

Run evaluators from code or CLI to score outputs automatically. Use the Dashboard for human annotations and shared experiment results.

Datasets for bulk testing against input/output pairs
Custom evaluators — numeric scores, pass/fail, classifications, LLM-as-judge
Experiments to compare prompt versions and track performance over time
Annotations for human-in-the-loop scoring and labeling

Learn more about Evaluate

Observe in production

Instrument with the SDK to capture traces automatically. Explore them in your terminal (local) or in the Dashboard with filtering, search, dashboards, and alerts.

Distributed tracing built on OpenTelemetry — tracks inference spans, tool calls, streaming
Sessions to group related traces across multi-turn conversations
Cost and token tracking across models and time periods
Alerts for latency spikes, cost thresholds, error rates, and quality drops
REST API for programmatic access to traces, scores, and metrics
agentmark-mcp MCP server exposes the gateway as MCP tools — works with both the local dev server and Cloud, and is what your IDE agent (Claude Code, Cursor, …) uses to query AgentMark headlessly

Learn more about Observe

Integrate with your stack

AgentMark works with the tools you already use.

TypeScript: Vercel AI SDK, Claude Agent SDK, Mastra
Python: Pydantic AI, Claude Agent SDK
Any framework via custom adapters and OpenTelemetry

Learn more about Integrations

Get started

Quickstart

Create your first prompt and see traces in under 5 minutes

Core Concepts

Organizations, apps, branches, and how they fit together

API Reference

Query traces, scores, and metrics via REST API

CLI Reference

Manage prompts, run evals, and query the API from your terminal

Have Questions?

We’re here to help! Choose the best way to reach us:

Email us at hello@agentmark.co for support
Schedule an Enterprise Demo to learn about our business solutions

​Two ways to work

Local

Cloud

​What you can do

​Build prompts

​Evaluate quality

​Observe in production

​Integrate with your stack

​Get started