Skip to main content

What is AgentMark?

AgentMark is a prompt engineering and LLM observability platform for teams building AI agents. It covers the full lifecycle: create prompts, run them against any model, trace every execution, evaluate quality, and monitor production. Unlike most AI platforms, AgentMark doesn’t require a cloud account to get started. Your prompts live in your codebase as .prompt.mdx files, traces stay on your machine, and evaluations run from your terminal. The AgentMark Dashboard adds visual editing, rich trace exploration, team collaboration, and production monitoring — when you want it.

Two Ways to Work

Local

Everything on your machine. Create prompts as files, run them via SDK or CLI, trace executions locally, run evaluations from your terminal. No account needed. No data leaves your environment.

Cloud

Visual tools and collaboration on top of your local workflow. A prompt editor, trace explorer, dashboards, alerts, annotations, and team management — accessible from any browser.
Most teams start local and add Cloud as they grow. Some stay local-only. Both are fully supported. Developers always get the local experience. Adding Cloud is seamless — once you sync your app, everything just works automatically.

What You Can Do

Build Prompts

Create prompts as .prompt.mdx files in your editor, or use the visual editor in the Dashboard. Both produce the same format — you can switch between them freely.
  • TemplateDX syntax with variables, expressions, logic, and reusable components
  • Multiple output types: text, structured objects, images, and speech
  • Tools and function calling for agentic workflows
  • Version control built in — every change tracked with history and rollback
Learn more about Build

Evaluate Quality

Run evaluators from code or CLI to score outputs automatically. Use the Dashboard for human annotations and shared experiment results.
  • Datasets for bulk testing against input/output pairs
  • Custom evaluators — numeric scores, pass/fail, classifications, LLM-as-judge
  • Experiments to compare prompt versions and track performance over time
  • Annotations for human-in-the-loop scoring and labeling
Learn more about Evaluate

Observe in Production

Instrument with the SDK to capture traces automatically. Explore them in your terminal (local) or in the Dashboard with filtering, search, dashboards, and alerts.
  • Distributed tracing built on OpenTelemetry — tracks inference spans, tool calls, streaming
  • Sessions to group related traces across multi-turn conversations
  • Cost and token tracking across models and time periods
  • Alerts for latency spikes, cost thresholds, error rates, and quality drops
Learn more about Observe

Integrate with Your Stack

AgentMark works with the tools you already use.
  • TypeScript: Vercel AI SDK, Claude Agent SDK, Mastra
  • Python: Pydantic AI, LlamaIndex
  • Any framework via custom adapters and OpenTelemetry
Learn more about Integrations

Get Started

Quickstart

Create your first prompt and see traces in under 5 minutes

Core Concepts

Organizations, apps, branches, and how they fit together

Have Questions?

We’re here to help! Choose the best way to reach us: