What is AgentMark?
AgentMark is a prompt engineering and LLM observability platform for teams building AI agents. It covers the full lifecycle: create prompts, run them against any model, trace every execution, evaluate quality, and monitor production. Unlike most AI platforms, AgentMark doesn’t require a cloud account to get started. Your prompts live in your codebase as.prompt.mdx files, traces stay on your machine, and evaluations run from your terminal. The AgentMark Dashboard adds visual editing, rich trace exploration, team collaboration, and production monitoring — when you want it.
Two Ways to Work
Local
Everything on your machine. Create prompts as files, run them via SDK or CLI, trace executions locally, run evaluations from your terminal. No account needed. No data leaves your environment.
Cloud
Visual tools and collaboration on top of your local workflow. A prompt editor, trace explorer, dashboards, alerts, annotations, and team management — accessible from any browser.
What You Can Do
Build Prompts
Create prompts as.prompt.mdx files in your editor, or use the visual editor in the Dashboard. Both produce the same format — you can switch between them freely.
- TemplateDX syntax with variables, expressions, logic, and reusable components
- Multiple output types: text, structured objects, images, and speech
- Tools and function calling for agentic workflows
- Version control built in — every change tracked with history and rollback
Evaluate Quality
Run evaluators from code or CLI to score outputs automatically. Use the Dashboard for human annotations and shared experiment results.- Datasets for bulk testing against input/output pairs
- Custom evaluators — numeric scores, pass/fail, classifications, LLM-as-judge
- Experiments to compare prompt versions and track performance over time
- Annotations for human-in-the-loop scoring and labeling
Observe in Production
Instrument with the SDK to capture traces automatically. Explore them in your terminal (local) or in the Dashboard with filtering, search, dashboards, and alerts.- Distributed tracing built on OpenTelemetry — tracks inference spans, tool calls, streaming
- Sessions to group related traces across multi-turn conversations
- Cost and token tracking across models and time periods
- Alerts for latency spikes, cost thresholds, error rates, and quality drops
Integrate with Your Stack
AgentMark works with the tools you already use.- TypeScript: Vercel AI SDK, Claude Agent SDK, Mastra
- Python: Pydantic AI, LlamaIndex
- Any framework via custom adapters and OpenTelemetry
Get Started
Quickstart
Create your first prompt and see traces in under 5 minutes
Core Concepts
Organizations, apps, branches, and how they fit together
Have Questions?
We’re here to help! Choose the best way to reach us:
- Email us at hello@agentmark.co for support
- Schedule an Enterprise Demo to learn about our business solutions