Skip to main content
2026-06-04
Custom Metadata in the Read API + String Filter Grammar

Features & Improvements

  • Custom metadata is now readable via the APIGET /v1/traces/{traceId} returns a trace-level metadata object plus per-span metadata; GET /v1/traces/{traceId}/spans/{spanId} and GET /v1/spans return span metadata alongside the I/O. Whatever you attach at ingestion (OTLP agentmark.metadata.* attributes or the SDK metadata option) is now retrievable programmatically and through the agentmark-mcp trace tools. Reserved internal namespaces (for example, graph.node.*) are excluded from the public object.
  • Filter by metadata.* over the API — the filter parameter on GET /v1/traces and GET /v1/spans now honors metadata.<key> predicates with exists, does not exist, =, and the string operators. See the filter grammar.
  • String filter grammarfilter is now a readable expression, for example filter=metadata.env = "prod" and status = ERROR. A malformed or unsupported filter returns a 400 with a clear message instead of being silently ignored.

Breaking Changes

  • The filter parameter no longer accepts the previous JSON-array form; use the string grammar.
2026-04-15
Local/Cloud API Parity, New API Endpoints for Datasets, Experiments, Prompts, and Runs

Features & Improvements

  • Local/cloud API parity — The local dev server now supports the same API endpoints as the cloud gateway, so you can develop and test API integrations without a cloud account
  • New API endpoints for datasets, experiments, prompt logs, and runs — query evaluation data programmatically via REST API or agentmark api CLI
  • New capabilities endpoint to check which features a server supports, useful when writing code that targets both local and cloud environments
  • The agentmark api CLI command now supports datasets, experiments, prompts, runs, and capabilities resources
2026-04-11
REST API Reference, Score Analytics Widgets, PII Masking

Features & Improvements

  • New API Reference tab with full REST API documentation for the AgentMark Gateway
  • Interactive endpoint docs for traces, scoring, templates, and health checks generated from an OpenAPI 3.1 spec
  • Authentication guide covering API key setup, request examples in curl, TypeScript, and Python, plus rate limit and quota details
  • New agentmark api CLI command for querying the gateway API directly from your terminal — auto-generated from the OpenAPI spec, targets local dev server by default or cloud with --remote (both the CLI command and --remote were retired in CLI 0.13/0.14; use the agentmark-mcp MCP server or direct REST instead)
  • Score analytics widgets — four new widget types for custom dashboards: summary cards, score distribution histograms, trend-over-time charts, and cross-score comparison (confusion matrix or scatter plot). Start from the Score Analytics template or add individual widgets to any dashboard.
  • PII masking — redact sensitive data from traces before they leave your application. Available in both TypeScript and Python SDKs with a built-in masker for common patterns (email, phone, SSN, credit card, IP address), custom regex patterns, and zero-code environment variable suppression via AGENTMARK_HIDE_INPUTS / AGENTMARK_HIDE_OUTPUTS. Masking runs client-side with fail-closed behavior — no unmasked data ever reaches the network.
2026-04-10
Prompt Playground with Multi-Variant Comparison Mode

Features & Improvements

  • New Playground for comparing prompts across multiple models and configurations side-by-side
  • Run up to 6 variants simultaneously with independent model, temperature, and prompt override settings
  • Per-variant prompt editing with a “Modified” badge to highlight customized text
  • Output metadata chips showing model name, latency, token usage, and finish reason for each variant
  • One-click Apply to write the winning variant’s configuration back to the editor
  • Responsive 3-column grid layout that scales from 2 to 6 variants
2026-04-10
Semantic Span Kinds, Observe Function, Span Kind Filters and Analytics

Features & Improvements

  • Expanded SpanKind enum from 3 to 7 values — added agent, retrieval, embedding, and guardrail span kinds for richer trace categorization
  • New observe() function for wrapping functions with automatic input/output capture and span kind tagging (available in both TypeScript and Python SDKs)
  • New “Span Kind” filter in the trace list — filter traces by function, llm, tool, agent, retrieval, embedding, or guardrail
  • New span kind breakdown analytics endpoint — view cost, latency, and token usage grouped by span type
  • Graph view now renders embedding and guardrail nodes with dedicated icons and colors
  • Cross-platform compatibility — the SDK now sets both agentmark.span.kind and openinference.span.kind attributes for portability with Langfuse and Phoenix

Bug Fixes

  • Embedding spans are now correctly tracked for cost analytics (previously excluded)
2026-03-09
Custom Roles & Permissions, Trace Search, Custom Dashboards, Billing Tiers, and More

Features & Improvements

  • Custom roles & granular permissions — create custom roles with fine-grained permission controls, prerequisite enforcement, and app-level role assignments
  • Permission-aware navigation — UI elements and pages automatically show/hide based on user permissions
  • Entitlement-gated permissions — permissions are automatically adjusted when billing plans change
  • Trace search & saved filters — search trace inputs/outputs, filter by span attributes, and save filter presets for quick access
  • Saved views with aggregate groupBy on the requests page
  • Custom analytics dashboards — build personalized dashboards with configurable widgets
  • Eval score chips on trace tree span nodes for at-a-glance evaluation results
  • Rich hover tooltips on graph view nodes with metadata display
  • Trace I/O aggregation — generation span inputs/outputs surfaced at the trace root level
  • Streamlined publish UI — single-button publish flow with async confirmation dialog
  • Tiered billing plans — Team plan card, tier-aware checkout, and per-tier pricing
  • Tiered span limits and query-time data retention enforcement
  • Tiered feature gating system for plan-based access control
  • Alert commit SHA tracking — see which deployment triggered an alert
  • Automated model configuration sync — new models are auto-configured from the registry
  • Auto-generated editor schemas from the model registry
  • XML tag passthrough — lowercase XML tags now pass through in prompt content
  • Webhook secret masking for improved security
  • SSO SAML attribute mapping for user display names
  • Span usage notifications — improved delivery reliability
  • Billing confirmation dialog before plan upgrades
  • Trace tunnel — agentmark dev for one-step local-to-platform trace forwarding

Bug Fixes

  • Fixed experiments page crash when the analytics backend is unavailable
  • Fixed trace view I/O tab not showing data for root traces
  • Fixed trace view tooltip crash on large JSON payloads
  • Fixed invite flow not persisting correctly for custom roles
  • Fixed profile visibility — tenant members can now view all profiles
  • Fixed MCP trace connection issues
  • Fixed deployment trigger button permission gating
  • Fixed score fetch errors silently failing in trace view
2026-02-04
Terms of Service, Admin Control Panel, Profile Updates, MCP API, and More

Features & Improvements

  • Terms of Service agreement — users are now prompted to accept terms before using the platform
  • Non-blocking terms update policy — users are notified of terms changes without being blocked
  • Streamlined registration — skip company setup during onboarding for faster account creation
  • Admin Control Panel — platform admins can manage users, roles, and organization settings
  • Profile email updates — update your profile email directly from account settings
  • Email sync — profile email now stays in sync when your authentication email changes
  • MCP Server API endpoints — programmatic access to the platform via MCP
  • Platform admin changelog notifications — send changelog updates to users via email
  • Annotations improvements and reliability fixes
  • Organization loading performance improvements

Bug Fixes

  • Fixed OAuth registration flow
  • Fixed image prompt timeout for large image generation
  • Strengthened password validation with improved error messaging
  • Fixed request filters not working correctly
  • Fixed dashboard display issues
  • Fixed git sync and user invite flows
  • Fixed dataset run execution
  • Fixed GitLab commit history display
  • Fixed template frontmatter handling
2025-12-31
Multi-Tenant Membership, GitLab Integration, Tracing Revamp

Features & Improvements

  • Multi-tenant membership — users can now belong to multiple organizations and switch between them
  • GitLab integration — connect GitLab repositories alongside GitHub for prompt management Read Docs
  • Tracing revamp — improved observability UI with better data presentation Read Docs

Bug Fixes

  • Fixed trace span selection unexpectedly jumping to top span
  • Fixed edit prompt processing for ‘done’ message type
  • Fixed annotations not appearing after being added
2025-10-17
Annotations, Interactive Trace Graph, MCP Documentation

Features & Improvements

  • Annotations — add notes and labels to trace scores for better evaluation tracking
  • Interactive trace graph — visualize trace execution flow with resizable panels
  • MCP integration documentation — guides and examples for using AgentMark as an MCP server

Bug Fixes

  • Fixed aspect ratio issue that could break the app layout
  • Fixed GitHub username not being saved on first OAuth login
  • Fixed webhook delivery reliability
2025-08-18
Evaluation Support, Dataset V2, Alerts, and More

Features & Improvements

  • Evaluation support — evaluate prompts and dataset outputs with scoring
  • Eval-based alerts — get notified when evaluations detect issues
  • Dataset V2 — improved dataset management with file loading support
  • Enhanced dataset metrics and evaluation charts for better insights
  • Automated pricing updates — model pricing stays current automatically
  • Build failure notifications — project owners are notified when builds fail
  • Improved permission checks for server actions

Bug Fixes

  • Fixed OAuth registration login and redirect issues
  • Fixed incorrect trace count when using filters
  • Improved database error messages with readable descriptions
2025-07-07
AgentMark init, CLI, Auth, Webhook, Streaming, Rollbacks, and more

Features & Improvements

  • AgentMark init
  • Updated examples in CLI init
  • Rebrand Puzzlet -> AgentMark
  • CLI: “run-prompt” for dataset + single props
  • Webhook Helpers
  • Alerts enhancements
  • Google/GitHub auth
  • Dataset runs directly via prompts
  • Vercel v4 webhook helper
  • Streaming to the platform
  • Commit History + Rollbacks

Bug Fixes

  • Ollama fix on init
2025-06-08
Attachments, Dataset references, MCP Server, Editor UI, and more

Features & Improvements

  • File/Image Attachments
  • Dataset references in prompts
  • MCP Server on init
  • Popout editor to prompt input
  • Improved datasets UI in editor

Bug Fixes

  • Disable publish button for read users
2025-05-09
AgentMark v3, Image/Speech Models, Vercel Adapter, Type-Safety, and more

Features & Improvements

  • AgentMark v3
  • Support Image + Speech Models
  • Vercel AI Adapter
  • Type-Safety for tools
  • Internal: Feature flag support
  • Commit messages

Bug Fixes

  • General bug fixes
2025-04-09
JSONL Datasets, Evals/Scoring, and more

JSONL Datasets

JSONL dataset file format in AgentMark for bulk testing with streaming supportDatasets are now supported with JSONL files. This allows you to test your prompts in bulk against large datasets, and supports streaming.Read Docs

Evals & Scoring

AgentMark Evals interface showing prompt evaluation scores and metricsAgentMark rolled out its initial evals support. Evals allow you to evaluate your prompts against a set of data, and get a score. More to come here soon.Read Docs

Other

  • Consolidating prompts, evals, and datasets into single “files”
  • Officially rolled out alerts
  • Some CLI improvements
  • Minor bug fixes
2025-03-12
Sessions, Alerts, Trace UI Improvements, Onboarding Improvements

Sessions

Sessions view grouping related traces together for workflow debuggingSessions provide a way to group related traces together, making it easier to monitor and debug complex workflows in your LLM applications. By organizing traces into sessions, you can track the entire lifecycle of a user interaction or a multi-step process.Read Docs

Alerts

Alerts configuration for error, latency, and cost monitoring via Slack or webhookNow, you can get notified when your application is experiencing increased errors, latency, or costs. Configure alerts to notify you via Slack, or a webhook.Read Docs

Traces UI Improvements

Improved Traces UI displaying request details at a glanceTraces now have a more user-friendly UI, with a focus on providing important information at a glance.

Onboarding Improvements

AgentMark improved its onboarding. Now, you can see your dashboard without having to sync your repo first. AgentMark also supports modular onboarding, so you can skip steps you don’t need.
2025-02-18
Add Trace Examples to Datasets, Load Trace in Prompt, Re-indexing, App UI Improvements, bug fixes

Adding Examples to Datasets

One-click button to add production trace data to datasetsYou can now add production trace data to your datasets with a single click.Read Docs

Adding Examples to Prompts

Adding production trace examples to prompts for testing with real dataYou can now add production trace examples to your prompts. This allows you to iterate/test against your prompts with real data.Read Docs

Re-indexing

Re-index button to refresh prompts and datasets from synced repositoryYou can now re-index your prompts, and datasets. This allows you to perform a fresh pull on the content from your synced repository.

App UI Improvements

You can now easily view your app’s repo configuration, including repo names, branch, and more.
2025-01-27
Type Safety, Datasets, and more

Type Safety

AgentMark aims to provide developers with the best developer experience possible. As part of this, type safety has just been added to the platform.
  • Types can now be generated via the CLI
  • Fetching prompts from the CDN or AgentMark are now type-safe
  • Prompts now support run/compile/deserialize functions
Type-safe prompt fetching with TypeScript autocomplete in IDERead more about Type Safety

Datasets

Datasets now allow you to test your prompts in bulk against a large set of data.
  • Run your datasets in bulk against your prompts
  • View previous runs, with inputs/outputs
  • View traces associated with each run
  • View high-level metrics for each run
Read more about DatasetsDatasets dashboard showing bulk test runs with inputs, outputs, and metrics

Trace Grouping

Traces can now be grouped based on the trace function, and the component function. Trace groups together at the root level, while component allows for sub-groups.
  • New function added: trace
  • New function added: component
Trace grouping with trace and component functions for hierarchical organization

CLI Improvements

The CLI has been improved to provide a better developer experience.
  • AgentMark init can optionally create an example app
  • Added pull-models to walk through adding new models to your platform
Read More about the CLI

Bug Fixes

  • Fixed a bug which could cause an app’s templates to be deleted when a new app was created
  • Fixed a bug which could cause some branches not to show up in the UI
  • Fixed a bug which could prevent newly created local prompts from being synced to the platform

Other

  • Improved UI for prompts input/output
  • Paginate traces
  • Improved UI theme for prompts
2025-01-16
Prompt Management, Observability, Datasets, CLI, Platform Management, and Evals

Overview

AgentMark logo - git-based Prompt Engineering PlatformAgentMark is a git-based Prompt Engineering Platform where application developers and prompt engineers collaborate on GenAI products. AgentMark enables application developers to manage their configuration, prompts, datasets, and evals in a git-based workflow while also providing a hosted platform for collaboration with non-technical team members.

Features

  • Prompt Management
  • Observability
  • Datasets
  • CLI
  • Platform Management
  • Evals

Prompt Management

AgentMark Prompt Management dashboard showing prompts organized as files in a git-synced repositoryAgentMark takes a developer-first approach to prompt management, treating prompts as files that live in your repository while still providing a platform for non-technical team members. All prompts are saved in AgentMark, a markdown-based format that’s easy to write and read.Read Docs

Observability

AgentMark Observability dashboard displaying LLM application metrics, traces, and logsAgentMark builds on top of OpenTelemetry for collecting telemetry data from your prompts. This helps you monitor, debug, and optimize your LLM applications in production. AgentMark provides traces, logs, metrics, and more.Read Docs

Datasets

Create datasets to easily test your prompts in bulk against a large set of data.Read Docs

CLI

AgentMark provides a CLI for initializing your AgentMark app, customizing it, and deploying it to the cloud. Add new models to your platform with just a single command. You can also develop w/ AgentMark locally using the serve command.
npx create-agentmark
Read Docs

Platform Management

AgentMark offers an intuitive platform for creating new git-synced apps, adding team members with roles, and setting up API keys for users.

AgentMark SDK

AgentMark SDK traces view showing LLM request spans with timing and metadataAgentMark’s SDK is simple and easy to use. It offers features like: one-LOC observability, securely fetching prompts from the CDN, and more.Read Docs
2025-01-03
Initial AgentMark Release

Features

  • Initial release of AgentMark
  • Support for OpenAI, Anthropic, and other LLM providers
  • MDX-based prompt templating
  • Type-safe prompt development
  • Tools and agents support

Documentation

  • Added comprehensive documentation
  • Included examples and guides
  • API reference documentation