Core concepts - AgentMark Docs

The diagram shows AgentMark’s hierarchy. An Organization (yellow) holds multiple Apps (green). Each App runs its own Environments (purple), typically dev, staging, and prod, and each environment keeps its own snapshot of that app’s resources (blue): Prompts, Datasets, Traces, and Evals. Here staging and prod each carry their own resource set, while dev tracks your branch’s latest commit (HEAD) live. Every app follows the same shape. Promoting staging → prod copies a snapshot forward, and environments stay isolated: a prod key can’t read staging’s traces, templates, or datasets.

Organizations

Each organization is typically associated with an individual company. Organizations each have their own billing configuration. Each organization can have multiple users, with the following roles: Owner, Admin, Write, or Read. See Users and access control for what each role can do. An organization often has multiple apps within it.

Apps

Many apps can exist within an organization. You can sync each app to a Git repository. GitHub is available on all plans; GitLab connections require a Growth plan or higher. Apps stay isolated from each other, and each contains their own prompt templates, traces, metrics, and API keys. Use separate apps for unrelated projects or products; to run one app across dev, staging, and prod, use environments instead.

Branches

Each app maps to a default branch in its connected Git repository. AgentMark reads your prompt templates, datasets, and configuration files from this branch. You can work on additional branches for previews, staging, or review workflows, and AgentMark syncs each one independently.

Environments

An environment is an isolated runtime that serves one specific version of your prompts and code. Each app starts with these building blocks:

dev tracks your connected branch’s HEAD live, so every push deploys instantly.
staging and prod (which you create) run pinned, immutable snapshots that change only when you promote a tested version into them.

Because you promote prod from a version you already validated in staging, pushing a fix to dev never silently changes production. Learn more about Environments and promotions.

Prompts

You define prompts in .prompt.mdx files, AgentMark’s file format that bundles prompt content, reusable components, and associated evals into a single versioned artifact. Fetch them from your Git repository, or from AgentMark’s secure CDN to iterate on prompts separately from your application code. Learn more about Build.

Traces

Traces capture every step from input to output. Each individual step is a span. For example, a prompt chain with 3 tool calls produces one trace containing multiple spans. Learn more about Traces.

Datasets

Datasets are collections of data you use to test prompts in bulk. Create datasets from your own data, public datasets, traces you’ve already captured in AgentMark, synthetic data, or manual entry. Learn more about Datasets.

Metrics

Metrics show you at a high level how users interact with your application: cost, latency, model usage, active users, and more. Filter metrics by time period, model, or other dimensions to drill in. Learn more about Dashboards.

Evals

Evals are functions you register on your AgentMark client and reference by name from a prompt’s test_settings.evals; they automatically grade the outputs of your prompts. Run them locally via the CLI or SDK, or in AgentMark Cloud. Use evals to catch quality regressions before deploying to production. Learn more about Evals.

Sessions

Sessions group related traces to represent multi-turn conversations or workflows. For example, AgentMark tracks a chat conversation with multiple back-and-forth exchanges as a single session containing multiple traces. Learn more about Sessions.

Alerts

Alerts notify you when your application crosses important thresholds. Configure alerts for cost limits, latency spikes, error rates, and quality metrics to catch issues before they impact users. Learn more about Alerts.

Annotations

Annotations provide a human-in-the-loop quality assessment workflow. Team members manually label and review trace outputs to build ground-truth datasets and ensure prompt quality. Learn more about Annotations.

Next steps

Ready to see these pieces in practice? Start with the Quickstart.

Have questions?

Reach out any time:

Email the team at hello@agentmark.co for support
Schedule an Enterprise Demo to learn about AgentMark’s business solutions

​Organizations

​Apps

​Branches

​Environments

​Prompts

​Traces

​Datasets

​Metrics

​Evals

​Sessions

​Alerts

​Annotations

​Next steps

​Have questions?