Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agentmark.co/llms.txt

Use this file to discover all available pages before exploring further.

The diagram shows AgentMark’s three-level hierarchy. An Organization (yellow) contains multiple Apps (green), and each App owns its own set of resources (blue): Prompts, Traces/Logs, Datasets, and Evals. Resources are isolated per app — a prompt in App 1 is not visible to App 2.

Organizations

Each organization is typically associated with an individual company. Organizations each have their own billing configuration. Each organization can have multiple users, with the following roles: Owner, Admin, Write, or Read. See Users and access control for what each role can do. An organization often has multiple apps within it.

Apps

Many apps can exist within an organization. Each app can be synced to a Git repository (GitHub or GitLab). Apps are isolated from each other, and each contain their own prompt templates, traces, metrics, and API keys. Use separate apps for staging, production, or dev environments.

Branches

Each app is backed by a default branch in its connected Git repository. AgentMark reads your prompt templates, datasets, and configuration files from this branch. You can work on additional branches — for previews, staging, or review workflows — and AgentMark syncs each one independently.

Prompts

Prompts are defined in .prompt.mdx files — AgentMark’s serialized format that bundles prompt content, reusable components, and associated evals into a single versioned artifact. Fetch them from your Git repository, or from AgentMark’s secure CDN to iterate on prompts separately from your application code. Learn more about Build.

Traces

Traces capture every step from input to output. Each individual step is a span. For example, a prompt chain with 3 tool calls produces one trace containing multiple spans. Learn more about Traces.

Datasets

Datasets are collections of data you use to test prompts in bulk. Create datasets from your own data, public datasets, traces you’ve already captured in AgentMark, synthetic data, or manual entry. Learn more about Datasets.

Metrics

Metrics show you at a high level how users interact with your application — cost, latency, model usage, active users, and more. Filter metrics by time period, model, or other dimensions to drill in. Learn more about Dashboards.

Evals

Evals are functions, declared in .prompt.mdx files, that automatically grade the outputs of your prompts. Run them locally via the CLI or SDK, or in AgentMark Cloud. Use evals to catch quality regressions before deploying to production. Learn more about Evals.

Sessions

Sessions group related traces to represent multi-turn conversations or workflows. For example, a chat conversation with multiple back-and-forth exchanges is tracked as a single session containing multiple traces. Learn more about Sessions.

Alerts

Alerts notify you when important thresholds are crossed in your application. Configure alerts for cost limits, latency spikes, error rates, and quality metrics to catch issues before they impact users. Learn more about Alerts.

Annotations

Annotations provide a human-in-the-loop quality assessment workflow. Team members manually label and review trace outputs to build ground-truth datasets and ensure prompt quality. Learn more about Annotations.

Have Questions?

We’re here to help! Choose the best way to reach us: