The diagram shows AgentMark’s three-level hierarchy. An Organization (yellow) contains multiple Apps (green), and each App owns its own set of resources (blue): Prompts, Traces/Logs, Datasets, and Evals. Resources are isolated per app — a prompt in App 1 is not visible to App 2.Documentation Index
Fetch the complete documentation index at: https://docs.agentmark.co/llms.txt
Use this file to discover all available pages before exploring further.
Organizations
Each organization is typically associated with an individual company. Organizations each have their own billing configuration. Each organization can have multiple users, with the following roles: Owner, Admin, Write, or Read. See Users and access control for what each role can do. An organization often has multiple apps within it.Apps
Many apps can exist within an organization. Each app can be synced to a Git repository (GitHub or GitLab). Apps are isolated from each other, and each contain their own prompt templates, traces, metrics, and API keys. Use separate apps for staging, production, or dev environments.Branches
Each app is backed by a default branch in its connected Git repository. AgentMark reads your prompt templates, datasets, and configuration files from this branch. You can work on additional branches — for previews, staging, or review workflows — and AgentMark syncs each one independently.Prompts
Prompts are defined in.prompt.mdx files — AgentMark’s serialized format that bundles prompt content, reusable components, and associated evals into a single versioned artifact. Fetch them from your Git repository, or from AgentMark’s secure CDN to iterate on prompts separately from your application code. Learn more about Build.
Traces
Traces capture every step from input to output. Each individual step is a span. For example, a prompt chain with 3 tool calls produces one trace containing multiple spans. Learn more about Traces.Datasets
Datasets are collections of data you use to test prompts in bulk. Create datasets from your own data, public datasets, traces you’ve already captured in AgentMark, synthetic data, or manual entry. Learn more about Datasets.Metrics
Metrics show you at a high level how users interact with your application — cost, latency, model usage, active users, and more. Filter metrics by time period, model, or other dimensions to drill in. Learn more about Dashboards.Evals
Evals are functions, declared in.prompt.mdx files, that automatically grade the outputs of your prompts. Run them locally via the CLI or SDK, or in AgentMark Cloud. Use evals to catch quality regressions before deploying to production. Learn more about Evals.
Sessions
Sessions group related traces to represent multi-turn conversations or workflows. For example, a chat conversation with multiple back-and-forth exchanges is tracked as a single session containing multiple traces. Learn more about Sessions.Alerts
Alerts notify you when important thresholds are crossed in your application. Configure alerts for cost limits, latency spikes, error rates, and quality metrics to catch issues before they impact users. Learn more about Alerts.Annotations
Annotations provide a human-in-the-loop quality assessment workflow. Team members manually label and review trace outputs to build ground-truth datasets and ensure prompt quality. Learn more about Annotations.Have Questions?
We’re here to help! Choose the best way to reach us:
- Email us at hello@agentmark.co for support
- Schedule an Enterprise Demo to learn about our business solutions