Quickstart - AgentMark Docs

agentmark init scaffolds the project: it writes agentmark.json, creates an empty agentmark/ directory, pins @agentmark-ai/cli as a local dev dependency (so CI and teammates run the same version), installs the AgentMark agent skill into your editor, and hands off to your AI tool. The AI tool reads your project, asks the docs MCP for the right integration pattern, and wires the SDK into your existing code.

Prerequisites

Node.js 18+
An AI-tool-aware editor: Claude Code, Cursor, VS Code (Copilot Chat), or Zed
An LLM provider API key (OpenAI, Anthropic, etc.) for the model you want to run

Step 1: Install the CLI

Install once, globally, so the bare agentmark command is on your PATH:

npm install -g @agentmark-ai/cli

Step 2: bootstrap

Run from inside your project directory (or pass a folder name to scaffold a fresh one):

agentmark init

Prefer not to install globally? npm create agentmark@latest (also yarn create agentmark / pnpm create agentmark) is a thin wrapper that runs the exact same agentmark init flow and produces identical output.

The CLI asks two short questions, scaffolds, and exits:

? Where would you like to set up AgentMark?  .
? Wire AgentMark MCP into which IDE clients?
  Space to toggle. Enter to submit. Skip all = empty selection.
  ◉ Claude Code
  ◉ Cursor
  ◉ VS Code
  ◉ Zed

✅ agentmark.json
✅ agentmark/ (empty, ready for your .prompt.mdx files)
✅ package.json (@agentmark-ai/cli ^0.15.0 + scripts: dev, agentmark:build, agentmark:experiment)
✅ MCP wired (Claude Code): .mcp.json
✅ MCP wired (Cursor): .cursor/mcp.json
✅ MCP wired (VS Code): .vscode/mcp.json
✅ MCP wired (Zed): .zed/settings.json

📚 Installing AgentMark agent skill...
✅ Agent skill installed at ./.agents/skills/agentmark/

✨ AgentMark is wired up.

   Next: open this project in Claude Code, Cursor, VS Code, or Zed and say:

       "Set up AgentMark in this project."

The local pin means npm run dev (and CI) resolve the pinned @agentmark-ai/cli from node_modules/.bin before any global install, so your project always builds against the version it was scaffolded with. In an existing project that already has a dev script, the AgentMark scripts land under namespaced keys (agentmark:dev, …) instead of clobbering yours.

Non-interactive (CI / scripting):

agentmark init my-app --client all --overwrite

Flags: --path <dir> • --client <id|all> (ids: claude-code, codex, cursor, vscode, zed; comma-separated) • --yes/-y (accept the default for every prompt) • --overwrite (replace existing agentmark.json) • positional folder name.

Step 3: Ask your AI tool to wire AgentMark into your code

Open your project in Claude Code, Cursor, VS Code, or Zed and send the agent this message:

Set up AgentMark in this project, including the client and a deployable handler.

The AgentMark skill takes over. It:

Detects your project’s framework (Next.js, FastAPI, Hono, plain Node, etc.)
Queries the docs MCP for the right integration recipe
Proposes a concrete plan back to you: packages to install, where the client file goes, and what your first prompt looks like
After you confirm, installs the SDK, writes the client (agentmark.client.ts / agentmark_client.py), the dev entry that agentmark dev boots (dev-entry.ts / .agentmark/dev_server.py), and a deployable handler file, then scaffolds a first prompt and smoke-tests it

It won’t touch your existing LLM-SDK call sites during setup. Migrating those is a separate confirmation, so ask the agent when you’re ready.

Step 4: Add your provider key

The agent tells you which env var to set for the model it picked. For OpenAI’s gpt-5.5 (the seeded default) that’s:

echo "OPENAI_API_KEY=sk-..." >> .env

Verify your setup at any point with agentmark doctor. It statically checks your config, prompts, client, and dependencies and flags the common silent failures (unregistered models, the agentmarkPath: "/" mistake, a missing client or dev-entry, an ungitignored .env) with a fix for each.

Step 5: Run your first prompt

Cloud
Local

Commit and push your project to a Git repository (GitHub or GitLab).
In the AgentMark Dashboard, click Create App and give it a display name (you can rename it later from the app’s settings menu, and the URL identifier stays fixed). Then open the app’s settings menu and choose Link Repository to pick your repo. The first time anyone in your organization connects a repo, you’ll do a one-time Connect Git Repository step first. It installs the GitHub or GitLab app for your org, then mirrors your .outerlayer/ context directory into the Dashboard on every push.
Add your LLM provider API key in Settings → Environment Variables.

Apps list in the AgentMark Dashboard showing the Create App button

Once connected, AgentMark Cloud syncs your prompts on every push. The Run button (and Experiments) stay disabled until a deployment of your handler exists; see Client setup to add and deploy one. Then open a prompt and click Run, and output streams back in real time.

Running a prompt in the AgentMark Dashboard

Start the dev server (keep it running in a separate terminal). It boots your dev-entry.ts / .agentmark/dev_server.py; see Client setup if you haven’t created one yet:

agentmark dev

Then run the prompt the agent scaffolded (the agent will tell you the actual path; substitute it for <your-prompt> below):

agentmark run-prompt agentmark/<your-prompt>.prompt.mdx --props '{"message":"hello"}'

The CLI prints the model output, token counts (in/out/total), and a 📊 View trace URL you can open in the browser to see the full span tree.

The dev server listens on ports 9418 (API), 9417 (webhook), and 3000 (UI app). Override with --api-port / --webhook-port / --app-port if you need different ports.

Step 6: Run an experiment

An experiment runs a prompt against a dataset and scores each row.

Cloud
Local

Your datasets and score configs sync from the repo on every push. In the AgentMark Dashboard, open Experiments, click New Experiment, choose the prompt, dataset, and evaluations, then run. Results stream in live.

Experiment results in the AgentMark Dashboard showing per-row scores and aggregate metrics

The experiment detail view shows each dataset row’s input, the AI output, expected output, and evaluator scores, alongside aggregate metrics for the run (average score, latency, cost, tokens). See Running experiments for the full flow.

Add a test_settings block to your prompt’s frontmatter pointing at a .jsonl dataset (see Datasets for the row shape), then:

agentmark run-experiment agentmark/<your-prompt>.prompt.mdx --threshold 80

The CLI runs every row, applies your evaluators, prints a results table, and exits non-zero if pass rate is below --threshold. Wire that into CI for prompt regression gating. The gate counts only evals that return a boolean passed (like the exact_match example in Client setup); score-only evals don’t feed it, so gate those with test_settings.score_thresholds instead.

Terminal output of agentmark run-experiment showing a per-row results table with evaluator scores

The CLI prints a per-row results table with each item’s evaluator scores, followed by the run’s aggregate pass rate.

Need worked examples? See Example prompts for four copy-paste recipes covering all four generation types (object, text+tools, image, speech).

What’s in your project after bootstrap

File	Source	Purpose
`agentmark.json`	CLI	Project config: `version`, `mdxVersion`, `agentmarkPath`, and one seeded model in `builtInModels` (plus `$schema`) at bootstrap. Add models with `agentmark pull-models`; `modelSchemas` and `scores` are optional keys you add later
`agentmark/.gitkeep`	CLI	Empty prompts directory (drop `.prompt.mdx` files here)
`.mcp.json` (and per-IDE configs)	CLI	MCP wiring: `agentmark-docs` (docs), `agentmark` (Cloud), `agentmark-local` (dev)
`.agents/skills/agentmark/`	CLI (via `npx skills add`)	Agent skill that knows AgentMark; teaches Claude Code / Cursor / etc.
`agentmark.client.ts` (or `_client.py`)	Skill	Configured SDK client, added when you ask the AI tool to integrate
`dev-entry.ts` (or `.agentmark/dev_server.py`)	You / Skill	Local webhook entry that `agentmark dev` boots; see Client setup Step 3
`handler.ts` (or `handler.py`)	You / Skill	Cloud deployment entry; see Client setup Step 4
Your first `.prompt.mdx`	Skill	Scaffolded by the AI tool, named for your use case
`.env`	You	Provider API key(s); `AGENTMARK_API_KEY` / `AGENTMARK_APP_ID` for Cloud

The CLI ships only the unopinionated bits. Everything stack-specific comes from the AI tool reading your project plus the docs MCP, so the integration matches whatever framework you’re already on.

Next steps

Build prompts

Author .prompt.mdx files: text, object, image, speech

Example prompts

Copy-paste starters for all four generation types

Evaluate

Test prompts with datasets + evaluators; gate CI on regressions

Observe

Traces, sessions, cost-and-token tracking

Connect your SDK

Any SDK via the neutral render plus reference executors

Set up your client

Host your client and connect it to AgentMark Cloud

Have questions?

Reach out any time:

Email the team at hello@agentmark.co for support
Schedule an Enterprise Demo to learn about AgentMark’s business solutions

​Prerequisites

​Step 1: Install the CLI

​Step 2: bootstrap

​Step 3: Ask your AI tool to wire AgentMark into your code

​Step 4: Add your provider key

​Step 5: Run your first prompt

​Step 6: Run an experiment

​What’s in your project after bootstrap

​Next steps

Build prompts

Example prompts

Evaluate

Observe

Connect your SDK

Set up your client

​Have questions?

Prerequisites

Step 1: Install the CLI

Step 2: bootstrap

Step 3: Ask your AI tool to wire AgentMark into your code

Step 4: Add your provider key

Step 5: Run your first prompt

Step 6: Run an experiment

What’s in your project after bootstrap

Next steps

Have questions?