> ## Documentation Index
> Fetch the complete documentation index at: https://docs.agentmark.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Set up your AgentMark client

> Create your AgentMark client, run prompts and experiments locally, and deploy it so the Dashboard can execute against your code.

The AgentMark client is the piece of your code that actually **executes prompts**. It renders each `.prompt.mdx` to AgentMark's neutral `{ messages, text_config }` shape; your code (or an [executor](#connect-your-sdk)) passes that to whatever LLM SDK you already use. One client powers three surfaces:

| Surface               | Entry point (TS / Python)                     | What runs it                                  |
| --------------------- | --------------------------------------------- | --------------------------------------------- |
| Your application code | `agentmark.client.ts` / `agentmark_client.py` | You                                           |
| Local development     | `dev-entry.ts` / `.agentmark/dev_server.py`   | `agentmark dev`                               |
| AgentMark Cloud       | `handler.ts` / `handler.py`                   | The [deployment pipeline](/deploy/deployment) |

Run buttons in the Dashboard (playground runs, experiments) dispatch to your **deployed** client. Until a deployment exists, the Dashboard disables those buttons and points you to this page.

<Tip>
  Your AI tool detects your stack and writes the client for it. See [Let your agent set it up](#let-your-agent-set-it-up). The steps below are the manual path.
</Tip>

## Prerequisites

* An AgentMark project: `agentmark.json` + an `agentmark/` directory with at least one prompt ([Quickstart](/getting-started/quickstart))
* Node.js 18+ (the `agentmark` CLI runs on Node for both languages); Python projects also need Python 3.12+
* Your model provider's API key (for example, `OPENAI_API_KEY`)

<Tabs>
  <Tab title="TypeScript">
    ## Step 1: Install the client and CLI

    The client is SDK-neutral. Install it once and keep whatever LLM SDK you already call:

    ```bash theme={null}
    npm install @agentmark-ai/prompt-core @agentmark-ai/sdk
    npm install -D @agentmark-ai/cli tsx typescript @types/node
    ```

    <Note>
      You only need the two runtime packages above plus your own SDK. The model call lives in an **executor** you own. Copy a ready-made one from [Connect your SDK](#connect-your-sdk) below (Vercel AI SDK, raw OpenAI, raw Anthropic, agent frameworks), or write your own following the same contract. There is no per-SDK AgentMark package to install or version to track.
    </Note>

    ## Step 2: Create `agentmark.client.ts`

    Create this file at your **project root** (next to `agentmark.json`), not inside `src/`. The CLI (`agentmark dev`, `agentmark doctor`) loads `agentmark.client.ts` from the project root, and `dev-entry.ts` / `handler.ts` import it from there.

    The client wires together three things: a **loader** (where prompts come from), the neutral adapter (which renders prompts to `{ messages, text_config }`), and your **evals** (registered once, here; everything else sources them from the client):

    ```typescript agentmark.client.ts theme={null}
    import { createAgentMark } from "@agentmark-ai/prompt-core";
    import { ApiLoader } from "@agentmark-ai/prompt-core/loader-api";
    import type AgentmarkTypes from "./agentmark.types";

    // Local dev (no app id): prompts come from `agentmark dev`'s API server.
    // Linked to cloud (app id present): prompts come from AgentMark Cloud.
    // Gate on AGENTMARK_APP_ID, not the API key — see the warning below.
    const loader = process.env.AGENTMARK_APP_ID
      ? ApiLoader.cloud({
          apiKey: process.env.AGENTMARK_API_KEY!,
          appId: process.env.AGENTMARK_APP_ID,
          baseUrl: process.env.AGENTMARK_BASE_URL,
        })
      : ApiLoader.local({ baseUrl: "http://localhost:9418" });

    // Evals live on the client. The webhook runner (Steps 3-4) sources them
    // from here, so they RUN in Cloud experiments and LIST in the Dashboard's
    // New Experiment dialog. Start empty and add as you go. `passed` is what
    // run-experiment's --threshold gate counts; score-only evals don't feed it.
    export const client = createAgentMark<AgentmarkTypes>({
      loader,
      scorers: {
        exact_match: ({ output, expectedOutput }) => ({
          score: output === expectedOutput ? 1 : 0,
          passed: output === expectedOutput,
        }),
      },
    });
    ```

    Generate the `agentmark.types` file with `agentmark generate-types --root-dir agentmark > agentmark.types.ts`. It's safe to start without type arguments: drop the `<AgentmarkTypes>` generic and the import until you're ready.

    <Warning>
      Two loader gotchas:

      * **Gate the loader on `AGENTMARK_APP_ID`, not `AGENTMARK_API_KEY`.** The API key is **also** the trace-exporter credential, so if the key is the switch, turning on tracing silently repoints prompt-loading to Cloud, which 404s every prompt until you've actually deployed. The app id is the loader-specific signal: you only have one once an app is provisioned.
      * **Don't point `ApiLoader.local` at `process.env.AGENTMARK_BASE_URL`.** That variable overrides the **cloud** endpoint (managed deployments inject it), and reusing it for the local loader silently breaks `agentmark dev` whenever it's set.
    </Warning>

    The neutral client doesn't resolve models or tools: it renders, and your call site (or executor) handles the rest. For the pieces this file wires up, see [Loaders](/configure/loaders), [Tools and agents](/build/tools-and-agents), [MCP](/build/mcp), [Type safety](/configure/type-safety), and [Writing evals](/evaluate/writing-evals) for real eval functions.

    ## Step 3: Run locally with `agentmark dev`

    `agentmark dev` starts a local API server (serves your prompt files) and a webhook server (executes prompts through your client). The webhook server boots from a `dev-entry.ts` file at your project root.

    It builds an **executor** (your one model call), wires it to your client with `createWebhookRunner`, and serves that runner locally:

    ```typescript dev-entry.ts theme={null}
    // Local dev webhook server — `agentmark dev` runs this with tsx.
    import { createWebhookServer } from "@agentmark-ai/cli/runner-server";
    import { createExecutor } from "@agentmark-ai/prompt-core";
    import { AgentMarkSDK, createWebhookRunner } from "@agentmark-ai/sdk";

    async function main() {
      const { client } = await import("./agentmark.client");

      // Local tracing: the runner wires span HOOKS, but spans only export once
      // tracing is initialized. Point the exporter at the local dev API server
      // (unauthenticated — no cloud keys needed); without this, runs work but
      // `agentmark doctor --smoke` and the local trace UI never see a trace.
      new AgentMarkSDK({
        apiKey: "local-dev",
        appId: "local-dev",
        baseUrl: process.env.AGENTMARK_DEV_SERVER ?? "http://localhost:9418",
      }).initTracing({ disableBatch: true });

      // The executor is the one place your SDK gets called. Copy a complete
      // one for your SDK from the "Connect your SDK" reference below and drop
      // it in here; the placeholder below shows the shape.
      const executor = createExecutor({
        name: "my-sdk",
        text: async (formatted) => {
          // Call your SDK with formatted.text_config.model_name + formatted.messages,
          // return { text, usage: { inputTokens, outputTokens } }.
          return { text: "", usage: { inputTokens: 0, outputTokens: 0 } };
        },
      });

      // The runner shares your app client — loader AND evals come from it
      // (register evals in agentmark.client.ts; they both RUN in experiments
      // and LIST in the Dashboard's New Experiment dialog).
      const runner = createWebhookRunner({ client, executor });

      const args = process.argv.slice(2);
      const portArg = args.find((arg) => arg.startsWith("--webhook-port="));
      const port = portArg ? parseInt(portArg.split("=")[1], 10) : 9417;

      // A WebhookRunner already satisfies the webhook server's handler contract.
      await createWebhookServer({ handler: runner, port });
    }

    main().catch((err) => {
      console.error(err);
      process.exit(1);
    });
    ```

    The placeholder executor above returns empty text: paste a real executor from [Connect your SDK](#connect-your-sdk) first to get actual model output. Then start the dev stack and run a prompt:

    ```bash theme={null}
    agentmark dev
    # in another terminal:
    agentmark run-prompt ./agentmark/my-prompt.prompt.mdx
    ```

    ```text theme={null}
    === Text Prompt Results ===
    The capital of France is Paris.
    ────────────────────────────────────────────────────────────
    🪙 12 in, 8 out, 20 total
    ```

    Experiments work the same way; datasets resolve through the local API server. Your prompt needs a dataset first (`test_settings.dataset` in its frontmatter; see [Datasets](/evaluate/datasets)):

    ```bash theme={null}
    agentmark run-experiment ./agentmark/my-prompt.prompt.mdx
    ```

    If `agentmark dev` exits with `No dev server entry point found`, the `dev-entry.ts` file above is what it's looking for.

    ## Step 4: Add a deployment entry point (`handler.ts`)

    AgentMark Cloud executes your client through a single **handler** function. Each Dashboard run (playground or experiment) arrives as one `{ type, data }` event; the runner's `dispatch` routes it:

    ```typescript handler.ts theme={null}
    // AgentMark Cloud deployment entry point. The deployment pipeline bundles
    // this file and wraps it in a managed HTTP server.
    //
    // The deployed handler depends only on prompt-core, @agentmark-ai/sdk
    // (tracing + the traced runner factory), and your LLM SDK — never the
    // CLI's dev-only dashboard tree (Next.js/React/MUI/better-sqlite3).
    import { AgentMarkSDK, createWebhookRunner } from "@agentmark-ai/sdk";
    import { createExecutor } from "@agentmark-ai/prompt-core";
    import type { WebhookRequest } from "@agentmark-ai/prompt-core/webhook-runner";
    import { client } from "./agentmark.client";

    // Initialize tracing once, at module load — before any run is dispatched.
    // Without this, prompt and experiment runs triggered from the Dashboard
    // still execute, but emit no spans, so no traces (and no experiment trace
    // detail) show up in the app. The deployment pipeline injects
    // AGENTMARK_API_KEY / AGENTMARK_APP_ID / AGENTMARK_BASE_URL automatically.
    // registerGlobally: true is REQUIRED to capture your SDK's own model
    // (generation) span if it emits through the global OTel tracer — without it
    // you get the wrapper span but no model, token, or input/output data.
    // See /observe/tracing-setup.
    const sdk = new AgentMarkSDK({
      apiKey: process.env.AGENTMARK_API_KEY!,
      appId: process.env.AGENTMARK_APP_ID!,
      baseUrl: process.env.AGENTMARK_BASE_URL,
    });
    sdk.initTracing({ registerGlobally: true });

    // Same executor + runner as your dev entry — copy a complete executor from
    // the "Connect your SDK" reference below. Evals come from the client (register
    // them in agentmark.client.ts) so they both run in experiments and list
    // in the New Experiment dialog.
    const executor = createExecutor({
      name: "my-sdk",
      text: async (formatted) => {
        return { text: "", usage: { inputTokens: 0, outputTokens: 0 } };
      },
    });
    const runner = createWebhookRunner({ client, executor });

    // The deployed handler IS the runner's dispatch — it routes prompt-run,
    // dataset-run, and the control-plane get-evals (which the New Experiment
    // dialog reads), sourcing evals from the runner's client. No hand-rolled
    // branches, no separate client to thread.
    export default (body: WebhookRequest) => runner.dispatch(body);
    ```

    `@agentmark-ai/sdk` owns tracing initialization. The managed server is long-lived, so the default batch span processor flushes on its own; you don't call `shutdown()` here (that's only for short-running scripts).

    The pipeline resolves your handler in this order: the `handler` key in `agentmark.json` if set, then `handler.py`, then `handler.ts` at the repository root. See [handler detection](/deploy/deployment#handler-detection).

    ## Step 5: Deploy

    1. **Connect your repository** in the Dashboard (the app's setup card, or **Deployments**). Every push then triggers the [deployment pipeline](/deploy/deployment): file sync, then a code deploy of your handler to a managed machine.
    2. **Set your provider keys** under **Settings → Environment Variables** (e.g. `OPENAI_API_KEY`). AgentMark Cloud injects `AGENTMARK_API_KEY`, `AGENTMARK_APP_ID`, and `AGENTMARK_BASE_URL` automatically, so your client's Cloud loader is already wired for them.
    3. **Push.** Watch the build under **Deployments**; when it goes green, Run buttons in the playground and experiments go live against your deployed client.

    <Note>
      Deployed **experiments** stream their datasets from your linked repository (at the environment's branch or pinned commit). The repo connection does more than sync: it's how your datasets reach the deployed client at run time.
    </Note>
  </Tab>

  <Tab title="Python">
    ## Step 1: Install the client

    The client is SDK-neutral. Install it once and keep whatever LLM SDK you already call:

    ```bash theme={null}
    pip install agentmark-prompt-core agentmark-sdk python-dotenv
    ```

    <Note>
      **Package vs import names:** `agentmark-prompt-core` installs but imports as `from agentmark.prompt_core import ...`. `ApiLoader`, `FileLoader`, `create_agentmark`, `create_executor`, `create_webhook_runner`, and `serve_webhook_runner` all ship with it; there's no separate loader or adapter package on PyPI. `create_agentmark` takes no adapter argument and always returns the neutral render. The model call lives in an **executor** you own; copy a ready-made one from [Connect your SDK](#connect-your-sdk) below or write your own following the same contract.
    </Note>

    ## Step 2: Create `agentmark_client.py`

    Create this file at your **project root** (next to `agentmark.json`), not inside a package directory. The CLI (`agentmark dev`, `agentmark doctor`) loads `agentmark_client.py` from the project root, and `dev_server.py` / `handler.py` import it from there.

    The client wires together three things: a **loader** (where prompts come from), the neutral `DefaultAdapter` (which renders prompts to `{ messages, text_config }`), and your **evals** (registered once, here; everything else sources them from the client):

    ```python agentmark_client.py theme={null}
    import os

    from dotenv import load_dotenv
    from agentmark.prompt_core import create_agentmark, ApiLoader

    load_dotenv()

    # Local dev (no app id): prompts come from `agentmark dev`'s API server.
    # Linked to cloud (app id present): prompts come from AgentMark Cloud.
    # Gate on AGENTMARK_APP_ID, not the API key: the key also powers tracing,
    # so keying off it would repoint prompt-loading to Cloud the moment you
    # enable traces (404s until you've deployed).
    if os.environ.get("AGENTMARK_APP_ID"):
        loader = ApiLoader.cloud(
            api_key=os.environ["AGENTMARK_API_KEY"],
            app_id=os.environ["AGENTMARK_APP_ID"],
        )
    else:
        loader = ApiLoader.local(base_url="http://localhost:9418")

    # Evals live on the client. The webhook runner (Steps 3-4) sources them
    # from here, so they RUN in Cloud experiments and LIST in the Dashboard's
    # New Experiment dialog. Start empty and add as you go. "passed" is what
    # run-experiment's --threshold gate counts; score-only evals don't feed it.
    def exact_match(params):
        match = params["output"] == params.get("expectedOutput")
        return {"score": 1.0 if match else 0.0, "passed": match}

    client = create_agentmark(loader=loader, scorers={"exact_match": exact_match})
    ```

    <Note>
      `ApiLoader.cloud` falls back to the `AGENTMARK_BASE_URL` environment variable automatically, so managed deployments reach the right gateway without extra configuration. Don't reuse that variable for the local branch; it would re-point local dev at the Cloud endpoint whenever it's set.
    </Note>

    The neutral client doesn't resolve models or tools: it renders, and your call site (or executor) handles the rest. Evals DO register here (`create_agentmark(loader=..., scorers=...)`): the webhook runner sources them from the client, so they run in Cloud experiments and list in the New Experiment dialog. For the pieces this file wires up, see [Loaders](/configure/loaders), [Tools and agents](/build/tools-and-agents), [MCP](/build/mcp), and [Writing evals](/evaluate/writing-evals).

    ## Step 3: Run locally with `agentmark dev`

    `agentmark dev` starts a local API server (serves your prompt files) and a webhook server (executes prompts through your client). For Python projects it boots `.agentmark/dev_server.py`.

    The dev server builds an **executor** (your one model call) and wraps it in a runner with `create_webhook_runner`:

    ```python .agentmark/dev_server.py theme={null}
    """AgentMark dev webhook server entry point."""

    import os
    import sys
    from pathlib import Path

    sys.path.insert(0, str(Path(__file__).parent.parent))

    from agentmark.prompt_core import (
        ExecutorTextResult,
        UsageData,
        create_executor,
        create_webhook_runner,
        serve_webhook_runner,
    )
    from agentmark_sdk import AgentMarkSDK
    from agentmark_client import client

    # Local tracing: the runner wires span HOOKS, but spans only export once
    # tracing is initialized. Point the exporter at the local dev API server
    # (unauthenticated — no cloud keys needed); without this, runs work but
    # `agentmark doctor --smoke` and the local trace UI never see a trace.
    AgentMarkSDK(
        api_key="local-dev",
        app_id="local-dev",
        base_url=os.environ.get("AGENTMARK_DEV_SERVER", "http://localhost:9418"),
    ).init_tracing(disable_batch=True)

    # The executor is the one place your SDK gets called. Copy a complete one for
    # your SDK from the "Connect your SDK" reference below; the placeholder shows
    # the shape.
    def _text(formatted, ctx) -> ExecutorTextResult:
        # Call your SDK with formatted.text_config.model_name +
        # [m.model_dump() for m in formatted.messages] (it's a Pydantic
        # model), return text + canonical token usage.
        return ExecutorTextResult(text="", usage=UsageData(input_tokens=0, output_tokens=0))

    executor = create_executor(name="my-sdk", text=_text)

    # The runner shares your app client — loader AND evals come from it
    # (register evals in agentmark_client.py; they both RUN in experiments
    # and LIST in the Dashboard's New Experiment dialog).
    runner = create_webhook_runner(client, executor)

    # Serve the runner over HTTP. `agentmark dev` spawns this file with
    # --webhook-port and expects the process to keep running;
    # serve_webhook_runner parses that flag and blocks serving runner.dispatch.
    serve_webhook_runner(runner)
    ```

    The last line is what keeps the process alive: without it the file builds a runner and exits, and `agentmark dev` reports the webhook server stopped. See [Python dev server](/reference/python-dev-server) for the entry-point details (argument parsing, ports, and the wire contract it serves).

    Start the dev stack (the CLI detects your virtualenv) and run a prompt:

    ```bash theme={null}
    agentmark dev
    # in another terminal:
    agentmark run-prompt ./agentmark/my-prompt.prompt.mdx
    ```

    ```text theme={null}
    === Text Prompt Results ===
    The capital of France is Paris.
    ────────────────────────────────────────────────────────────
    🪙 12 in, 8 out, 20 total
    ```

    Experiments work the same way; datasets resolve through the local API server. Your prompt needs a dataset first (`test_settings.dataset` in its frontmatter; see [Datasets](/evaluate/datasets)):

    ```bash theme={null}
    agentmark run-experiment ./agentmark/my-prompt.prompt.mdx
    ```

    More on the Python dev server (custom entry points, ports, environment): [Python dev server](/reference/python-dev-server).

    ## Step 4: Add a deployment entry point (`handler.py`)

    AgentMark Cloud executes your client through a single async **handler** function. Each Dashboard run (playground or experiment) arrives as one `{type, data}` event; the runner's `dispatch` routes it:

    ```python handler.py theme={null}
    """AgentMark Cloud deployment entry point.

    The deployment pipeline wraps this file in a managed HTTP server. Each
    dashboard run (playground or experiment) arrives as one {type, data} event.
    """

    import os

    from agentmark_sdk import AgentMarkSDK
    from agentmark.prompt_core import (
        ExecutorTextResult,
        UsageData,
        create_executor,
        create_webhook_runner,
    )
    from agentmark_client import client

    # Initialize tracing once, at import time — before any run is dispatched.
    # Without it, runs and experiments triggered from the Dashboard still
    # execute, but emit no traces, so nothing shows up in the app. The
    # deployment pipeline injects AGENTMARK_API_KEY / AGENTMARK_APP_ID /
    # AGENTMARK_BASE_URL automatically. See /observe/tracing-setup.
    _sdk = AgentMarkSDK(
        api_key=os.environ["AGENTMARK_API_KEY"],
        app_id=os.environ["AGENTMARK_APP_ID"],
        base_url=os.environ.get("AGENTMARK_BASE_URL", "https://api.agentmark.co"),
    )
    _sdk.init_tracing()

    # Same executor + runner as your dev server — copy a complete executor from
    # the "Connect your SDK" reference below. Evals come from the client (register
    # them in agentmark_client.py) so they both run in experiments and list
    # in the New Experiment dialog.
    def _text(formatted, ctx) -> ExecutorTextResult:
        return ExecutorTextResult(text="", usage=UsageData(input_tokens=0, output_tokens=0))

    executor = create_executor(name="my-sdk", text=_text)
    runner = create_webhook_runner(client, executor)

    # The deployed handler IS the runner's dispatch — it routes prompt-run,
    # dataset-run, and the control-plane get-evals the New Experiment dialog
    # reads, sourcing evals from the runner's client. Nothing to hand-roll.
    handler = runner.dispatch
    ```

    The pipeline resolves your handler in this order: the `handler` key in `agentmark.json` if set, then `handler.py`, then `handler.ts` at the repository root. See [handler detection](/deploy/deployment#handler-detection).

    ## Step 5: Deploy

    1. **Connect your repository** in the Dashboard (the app's setup card, or **Deployments**). Every push then triggers the [deployment pipeline](/deploy/deployment): file sync, then a code deploy of your handler to a managed machine.
    2. **Set your provider keys** under **Settings → Environment Variables** (e.g. `OPENAI_API_KEY`). AgentMark Cloud injects `AGENTMARK_API_KEY`, `AGENTMARK_APP_ID`, and `AGENTMARK_BASE_URL` automatically, so your client's Cloud loader is already wired for them.
    3. **Push.** Watch the build under **Deployments**; when it goes green, Run buttons in the playground and experiments go live against your deployed client.

    <Note>
      Deployed **experiments** stream their datasets from your linked repository (at the environment's branch or pinned commit). The repo connection does more than sync: it's how your datasets reach the deployed client at run time.
    </Note>
  </Tab>
</Tabs>

## Connect your SDK

Steps 3 and 4 wire an **executor** into your runner: the one function that calls your SDK. This section is the reference for that function. Most apps that run prompts in their own code never need it (see [Running prompts](/build/running-prompts)); it matters only when you let **AgentMark Cloud** run a prompt for you, via the Dashboard **Run** button and Cloud-driven **experiments**. Copy the setup that matches your SDK into the `dev-entry` and `handler` from the steps above.

### Write an executor

`createExecutor` takes a pair of handlers (`text` / `object`). Each receives `formatted` (the neutral rendered prompt) and returns `{ text | object, usage }`. That's the whole contract; [Client setup](/getting-started/client-setup) handles wiring it into a runner and serving it.

<CodeGroup>
  ```ts TypeScript theme={null}
  import { createExecutor } from "@agentmark-ai/prompt-core";

  export const executor = createExecutor({
    name: "bedrock-converse",
    // `formatted.text_config.model_name` and `.messages` come typed, no cast.
    text: async (formatted) => {
      const res = await bedrock.converse({ modelId: formatted.text_config.model_name, messages: formatted.messages });
      return { text: res.outputText, usage: res.usage };
    },
    object: async (formatted) => {
      const res = await bedrock.converse({ modelId: formatted.text_config.model_name, messages: formatted.messages });
      return { object: JSON.parse(res.outputText), usage: res.usage };
    },
  });
  ```

  ```python Python theme={null}
  from agentmark.prompt_core import create_executor, ExecutorTextResult, ExecutorObjectResult, UsageData
  import json

  # `formatted` is the neutral render: a TextConfigSchema / ObjectConfigSchema
  # Pydantic model, so use attribute access, and model_dump the messages.
  def _text(formatted, ctx) -> ExecutorTextResult:
      res = bedrock.converse(
          modelId=formatted.text_config.model_name,
          messages=[m.model_dump(exclude_none=True) for m in formatted.messages],
      )
      return ExecutorTextResult(text=res.output_text, usage=UsageData(input_tokens=res.in_tokens, output_tokens=res.out_tokens))

  def _object(formatted, ctx) -> ExecutorObjectResult:
      res = bedrock.converse(
          modelId=formatted.text_config.model_name,
          messages=[m.model_dump(exclude_none=True) for m in formatted.messages],
      )
      return ExecutorObjectResult(object=json.loads(res.output_text), usage=UsageData(input_tokens=res.in_tokens, output_tokens=res.out_tokens))

  executor = create_executor(name="bedrock-converse", text=_text, object=_object)
  ```
</CodeGroup>

### Reference setups

Complete, copy-paste executors for the SDKs teams reach for most. Each calls your SDK directly. Copy the closest one, adjust the model mapping, and you're done. Every one takes the neutral render and returns `{ text | object, usage }`.

#### Vercel AI SDK

Wraps the `ai` package's `generateText` / `streamText` (and `generateObject` for structured output), with both one-shot and streaming text paths:

```ts theme={null}
import { createExecutor } from "@agentmark-ai/prompt-core";
import { generateText, streamText, generateObject, jsonSchema } from "ai";
import { openai } from "@ai-sdk/openai";

// Minimal model resolution: strip an optional "openai/" prefix. Swap in your
// own provider map (anthropic, google, …) keyed off the model name as needed.
const model = (name: string) => openai(name.replace(/^openai\//, ""));

export const executor = createExecutor({
  name: "vercel-ai-sdk",
  text: async (formatted, ctx) => {
    const { text, usage } = await generateText({
      model: model(formatted.text_config.model_name),
      messages: formatted.messages,
      abortSignal: ctx.signal,
    });
    // The executor wants { inputTokens, outputTokens }; the AI SDK reports usage in the same shape.
    return { text, usage: { inputTokens: usage.inputTokens, outputTokens: usage.outputTokens } };
  },
  streamText: async function* (formatted, ctx) {
    const result = streamText({
      model: model(formatted.text_config.model_name),
      messages: formatted.messages,
      abortSignal: ctx.signal,
    });
    // Consume `fullStream`, not `textStream`: a failed model call (bad API
    // key, rate limit, unknown model) surfaces ONLY as an `error` part on
    // fullStream. textStream just ends silently, and `await result.usage`
    // then rejects with the AI SDK's generic "No output generated. Check the
    // stream for errors.", hiding the real cause from the dashboard.
    for await (const part of result.fullStream) {
      if (part.type === "error") throw part.error; // builder emits it as a terminal error event
      if (part.type === "text-delta") yield { type: "text-delta", text: part.text };
      if (part.type === "finish") {
        const usage = part.totalUsage ?? part.usage;
        yield {
          type: "finish",
          reason: part.finishReason ?? "stop",
          usage: { inputTokens: usage.inputTokens, outputTokens: usage.outputTokens },
        };
      }
    }
  },
  object: async (formatted, ctx) => {
    const { object, usage } = await generateObject({
      model: model(formatted.object_config.model_name),
      messages: formatted.messages,
      schema: jsonSchema(formatted.object_config.schema),
      abortSignal: ctx.signal,
    });
    return { object, usage: { inputTokens: usage.inputTokens, outputTokens: usage.outputTokens } };
  },
});
```

#### OpenAI (raw SDK)

The official OpenAI SDK's `chat.completions.create`. In TypeScript the neutral messages need a cast to OpenAI's `ChatCompletionMessageParam[]`. The shapes are structurally compatible, but TypeScript won't infer it, so the call doesn't type-check without the cast. In Python, `formatted` is a Pydantic model so `model_dump` the messages:

<CodeGroup>
  ```ts TypeScript theme={null}
  import { createExecutor } from "@agentmark-ai/prompt-core";
  import OpenAI from "openai";

  const openai = new OpenAI();
  const model = (name: string) => name.replace(/^openai\//, "");

  export const executor = createExecutor({
    name: "openai",
    text: async (formatted, ctx) => {
      const res = await openai.chat.completions.create(
        {
          model: model(formatted.text_config.model_name),
          messages: formatted.messages as OpenAI.Chat.ChatCompletionMessageParam[],
        },
        { signal: ctx.signal },
      );
      return {
        text: res.choices[0].message.content ?? "",
        finishReason: res.choices[0].finish_reason,
        usage: { inputTokens: res.usage?.prompt_tokens ?? 0, outputTokens: res.usage?.completion_tokens ?? 0 },
      };
    },
    object: async (formatted, ctx) => {
      const res = await openai.chat.completions.create(
        {
          model: model(formatted.object_config.model_name),
          messages: formatted.messages as OpenAI.Chat.ChatCompletionMessageParam[],
          response_format: {
            type: "json_schema",
            json_schema: { name: "response", schema: formatted.object_config.schema, strict: true },
          },
        },
        { signal: ctx.signal },
      );
      return {
        object: JSON.parse(res.choices[0].message.content ?? "{}"),
        usage: { inputTokens: res.usage?.prompt_tokens ?? 0, outputTokens: res.usage?.completion_tokens ?? 0 },
      };
    },
  });
  ```

  ```python Python theme={null}
  from agentmark.prompt_core import create_executor, ExecutorTextResult, UsageData
  from openai import OpenAI

  sdk = OpenAI()

  def _model(name: str) -> str:
      return name.removeprefix("openai/")

  def _text(formatted, ctx) -> ExecutorTextResult:
      res = sdk.chat.completions.create(
          model=_model(formatted.text_config.model_name),
          messages=[m.model_dump(exclude_none=True) for m in formatted.messages],
      )
      usage = res.usage
      return ExecutorTextResult(
          text=res.choices[0].message.content or "",
          usage=UsageData(input_tokens=usage.prompt_tokens, output_tokens=usage.completion_tokens),
      )

  executor = create_executor(name="openai", text=_text)
  ```
</CodeGroup>

#### Anthropic (raw SDK)

The `@anthropic-ai/sdk` `messages.create`. Anthropic takes `system` as a top-level field and requires `max_tokens`, so split the system message out of the neutral render:

```ts theme={null}
import { createExecutor } from "@agentmark-ai/prompt-core";
import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic();
const model = (name: string) => name.replace(/^anthropic\//, "");

export const executor = createExecutor({
  name: "anthropic",
  text: async (formatted, ctx) => {
    const system = formatted.messages.filter((m) => m.role === "system").map((m) => m.content).join("\n");
    const messages = formatted.messages.filter((m) => m.role !== "system");
    const res = await anthropic.messages.create(
      {
        model: model(formatted.text_config.model_name),
        max_tokens: formatted.text_config.max_tokens ?? 1024,
        system,
        messages: messages as Anthropic.MessageParam[],
      },
      { signal: ctx.signal },
    );
    const text = res.content.map((b) => (b.type === "text" ? b.text : "")).join("");
    return { text, usage: { inputTokens: res.usage.input_tokens, outputTokens: res.usage.output_tokens } };
  },
});
```

#### Amazon Bedrock (Python)

Bedrock's `invoke_model` takes a different request shape: `anthropic_version` lives in the body, the request must include `max_tokens`, and the model ID is a full cross-region inference profile ID, not the short alias in the prompt's `model_name`. Map it explicitly.

The runner automatically stamps `gen_ai.operation.name = "chat"` and the config alias on the span, so the Requests view and cost attribution work with no extra code. To surface the **full inference profile ID** in the dashboard instead of the alias, override `gen_ai.request.model` on the span after your call (`set_attribute` is last-write-wins):

```python theme={null}
import json
import boto3
from agentmark.prompt_core import create_executor, ExecutorTextResult, UsageData

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

# Map the short config alias to the full cross-region inference profile ID.
MODEL_MAP: dict[str, str] = {
    "us.anthropic.claude-opus-4-8": "us.anthropic.claude-opus-4-8-20251101-v1:0",
    "us.anthropic.claude-sonnet-4-6": "us.anthropic.claude-sonnet-4-6-20251001-v1:0",
    # add more as needed
}

def _text(formatted, ctx) -> ExecutorTextResult:
    model_id = MODEL_MAP.get(formatted.text_config.model_name, formatted.text_config.model_name)

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": formatted.text_config.max_tokens or 1024,
        "messages": [m.model_dump(exclude_none=True) for m in formatted.messages],
    }
    response = bedrock.invoke_model(
        modelId=model_id, body=json.dumps(body),
        contentType="application/json", accept="application/json",
    )
    payload = json.loads(response["body"].read())
    text = "".join(b["text"] for b in payload["content"] if b.get("type") == "text")
    usage = payload.get("usage", {})

    # Show what actually ran (the resolved profile ID), not the config alias.
    span = (ctx.extra or {}).get("span")
    if span:
        span.set_attribute("gen_ai.request.model", model_id)

    return ExecutorTextResult(
        text=text,
        usage=UsageData(input_tokens=usage.get("input_tokens", 0), output_tokens=usage.get("output_tokens", 0)),
    )

executor = create_executor(name="bedrock", text=_text)
```

#### Agent frameworks (Pydantic AI, Mastra, Claude Agent SDK)

Agent frameworks follow the identical shape; the only difference is that your handler runs an **agent loop** instead of a single completion. Feed the render's messages into your agent, run it, and return its final output plus token usage:

<CodeGroup>
  ```ts TypeScript theme={null}
  import { createExecutor } from "@agentmark-ai/prompt-core";

  export const executor = createExecutor({
    name: "my-agent-framework",
    text: async (formatted, ctx) => {
      // Run your agent (Pydantic AI Agent.run, a Mastra Agent, Claude's query(), …)
      // with formatted.messages, honoring ctx.signal for cancellation.
      const result = await runMyAgent(formatted.messages, { signal: ctx.signal });
      return { text: result.output, usage: { inputTokens: result.inputTokens, outputTokens: result.outputTokens } };
    },
  });
  ```

  ```python Python theme={null}
  from agentmark.prompt_core import create_executor, ExecutorTextResult, UsageData

  async def _text(formatted, ctx) -> ExecutorTextResult:
      # Run your agent (Pydantic AI Agent.run, a Mastra agent, Claude's query(), …)
      # with formatted.messages.
      result = await run_my_agent([m.model_dump(exclude_none=True) for m in formatted.messages])
      return ExecutorTextResult(
          text=result.output,
          usage=UsageData(input_tokens=result.input_tokens, output_tokens=result.output_tokens),
      )

  executor = create_executor(name="my-agent-framework", text=_text)
  ```
</CodeGroup>

For token-by-token output and tool-call events, use a streaming handler and yield `text-delta` / `tool-call` / `tool-result` events as the agent emits them. See [Streaming SDKs](#streaming-sdks).

### Streaming SDKs

If your SDK streams (for example, Bedrock `ConverseStream`), use the streaming handlers instead of buffering. They `yield` the same content events (`text-delta`, `tool-call`, …) and report usage plus the finish reason on a `finish` event you yield; the builder emits the single terminal `finish` for you:

<CodeGroup>
  ```ts TypeScript theme={null}
  const executor = createExecutor({
    name: "bedrock-converse-stream",
    streamText: async function* (formatted) {
      for await (const chunk of bedrock.converseStream({ modelId, messages: formatted.messages })) {
        if (chunk.delta) yield { type: "text-delta", text: chunk.delta };
        // Report usage + provider stop reason on a finish event: the builder
        // captures it and emits the one terminal finish.
        else if (chunk.usage) yield { type: "finish", reason: chunk.stopReason ?? "stop", usage: chunk.usage };
      }
    },
  });
  ```

  ```python Python theme={null}
  from agentmark.prompt_core import create_executor, TextDeltaEvent, FinishEvent, UsageData

  async def _stream_text(formatted, ctx):
      async for chunk in bedrock.converse_stream(messages=[m.model_dump(exclude_none=True) for m in formatted.messages]):
          if chunk.get("delta"):
              yield TextDeltaEvent(text=chunk["delta"])
          # Report usage + provider stop reason on a FinishEvent: the builder
          # captures it and emits the one terminal finish.
          elif chunk.get("usage"):
              yield FinishEvent(
                  reason=chunk.get("stopReason", "stop"),
                  usage=UsageData(input_tokens=chunk["usage"]["in"], output_tokens=chunk["usage"]["out"]),
              )

  executor = create_executor(name="bedrock-converse-stream", stream_text=_stream_text)
  ```
</CodeGroup>

<Note>
  Streaming **object** handlers yield `object-delta` / `object-final` events (`ObjectDeltaEvent` / `ObjectFinalEvent` in Python) and a `finish` carrying usage. If your SDK only streams cumulative partials (no explicit final), the builder uses the last delta as the resolved value, so AgentMark Cloud always receives a complete object.
</Note>

### Validate your executor

Run the conformance suite. One call confirms your executor emits a protocol-correct stream for every kind, streaming and one-shot, including the error path. `errorInput` is a malformed render your handler rejects before any network call:

<CodeGroup>
  ```ts TypeScript theme={null}
  import { runExecutorConformance } from "@agentmark-ai/prompt-core";

  await runExecutorConformance(executor, {
    text: { messages: [{ role: "user", content: "hello" }], text_config: { model_name: "openai/gpt-4o" } },
    object: { messages: [{ role: "user", content: "give me JSON" }], object_config: { model_name: "openai/gpt-4o" } },
    errorInput: { messages: null },
  });
  ```

  ```python Python theme={null}
  import asyncio
  from agentmark.prompt_core import (
      run_executor_conformance,
      TextConfigSchema, TextSettingsSchema, ObjectConfigSchema, ObjectSettingsSchema,
  )

  asyncio.run(run_executor_conformance(
      executor,
      text=TextConfigSchema(
          name="conformance",
          messages=[{"role": "user", "content": "hello"}],
          text_config=TextSettingsSchema(model_name="openai/gpt-4o"),
      ),
      object=ObjectConfigSchema(
          name="conformance",
          messages=[{"role": "user", "content": "give me JSON"}],
          object_config=ObjectSettingsSchema(
              model_name="openai/gpt-4o",
              schema={"type": "object", "properties": {"answer": {"type": "string"}}},
          ),
      ),
      error_input={"messages": None},  # a malformed render your handler rejects
  ))
  ```
</CodeGroup>

Unless you pin `ctx`, the suite runs your executor **twice**, once streaming and once one-shot, so if you supply both a one-shot and a streaming handler, the suite validates both branches (a broken one-shot path won't hide behind a working stream).

<Note>
  Provider-specific **parameter** mapping (tool wiring, custom settings, full request control) also lives in your executor: its handlers receive the neutral render and build the exact request your SDK expects. See the [resolve-by-name tools pattern](/build/tools-and-agents) for wiring frontmatter tool names to implementations.
</Note>

### Model names vs provider model IDs

`formatted.text_config.model_name` is the prompt's `model_name` verbatim: a registry ID in `provider/model` form. Your executor owns the translation to whatever ID your SDK expects. Two common shapes:

**Strip the provider prefix** when the registry ID *is* your SDK's model ID, the usual case (`openai/gpt-4o` → `gpt-4o`):

<CodeGroup>
  ```ts TypeScript theme={null}
  const modelId = formatted.text_config.model_name.replace(/^[^/]+\//, "");
  ```

  ```python Python theme={null}
  model_id = formatted.text_config.model_name.split("/", 1)[-1]
  ```
</CodeGroup>

**Map names explicitly** when your prompts declare one provider's names but your executor calls another (for example, prompts on `anthropic/claude-sonnet-4-6`, production on Bedrock). Keep the dict in the executor so it's versioned with the code, and fail loudly on unmapped names instead of passing them through (an unmapped name otherwise surfaces as a confusing provider-side 404):

<CodeGroup>
  ```ts TypeScript theme={null}
  const BEDROCK_IDS: Record<string, string> = {
    "anthropic/claude-opus-4-6": "global.anthropic.claude-opus-4-6-v1",
    "anthropic/claude-sonnet-4-6": "global.anthropic.claude-sonnet-4-6",
    "anthropic/claude-haiku-4-5": "global.anthropic.claude-haiku-4-5-20251001-v1:0",
  };
  const modelId = BEDROCK_IDS[formatted.text_config.model_name];
  if (!modelId) throw new Error(`No Bedrock mapping for ${formatted.text_config.model_name}`);
  ```

  ```python Python theme={null}
  BEDROCK_IDS = {
      "anthropic/claude-opus-4-6": "global.anthropic.claude-opus-4-6-v1",
      "anthropic/claude-sonnet-4-6": "global.anthropic.claude-sonnet-4-6",
      "anthropic/claude-haiku-4-5": "global.anthropic.claude-haiku-4-5-20251001-v1:0",
  }
  model_id = BEDROCK_IDS.get(formatted.text_config.model_name)
  if model_id is None:
      raise ValueError(f"No Bedrock mapping for {formatted.text_config.model_name}")
  ```
</CodeGroup>

Either way, declare the names your prompts use in `builtInModels` (a non-empty list is an allowlist). `agentmark pull-models --provider bedrock` lists the registry's Bedrock IDs.

## Let your agent set it up

The [AgentMark skill](/coding-agents/agent-skill) gives your AI tool (Claude Code, Cursor, etc.) a setup workflow that scaffolds everything on this page (the client file, the dev entry, and the handler) matched to your stack's language and SDK. Prompt it with:

```text theme={null}
Set up AgentMark in this project, including the client and a deployable handler.
```

The agent verifies its work the same way you would: `agentmark run-prompt` against the dev server.

## Troubleshooting

| Symptom                                    | Cause                                                                 | Fix                                                                                                                 |
| ------------------------------------------ | --------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
| `No dev server entry point found`          | Missing dev entry                                                     | Create `dev-entry.ts` at the project root (TS) or `.agentmark/dev_server.py` (Python); see Step 3                   |
| Local experiment fails `Not authorized`    | `AGENTMARK_API_KEY` is present, so the client picked the Cloud loader | Unset `AGENTMARK_API_KEY` (check `.env`) when running against `agentmark dev`, or set a valid Cloud key; see Step 2 |
| Run buttons disabled in the Dashboard      | No deployment for the selected environment                            | Deploy (Step 5)                                                                                                     |
| Deployed run fails `Authentication failed` | Stale deployment credentials                                          | Trigger a **Rebuild** under Deployments                                                                             |
| Deployed experiment finds no dataset       | Dataset isn't in the linked repo branch                               | Commit the `.jsonl` under `agentmark/` and push                                                                     |

<div className="mt-8 rounded-lg bg-blue-50 p-6 dark:bg-blue-900/30">
  <h3 className="font-semibold mb-3">Have questions?</h3>
  <p className="mb-4">Reach out any time:</p>

  <ul>
    <li>
      Email the team at <a href="mailto:hello@agentmark.co" className="text-blue-600 hover:text-blue-800 dark:text-blue-400 dark:hover:text-blue-200">[hello@agentmark.co](mailto:hello@agentmark.co)</a> for support
    </li>

    <li>
      Schedule an <a href="https://cal.com/ryan-randall/enterprise" className="text-blue-600 hover:text-blue-800 dark:text-blue-400 dark:hover:text-blue-200">Enterprise Demo</a> to learn about AgentMark's business solutions
    </li>
  </ul>
</div>
