Skip to main content
Already calling an LLM directly with the raw AWS Bedrock ConverseCommand, the OpenAI SDK, or a hand-rolled HTTP client? You don’t need to switch SDKs or write a full adapter to adopt AgentMark. Pick the path that matches what you want.

Observe & evaluate (lightweight)

Keep your SDK calls. Add prompt management, tracing, and experiments in your own app. No Executor, no adapter.

Cloud-executed prompts

Let the Dashboard “Run” your prompts and run managed experiments. Add a small Executor so AgentMark can call your SDK.

Path A: observe and evaluate in your own app

This covers the common migration: your code keeps calling Bedrock; AgentMark manages the prompts, traces the calls, and runs experiments. Everything runs in your process and reports to AgentMark Cloud. Zero adapter, zero executor.

1. Prompt management (neutral render)

Use the neutral render. createAgentMark (from @agentmark-ai/prompt-core) renders a .prompt.mdx to AgentMark’s neutral { messages, text_config } shape, which you pass straight to your SDK, and it’s fully type-safe.
npm install @agentmark-ai/prompt-core
import { createAgentMark } from "@agentmark-ai/prompt-core";
import { FileLoader } from "@agentmark-ai/prompt-core/loader-file";
// import { ApiLoader } from "@agentmark-ai/prompt-core/loader-api"; // ← Cloud-managed prompts instead

const client = createAgentMark<AgentmarkTypes>({ // AgentmarkTypes from: npx @agentmark-ai/cli generate-types
  // FileLoader reads the output of `npx @agentmark-ai/cli build` (pre-compiled JSON),
  // not your source prompt directory.
  loader: new FileLoader("./dist/agentmark"),
});

const prompt = await client.loadTextPrompt("support.prompt.mdx");
const { messages, text_config } = await prompt.format({ props: { question } });

// Your existing Bedrock call — unchanged.
const res = await bedrock.converse({ modelId: text_config.model_name, messages });
What the neutral render returns: model configuration (name, temperature, max tokens, and other frontmatter parameters), the rendered system/user/assistant messages, tool names referenced in frontmatter, and, for object prompts, the output schema. Map it to any provider’s API format (OpenAI, Anthropic, Google, Bedrock, Ollama, …). It’s also useful for inspecting the raw config AgentMark produces.

2. Tracing

Initialize once, then wrap any call. AgentMark’s tracing is SDK-agnostic.
import { AgentMarkSDK, observe } from "@agentmark-ai/sdk";

const sdk = new AgentMarkSDK({ apiKey: process.env.AGENTMARK_API_KEY!, appId: process.env.AGENTMARK_APP_ID! });
sdk.initTracing();

const tracedConverse = observe((messages) => bedrock.converse({ modelId, messages }), {
  name: "bedrock.converse",
});
await tracedConverse(messages); // → shows up in the AgentMark Dashboard

3. Experiments + evals

runExperiment takes your function as the task. It drives the dataset, traces each row, runs evaluators, and applies the regression gate.
const result = await sdk.runExperiment({
  experimentKey: "support-quality",
  dataset: [{ input: { question: "How do I get a refund?" }, expectedOutput: "refund" }],
  task: async (input) => {
    const { messages } = await (await client.loadTextPrompt("support.prompt.mdx")).format({ props: input });
    return (await bedrock.converse({ modelId, messages })).outputText;
  },
  evaluators: [{ name: "mentions_topic", evaluate: ({ output, expectedOutput }) => ({ score: output.includes(expectedOutput) ? 1 : 0 }) }],
  scoreThresholds: { mentions_topic: 1 },
});
if (!result.passed) process.exit(1); // gate in CI
That’s the full loop, Cloud-connected (login plus Cloud prompts plus traces plus experiment results in the Dashboard), with no AgentMark-specific model code.

Path B: let AgentMark Cloud run your prompts

To use the Dashboard Run prompt button, agentmark dev’s webhook, or Cloud-driven experiments, AgentMark needs to call your SDK, so you provide an Executor. The createExecutor builder gives you a pair of one-shot handlers and guarantees the AgentEvent wire protocol for you.
import { createAgentMark, createExecutor } from "@agentmark-ai/prompt-core";
import { FileLoader } from "@agentmark-ai/prompt-core/loader-file";
import { createWebhookRunner } from "@agentmark-ai/sdk";

const executor = createExecutor({
  name: "bedrock-converse",
  // `formatted` is the neutral rendered prompt, typed as `TextConfig` /
  // `ObjectConfig` — `formatted.text_config.model_name` and `.messages` are
  // typed, no cast. (Pairing with a custom adapter? `createExecutor<MyText,
  // MyObject>({...})` retypes `formatted` to your adapter's output shape.)
  text: async (formatted) => {
    const res = await bedrock.converse({ modelId: formatted.text_config.model_name, messages: formatted.messages });
    return { text: res.outputText, usage: res.usage };
  },
  object: async (formatted) => {
    const res = await bedrock.converse({ modelId: formatted.text_config.model_name, messages: formatted.messages });
    return { object: JSON.parse(res.outputText), usage: res.usage };
  },
});

// Register the loader AND your evals once, on the client — the runner sources
// both from it. Evals registered here both RUN in experiments and LIST in the
// Dashboard's New Experiment dialog (the get-evals control-plane job). Omit
// them and that dialog is silently empty — there's no other place a custom
// app's evals come from.
const client = createAgentMark({
  // FileLoader reads the output of `npx @agentmark-ai/cli build`, not your source dir.
  loader: new FileLoader("./dist/agentmark"),
  evals: {
    mentions_topic: ({ output, expectedOutput }) => ({
      score: output.includes(expectedOutput) ? 1 : 0,
    }),
  },
});

// One call wires your client + executor + tracing into a runner the CLI
// dev-server / gateway dispatches to.
const runner = createWebhookRunner({ client, executor });
Same primitives in Python: from agentmark.prompt_core import create_agentmark, create_executor, create_webhook_runner. Build the runner the same way (client = create_agentmark(loader=loader, evals=my_evals) then runner = create_webhook_runner(client, executor)), and the deployed managed handler is one line: handler = runner.dispatch (routes prompt-run / dataset-run / get-evals, sourcing evals from the runner’s client). Pass text=/object= (or stream_text=/stream_object=) handlers to create_executor, validate with run_executor_conformance. Streaming handlers yield the same stream events in both languages (TS as object literals like { type: "text-delta", text }, Python as dataclasses like TextDeltaEvent(text=...)) and report usage on a yielded finish (FinishEvent(reason=..., usage=...)). The builder folds it onto the single terminal finish.

Serve the runner

createWebhookRunner gives you a runner with runPrompt / runExperiment methods. AgentMark Cloud reaches it over HTTP, so you mount it one of two ways. Local development: agentmark dev and the Dashboard talk to a local webhook server:
import { createWebhookServer } from "@agentmark-ai/cli/runner-server";

// A WebhookRunner already satisfies the handler contract.
await createWebhookServer({ handler: runner, port: 9417 });
Managed deployment: your deployed app exposes a single handler(body) entry, and that’s just the runner’s dispatch, which routes the { type, data } jobs the gateway sends (prompt-run, dataset-run, and the control-plane get-evals), sourcing evals from the runner’s client:
// handler.ts (compiled to handler.mjs for the managed deploy)
// `runner` is the WebhookRunner built above — keep it in the same module.
// dispatch lives on the runner (from prompt-core), so the deployed app depends
// only on prompt-core + your SDK — not the CLI's dev-server-only tree.
import { AgentMarkSDK } from "@agentmark-ai/sdk";

// Initialize tracing once, at module load. `createWebhookRunner` already wired
// the span *hooks*, but spans only export once the provider is initialized —
// without this, Dashboard-dispatched runs and experiments produce no traces.
// registerGlobally: true also captures your SDK's own model span if it emits
// through the global OTel tracer. See /observe/tracing-setup.
new AgentMarkSDK({
  apiKey: process.env.AGENTMARK_API_KEY!,
  appId: process.env.AGENTMARK_APP_ID!,
  baseUrl: process.env.AGENTMARK_BASE_URL,
}).initTracing({ registerGlobally: true });

export default (body) => runner.dispatch(body);
That’s the whole Path B: createAgentMarkcreateExecutorcreateWebhookRunner → serve. The Dashboard can now Run your prompts and drive managed experiments against your SDK, with traces flowing for every Dashboard-dispatched run.

Streaming SDKs

If your SDK streams (e.g. Bedrock ConverseStream), use the streaming handlers instead of buffering, with the same protocol guarantees. Streaming handlers yield the same stream events the rest of the protocol uses (text-delta, tool-call, …) and report usage plus the finish reason on a finish event you yield; the builder emits the single terminal finish for you:
const executor = createExecutor({
  name: "bedrock-converse-stream",
  streamText: async function* (formatted) {
    for await (const chunk of bedrock.converseStream({ modelId, messages: formatted.messages })) {
      if (chunk.delta) yield { type: "text-delta", text: chunk.delta };
      // Report usage + provider stop reason on a finish event — the builder
      // captures it and emits the one terminal finish.
      else if (chunk.usage) yield { type: "finish", reason: chunk.stopReason ?? "stop", usage: chunk.usage };
    }
  },
});
Streaming object handlers yield object-delta / object-final events and a finish (carrying usage). If your SDK only streams cumulative partials (no explicit object-final), the builder uses the last delta as the resolved value, so AgentMark Cloud always receives a complete object.

Validate your executor

Run the conformance suite. One call confirms your executor emits a protocol-correct stream for every kind, including the error path:
import { runExecutorConformance } from "@agentmark-ai/prompt-core";

await runExecutorConformance(executor, {
  text: { messages: [{ role: "user", content: "hello" }], text_config: { model_name: "..." } },
  object: { messages: [{ role: "user", content: "give me JSON" }], object_config: { model_name: "..." } },
  // A payload your handler rejects before any network call — e.g. messages your
  // SDK validates as malformed — so the terminal-error path is exercised.
  errorInput: { messages: null },
});
Unless you pin ctx, the suite runs your executor twice, once streaming and once one-shot, so if you supply both a text and a streamText handler, both branches are validated (a broken one-shot path won’t hide behind a working stream).
Provider-specific parameter mapping (tool wiring, custom settings, full request control) also lives in your executor: its handlers receive the neutral render and build the exact request your SDK expects. See the resolve-by-name tools pattern for wiring frontmatter tool names to implementations.

Model names vs provider model IDs

formatted.text_config.model_name is the prompt’s model_name verbatim: a registry ID in provider/model form (what pull-models writes to builtInModels). Your executor owns the translation to whatever ID your SDK expects. Two common shapes: Strip the provider prefix when the registry ID is your SDK’s model ID. This is the usual case; for example openai/gpt-4ogpt-4o for the OpenAI SDK, or bedrock/global.anthropic.claude-opus-4-6-v1global.anthropic.claude-opus-4-6-v1 for Bedrock (the registry’s bedrock provider lists real Bedrock model IDs, including the global./us./eu. cross-region inference profiles):
const modelId = formatted.text_config.model_name.replace(/^[^/]+\//, "");
Map names explicitly when your prompts declare one provider’s names but your executor calls another, for example prompts standardized on anthropic/claude-sonnet-4-6 while production runs on Bedrock. Keep the dict in the executor so the mapping is versioned with the code that uses it, and fail loudly on unmapped names instead of passing them through (an unmapped name surfaces as a confusing provider-side 404 otherwise):
const BEDROCK_IDS: Record<string, string> = {
  "anthropic/claude-opus-4-6": "global.anthropic.claude-opus-4-6-v1",
  "anthropic/claude-sonnet-4-6": "global.anthropic.claude-sonnet-4-6",
  "anthropic/claude-haiku-4-5": "global.anthropic.claude-haiku-4-5-20251001-v1:0",
};
const modelId = BEDROCK_IDS[formatted.text_config.model_name];
if (!modelId) throw new Error(`No Bedrock mapping for ${formatted.text_config.model_name}`);
Either way, declare the names your prompts actually use in builtInModels (a non-empty list is an allowlist). npx @agentmark-ai/cli pull-models --provider bedrock lists the registry’s Bedrock IDs.

Migrating from the adapter packages

The SDK-specific adapter packages (@agentmark-ai/ai-sdk-v4-adapter, @agentmark-ai/ai-sdk-v5-adapter, @agentmark-ai/mastra-v0-adapter, agentmark-pydantic-ai-v0, and the Claude Agent SDK adapters) are removed. Everything they did is covered by the two paths above. Here’s the map: TypeScript. Whatever the adapter package exported your client as, the replacement is one import:
import { createAgentMark } from "@agentmark-ai/prompt-core";
If you used an adapter’s model registry + format() to call your SDK, switch to the neutral render (Path A above): format() gives you { messages, ...config }, and you make the SDK call yourself. If you used an adapter’s webhook handler (VercelAdapterWebhookHandler et al.) for agentmark dev or managed runs, replace it with Path B: createExecutor (your SDK call in a one-shot handler) → createWebhookRunner → serve runner.dispatch. Python. The Python adapter packages have no shim; replace them with agentmark-prompt-core directly:
from agentmark.prompt_core import create_agentmark, create_executor, create_webhook_runner, ExecutorTextResult, UsageData

def _text(formatted, ctx) -> ExecutorTextResult:
    # formatted is a TextConfigSchema Pydantic model — attribute access, not subscripts
    res = my_sdk_call(
        model=formatted.text_config.model_name,
        messages=[m.model_dump(exclude_none=True) for m in formatted.messages],
    )
    return ExecutorTextResult(text=res.text, usage=UsageData(input_tokens=res.in_tokens, output_tokens=res.out_tokens))

executor = create_executor(name="my-sdk", text=_text)
client = create_agentmark(loader=loader)  # register evals here too: create_agentmark(loader=loader, evals=my_evals)
runner = create_webhook_runner(client, executor)
handler = runner.dispatch  # managed deploys serve this for you
For local dev, .agentmark/dev_server.py serves the runner over HTTP with serve_webhook_runner(runner), the Python counterpart of the TS createWebhookServer (see the Python dev server reference and client setup for a complete file). Tracing still needs an explicit AgentMarkSDK(...).init_tracing(...) call at startup: the runner wires span hooks, but spans only export once tracing is initialized. Working SDK-specific handler bodies (OpenAI, Anthropic, Bedrock, agent frameworks) live in reference executors.

Have questions?

Reach out any time: