Bring your own SDK - AgentMark Docs

Already calling an LLM directly — the raw AWS Bedrock ConverseCommand, the OpenAI SDK, a hand-rolled HTTP client? You don’t need to switch SDKs or write a full adapter to adopt AgentMark. Pick the path that matches what you want.

Observe & evaluate (lightweight)

Keep your SDK calls. Add prompt management, tracing, and experiments in your own app. No Executor, no adapter.

Cloud-executed prompts

Let the dashboard “Run” your prompts and run managed experiments. Add a small Executor so AgentMark can call your SDK.

Path A — observe & evaluate in your own app

This covers the common migration: your code keeps calling Bedrock; AgentMark manages the prompts, traces the calls, and runs experiments. Everything runs in your process and reports to the cloud. Zero adapter, zero executor.

1. Prompt management (neutral render)

Use the Default integration — it renders a .prompt.mdx to AgentMark’s neutral { messages, text_config } shape, which you pass straight to your SDK. It’s fully type-safe.

import { createAgentMarkClient } from "@agentmark-ai/fallback-adapter";
import { FileLoader } from "@agentmark-ai/loader-file";
// import { ApiLoader } from "@agentmark-ai/sdk"; // ← cloud-managed prompts instead

const client = createAgentMarkClient<AgentmarkTypes>({
  loader: new FileLoader(process.cwd()),
});

const prompt = await client.loadTextPrompt("support.prompt.mdx");
const { messages, text_config } = await prompt.format({ props: { question } });

// Your existing Bedrock call — unchanged.
const res = await bedrock.converse({ modelId: text_config.model_name, messages });

2. Tracing

Initialize once, then wrap any call — AgentMark’s tracing is SDK-agnostic.

import { AgentMarkSDK, observe } from "@agentmark-ai/sdk";

const sdk = new AgentMarkSDK({ apiKey: process.env.AGENTMARK_API_KEY!, appId: process.env.AGENTMARK_APP_ID! });
sdk.initTracing();

const tracedConverse = observe((messages) => bedrock.converse({ modelId, messages }), {
  name: "bedrock.converse",
});
await tracedConverse(messages); // → shows up in the AgentMark dashboard

3. Experiments + evals

runExperiment takes your function as the task — it drives the dataset, traces each row, runs evaluators, and applies the regression gate.

const result = await sdk.runExperiment({
  experimentKey: "support-quality",
  dataset: [{ input: { question: "How do I get a refund?" }, expectedOutput: "refund" }],
  task: async (input) => {
    const { messages } = await (await client.loadTextPrompt("support.prompt.mdx")).format({ props: input });
    return (await bedrock.converse({ modelId, messages })).outputText;
  },
  evaluators: [{ name: "mentions_topic", evaluate: ({ output, expectedOutput }) => ({ score: output.includes(expectedOutput) ? 1 : 0 }) }],
  scoreThresholds: { mentions_topic: 1 },
});
if (!result.passed) process.exit(1); // gate in CI

That’s the full loop — cloud-connected (login + cloud prompts + traces + experiment results in the dashboard) — with no AgentMark-specific model code.

Path B — let the cloud run your prompts

To use the dashboard Run prompt button, agentmark dev’s webhook, or platform-driven experiments, AgentMark needs to call your SDK — so you provide an Executor. The createExecutor builder makes this a pair of one-shot handlers; it guarantees the AgentEvent wire protocol for you (no async-generator plumbing, no usage/finish/error footguns).

import { createExecutor } from "@agentmark-ai/prompt-core";
import { createWebhookRunner } from "@agentmark-ai/sdk";

const executor = createExecutor({
  name: "bedrock-converse",
  // `formatted` is the neutral rendered prompt, typed as `TextConfig` /
  // `ObjectConfig` — `formatted.text_config.model_name` and `.messages` are
  // typed, no cast. (Pairing with a custom adapter? `createExecutor<MyText,
  // MyObject>({...})` retypes `formatted` to your adapter's output shape.)
  text: async (formatted) => {
    const res = await bedrock.converse({ modelId: formatted.text_config.model_name, messages: formatted.messages });
    return { text: res.outputText, usage: res.usage };
  },
  object: async (formatted) => {
    const res = await bedrock.converse({ modelId: formatted.text_config.model_name, messages: formatted.messages });
    return { object: JSON.parse(res.outputText), usage: res.usage };
  },
});

// One call wires the executor + neutral adapter + tracing into a runner the
// CLI dev-server / gateway dispatches to.
const runner = createWebhookRunner({ executor, loader: new FileLoader(process.cwd()) });

Same primitive in Python: from agentmark.prompt_core import create_executor — pass text=/object= (or stream_text=/stream_object=) handlers, validate with run_executor_conformance. Streaming handlers yield the same stream events in both languages — TS as object literals ({ type: "text-delta", text }), Python as dataclasses (TextDeltaEvent(text=...)) — and report usage on a yielded finish (FinishEvent(reason=..., usage=...)). The builder folds it onto the single terminal finish.

Serve the runner

createWebhookRunner gives you a runner with runPrompt / runExperiment methods. The cloud reaches it over HTTP, so you mount it one of two ways. Local development — agentmark dev and the dashboard talk to a local webhook server:

import { createWebhookServer } from "@agentmark-ai/cli/runner-server";

// A WebhookRunner already satisfies the handler contract.
await createWebhookServer({ handler: runner, port: 9417 });

Managed deployment — your deployed app exposes a single handler(body) entry. handleWebhookRequest bridges the runner’s methods to the { type, data } request the gateway dispatches:

// handler.ts (compiled to handler.mjs for the managed deploy)
import { handleWebhookRequest } from "@agentmark-ai/cli/runner-server";

export default (body) => handleWebhookRequest(body, runner);

That’s the whole Path B: createExecutor → createWebhookRunner → serve. The dashboard can now Run your prompts and drive managed experiments against your SDK.

Streaming SDKs

If your SDK streams (e.g. Bedrock ConverseStream), use the streaming handlers instead of buffering — same protocol guarantees. Streaming handlers yield the same stream events the rest of the protocol uses (text-delta, tool-call, …) and report usage + the finish reason on a finish event you yield; the builder emits the single terminal finish for you:

const executor = createExecutor({
  name: "bedrock-converse-stream",
  streamText: async function* (formatted) {
    for await (const chunk of bedrock.converseStream({ modelId, messages: formatted.messages })) {
      if (chunk.delta) yield { type: "text-delta", text: chunk.delta };
      // Report usage + provider stop reason on a finish event — the builder
      // captures it and emits the one terminal finish.
      else if (chunk.usage) yield { type: "finish", reason: chunk.stopReason ?? "stop", usage: chunk.usage };
    }
  },
});

Streaming object handlers yield object-delta / object-final events and a finish (carrying usage). If your SDK only streams cumulative partials (no explicit object-final), the builder uses the last delta as the resolved value — so the cloud always receives a complete object.

Validate your executor

Run the conformance suite — one call confirms your executor emits a protocol-correct stream for every kind, including the error path:

import { runExecutorConformance } from "@agentmark-ai/prompt-core";

await runExecutorConformance(executor, {
  text: { messages: [{ role: "user", content: "hello" }], text_config: { model_name: "..." } },
  object: { messages: [{ role: "user", content: "give me JSON" }], object_config: { model_name: "..." } },
  // A payload your handler rejects before any network call — e.g. messages your
  // SDK validates as malformed — so the terminal-error path is exercised.
  errorInput: { messages: null },
});

Unless you pin ctx, the suite runs your executor twice — once streaming, once one-shot — so if you supply both a text and a streamText handler, both branches are validated (a broken one-shot path won’t hide behind a working stream).

createExecutor only covers the model-call (Executor) half. If you also need provider-specific parameter mapping (tool wiring, custom settings) rather than the neutral config, write a custom adapter and pass your executor to createWebhookRunner via the same path.

Have Questions?

We’re here to help! Choose the best way to reach us:

Email us at hello@agentmark.co for support
Schedule an Enterprise Demo to learn about our business solutions

Observe & evaluate (lightweight)

Cloud-executed prompts

​Path A — observe & evaluate in your own app

​1. Prompt management (neutral render)

​2. Tracing

​3. Experiments + evals

​Path B — let the cloud run your prompts

​Serve the runner

​Streaming SDKs

​Validate your executor

​Have Questions?

Path A — observe & evaluate in your own app

1. Prompt management (neutral render)

2. Tracing

3. Experiments + evals

Path B — let the cloud run your prompts

Serve the runner

Streaming SDKs

Validate your executor

Have Questions?