When you run a dataset in the AgentMark platform, it sends a dataset-run event to your webhook endpoint. This event contains the dataset items and prompt configuration for processing.

Event Format

{
  "event": {
    "type": "dataset-run",
    "data": {
      "datasetRunName": "string",
      "prompt": "// Prompt AST object"
    }
  }
}

Processing Dataset Runs

The webhook handler processes dataset runs by executing the prompt for each item in the dataset:

if (event.type === "dataset-run") {
  const data = event.data;
  const frontmatter = getFrontMatter(data.prompt) as any;
  const runId = crypto.randomUUID();

  if (frontmatter.text_config) {
    const prompt = await agentmarkClient.loadTextPrompt(data.prompt);
    const dataset = await prompt.formatWithDataset({
      datasetPath: frontmatter?.test_settings?.dataset,
      telemetry: { isEnabled: true },
    });

    const stream = new ReadableStream({
      async start(controller) {
        let index = 0;
        for await (const item of dataset) {
          const traceId = crypto.randomUUID();
          const result = await generateText({
            ...item.formatted,
            experimental_telemetry: {
              ...item.formatted.experimental_telemetry,
              metadata: {
                ...item.formatted.experimental_telemetry?.metadata,
                dataset_run_id: runId,
                dataset_path: frontmatter?.test_settings?.dataset,
                dataset_run_name: data.datasetRunName,
                dataset_item_name: index,
                traceName: `ds-run-${data.datasetRunName}-${index}`,
                traceId,
                dataset_expected_output: item.dataset.expected_output,
              },
            },
          });

          const chunk =
            JSON.stringify({
              type: "dataset",
              result: {
                input: item.dataset.input,
                expectedOutput: item.dataset.expected_output,
                actualOutput: result.text,
                tokens: result.usage?.totalTokens,
              },
              runId,
              runName: data.datasetRunName,
            }) + "\n";

          controller.enqueue(chunk);
          index++;
        }
        controller.close();
      },
    });

    return new Response(stream, {
      headers: {
        "AgentMark-Streaming": "true",
      },
    });
  }

  // Handle object_config similarly...
}

Streaming Response

Dataset runs now return streaming responses for real-time processing updates. Each chunk in the stream contains:

{
  type: "dataset",
  result: {
    input: any,              // Original dataset item input
    expectedOutput: any,     // Expected output from dataset
    actualOutput: any,       // Generated output from model
    tokens: number,          // Token usage for this item
  },
  runId: string,            // Unique run identifier
  runName: string,          // Dataset run name
}

Telemetry

Each dataset item includes comprehensive telemetry information:

const telemetry = {
  dataset_run_id: runId, // required
  dataset_path: frontmatter?.test_settings?.dataset, // required
  dataset_run_name: data.datasetRunName, // required
  dataset_item_name: index, // required
  traceName: `ds-run-${data.datasetRunName}-${index}`, // required
  traceId: traceId, // required
  dataset_expected_output: item.dataset.expected_output, // required
};

Error Handling

Handle errors appropriately in your webhook:

try {
  // Process dataset run
} catch (error) {
  console.error("Dataset run error:", error);
  return NextResponse.json(
    { message: "Error processing dataset run" },
    { status: 500 }
  );
}

Best Practices

  1. Streaming

    • Always return streaming responses for dataset runs
    • Use proper headers: "AgentMark-Streaming": "true"
    • Handle stream errors appropriately
  2. Telemetry

    • Include all required metadata in experimental_telemetry
    • Use unique traceId and runId for each execution
    • Track dataset progress and results
  3. Error Handling

    • Validate prompt configuration before processing
    • Handle individual item failures gracefully
    • Return appropriate HTTP status codes
  4. Performance

    • Process dataset items sequentially to avoid overwhelming the model
    • Use appropriate timeouts for long-running datasets
    • Monitor memory usage for large datasets

Next Steps

Have Questions?

We’re here to help! Choose the best way to reach us: