Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agentmark.co/llms.txt

Use this file to discover all available pages before exploring further.

Datasets are JSONL files containing test cases to validate prompt behavior. Each line has an input (required) and an optional expected_output. The same files power both Cloud and Local — in Cloud they sync to the Dashboard through the deployment pipeline, and in Local you run them directly from the CLI.

Datasets in the Dashboard

Datasets live as JSONL files in your repo. The git deployment pipeline syncs them to AgentMark Cloud, where you select them when you create experiments and when you configure review queues.New Experiment dialog in the AgentMark Dashboard showing the dataset selectorThe New Experiment dialog includes a Dataset field listing the datasets synced to your app. When you select a prompt, the dataset auto-fills from its test_settings frontmatter.

How datasets reach Cloud

1

Add a JSONL file to your repo

Create the dataset alongside your prompts, for example agentmark/datasets/sentiment.jsonl.
2

Reference it from a prompt

Set test_settings.dataset in the prompt frontmatter so the dialog can auto-fill it.
3

Deploy to sync

Push to your connected branch. The deployment pipeline syncs the dataset to AgentMark Cloud, where it appears in the dataset selector.
The dataset structure is identical to Local — see the Local tab for the JSONL schema, what to test, sizing guidance, held-out sets, and statistical significance.

Where dataset rows appear

  • Experiment detail — each dataset row’s input and expected_output are shown next to the actual AI output and evaluator scores. See Running experiments.
  • Review queues — set a default dataset on a queue so the “Save to dataset” action is pre-filled during annotation review.

Appending rows

Rows are appended to a synced dataset in two ways:
  • Save to dataset — during annotation review, save a reviewed trace’s input and output to the queue’s default dataset. Saved items are staged and committed when the queue is marked completed. See Human annotation.
  • REST API — POST a row to /v1/datasets/{datasetName}/rows. See Programmatic access in the Local tab for the request shape (the same endpoint serves Cloud and Local).

Next steps

Evaluations

Write evaluation functions

Running Experiments

Test your datasets

Testing overview

Learn testing concepts

Have Questions?

We’re here to help! Choose the best way to reach us: