Creating Datasets

In Agentmark, you can create datasets directly through the user interface (UI), or locally as a JSONL file.

Dataset Items

Each item in a dataset must have the following components:

  • Input: This represents the input props that the prompt expects.
  • Expected Output: The expected output from the prompt when the given input is provided. This serves as the ground truth for evaluating the prompt’s performance.

Here’s an example of how dataset might look:

{ "input": "What is the capital of France?", "expected_output": "Paris" }
{ "input": "Translate 'Hello' to French.",  "expected_output": "Bonjour" }

Supported Formats

Currently, Agentmark only supports datasets in JSONL format.

Run Datasets

Bulk Testing

Once you have created a dataset, you can run it in bulk against your prompts. This allows you to evaluate how well your prompt performs across all the items in the dataset. Agentmark will execute each item in the dataset and compare the actual output of the prompt with the expected output.

Viewing Results

After running a dataset, you can view the results of each run. Agentmark provides detailed insights, including:

  • Inputs and Outputs: For each item in the dataset, you can see the input provided to the prompt and the output generated by the prompt.

  • Traces: You can also view the traces associated with each run item. Traces provide a detailed breakdown of the prompt’s execution, helping you understand how the output was generated.

Webhook Integration

Agentmark gives you full control over the inference process by allowing you to set up a webhook. The webhook is responsible for calling the inference when a dataset is run. This setup ensures that you can integrate Agentmark with your existing infrastructure seamlessly.

To learn more about setting up a webhook, check out the documentation.

Have Questions?

We’re here to help! Choose the best way to reach us: