Datasets enable bulk testing of prompts against diverse inputs and expected outputs
In Agentmark, you can create datasets directly through the user interface (UI), or locally as a JSONL file.
Each item in a dataset must have the following components:
Here’s an example of how dataset might look:
Currently, Agentmark only supports datasets in JSONL format.
Once you have created a dataset, you can run it in bulk against your prompts. This allows you to evaluate how well your prompt performs across all the items in the dataset. Agentmark will execute each item in the dataset and compare the actual output of the prompt with the expected output.
After running a dataset, you can view the results of each run. Agentmark provides detailed insights, including:
Inputs and Outputs: For each item in the dataset, you can see the input provided to the prompt and the output generated by the prompt.
Traces: You can also view the traces associated with each run item. Traces provide a detailed breakdown of the prompt’s execution, helping you understand how the output was generated.
Agentmark gives you full control over the inference process by allowing you to set up a webhook. The webhook is responsible for calling the inference when a dataset is run. This setup ensures that you can integrate Agentmark with your existing infrastructure seamlessly.
To learn more about setting up a webhook, check out the documentation.
We’re here to help! Choose the best way to reach us:
Join our Discord community for quick answers and discussions
Email us at hello@agentmark.co for support
Schedule an Enterprise Demo to learn about our business solutions