Experiments in AgentMark
Experiments allow you to systematically test prompts against datasets and compare results across different prompt versions, configurations, or models.This feature is coming soon. In the meantime, you can run experiments programmatically using the AgentMark SDK.See Running Experiments in the Development documentation.
What You’ll Be Able to Do
Compare Prompt Versions - Test multiple versions of a prompt side-by-side to see which performs better. A/B Testing - Compare different models, temperature settings, or prompt strategies. Track Performance - View aggregated metrics across dataset runs to identify improvements or regressions. Historical Analysis - Compare current results against previous experiment runs. Visual Dashboards - See experiment results in easy-to-understand charts and tables.Typical Workflow
- Create Experiment - Select a prompt and dataset to test
- Configure Variants - Set up different versions or configurations to compare
- Run Experiment - Execute all variants against the dataset
- Analyze Results - Compare metrics, scores, and outputs
- Deploy Winner - Merge the best-performing version to production
Experiment Types
Version Comparison - Test different versions of the same prompt (e.g., comparing a branch to main) Model Comparison - Compare performance across different LLM models (e.g., GPT-4 vs Claude) Configuration Testing - Test different parameter settings (temperature, max_tokens, etc.) Evaluation Testing - Run multiple evaluation functions to assess different quality dimensionsIntegration with Other Features
Experiments work seamlessly with other AgentMark platform features:- Datasets - Use your existing test datasets
- Evaluations - Apply evaluation functions to measure quality
- Annotations - Manually review experiment outputs
- Webhooks - Trigger custom workflows when experiments complete
Have Questions?
We’re here to help! Choose the best way to reach us:
- Join our Discord community for quick answers and discussions
- Email us at [email protected] for support
- Schedule an Enterprise Demo to learn about our business solutions