
CLI Usage
Quick Start
- Dataset configured in prompt frontmatter
- Development server running (
agentmark dev) - Optional: Evaluation functions defined
Full Command Signature
--server flag defaults to the AGENTMARK_WEBHOOK_URL environment variable if set, otherwise http://localhost:9417.
Command Options
Skip evaluations (output-only mode):passed field.
Custom server:
Output Example
| # | Input | AI Result | Expected Output | sentiment_check |
|---|---|---|---|---|
| 1 | {"text":"I love it"} | positive | positive | PASS (1.00) |
| 2 | {"text":"Terrible"} | negative | negative | PASS (1.00) |
| 3 | {"text":"It's okay"} | neutral | neutral | PASS (1.00) |
.mdx source files and pre-built .json files (from agentmark build). Media outputs (images, audio) are saved to .agentmark-outputs/ with clickable file paths.
How It Works
Therun-experiment command:
- Loads your prompt file (
.mdxor pre-built.json) and parses the frontmatter - Reads the dataset specified in
test_settings.dataset - Sends the prompt and dataset to the webhook server (default:
http://localhost:9417) - The server runs the prompt against each dataset row
- Evaluates results using the evals specified in
test_settings.evals - Streams results back to the CLI as they complete
- Displays formatted output (table, CSV, JSON, or JSONL)
Configuration
Link dataset and evals in prompt frontmatter:test_settings.props:
Workflow
1. Develop prompts - Iterate on your prompt design 2. Create datasets - Add test cases covering your scenarios 3. Write evaluations - Define success criteria 4. Run experiments - Test against datasetSDK Usage
Run experiments programmatically usingformatWithDataset():
dataset- The test case (inputandexpected_output)formatted- The formatted prompt ready for your AI SDKevals- List of evaluation names to runtype- Always"dataset"
FormatWithDatasetOptions):
datasetPath?: string- Override dataset from frontmatterformat?: 'ndjson' | 'json'- Buffer all rows ('json') or stream as available ('ndjson', default)
- Custom test logic in your test framework
- Fine-grained control over test execution
- Integrating with existing test infrastructure
- Running experiments in application code
Troubleshooting
CLI Issues
Dataset not found:- Check dataset path in frontmatter
- Verify file exists and is valid JSONL
- Ensure
agentmark devis running - Check ports are available (default webhook port: 9417)
- Verify
--serverURL if using a custom server
- Each line must be valid JSON
- Required:
inputfield - Optional:
expected_outputfield
- Add
evalstotest_settingsin frontmatter - Or use
--skip-evalflag for output-only mode
- The
--thresholdflag requires evals that return apassedfield - Verify your eval functions return
{ passed: true/false, ... }