Annotations

Annotations allow you to manually add scores, labels, and contextual information to traces and spans. Use them for human-in-the-loop evaluation, debugging, and creating training datasets from production data.

What Are Annotations?

Annotations are manual evaluations added to spans in your traces. Unlike automated evaluations that run during experiments, annotations are created by team members directly in the AgentMark dashboard. Each annotation contains:

Field	Description
Name	A short title describing what you’re evaluating
Label	A categorical assessment (e.g., “correct”, “incorrect”, “regression”)
Score	A numeric value representing quality or performance
Reason	Detailed explanation of why you assigned this score and label

Use Cases

Quality Assessment

Review production traces to identify issues and track improvements:

Name: Response Quality
Label: good
Score: 0.85
Reason: The response was accurate and well-formatted, but could have been more concise.

Edge Case Documentation

Flag unusual inputs or unexpected behavior for follow-up:

Name: Edge Case
Label: unexpected_behavior
Score: 0.3
Reason: Model hallucinated when given empty input. Should add input validation.

Training Data Curation

Label production traces to build high-quality datasets from real usage:

Name: Training Suitability
Label: include
Score: 1.0
Reason: Clean input/output pair suitable for fine-tuning dataset.

Adding Annotations

From the Traces View

Navigate to the Traces page in your AgentMark dashboard
Click on any trace to open the trace details drawer
Select a span from the trace tree
Click on the Evaluation tab
Click the Add annotation button
Fill in the annotation fields:
- Name — Short identifier for this annotation
- Label — Category or classification
- Score — Numeric value (can be decimal)
- Reason — Detailed explanation
Click Save

Viewing Annotations

Annotations appear in the Evaluation tab alongside automated evaluation scores. They are distinguished by a filled badge labeled “annotation” (vs. “eval” for automated scores).

Annotations vs. Automated Evals

	Annotations	Automated Evals
Created by	Team members in the dashboard	Eval functions during experiments
When	Anytime, on any trace	During experiment runs
Best for	Subjective quality, edge cases, training data	Automated regression testing
Scale	Individual review	Bulk dataset evaluation

Use both together: automated evals catch regressions at scale, while annotations add human judgment on individual cases.

Learn More

Traces and Logs — Understanding trace data
Evaluations — Setting up automated scoring
Datasets — Creating test datasets

Have Questions?

We’re here to help! Choose the best way to reach us:

Join our Discord community for quick answers and discussions
Email us at hello@agentmark.co for support
Schedule an Enterprise Demo to learn about our business solutions

Getting Started

Prompt Management

Observability

Testing

Further Reference

What Are Annotations?

Use Cases

Quality Assessment

Edge Case Documentation

Training Data Curation

Adding Annotations

From the Traces View

Viewing Annotations

Annotations vs. Automated Evals

Learn More

Have Questions?

Getting Started

Prompt Management

Observability

Testing

Further Reference

​What Are Annotations?

​Use Cases

​Quality Assessment

​Edge Case Documentation

​Training Data Curation

​Adding Annotations

​From the Traces View

​Viewing Annotations

​Annotations vs. Automated Evals

​Learn More

​Have Questions?

What Are Annotations?

Use Cases

Quality Assessment

Edge Case Documentation

Training Data Curation

Adding Annotations

From the Traces View

Viewing Annotations

Annotations vs. Automated Evals

Learn More

Have Questions?