Skip to main content
AgentMark automatically tracks costs and token usage for every LLM call. Costs are calculated from token counts using provider pricing tables, and are available at the individual trace level and aggregated across your dashboard. Dashboard showing average cost per request, token metrics, and cost chart over time
Developers set up observability in your application. See Development documentation for setup instructions.

What AgentMark tracks

AgentMark records the following token and cost data for each LLM generation span:
  • Input tokens (prompt tokens): The number of tokens in the prompt sent to the model
  • Output tokens (completion tokens): The number of tokens in the model’s response
  • Total tokens: The sum of input and output tokens
  • Reasoning tokens: Additional tokens used by models that support chain-of-thought reasoning (such as OpenAI o1 and o3). These tokens represent the model’s internal reasoning steps before producing a response.
  • Cost: The dollar cost of the request, calculated from token counts and the model’s pricing
Token counts are reported directly by the LLM provider’s response. AgentMark does not estimate token counts — it uses the exact values returned by the API.

How costs are calculated

AgentMark computes cost automatically based on the model used and current provider pricing:
Cost = (input_tokens x input_price_per_token) + (output_tokens x output_price_per_token)
AgentMark maintains a pricing table for common models across major providers. Costs are calculated at ingestion time and stored alongside each trace, so you always see accurate cost data without manual configuration.
For custom or self-hosted models not in the built-in pricing table, you can define pricing in your agentmark.json using model schemas. See Custom model pricing below.

Supported providers

AgentMark maintains pricing for models from these providers:
  • OpenAI — GPT-4o, GPT-4, GPT-3.5, o1, o3, and variants
  • Anthropic — Claude 4, Claude 3.5, Claude 3, and variants
  • Google — Gemini 2.5, Gemini 2.0, Gemini 1.5, and variants
  • Meta — Llama models (when accessed through supported APIs)
  • Mistral — Mistral Large, Medium, Small, and variants
  • Cohere — Command R, Command R+, and variants
Pricing is updated regularly as providers release new models or change pricing.

Where to view cost data

Dashboard metrics

The Metrics page shows aggregate cost data across your application:
  • Total cost over your selected time range
  • Cost by model to see which models drive your spending
  • Cost trends over time to identify usage patterns
  • Average cost per request to understand per-call economics

Trace list

Each trace in the Traces list displays its cost and token counts. Use this to inspect individual requests and understand their resource consumption.

Trace detail

When you open a trace, each generation span shows its own token breakdown:
  • Input tokens, output tokens, and total tokens
  • Reasoning tokens (when the model supports it)
  • Cost for that specific LLM call
For traces with multiple LLM calls, the trace-level cost is the sum of all generation spans within it.

Sessions

The Sessions view aggregates cost and token usage across all traces in a session. This is useful for understanding the total cost of multi-turn conversations or agent workflows.

Per-user cost attribution

The dashboard tracks cost and token usage per user when you pass a userId to the SDK’s trace() function. Use this for billing, capacity planning, or identifying heavy users. Filter traces by user ID in the Filtering and search view.

Filtering by cost and tokens

You can filter traces by cost or token count using numeric operators in the filter bar. This helps you quickly find expensive or token-heavy requests. Available cost and token filters:
  • Cost — Filter traces where cost equals, exceeds, or falls below a threshold (for example, cost > $0.10)
  • Input tokens — Filter by prompt token count
  • Output tokens — Filter by completion token count
  • Total tokens — Filter by combined token count
Available operators for numeric filters:
  • equals / notEquals — Exact match
  • gt / gte — Greater than / greater than or equal
  • lt / lte — Less than / less than or equal
Combine cost filters with model or user filters to answer questions like “Which GPT-4o requests cost more than $0.05?” or “Which users have the most expensive requests?”

Aggregate analysis

Switch to the aggregate view on the Requests page to group requests by a dimension and compare cost and token usage:
  • Group by model to compare cost efficiency across models
  • Group by user to see per-user spending
  • Group by prompt name to identify which prompts are most expensive
Each aggregate row shows total requests, total cost, total tokens, input tokens, output tokens, average latency, and success rate.

Custom model pricing

For models not in the built-in pricing table (such as self-hosted models, fine-tuned models, or newer providers), you can define custom pricing in your agentmark.json using model schemas:
{
  "modelSchemas": {
    "my-fine-tuned-model": {
      "label": "My Fine-Tuned GPT-4o",
      "cost": {
        "inputCost": 0.005,
        "outputCost": 0.015,
        "unitScale": 1000
      }
    }
  }
}
PropertyDescription
inputCostCost per unit for input tokens
outputCostCost per unit for output tokens
unitScaleNumber of tokens per unit (e.g., 1000 = cost per 1K tokens, 1000000 = cost per 1M tokens)
Custom model pricing is applied at ingestion time, the same as built-in pricing. Token counts are always tracked regardless of whether pricing is configured. For full details on model schema configuration, see Adding Models.

Best practices

Monitor cost trends regularly. Check the Metrics dashboard to spot unexpected cost increases early. A sudden spike may indicate a prompt regression or unexpected traffic. Use cost filters to find expensive requests. Filter traces where cost exceeds your expected per-request budget. Investigate high-cost traces to see if prompts can be optimized. Track per-user costs for billing. If you bill customers based on AI usage, the Users page provides the cost attribution data you need. Compare model costs. Use the aggregate view grouped by model to evaluate whether cheaper models can handle certain tasks without quality loss. Set up alerts for cost thresholds. Configure Alerts to notify you when cost metrics exceed acceptable levels.

Next steps

Metrics

View aggregate cost and usage metrics

Traces and Logs

Inspect individual request costs

Alerts

Get notified of cost spikes

Filtering and Search

Filter traces by cost and tokens

Have Questions?

We’re here to help! Choose the best way to reach us: