Cost and Token Tracking

AgentMark automatically tracks costs and token usage for every LLM call. Costs are calculated from token counts using provider pricing tables, and are available at the individual trace level and aggregated across your dashboard.

Dashboard showing average cost per request, token metrics, and cost chart over time

Developers set up observability in your application. See Development documentation for setup instructions.

What AgentMark tracks

AgentMark records the following token and cost data for each LLM generation span:

Input tokens (prompt tokens): The number of tokens in the prompt sent to the model
Output tokens (completion tokens): The number of tokens in the model’s response
Total tokens: The sum of input and output tokens
Reasoning tokens: Additional tokens used by models that support chain-of-thought reasoning (such as OpenAI o1 and o3). These tokens represent the model’s internal reasoning steps before producing a response.
Cost: The dollar cost of the request, calculated from token counts and the model’s pricing

Token counts are reported directly by the LLM provider’s response. AgentMark does not estimate token counts — it uses the exact values returned by the API.

How costs are calculated

AgentMark computes cost automatically based on the model used and current provider pricing:

Cost = (input_tokens x input_price_per_token) + (output_tokens x output_price_per_token)

AgentMark maintains a pricing table for common models across major providers. Costs are calculated at ingestion time and stored alongside each trace, so you always see accurate cost data without manual configuration.

For custom or self-hosted models not in the built-in pricing table, you can define pricing in your agentmark.json using model schemas. See Custom model pricing below.

Supported providers

AgentMark maintains pricing for models from these providers:

OpenAI — GPT-4o, GPT-4, GPT-3.5, o1, o3, and variants
Anthropic — Claude 4, Claude 3.5, Claude 3, and variants
Google — Gemini 2.5, Gemini 2.0, Gemini 1.5, and variants
Meta — Llama models (when accessed through supported APIs)
Mistral — Mistral Large, Medium, Small, and variants
Cohere — Command R, Command R+, and variants

Pricing is updated regularly as providers release new models or change pricing.

Where to view cost data

Dashboard metrics

The Metrics page shows aggregate cost data across your application:

Total cost over your selected time range
Cost by model to see which models drive your spending
Cost trends over time to identify usage patterns
Average cost per request to understand per-call economics

Trace list

Each trace in the Traces list displays its cost and token counts. Use this to inspect individual requests and understand their resource consumption.

Trace detail

When you open a trace, each generation span shows its own token breakdown:

Input tokens, output tokens, and total tokens
Reasoning tokens (when the model supports it)
Cost for that specific LLM call

For traces with multiple LLM calls, the trace-level cost is the sum of all generation spans within it.

Sessions

The Sessions view aggregates cost and token usage across all traces in a session. This is useful for understanding the total cost of multi-turn conversations or agent workflows.

Per-user cost attribution

The dashboard tracks cost and token usage per user when you pass a userId to the SDK’s trace() function. Use this for billing, capacity planning, or identifying heavy users. Filter traces by user ID in the Filtering and search view.

Filtering by cost and tokens

You can filter traces by cost or token count using numeric operators in the filter bar. This helps you quickly find expensive or token-heavy requests. Available cost and token filters:

Cost — Filter traces where cost equals, exceeds, or falls below a threshold (for example, cost > $0.10)
Input tokens — Filter by prompt token count
Output tokens — Filter by completion token count
Total tokens — Filter by combined token count

Available operators for numeric filters:

equals / notEquals — Exact match
gt / gte — Greater than / greater than or equal
lt / lte — Less than / less than or equal

Combine cost filters with model or user filters to answer questions like “Which GPT-4o requests cost more than $0.05?” or “Which users have the most expensive requests?”

Aggregate analysis

Switch to the aggregate view on the Requests page to group requests by a dimension and compare cost and token usage:

Group by model to compare cost efficiency across models
Group by user to see per-user spending
Group by prompt name to identify which prompts are most expensive

Each aggregate row shows total requests, total cost, total tokens, input tokens, output tokens, average latency, and success rate.

Custom model pricing

For models not in the built-in pricing table (such as self-hosted models, fine-tuned models, or newer providers), you can define custom pricing in your agentmark.json using model schemas:

{
  "modelSchemas": {
    "my-fine-tuned-model": {
      "label": "My Fine-Tuned GPT-4o",
      "cost": {
        "inputCost": 0.005,
        "outputCost": 0.015,
        "unitScale": 1000
      }
    }
  }
}

Property	Description
`inputCost`	Cost per unit for input tokens
`outputCost`	Cost per unit for output tokens
`unitScale`	Number of tokens per unit (e.g., `1000` = cost per 1K tokens, `1000000` = cost per 1M tokens)

Custom model pricing is applied at ingestion time, the same as built-in pricing. Token counts are always tracked regardless of whether pricing is configured. For full details on model schema configuration, see Adding Models.

Best practices

Monitor cost trends regularly. Check the Metrics dashboard to spot unexpected cost increases early. A sudden spike may indicate a prompt regression or unexpected traffic. Use cost filters to find expensive requests. Filter traces where cost exceeds your expected per-request budget. Investigate high-cost traces to see if prompts can be optimized. Track per-user costs for billing. If you bill customers based on AI usage, the Users page provides the cost attribution data you need. Compare model costs. Use the aggregate view grouped by model to evaluate whether cheaper models can handle certain tasks without quality loss. Set up alerts for cost thresholds. Configure Alerts to notify you when cost metrics exceed acceptable levels.

Next steps

Metrics

View aggregate cost and usage metrics

Traces and Logs

Inspect individual request costs

Alerts

Get notified of cost spikes

Filtering and Search

Filter traces by cost and tokens

Have Questions?

We’re here to help! Choose the best way to reach us:

Email us at hello@agentmark.co for support
Schedule an Enterprise Demo to learn about our business solutions

Getting Started

Prompt Management

Observability

Evaluation

Further Reference

What AgentMark tracks

How costs are calculated

Supported providers

Where to view cost data

Dashboard metrics

Trace list

Trace detail

Sessions

Per-user cost attribution

Filtering by cost and tokens

Aggregate analysis

Custom model pricing

Best practices

Next steps

Metrics

Traces and Logs

Alerts

Filtering and Search

Have Questions?

Getting Started

Prompt Management

Observability

Evaluation

Further Reference

​What AgentMark tracks

​How costs are calculated

​Supported providers

​Where to view cost data

​Dashboard metrics

​Trace list

​Trace detail

​Sessions

​Per-user cost attribution

​Filtering by cost and tokens

​Aggregate analysis

​Custom model pricing

​Best practices

​Next steps

Metrics

Traces and Logs

Alerts

Filtering and Search

​Have Questions?

What AgentMark tracks

How costs are calculated

Supported providers

Where to view cost data

Dashboard metrics

Trace list

Trace detail

Sessions

Per-user cost attribution

Filtering by cost and tokens

Aggregate analysis

Custom model pricing

Best practices

Next steps

Have Questions?