In the AgentMark Dashboard, click Create App and select your GitHub repository
Add your LLM provider API key in Settings > Environment Variables
The Apps list shows every app in your organization with its name, linked Git repository, and last-sync status. Click Create App in the top right to start a new one — the modal walks you through selecting a GitHub repository and naming the app.Once your repository is connected, AgentMark Cloud syncs your prompt files and deploys your handler automatically.
Open a prompt in the Dashboard and click Run. AgentMark Cloud executes it on your deployed handler and streams results back in real time.The prompt editor shows your .prompt.mdx content, the selected model, input-variable fields, and a streaming output pane. Click Run to execute the prompt against your deployed handler.
Experiments test a prompt against a dataset and score the results with evaluators. Your project includes example prompts and datasets ready to go.
Navigate to the party-planner prompt in the Dashboard
Open the Experiments tab
Click Run Experiment
Review the results — scores, pass rates, and individual outputs
The experiment results page shows each dataset row with its input, the AI output, the expected output, and evaluator pass/fail scores — plus aggregate metrics like average score, latency, total cost, and total tokens across the run.
Every prompt and experiment execution is automatically traced. Navigate to the Traces page to see the full execution timeline — span details, token usage, cost, and latency.The Traces page lists every prompt and experiment execution with columns for name, status, latency, cost, tokens, spans, tags, and timestamp. Filter by time range from the toolbar, or click a row to drill into the full span tree for that execution.
The CLI prints a Text Prompt Results section with the model output, token counts, cost estimate, and a 📊 View trace URL you can open in the browser for the full span tree.
The CLI runs every item in the dataset, applies your evaluators, and outputs a results table:The CLI prints a results table — one row per dataset item — showing the input, AI output, expected output, and each evaluator’s pass/fail score. A summary line at the bottom reports the overall pass rate.
Every prompt and experiment execution is automatically traced. Open http://localhost:3000 and navigate to Traces to see your execution history with span trees, input/output, and timing.The trace detail page in the local dev server shows the span tree on the left, with input / output and per-span timing visible for the selected span on the right.