Generation types

Generation types define what kind of output your prompt produces. AgentMark supports four types, each suited to different use cases:

Text: natural language responses for chatbots, content generation, and analysis
Object: structured JSON data with schema validation for APIs and data extraction
Image: visual content from models like DALL-E 3
Speech: spoken audio for voice applications and text-to-speech

Choosing the right type

Type	Best for	Output format	Example use cases
Text	Conversational AI, content writing	String	Chatbots, summarization, Q&A
Object	Structured data extraction	JSON with schema	Form parsing, data normalization, API responses
Image	Visual content creation	Image file	Marketing assets, illustrations, prototypes
Speech	Voice applications	Audio file	Podcasts, audiobooks, voice assistants

You declare each type with its frontmatter config key (text_config, object_config, image_config, or speech_config) and load it with the matching client method (loadTextPrompt, loadObjectPrompt, loadImagePrompt, loadSpeechPrompt). This page covers text, image, and speech; object generation has its own page for schema validation and $ref reuse.

Text generation

AgentMark generates text with prompts that declare a text_config in frontmatter. Text prompts use message-role tags (<System>, <User>, <Assistant>) and return a string.

Example configuration

example.prompt.mdx

---
name: example
text_config:
  model_name: openai/gpt-5-mini
---

<System>You are a math tutor that can perform calculations.</System>
<User>What's 235 * 18?</User>

These docs write model_name in the provider-prefixed form, such as openai/gpt-5-mini. The value is a free-form string that AgentMark passes through to your call site unchanged, which is why the Running prompts examples strip the openai/ prefix before constructing the provider model.

Tag	Description
`<System>`	System-level instructions
`<User>`	User message
`<Assistant>`	Assistant message (optional; include for few-shot examples or prior-turn context)

Available configuration

Property	Type	Description	Required
`model_name`	`string`	The name of the model to use for text generation.	Yes
`max_tokens`	`number`	Maximum number of tokens to generate.	No
`temperature`	`number`	Controls the randomness of the output; higher values are more random.	No
`max_calls`	`number`	Maximum number of LLM calls allowed (for agent workflows).	No
`top_p`	`number`	Cumulative probability for nucleus sampling.	No
`top_k`	`number`	Limits next-token selection to the top `k` tokens.	No
`presence_penalty`	`number`	Penalizes tokens based on presence in the text so far, encouraging new topics.	No
`frequency_penalty`	`number`	Penalizes tokens based on frequency in the text so far, reducing verbatim repetition.	No
`stop_sequences`	`string[]`	Strings that, if encountered, stop generation.	No
`seed`	`number`	Random-number seed for reproducibility.	No
`max_retries`	`number`	Maximum number of retries on failure.	No
`tool_choice`	`"auto" \| "none" \| "required" \| { type: "tool", tool_name: string }`	Controls how the model uses tools during generation.	No
`tools`	`string[]`	List of tool names or MCP URIs available to the model. You resolve these names to implementations at your call site (see Tools and agents).	No

Running a text prompt

See the SDK usage section of Running prompts (under the Local tab) for the text-generation SDK code patterns: render the prompt with prompt.format(), then make your own model call (generateText in TypeScript, the OpenAI client in Python).

Image generation

AgentMark generates images with prompts that declare image_config in frontmatter. The image description itself goes in an <ImagePrompt> tag.

Example configuration

example.prompt.mdx

---
name: image
image_config:
  model_name: openai/dall-e-3
  num_images: 1
  size: 1024x1024
  aspect_ratio: 1:1
  seed: 12345
---

<ImagePrompt>
A futuristic cityscape at sunset with flying cars and neon lights
</ImagePrompt>

Tag	Description
`<ImagePrompt>`	The text description for image generation. AgentMark reads the contents at compile time and sends it to the model as the prompt.

Available configuration

Property	Type	Description	Required
`model_name`	`string`	The name of the model to use for image generation.	Yes
`num_images`	`number`	Number of images to generate.	No
`size`	`string`	Image dimensions in format `WIDTHxHEIGHT` (for example, `1024x1024`, `512x512`).	No
`aspect_ratio`	`string`	Aspect ratio in format `WIDTH:HEIGHT` (for example, `1:1`, `16:9`, `9:16`).	No
`seed`	`number`	Random-number seed for reproducibility.	No

Set size for pixel-exact dimensions or aspect_ratio for proportional sizing. AgentMark passes both through to the model unchanged, so provider support varies. If you set both, the provider chooses which to honor.

Running an image prompt

See the SDK usage section of Running prompts (under the Local tab) for the image-generation SDK code pattern using Vercel AI SDK’s experimental_generateImage.

Tracing image generation

AgentMark’s prompt runner (deployed agents and experiments) captures the generated image on the span automatically, and the trace’s Output tab renders it inline. Your own application code has a gap. The Vercel AI SDK’s experimental_generateImage (unlike generateText) emits no OpenTelemetry telemetry, so tracing captures nothing on its own. Instrument the call yourself and set the generated media as the span output:

import { span } from "@agentmark-ai/sdk";
import { experimental_generateImage as generateImage } from "ai";
import { openai } from "@ai-sdk/openai";

const { result } = span({ name: "generate-image", kind: "llm" }, async (ctx) => {
  ctx.setInput({ prompt: "a red sneaker on a white background" });

  const { images } = await generateImage({
    model: openai.image("gpt-image-1"),
    prompt: "a red sneaker on a white background",
  });

  // Set the media as the output so it renders inline in the trace.
  ctx.setOutput(images.map((i) => ({ mediaType: i.mediaType, base64: i.base64 })));
  return images;
});
await result;

For the image to render, the output must be (or contain) objects shaped { mimeType, base64 }. The mediaType key also works. A value of any other shape renders as text. The same pattern applies to audio from experimental_generateSpeech. See Tracing setup for initializing the tracer.

Speech generation

AgentMark generates speech audio with prompts that declare speech_config in frontmatter. The text to speak goes in a <SpeechPrompt> tag.

Example configuration

example.prompt.mdx

---
name: speech
speech_config:
  model_name: openai/tts-1-hd
  voice: "nova"
  speed: 1.0
  output_format: "mp3"
---

<System>
Please read this text aloud.
</System>

<SpeechPrompt>
This is a test for the speech prompt to be spoken aloud.
</SpeechPrompt>

Tag	Description
`<SpeechPrompt>`	The text to convert to speech. AgentMark reads the contents at compile time and sends it to the TTS model.
`<System>`	Optional system-level instructions for models that support them. At compile time the tag body becomes `instructions` in the compiled `speech_config` (or `""` when the tag is absent). AgentMark ignores an `instructions` value written directly in frontmatter.

Available configuration

Property	Type	Description	Required
`model_name`	`string`	The name of the model to use for speech generation.	Yes
`voice`	`string`	Voice identifier (provider-specific; for example for OpenAI TTS: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`).	No
`output_format`	`string`	Audio output format (for example, `mp3`, `opus`, `aac`, `flac`).	No
`speed`	`number`	Playback speed multiplier.	No

The compiled speech_config also carries text (from the <SpeechPrompt> tag) and instructions (from the <System> tag). AgentMark populates both at compile time from the tags; you don’t author them in frontmatter.

Running a speech prompt

See the SDK usage section of Running prompts (under the Local tab) for the speech-generation SDK code pattern using Vercel AI SDK’s experimental_generateSpeech.

Have questions?

Reach out any time:

Email the team at hello@agentmark.co for support
Schedule an Enterprise Demo to learn about AgentMark’s business solutions

Choosing the right type

Text generation

Example configuration

Tags

Available configuration

Running a text prompt

Image generation

Example configuration

Tags

Available configuration

Running an image prompt

Tracing image generation

Speech generation

Example configuration

Tags

Available configuration

Running a speech prompt

Have questions?

​Choosing the right type

​Text generation

​Example configuration

​Tags

​Available configuration

​Running a text prompt

​Image generation

​Example configuration

​Tags

​Available configuration

​Running an image prompt

​Tracing image generation

​Speech generation

​Example configuration

​Tags

​Available configuration

​Running a speech prompt

​Have questions?

Choosing the right type

Text generation

Example configuration

Tags

Available configuration

Running a text prompt

Image generation

Example configuration

Tags

Available configuration

Running an image prompt

Tracing image generation

Speech generation

Example configuration

Tags

Available configuration

Running a speech prompt

Have questions?