← Back to Blog

From Text to World: Generating Fantasy Art in Real-Time

Lorenzo Lorenzo · · 3 min read
From Text to World: Generating Fantasy Art in Real-Time

When the AI Game Master describes a torch-lit dungeon, Embertold can show you that dungeon. Here's how real-time image generation works behind the scenes.

The Challenge

Generating images during gameplay has unique constraints:

  1. Speed — Players are waiting. Every second counts.
  2. Relevance — The image must match what was just narrated.
  3. Style consistency — Images across an adventure should feel like they belong together.
  4. Cost efficiency — We can't generate a new image for every sentence.

Solving all four simultaneously required careful engineering.

Prompt Engineering

The AI Game Master doesn't just send the narration text to the image generator. It crafts a specific image prompt designed for visual generation:

  • Concrete over abstract. "A stone bridge over a rushing river in a pine forest at sunset" generates better than "a beautiful natural scene."
  • Composition cues. We include hints about framing: "wide shot," "looking up at," "seen from a distance."
  • Style modifiers. Each adventure can have a visual style that's appended to every prompt, ensuring consistency. A gritty dark fantasy adventure gets different modifiers than a whimsical fairy tale.

The Style System

Embertold supports configurable image styles. Each style includes prompt modifiers that define the visual aesthetic:

  • Art style (painterly, realistic, stylized)
  • Color palette tendencies
  • Lighting preferences
  • Level of detail

This means two adventures in the same universe can look completely different based on their style configuration. A horror-themed adventure feels darker and grittier than an epic heroic quest.

When Images Are Generated

The AI is selective about when to generate images. Not every message gets one — that would be overwhelming and expensive. Images are generated for:

  • Major location changes — Entering a new area
  • Dramatic reveals — A hidden chamber, a breathtaking vista
  • Key narrative moments — The villain's lair, the final battle, a moment of triumph

The AI judges the narrative importance of each moment and only triggers image generation when it will have real impact.

Caching at Scale

As covered in our caching article, generated images are cached using similarity matching. A "torch-lit stone dungeon" generated for one player benefits all future players encountering similar scenes.

Over time, common fantasy environments (taverns, forests, dungeons, cities) accumulate rich caches of images. The system gets faster and cheaper with every session played.

The Results

The combination of crafted prompts, consistent styles, selective generation, and aggressive caching means that Embertold can show you your adventure — in real-time, with visual coherence, without breaking the bank.

It's not perfect yet. We're constantly refining prompts, adjusting style modifiers, and expanding the cache. But every day, the visual experience gets richer. From text, a world emerges.

Related Posts