I. The Trigger: An Interesting Article

Recently, I came across fofr.ai's JSON prompting tutorial, with a core argument:

Using JSON-structured prompts can make AI image generation more precise and repeatable.

Sounds great. But does it actually work?

As someone who's been frequently frustrated by AI image generation, I decided to test it out.


II. Experiment Design

Test Scenario:

Generate character art for a Japanese RPG game protagonist

  • Requirements: Black-haired shrine priestess, shrine background, golden hour lighting

  • Style: Anime watercolor

  • Purpose: Game character design

Comparison Methods:

  1. Method A: Natural language prompts (my usual approach)

  2. Method B: JSON-structured prompts (following fofr's method)

Evaluation Criteria:

  • Hit rate (percentage of usable images)

  • Time spent

  • Adjustability (how easy to modify specific details)


III. Experiment Results

Method A: Natural Language (Failed)

My prompt:

"A female shrine priest with long black hair, wearing traditional outfit, standing at a shrine during sunset"

Results:

  • Generated: 100 images

  • Usable: 5 images

  • Hit rate: 5%

  • Time spent: 2 hours

Actual generation result (one example):

An image to describe post


Result from natural language: While some elements are correct, the overall effect differs significantly from expectations

Failure reasons:

  • First 10 images: Hair is black, but either short or twin-tailed

  • Image 20: Finally long hair, but the scene moved to a forest

  • Image 35: Got the shrine, but the character became blonde

  • Image 42: All elements correct, but lighting is noon sun (not the golden hour I wanted)

Root problem:

Every word is ambiguous. How long is "long hair"? What time and color temperature is "sunset"? AI interprets differently each time—it's like rolling dice.

Method B: JSON Structured (Significant Improvement)

Following fofr's method, I broke down the requirements into 4 fields:

{
  "subject": "female shrine priest with long black hair, white miko outfit with red hakama, holding wooden staff",
  "style": "anime watercolor",
  "mood": "peaceful and sacred",
  "lighting": "golden_hour side_light"
}

Results:

  • Generated: 10 images

  • Usable: 8 images

  • Hit rate: 80%

  • Time spent: 15 minutes

Actual generation result (one example):

An image to describe post


Result from JSON-structured prompt: Precise control over lighting, mood, and details

Comparison:

Left (natural language): Lighting, mood, and details not precise enough
Right (JSON): Each parameter locked down, results closer to expectations

Improvement reasons:

  • golden_hour is more precise than "sunset" (1 hour before sunset, warm golden color)

  • side_light locked down the light direction (not top light or back light)

  • mood: "peaceful and sacred" influenced color tone and detail density

Data comparison:

  • Hit rate improvement: 16x (5% → 80%)

  • Time efficiency improvement: 8x (120 minutes → 15 minutes)

  • Generation quota savings: 90%


IV. Deep Dive: Why Does JSON Work?

The Core Issue: Natural Language Is Too Vague

Have you ever wondered why, despite your detailed descriptions, AI still misunderstands?

Because every adjective is ambiguous.

"Japanese RPG style"—Is it Final Fantasy's European-fantasy-meets-Japanese? Or Zelda's cel-shaded cartoon? Or Persona's urban dark aesthetic? 1000 people have 1000 different interpretations of "Japanese RPG."

"Long hair"—Waist-length? Shoulder-length? Or ankle-length? With bangs? Straight or curly? Tied up or loose?

"Golden hour"—What time? Summer or winter? Orange-red sunset clouds, or pale purple twilight?

Every adjective is a dice roll.

An image to describe post


Left is vague natural language (chaos), right is structured JSON (order)

A More Critical Misconception: Detailed ≠ Precise

Many people think "being detailed enough equals precision."

Wrong.

It's like going to a coffee shop and saying: "Give me a good coffee, not too bitter, not too weak, medium temperature, preferably with some milk flavor but not too creamy."

Can the barista make what you want? Maybe. But more likely—what you get is worlds apart from what you imagined.

Compare this: "Medium Americano, light ice, no sugar, add an extra shot."

Which one gets you closer to what you want?

An image to describe post


Precise instructions vs vague descriptions

Why Does JSON Work?

Here's the key insight:

Natural language is for humans, JSON is for AI.

AI model training data contains massive amounts of JSON format content—config files, API responses, database records, code snippets. It has seen countless times how "lighting": "golden_hour" corresponds precisely to what visual effect.

Using JSON prompts is like speaking AI's native language. Interpretation errors drop dramatically.

And JSON isn't programming. It's just filling in a form.

You don't need to know how to code, just what to fill in.


V. Core Discovery: 4 Fields Are Enough

fofr's article mentions many fields, but through actual testing, these 4 are the most critical:

An image to describe post


Subject, Style, Mood, Lighting constitute the core elements for controlling images

1. Subject - Main Subject

What to draw. Can be simple ("cat") or detailed (nested structure controlling hairstyle, clothing, pose)

Basic format:

{
  "subject": "female priest"
}

Detailed format:

{
  "subject": {
    "character": "female priest",
    "hair": "long black hair, straight, waist-length",
    "clothing": "traditional white shrine maiden outfit with red hakama",
    "accessories": "wooden staff with paper talismans",
    "pose": "standing upright, staff in right hand"
  }
}

Common structures:

  • Characters: demographics (age, gender), face, hair, body, pose, attire

  • Animals: species, breed, age, pose

  • Objects: type, material, condition, placement

  • Scenes: location, time, weather

2. Style - Visual Style

Determines what the image looks like:

Common types:

  1. Art styles: watercolor, oil painting, anime, pixel art

  2. Technical styles: photorealistic, 3D render, sketch

  3. Period styles: retro 80s, art nouveau, cyberpunk

Hybrid format:

{
  "style": "anime watercolor with soft edges"
}

You can combine multiple style terms, but two principles to avoid pitfalls:

  • Don't exceed 3 (too many will conflict)

  • Ensure they don't contradict (like "photorealistic anime" is contradictory)

3. Mood - Atmosphere

The emotion and feeling of the image. Easy to overlook, but hugely impactful.

Common terms:

  • peaceful

  • energetic

  • mysterious

  • dramatic

  • melancholic

  • nostalgic

Combination tips:

2-3 words are enough, more creates confusion.

{
  "mood": "peaceful and nostalgic"
}

Actual impact:

Mood directly affects:

  • Color tone (warm vs. cool)

  • Light softness

  • Detail density (dramatic images usually have more details)

  • Contrast

4. Lighting - Illumination

The most important field.

Lighting determines 80% of an image's atmosphere. Same scene, different lighting—completely different images.

Time-based lighting:

  • dawn: Dawn, cool blue-purple tones

  • golden_hour: Golden hour (1 hour after sunrise / before sunset), warm golden color

  • noon: Midday, strong top light, high contrast

  • dusk: Dusk, orange-red

  • night: Nighttime, low light

  • blue_hour: Blue hour (20 minutes after sunset), deep blue-purple

Light source types:

  • sunlight: Natural sunlight

  • neon: Neon lights (cyberpunk essential)

  • candlelight: Candlelight

  • moonlight: Moonlight

  • studio_light: Studio lighting

Direction:

  • front_light: Front light (flat feel)

  • back_light: Backlight (rim light, dramatic)

  • side_light: Side light (strong dimensionality)

  • top_light: Top light (mysterious)

Quick reference:

  • Want warmth? → golden_hour + front_light

  • Want coolness? → blue_hour + side_light

  • Want drama? → dusk + back_light


VI. Experiment: The Power of Lighting

To verify lighting's impact, I did a comparison experiment:

An image to describe post


Same scene, different lighting, completely different atmospheres

Experiment design:

Same character (shrine priestess), only changing the lighting parameter.

Technical note:

To ensure character consistency, I used the reference image technique:

  1. First generate 1 baseline image

  2. Use it as reference to generate 3 other variants

  3. Tell AI: "Keep the character and composition, only change lighting"

(This is a key practical technique, explained in detail below)

Results:

1. Golden Hour + Side Light (Warmth)

An image to describe post


"lighting": { "time": "golden_hour", "direction": "side_light" }

2. Blue Hour + Side Light (Coolness)

An image to describe post


"lighting": { "time": "blue_hour", "direction": "side_light" }

3. Dusk + Back Light (Drama)

An image to describe post


"lighting": { "time": "dusk", "direction": "back_light" }

4. Noon + Top Light (Strong Contrast)

An image to describe post


"lighting": { "time": "noon", "direction": "top_light" }

Conclusion:

Lighting truly dominates 80% of an image's atmosphere. Same character and scene, change one parameter, completely different effect.


VII. Practical Cases: Three Different Scenarios

To test JSON's versatility, I tested two completely different scenarios:

Case 1: Game Character Art

Basic version (recommended for beginners):

{
  "subject": "female shrine priest with long black hair, white miko outfit with red hakama, holding wooden staff",
  "style": "anime watercolor",
  "mood": "peaceful and sacred",
  "lighting": "golden_hour side_light"
}

Basic version is sufficient. When you need more precise control, use the advanced nested structure version.

Advanced version (when needing more control):

{
  "subject": {
    "character": "female shrine priest",
    "hair": "long straight black hair, waist-length",
    "clothing": "white shrine maiden outfit with red hakama pants",
    "accessories": "wooden staff with paper talismans",
    "pose": "standing confidently, staff in right hand"
  },
  "style": "anime with soft watercolor texture",
  "mood": "peaceful and sacred",
  "lighting": {
    "time": "golden_hour",
    "type": "sunlight",
    "direction": "side_light",
    "details": "warm golden light filtering through torii gate"
  },
  "background": {
    "setting": "traditional Japanese shrine",
    "elements": "red torii gate, stone lanterns, cherry blossom petals",
    "depth": "shallow focus, background slightly blurred"
  }
}

Effect: You'll get a shrine priestess illustration with warm lighting, sacred atmosphere, and precise details. Golden hour light shines from the side, filtering through the torii gate, creating soft shadows on the character. Background slightly blurred, highlighting the subject.

Compared to natural language version:
"A female shrine priest with long black hair, wearing traditional outfit, standing at a shrine during sunset"

See the difference? The JSON version specifies "sunset" to "golden_hour" (1 hour before sunset), sets light direction to "side_light," and defines mood as "peaceful and sacred." Every detail is locked down.


Case 2: Cyberpunk Street Scene

{
  "subject": {
    "main": "rain-soaked cyberpunk street",
    "elements": "neon signs, street vendors, hovering vehicles"
  },
  "style": "photorealistic with cinematic color grading",
  "mood": "mysterious and energetic",
  "lighting": {
    "time": "night",
    "type": "neon",
    "colors": ["pink", "cyan", "purple"],
    "details": "reflections on wet pavement, strong color contrast"
  },
  "composition": {
    "angle": "low angle, looking up at buildings",
    "rule": "leading lines from street converging to vanishing point"
  },
  "weather": "light rain, mist"
}

Effect: You'll get a Blade Runner 2049-style street scene—rain-soaked pavement reflecting neon lights, with pink, cyan, and purple lights creating strong contrast. Low-angle shot looking up, street extending to a distant vanishing point, towering buildings. Mist and light rain add mystery.

An image to describe post


Using JSON to precisely control neon colors (pink/cyan/purple), camera angle (low angle), and weather (light rain)

Why it works:

  • lighting.colors directly specifies neon colors

  • composition.angle locks down camera position

  • weather adds atmospheric details


Case 3: Product Rendering

{
  "subject": {
    "product": "wireless earbuds",
    "material": "matte white plastic with metallic accents",
    "placement": "floating on clean surface"
  },
  "style": "3D render, minimalist",
  "mood": "clean and modern",
  "lighting": {
    "type": "studio_light",
    "setup": "three-point lighting",
    "details": "soft shadows, rim light on edges"
  },
  "background": {
    "color": "gradient from light gray to white",
    "texture": "smooth, no distractions"
  },
  "technical": {
    "focus": "razor sharp",
    "depth_of_field": "infinite"
  }
}

Effect: Professional-grade product rendering, earbuds floating on clean background, three-point lighting creating soft shadows, rim light outlining edges. Background is a gradient from light gray to white, completely non-distracting.

An image to describe post


Three-point lighting + infinite depth of field + clean background, professional product shot generated in one go

Key points:

  • lighting.setup: "three-point lighting" directly invokes photography's standard lighting scheme

  • technical.depth_of_field: "infinite" ensures entire product is sharp

  • background.texture: "smooth, no distractions" avoids background clutter


VIII. Unexpected Discovery: Consistency Is a Challenge

Problem encountered during experiments:

When I wanted to generate "multiple versions of the same character," I found that even using identical JSON, AI generates characters with slightly different appearances (facial features, poses, details) each time.

This is AI's randomness. fofr's article didn't emphasize this point.

Maintaining Character Consistency: Advanced Techniques

Reality:

Even with identical JSON, AI generates slightly different character details each time. This is AI's randomness.

If you need "multiple versions of the same character" (like different lighting, different expressions), there are two methods:

Method 1: Fixed seed (random seed)

{
  "subject": "cat",
  "style": "watercolor",
  "mood": "peaceful",
  "seed": 12345  // Lock this number
}

Change mood or lighting, but keep seed unchanged, and the character will be relatively consistent.

Tool support:

  • Midjourney: Use --seed 12345

  • ComfyUI / Automatic1111: Set seed in interface

  • DALL-E: Not supported (cannot fix seed)

Limitations:

  • Only ensures "relative consistency," not 100% identical

  • When parameters change too much, seed also fails

Method 2: Use reference image (most stable)

This is the method I used to generate demo images:

Steps:

  1. First generate 1 satisfactory baseline image

  2. Upload this image as reference in the tool (or use image URL)

  3. Write prompt: "Keep the character, pose, and composition from the reference image, only change lighting to blue_hour"

Tool support:

  • Midjourney: Use image URL + prompt

  • DALL-E (ChatGPT): Upload image, say "keep this character, change..."

  • ComfyUI: Use ControlNet or img2img nodes

Effect:

  • This is the most stable method

  • Character appearance and pose highly consistent

  • My shrine priestess and cat demos were generated this way

When do you need consistency?

Don't need:

  • You just want to quickly generate one image

  • Variation between generations doesn't matter

Need:

  • Doing character design (multiple expressions/poses of same character)

  • Making tutorial demos (showing parameter comparisons)

  • Creating brand materials (maintaining visual unity)

Beginner advice:

Don't worry about consistency at first, focus on using JSON to control single-image effects. Once proficient, explore seed and reference images.


IX. Supplementary Experiment: Mood Atmosphere Control

Using the simplest template, I did another comparison experiment on the Mood parameter:

Baseline template:

{
  "subject": "cat",
  "style": "watercolor",
  "mood": "peaceful",
  "lighting": "soft_morning_light"
}

Only changing the mood parameter to see the difference:

Peaceful

An image to describe post

Mysterious

An image to describe post

Dramatic

An image to describe post

Conclusion:

Same subject, same style, only changing mood, completely different atmosphere. This is JSON's charm.


X. Advanced Fields: Not Needed 80% of the Time

The basic 4 fields (subject / style / mood / lighting) solve 80% of scenarios. But if you want more precise control, here are additional options:

Composition-related:

  • composition_rule: "rule of thirds," "golden ratio"

  • camera_angle: "bird's eye view," "worm's eye view"

  • focal_length: "24mm wide angle," "85mm portrait"

Detail control:

  • color_palette: ["#FF6B6B", "#4ECDC4", "#FFE66D"] (use hex color values directly)

  • weather: "fog", "snow", "storm"

  • season: "autumn", "spring"

Technical parameters:

  • aspect_ratio: "16:9", "1:1", "9:16"

  • detail_level: "high detail", "minimalist"

  • texture: "rough", "glossy", "matte"

Metadata:

  • negative_prompt: List elements you don't want

  • seed: Fix random seed to reproduce same results

When to use?

Only add when the basic 4 fields can't meet your needs. Don't bite off more than you can chew.


XI. Practical Advice

Getting Started

  1. Start with 4 basic fields (subject / style / mood / lighting)

  2. Use simple versions, don't jump into complex nested structures

  3. Don't worry about "consistency" at first, focus on single-image effects

  4. Generate more, compare how parameter changes affect results

Advanced Control

  • Only use nested structures when needing more details

  • Use seed or reference image when character consistency is needed

  • Gradually add fields based on needs (composition / weather / background)

  • Don't be greedy, basic 4 solve 80% of scenarios

Tool Selection

Tools:

Template library:

Community:


XII. Common Problem Troubleshooting

Q1: I copied the JSON, tool says "format error"

  • Check JSON format: bracket pairing, quotation marks paired, comma placement

  • Validate online: JSONLint

  • Some tools don't directly support JSON, need to convert to natural language (let ChatGPT help you convert)

Q2: Generated image is completely wrong, far from what I wanted

  • Simplify first: Only use 4 basic fields, remove all nesting

  • Check if field values are accurate (e.g., is golden_hour spelled correctly?)

  • Try a different tool (Midjourney / DALL-E / Flux have different comprehension abilities)

Q3: The elements I want never appear

  • Put the most important elements at the front of subject

  • Use negative_prompt to exclude distractions: "negative_prompt": ["forest", "short hair"]

  • Generate multiple images, pick the closest one

Q4: Style/mood different from expectations

  • Make style more specific: "anime" → "Studio Ghibli anime style"

  • Combine multiple moods: "peaceful and nostalgic"

  • Reference others' prompts, see how they describe

Q5: Character different every time, what to do?

  • This is normal, AI has randomness

  • If consistency needed, see "Maintaining Character Consistency" section above

  • Use seed or reference image

Q6: Which tools actually support JSON?

  • Native support: Midjourney (paste directly), Claude (understands structure)

  • Indirect support: DALL-E (needs conversion to natural language, but understands structured descriptions)

  • Needs plugins: ComfyUI / Automatic1111 (need custom nodes)

Still not working?

  • Post your JSON and generation results to community for help

  • Reddit r/StableDiffusion and Midjourney Discord are both very active


XIII. Conclusion

Does JSON prompting really work?

Yes. My test data: Hit rate improved from 5% to 80%, time saved 8x.

But not as simple as imagined:

  • Consistency requires additional techniques (seed / reference image)

  • Different tools have different levels of support

  • Need practice to find what works for you

  • 80% hit rate depends on specific scenarios and requirements

Core value isn't "guaranteed success," but:

  • Repeatability: Same JSON, relatively consistent results

  • Adjustability: Want to change a specific detail? Directly modify the corresponding field, no need to redescribe the entire scene

  • Understandability: Looking at someone else's JSON, can quickly understand each parameter's role

Advice for beginners:

  1. Start with the simplest 4-field template

  2. Don't be intimidated by complex nesting, basic version is sufficient

  3. Experiment more, compare parameter changes

  4. Seek community help when encountering problems

5 minutes to get started, results are night and day different. Give it a try and share your results in the comments.


References