This article is a spinoff of Local image generation on Mac: 10 models compared, my top pick flipped. Per-model deep dive, v8.

TL;DR

  • Qwen-Image Lightning is the upstream Qwen-Image (Alibaba 20B) with lightx2v's 8-step distillation LoRA applied
  • On Mac M1 Max 64GB / Apple MPS, ~10 min per image โ€” 9ร— faster than upstream Full (50-step / 93 min)
  • The surprise wasn't speed โ€” it was quality. There are prompts where finger count, object placement, and storefront depiction are better than Full
  • Full's text-fudging tics (drops "I", warps "MAX") disappear in Lightning
  • For Asian-circle article illustrations, the best local option. Apache 2.0, commercial OK

Why include this model

Upstream Qwen-Image was the only local model that fits on a 64GB Mac and is expected to handle kanji rendering and Asian food culture. Testing confirmed: it writes the 3 kanji "ๅฑ…้…’ๅฑ‹" vertically, and it doesn't put cilantro in ramen.

But the cost was 93 min per image. 12 hours for 8 prompts. Even running it overnight, it can't support a workflow like "I want 2 candidates by morning."

So I tried lightx2v's Qwen-Image-Lightning-8steps-V1.0.safetensors. Just apply an 8-step distillation LoRA on top of Full's base model.

Distillations usually "trade quality for speed." I'd seen plenty of that with SDXL Turbo (1-step). My expectation was "a 5โ€“6ร— faster degraded version of Full."

Reality flipped expectations. 9ร— faster + quality improvement on some prompts.

Environment setup

# Python 3.14 (3.10+ works)
python -m venv venv && source venv/bin/activate

pip install diffusers==0.37.1 torch==2.11.0 transformers
pip install peft  # โ˜… required for LoRA application โ€” it dies with LOAD FAILED otherwise

Load code:

from diffusers import QwenImagePipeline
import torch

pipe = QwenImagePipeline.from_pretrained(
    "Qwen/Qwen-Image",
    torch_dtype=torch.bfloat16,
).to("mps")

pipe.load_lora_weights(
    "lightx2v/Qwen-Image-Lightning",
    weight_name="Qwen-Image-Lightning-8steps-V1.0.safetensors",
)

image = pipe(
    prompt="...",
    num_inference_steps=8,
    true_cfg_scale=1.0,  # Lightning runs without CFG (important)
    height=1024,
    width=1024,
).images[0]

true_cfg_scale=1.0 is the key. Full uses 4.0; Lightning is designed to run without CFG. Leaving it at Full's 4.0 defeats Lightning's purpose, so be careful.

Hardware requirements:

Item Value
Mac M1 Max / 64GB
GPU limit iogpu.wired_limit_mb=61440 (60GB)
Base model 40GB (bf16)
LoRA tens of MB
Per image ~10 min (8 step / 1024px / MPS)

Without bumping iogpu.wired_limit_mb, the bf16 40GB doesn't fit. Default (system-managed) drops to swap and you get nowhere near 10 min.

All 8 prompts

# Prompt Image
01 a cute cat sitting on a wooden bench in a sunny park
02 a bowl of ramen with chashu and soft-boiled egg
03 a wooden sign with "LOCAL AI"
04 a developer's t-shirt with "M1 MAX 64GB" retro 80s style
05 a woman developer working at a laptop
06 a glowing AI brain made of circuits and neon
07 three robots playing chess in a sunlit library
08 a wooden izakaya sign with the kanji "ๅฑ…้…’ๅฑ‹"

Total for 8 images: 80.8 min (vs Full's 12 hours).

Per-prompt evaluation

01 Cat โ€” top photorealism in local

Cat on a bench. Lighting, fur, background bokeh โ€” all natural. Quality where you wouldn't blink as a stock photo (with a slight illustration feel still).

Contrast with Flux dev, which drifts to anime style on the same prompt. Flux dev has a "pretty / anime" tic even on photorealistic prompts; Qwen Lightning, while keeping a slight illustration feel, passes as photorealistic when viewed at distance.

Flux dev (2024) Qwen Full (2025) Qwen Lightning (2025)
Flux dev cat Qwen Full cat Qwen Lightning cat
Drifts to anime / illustration Photorealistic, fur present Mostly photorealistic, slight illustration feel on close inspection. Could fix with a different prompt

02 Ramen โ€” no cilantro, only the mystery green vegetable is regrettable

Chashu, egg, nori, clean composition. Fully avoids Flux dev's cilantro problem.

But a mystery green vegetable (spinach? leafy greens?) is on top. Off for Japanese ramen. Possibly Alibaba's training data mixed in "Chinese-style hot noodle soup."

Flux dev (2024) Gemini (2025) Qwen Full (2025) Qwen Lightning (2025)
Flux dev ramen Gemini ramen Qwen Full ramen Qwen Lightning ramen
Cilantro inside (SE Asian crossover) Naruto (pink swirl fish cake) / menma (fermented bamboo) / side dishes โ€” full "ramen shop photo" No cilantro โœ“ clean Just the mystery green vegetable, otherwise perfect

03 LOCAL AI โ€” Full's fudge disappears, perfect

All "LOCAL AI" letters come out clearly readable. Wooden sign, sunset meadow, depth of field โ€” all natural.

This is a major improvement over Full. Full had the tic of dropping the "I" of "AI" and fudging it into a logo, but Lightning erases that tic.

Qwen Full (2025) Qwen Lightning (2025)
Qwen Full LOCAL AI Qwen Lightning LOCAL AI
Drops "I" of "AI", fudges "LOCAL AI" perfect, Full's tic is gone

This was unexpected. Distillations usually "trade quality for speed," and text rendering is one of the first sacrifices. In Lightning it's the opposite. Best explanation: the 8-step-optimized LoRA is pulling the base model's text rendering capability through.

04 M1 MAX 64GB t-shirt โ€” fudge gone here too

"M1 MAX 64GB" rendered as a clean rainbow gradient on a navy t-shirt. The 3 letters "MAX" don't warp either โ€” Full had the tic of warping "MAX," and Lightning erased it.

Qwen Full (2025) Qwen Lightning (2025)
Qwen Full M1 MAX Qwen Lightning M1 MAX
T-shirt perfect, "MAX" warped "M1 MAX 64GB" all perfect

Combined with 03, this confirms: "Qwen family text-fudging is Full-specific; Lightning doesn't manifest it." Another core finding of this article. Call it the "over-step fudge phenomenon" if you want.

05 Woman developer โ€” major improvement over Full, decisive Full-vs-Lightning gap

Full had the PC floating in the air with unnatural left-hand fingers. Lightning fixes:

  • 5 fingers, properly placed
  • PC actually sitting on the desk
  • Hand-mug positioning natural
  • Slight back-view composition with clean aspect
Qwen Full (2025) Qwen Lightning (2025)
Qwen Full woman Qwen Lightning woman
PC floating, fingers off Fingers OK, PC placed OK, natural composition

What broke at 50 steps stabilized at 8. This is the heart of this article.

06 AI brain โ€” not bad, improved from Full

Neon circuit brain, cyan / magenta palette, depth of field, cyberpunk vibe properly assembled. The letters "AI" embedded as part of the circuit centerpiece โ€” natural addition.

Resolution and color contrast improved over Full. Loses to Flux dev on resolution feel for the same prompt, but enters practical territory as abstract art.

Flux dev (2024) Qwen Full (2025) Qwen Lightning (2025)
Flux dev AI brain Qwen Full AI brain Qwen Lightning AI brain
Particles, light streaks, overwhelming resolution (top of local) Detail thin Improved over Full, practical, "AI" letters natural too

07 Robots and chess โ€” composition OK, expressions a bit immature

Three robots in a library playing chess. Lighting, bookshelves, chess board โ€” all OK. Robot expressions read slightly cartoonish; loses on photorealism to Flux dev / Gemini.

Flux dev (2024) Qwen Full (2025) Qwen Lightning (2025)
Flux dev robots Qwen Full robots Qwen Lightning robots
Count OK, expression / hand / chess piece detail rich Count OK, expressions cartoonish Count OK, expressions slightly improved

โ†’ Count specification works on both Qwen and Flux families. SD 3.5 reduces it to 2 (see v4; SDXL base could draw 3) โ€” both Qwen Lightning and Flux dev clear it without breaking. Photorealism of expression is where Flux dev pulls ahead.

08 Izakaya โ€” major improvement from Full, only local model that combines kanji + atmosphere

Full could write the kanji but the storefront was thin, while in Lightning:

  • "ๅฑ…้…’ๅฑ‹" 3 characters, vertical, perfect
  • Lantern texture, warm lighting
  • Wooden storefront, Japanese alley vibe
  • Brushed-ink feel on the sign
Qwen Full (2025) Qwen Lightning (2025)
Qwen Full izakaya Qwen Lightning izakaya
Kanji OK but storefront thin Kanji + storefront + warm light all perfect

Lightning is the only local model that combines "ๅฑ…้…’ๅฑ‹" + atmosphere. SD family produces fakes, Flux dev produces fake "ๅ…ธๆก”," Full's atmosphere is weak.

What worked

  1. Avoids the over-step collapse: 50 โ†’ 8 steps actually improves quality on some prompts (woman developer, izakaya)
  2. 9ร— faster: 10 min/image is practical. 8 images in 80 min โ€” a comparison set during lunch break
  3. Kanji + Asian food culture: only local model that produces images Japanese readers don't find odd
  4. Apache 2.0: commercial OK; license honor student alongside Flux schnell
  5. Same base model as Full: if you already have Full's weights, just add the LoRA

What didn't

  1. 40GB base model first download: HuggingFace pull takes 30 min โ€“ 1 hr on fiber
  2. Mystery green vegetable: shows up on every ramen. "Chinese-style hot noodle" training bias (same in Full)
  3. Slightly behind Flux dev on abstract art: AI brain is practical, but resolution / particle sparkle wins for Flux dev
  4. Forgetting true_cfg_scale=1.0 kills it: leaving Full's 4.0 defeats Lightning's purpose

Where this model earns its keep

Bottom line: best Asian-circle model. The "Asian-circle = Qwen Lightning" half of this article's adoption plan. Lightning is not a degraded version of Full โ€” it's the completed version with Full's tics removed.

  • โœ… Asian-circle article illustrations: Japanese blogs, Asian food blogs, Japanese architecture explainers โ€” only practical local option
  • โœ… Sign / signage with kanji: only Qwen family writes "ๅฑ…้…’ๅฑ‹" perfectly on local
  • โœ… Accurate Asian food culture depiction: no cilantro ramen (avoids Flux family's bias)
  • โœ… Designs containing important English text: "LOCAL AI" / "M1 MAX 64GB" all readable, Full's fudge tic disappears in Lightning, on par with Flux dev quality
  • โœ… Stock-photo-style developer scenes: 5 fingers, PC placed OK; not as polished as Gemini, but passes as "clean material"
  • โœ… Apache 2.0, commercial OK: clearly above Flux schnell in the same commercial-OK slot (writes kanji)
  • โš ๏ธ 40GB base model: tight on Mac M1 Max 64GB / wired_limit 60GB. NVIDIA recommends 80GB-class
  • โš ๏ธ Abstract art / cyberpunk: practical even on Lightning, but for resolution priority go to Flux dev
  • โŒ Conversational editing: "make this red" doesn't work (need a different model, Qwen-Image-Edit)

Gotchas / tips

1. LOAD FAILED without peft

!! LOAD FAILED: PEFT backend is required for this method.

Run pip install peft before calling load_lora_weights(). Diffusers alone can't apply LoRA. My first attempt died at 1.3 min because of this.

2. Don't forget true_cfg_scale=1.0

Lightning is designed to run without CFG. Leaving it at Full's 4.0 produces images, but loses Lightning-style stability. The point of the LoRA is gone.

3. Only the first image is slow

# Prompt Sec
01 cat 326.8
02 ramen 327.0
03 LOCAL AI (text-heavy) 606.8
04 M1 MAX (text-heavy) 800.8
05 woman dev 483.5
06 AI brain 712.7
07 robots chess 654.0
08 izakaya 763.5

Text-heavy prompts (03, 04) are 2ร— heavier than the rest. Likely the attention layers spend more time resolving text coordinates.

4. Output stops if macOS sleeps

If the Mac sleeps mid-generation, MPS dies. Use caffeinate -i to prevent sleep:

caffeinate -i ./venv/bin/python compare.py qwen_image_lightning

5. Full and Lightning coexist

Same 40GB base model โ€” disk-wise, Full + Lightning is still 40GB. Just add the tens-of-MB LoRA.

Comparison article and next models

Related articles in this series:

Next (planned):

  • Part 3: AI drawing different national cuisines โ€” visualizing geographic bias in training data (draft)

Test environment: Mac M1 Max 64GB / macOS 25.4 / Python 3.14 / Diffusers 0.37.1 / peft 0.19.1 / PyTorch 2.11 (MPS) Run log: 2026-04-30, Qwen-Image Lightning (lightx2v/Qwen-Image-Lightning, 8steps V1.0)