This article is a spinoff of Local image generation on Mac: 10 models compared, my top pick flipped. Per-model deep dive, v1.

TL;DR

  • Stable Diffusion 1.5 was released in 2022 and is the ancestor of local image generation as we know it
  • On Mac M1 Max 64GB / Apple MPS, ~13 sec per image (20 step / 512px), 7 sec to load
  • By 2026 standards: picture-book illustration quality, can't even spell English text
  • Still useful as an absolute baseline. The "wow, that's good" feeling you get from Flux dev / Qwen Lightning is calibrated by what SD 1.5 produces
  • License is CreativeML OpenRAIL-M (the standard at release), commercial use allowed but not recommended at modern quality bars
  • Verdict: historical reference / fallback for ultra-low-resource environments only

Why include this model

To make "the starting point of local image generation" part of the comparison, so the 4-year evolution is visible. When this series says "Flux dev tops photorealism" or "Qwen Lightning is the best local option," you need an absolute reference to know good compared to what. SD 1.5 is that reference.

What I expected:

  • Material for laughable failures: ramen with 5 eggs on it, hallucinated English text that reads like a foreign language, an izakaya prompt that came back as a Buddhist scripture scroll, etc.
  • A baseline for "how far have we come in 4 years"
  • Crushing lightness โ€” 4GB barely affects anything else running on the Mac

What I didn't expect:

  • Anything usable as modern blog illustration
  • Legible text rendering
  • Coherent object placement

Result: it satisfies the "historical reference" role, but never enters the candidate list for actual modern illustrations. After looking at Lightning / Flux dev / Gemini, SD 1.5 reads as "the thing that made today possible."

Environment setup

pip install diffusers==0.37.1 torch==2.11.0 transformers

Load code:

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
).to("mps")

image = pipe(
    prompt="...",
    num_inference_steps=20,
    guidance_scale=7.5,
    height=512,
    width=512,
).images[0]

Hardware requirements:

Item Value
Mac M1 Max / 64GB (works on 16GB realistically)
Model 4GB (fp16)
Resolution 512x512 (1024 wasn't trained, breaks down)
Per image ~13 sec (20 step / 512px / MPS)
Load time ~7 sec

By far the lightest among local image-gen models. Running it in the background while doing other Mac work has negligible impact on the rest of the system.

All 8 prompts

# Prompt Image Time
01 a cute cat sitting on a wooden bench in a sunny park 13.5s
02 a bowl of ramen with chashu and soft-boiled egg 13.0s
03 a wooden sign with "LOCAL AI" 13.0s
04 a developer's t-shirt with "M1 MAX 64GB" retro 80s style 12.9s
05 a woman developer working at a laptop 13.0s
06 a glowing AI brain made of circuits and neon 12.9s
07 three robots playing chess in a sunlit library 12.9s
08 a wooden izakaya sign with the kanji "ๅฑ…้…’ๅฑ‹" 12.9s

Total for 8 images: ~104 seconds = 1m40s. Compared to Qwen Full's 12 hours, that's roughly 430ร— faster โ€” but quality is in a completely different league.

Per-prompt evaluation

01 Cat โ€” vertically stretched face, eyes misaligned

Cat on a bench. The face is stretched vertically and the eyes are off-center. Classic "Stable Diffusion can't quite do animal faces" symptom. Background is surprisingly natural (wood grain on the bench, greenery, depth of field).

Flux dev (2024) Qwen Lightning (2025) SD 1.5 (2022)
Flux dev cat Qwen Lightning cat SD 1.5 cat
Drifts to anime / illustration style Mostly photorealistic, slight illustration feel Stretched face, misaligned eyes, picture-book illustration

02 Ramen โ€” five eggs

Five eggs on a single bowl. The prompt clearly said "soft-boiled egg" โ€” singular โ€” and SD 1.5 disagreed four times. Textbook example of "the model has an opinion about quantity, and it's not yours." (For context: Japanese ramen normally has one egg, halved. Five would be a culinary war crime.)

Flux dev (2024) Qwen Lightning (2025) SD 1.5 (2022)
Flux dev ramen Qwen Lightning ramen SD 1.5 ramen
Has cilantro (Southeast Asian crossover) Just the mystery green vegetable, otherwise near-perfect Five eggs on one bowl

โ†’ In 4 years, ramen actually became ramen. The "wow, this looks good" reaction to Flux dev / Qwen Lightning is anchored by what SD 1.5 produces.

03 LOCAL AI โ€” OOLDD AIXNIA, an invented language

Can't even spell Latin characters correctly. What comes out is OOLDD AIXNIA or something close to it โ€” corrupted strings. SD 1.5 treats text as "shape" and has no ability to spell meaningful strings.

SDXL base (2023) Flux dev (2024) SD 1.5 (2022)
SDXL base LOCAL AI Flux dev LOCAL AI SD 1.5 LOCAL AI
Degrades to "LOCAL LL" "LOCAL AI" perfect OOLDD AIXNIA mystery language

โ†’ You can see at a glance how text encoder generation rollovers enabled actual spelling. CLIP-L (77 tokens) โ†’ T5-XXL (thousands of tokens) progressively improved English text rendering. SD 1.5 sits at the starting point, where even spelling Latin characters doesn't work.

04 M1 MAX 64GB t-shirt โ€” same illegibility

The 80s t-shirt vibe is there, but the text is mostly illegible. It gets processed as decorative graphics rather than letters.

Flux dev (2024) SD 1.5 (2022)
Flux dev M1 MAX SD 1.5 M1 MAX
"M1 MAX 64GB" perfect + 80s synthwave fully captured Text is symbolized, illegible

05 Woman developer โ€” surprisingly passable

Woman, glasses, laptop, in front of a window. No finger-disappearance issue โ€” hands are hidden under the desk, which is a slightly cheaty form of stability compared to SDXL. The white mug and laptop bag in the background look natural.

Flux dev (2024) Qwen Lightning (2025) SD 1.5 (2022)
Flux dev woman Qwen Lightning woman SD 1.5 woman
Photorealistic, plants and notebook props included Photorealistic, natural as stock photo Looks fine at first glance (hides hands to dodge)

โ†’ For simple compositions, SD 1.5 can pass at a glance. Same "stability through omission" tendency as SDXL Turbo. But modern models have moved on to "actually draw the hands without breaking them" โ€” SD 1.5's hide-to-survive is dated.

06 AI brain โ€” neon circuit, NES vibes

Green neon circuits with a brain silhouette. NES-era retro digital feel. It has a vibe, but is nowhere near the resolution of Flux dev / Qwen Lightning. Cute as naive style, unusable as modern illustration.

Flux dev (2024) Qwen Lightning (2025) SD 1.5 (2022)
Flux dev AI brain Qwen Lightning AI brain SD 1.5 AI brain
Neon particles, light streaks, top of local Practical, improved over Full NES-era retro digital

โ†’ For abstract art, SD 1.5's naivety can ironically read as flavor, but at modern illustration standards it loses on detail and resolution. For cyberpunk-style work, go to Flux dev.

07 Robots and chess โ€” forgets the robots

Prompt was "three robots playing chess." What came out: two children playing chess. Textbook example of SD 1.5's incomplete prompt parsing.

  • "three robots" โ†’ quantity ignored + robots forgotten
  • "in a sunlit library" โ†’ library rendered (bookshelves in background)
  • "warm afternoon light" โ†’ warm lighting present

Each element gets reflected to a different degree. Plurals and abstract nouns like "robots" tend to get ignored. This is the limit of SD 1.5's text encoder (CLIP ViT-L/14).

SDXL base (2023) Flux dev (2024) SD 1.5 (2022)
SDXL base robots Flux dev robots SD 1.5 robots
Three robots, drawn correctly Three robots, rich expression and hand detail Forgets the robots, draws two children

โ†’ Response to "three robots" improves dramatically across SD 1.5 โ†’ SDXL generations: SD 1.5 (forgotten) โ†’ SDXL base (3 robots OK) โ†’ Flux family (3 + facial detail). Effect of strengthening the text encoder (CLIP-L alone โ†’ CLIP-L+G โ†’ T5-XXL).

08 Izakaya โ€” comes back as a Buddhist scripture scroll

I expected an izakaya (Japanese pub) sign. What I got: a hanging scroll of Buddhist scripture. No building, no paper lantern, no warm light โ€” just a calligraphy work like you'd see at a temple gift shop.

The structure: SD 1.5 tries to retrieve "ๅฑ…้…’ๅฑ‹" (kanji for izakaya) from the training data, hits the wrong association โ€” kanji โ†’ calligraphy โ†’ scroll โ†’ temple โ€” and lands on something completely different. Imagine asking for "a pub sign" and getting "a parchment of medieval prayer text" โ€” that level of category miss.

Flux dev (2024) Qwen Lightning (2025) SD 1.5 (2022)
Flux dev izakaya Qwen Lightning izakaya SD 1.5 izakaya
Kyoto townhouse style + made-up kanji "ๅ…ธๆก”" "ๅฑ…้…’ๅฑ‹" 3 chars perfect + warm-light storefront Buddhist scripture scroll (no building at all)

โ†’ SD 1.5's handling of Asian elements is catastrophic. Flux dev produces fake kanji, Qwen Full hedges, but SD 1.5 doesn't even reach the category "izakaya." Until Qwen Lightning shipped, there was no local option that could write kanji โ€” that 4-year history shows in this single comparison.

What worked

  1. Ultra-light: 4GB, 7-sec load, coexists with everything else on the Mac
  2. Stable as a classic: 4 years of operational track record, troubleshooting info is overwhelming
  3. CreativeML OpenRAIL-M license: commercial OK (though I wouldn't recommend it on quality)
  4. Rich ecosystem: massive number of LoRAs, ControlNet variants, inpainting tools
  5. Value as an "absolute baseline": when used as a comparison, the awesomeness of modern models becomes legible

What didn't

  1. Picture-book illustration quality: not usable at modern illustration standards
  2. Text rendering NG: can't even spell "LOCAL AI"
  3. Asian elements catastrophic: kanji prompt collapses into a temple scroll
  4. Incomplete prompt parsing: ignores quantity specifications and plurals
  5. Locked to 512 resolution: 1024 has no training data, breaks

Where this model still earns its keep

Honestly: SD 1.5's quality is below what 2026 calls "practical."

  • 02 ramen has 5 eggs on it
  • 03 LOCAL AI becomes OOLDD AIXNIA, an invented language
  • 07 "three robots" silently turn into "two children"
  • 08 izakaya comes out as a Buddhist scripture scroll (wrong religion, wrong building, wrong everything)
  • Locked to 512 resolution, no 1024 training data

These never happen with Flux schnell / Qwen Lightning, so there is no scenario where SD 1.5 is your modern illustration candidate. If you need reasons to still use it:

  • โœ… Historical reference: showcase "the starting point of local image generation," visualize 4 years of progress
  • โœ… Absolute baseline: the floor when evaluating other models. The "wow" of Flux dev / Qwen Lightning is calibrated by what SD 1.5 puts out
  • โœ… Ecosystem experiments: the volume of LoRA / ControlNet / inpainting tooling is largest for SD 1.5; if you need niche use cases, still useful
  • โœ… Ultra-low-resource environments: floor of local image generation, runs on 4GB / 4GB VRAM
  • โŒ Modern illustration candidate: 2022-level quality, looks picture-book by today's standards
  • โŒ Part of this article's adoption plan: never lands on the (English-circle = Flux dev / Asian = Qwen Lightning) plan

Gotchas / tips

1. Don't run it at 1024x1024

# โŒ It breaks if you go SDXL-style
image = pipe(prompt=p, height=1024, width=1024, ...)

# โœ… SD 1.5 was trained on 512
image = pipe(prompt=p, height=512, width=512, ...)

SD 1.5's training data is 512x512-centric. 1024 is out of distribution โ€” repeating patterns or composition collapse. There's no point running SD 1.5 at 1024 (use SDXL for that).

2. Keep prompts simple, use negative prompts

SD 1.5 is a model designed around negative prompts. Flux family stopped using them, but for SD 1.5:

neg = "low quality, blurry, distorted, extra fingers, bad anatomy, worst quality"
image = pipe(prompt=p, negative_prompt=neg, ...).images[0]

This mitigates finger problems and face distortion somewhat. Flux-era models bake equivalent processing into the model, so it's unnecessary there, but SD 1.5 needs it explicit.

3. Try multiple seeds โ€” it's a lottery

SD 1.5 has high generation variance. Rolling 5โ€“10 different seeds on the same prompt usually gives you one decent shot. Flux dev / Qwen Lightning hit near-max quality on the first seed; SD 1.5 expects you to play the lottery.

4. Understand LoRA / ControlNet

Bare SD 1.5 breaks, but specialize it via LoRA + lock composition with ControlNet and you can still get usable material in 2026. A "naked SD 1.5" review like this one is the floor sans ecosystem.

5. Be aware of CLIP text encoder limits

SD 1.5's text encoder is CLIP ViT-L/14 with a 77-token limit. Long prompts get truncated from the back. Put important keywords first. Flux family's T5-XXL (thousands of tokens) is orders of magnitude more expressive.

Witnessing 4 years of progress

Lined up in chronological order across this article series:

Year Model Size Resolution Character
2022 SD 1.5 4GB 512 Picture-book level, text NG, Asian elements broken
2023 SDXL base (2023) 7GB 1024 Mid-tier stable, element-addition habit, classic SDXL
2023 SDXL Turbo (2023) 7GB 512 ADD distillation, 1-sec gen, blurry
2024 SD 3.5 Medium (2024) 5GB 1024 DiT, thumbnail OK, full-size NG
2024 Flux schnell (2024) 23GB 1024 4-step Apache 2.0
2024 Flux dev (2024) 23GB 1024 Top photorealism, English text perfect, Asian NG
2025 Qwen-Image (Full) 40GB 1024 Kanji OK, 93 min/image
2025 Qwen-Image Lightning 40GB 1024 8-step LoRA, top local candidate

โ†’ In 4 years: 10ร— the size, 4ร— resolution, text rendering went from broken to perfect, Asian elements went from broken to perfect. Watching SD 1.5 lets you feel just how big that delta is.

Comparison article and next models

Related articles in this series:


Test environment: Mac M1 Max 64GB / macOS 25.4 / Python 3.14 / Diffusers 0.37.1 / PyTorch 2.11 (MPS) Run log: 2026-04-29, Stable Diffusion 1.5 (stable-diffusion-v1-5/stable-diffusion-v1-5, 20-step / guidance 7.5 / 512px)