ComfyUI 图像放大技术白皮书 ComfyUI Image Upscaling White Paper

作者:GaryAIGCAuthor: GaryAIGC | YouTube Bilibili Dream3D.vip

1. 潜空间放大 (Latent Upscale) 1. Latent Space Upscale

核心原理:信息熵的插值 Core Principle: Entropy Interpolation

原理:直接在 Latent 张量上进行插值拉伸 → 必须配合 Ksampler 二次重绘 Logic: Tensor interpolation in Latent Space → Requires Ksampler resampling
  • 物理限制:在 VAE Encode 阶段,图像已被压缩,其信息熵上限已确定。 Physical Limit: Information entropy is capped during VAE Encode.
  • 本质:它是“再画一遍”,但尊重原构图。模型会尝试补全它“认为应该存在”的细节(可能是对的,也可能是幻觉)。 Essence: "Redrawing" while respecting composition. The model hallucinates details it "thinks" should exist.
✅ 推荐场景✅ Recommended “还没定稿,但已经确定构图”的中间阶段。快速验证,甚至期待模型给一点惊喜。 Pre-final stage where composition is set. Good for rapid prototyping and expecting "happy accidents".
❌ 慎用场景❌ Avoid Using
  • LOGO / 文字:Latent 会尝试理解文字,但通常会画错。Logo/Text: Latent misinterprets text shapes.
  • 极度写实产品:容易“优化过头”,改变产品特征。Hyper-realism: Risks over-optimization.
  • 完美成图:它只会破坏你原本满意的结果。Finished Art: It will destroy your perfect result.

⚠️ Latent 放大的 4 个深坑 ⚠️ 4 Pitfalls of Latent Upscale

01
贪大Greedy Scale

❌ 一步 4倍
✅ 分多步,每次 1.5x / 2x
❌ 1x → 4x direct
✅ Step by step (1.5x)

02
Denoise 错误Wrong Denoise

过低是马赛克,过高是换脸。
需精准控制。
Too low = Pixelated
Too high = Identity shift

03
Prompt 未收缩Prompt Expand

❌ 加新内容 (New dress)
✅ 锁细节 (Detailed skin)
❌ Adding new subjects
✅ Locking details

04
定位错误Misconception

它是“生成工具”
不是“交付工具”。
It's a "Generation Tool"
Not "Delivery Tool"

2. 图像/像素空间放大 (Pixel Space) 2. Image/Pixel Space Upscale

2.1 传统 Upscale (Traditional)2.1 Traditional Upscale

UltraSharp / SwinIR
  • 特点:不引入扩散、不重绘。构图与像素位置 100% 保持。Features: No diffusion. 100% composition retention.
  • 局限:不生成新纹理,对模糊图无能为力。Limits: No new textures. Cannot fix blur.
  • 关键认知:任何跳过 Re-Encode 的流程都无法真正增加细节密度。只有进入 Latent,模型才有机会“无中生有”。Key: Skipping Re-Encode = No new detail density. Only Latent entry allows hallucination.

2.2 扩散重绘 (Resampling / Hires Fix)2.2 Diffusion Resampling

Upscale Image -> VAE Encode -> Ksampler (Denoise 0.3-0.5)
  • 优点:一致性高。Pros: High consistency.
  • 技巧:可配合 ControlNet Tile 防止重绘时构图走样。Tip: Use ControlNet Tile to lock structure.

2.3 分块放大 (Tiled Upscale)2.3 Tiled Upscale

包含 SD Upscale, TTP Upscale 等。Includes SD Upscale, TTP Upscale.

ControlNet Tile 的本质Essence of CN Tile Tile 并不是在“理解图像”,而是在约束局部纹理如何被重建。即使 Denoise=1,它也能按住原图的骨架不散。 Tile doesn't "understand" image, it constraints texture reconstruction. Keeps structure even at Denoise=1.

3. 专项修复 (Specialized Restoration) 3. Specialized Restoration

StableSR / CCSR

偏向工程修复,减少过度美化。Engineering restoration, less beautification.

APISR

动漫/二次元专属放大,线条处理极佳。Anime specific. Excellent line handling.

SUPIR High VRAM

图像已经“不可救”时的最后手段。Last resort for "unsalvageable" images.

  • 逻辑:语义重建。它看着马赛克说“这应该是个人”,然后画个人。Logic: Semantic reconstruction.
  • 代价:真实感极强,但不保证像素或身份一致(Identity Shift)。Cost: High realism but potential identity shift.

4. 2026 主流:生成型图像放大 (Video Models) 4. 2026 Trend: Generative Upscale (Video Models)

为什么“视频模型”统治了“图像放大”? Why Video Models Rule Image Upscaling?

现在的真实需求不再是“无损放大”,而是“洗图 (Washing/Polishing)”——即“更好看 + 还是同一个东西”。 The real demand isn't "Lossless", but "Polishing" — "Better looking + Same identity".

底层逻辑:单帧视频生成模式下的 Latent 重绘。
Logic: Latent inpainting in single-frame video generation mode.

视频模型的一致性优势是“换来的”:它们牺牲了极端的细节自由度,换取了全局的结构稳定性。对大多数人来说,“不翻车”比“最惊艳”更重要。 Video models trade "detail freedom" for "structural stability". "Not failing" is more important than "Amazing".

SeedVR 推荐 / Recommended

4K以内直接跑,8K 配合 TTP 分块。 Direct run for <4K. TTP Tiling for 8K.
  • 关键词:Seed (生成起点), Consistency (一致性), Realism (观感)。Keywords: Seed, Consistency, Realism.
  • 哲学:允许适度再创作,用生成能力换取画面质量。Philosophy: Allows moderate regeneration for quality.
  • 地位:它在“一致性”与“观感”之间找到了最佳平衡点。Status: Best balance between consistency and look.

FlashVSR 保守派 / Conservative

SeedVR 效果不理想时的备选方案。 Backup plan when SeedVR fails.
  • 逻辑:同属视频模型,利用时空全局约束优化。Logic: Spatio-temporal constraints optimization.
  • 优点:不容易乱画,人脸不容易飞,结构极其安全。Pros: Very safe structure. Face won't distort.
  • 缺点:细节提升有限,风格偏“工程味”,不会主动变好看。Cons: Limited detail boost. "Engineering" look.

💡 终极总结 (Conclusion) 💡 Conclusion

你的第一目标Primary Goal 推荐路径Recommended Path
结构一致性 (不翻车)Consistency (Safety) 视频模型 (SeedVR / FlashVSR)Video Models
细节自由度 (加戏)Detail Freedom 图像模型 (SD Hires Fix / Tiled)Image Models
死图复活 (救命)Resurrection 语义重建 (SUPIR)Semantic Reconstruction