Hunyuan Image 3.0 Instruct Distil -- NF4 Quantized (v2)

NF4 (4-bit) quantization of the HunyuanImage-3.0 Instruct Distil model (v2). The most accessible option -- fits on a single 48GB GPU with ~6x faster generation (8 steps vs 50). Best balance of speed, quality, and VRAM.

What's New in v2

v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.

Key Features

Instruct model -- supports text-to-image, image editing, multi-image fusion
Chain-of-Thought -- built-in think_recaption mode for highest quality
NF4 quantized -- ~48 GB on disk
8 diffusion steps (CFG-distilled)
Block swap support -- offload transformer blocks to CPU for lower VRAM
ComfyUI ready -- works with Comfy_HunyuanImage3 nodes

VRAM Requirements

Component	Memory
Weight Loading	~29 GB weights
Inference (additional)	~12-20 GB inference
Total	~41-49 GB

Recommended Hardware:

Single 48GB GPU (RTX 6000 Ada, RTX PRO 5000, A6000)
With block swap: may work on 24GB GPUs (swapping ~20 blocks)

Model Details

Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
Parameters: 80B total, 13B active per token (top-K MoE routing)
Variant: Instruct Distil (CFG-Distilled, 8-step)
Quantization: 4-bit NormalFloat (NF4) quantization via bitsandbytes with double quantization
Diffusion Steps: 8
Default Guidance Scale: 2.5
Resolution: Up to 2048x2048
Language: English and Chinese prompts

Distillation

This is the CFG-Distilled variant:

Only 8 diffusion steps needed (vs 50 for the full Instruct model)
~6x faster image generation
No quality loss -- distilled to match the full model's output
cfg_distilled: true means no classifier-free guidance needed

Quantization Details

Layers quantized to NF4:

Feed-forward networks (FFN/MLP layers)
Expert layers in MoE architecture (64 experts per layer)
Large linear transformations

Kept in full precision (BF16):

VAE encoder/decoder (critical for image quality)
Attention projection layers (q_proj, k_proj, v_proj, o_proj)
Patch embedding layers
Time embedding layers
Vision model (SigLIP2)
Final output layers

Usage

ComfyUI (Recommended)

This model is designed to work with the Comfy_HunyuanImage3 custom nodes:


cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3

Download this model to your preferred models directory
Use the "Hunyuan 3 Instruct Loader" node
Select this model folder and choose nf4 precision
Connect to the "Hunyuan 3 Instruct Generate" node for text-to-image
Or use "Hunyuan 3 Instruct Edit" for image editing
Or use "Hunyuan 3 Instruct Multi-Fusion" for combining multiple images

Bot Task Modes

The Instruct model supports three generation modes:

Mode	Description	Speed
`image`	Direct text-to-image, prompt used as-is	Fastest
`recaption`	Model rewrites prompt into detailed description, then generates	Medium
`think_recaption`	CoT reasoning -> prompt enhancement -> generation (best quality)	Slowest

Block Swap

Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the full model requires. The system keeps N transformer blocks on CPU and swaps them to GPU on demand during each diffusion step.

blocks_to_swap	VRAM Saved	Recommended For
0	0 GB	96GB+ GPU (no swap needed)
4	~5 GB	80-90GB GPU
8	~10 GB	64-80GB GPU
16	~19 GB	48-64GB GPU
-1 (auto)	varies	Let the system decide

Original Model

This is a quantized derivative of Tencent's HunyuanImage-3.0 Instruct.

License: Tencent Hunyuan Community License

Credits

Original Model: Tencent Hunyuan Team
Quantization: Eric Rollei
ComfyUI Integration: Comfy_HunyuanImage3

License

This model inherits the license from the original Hunyuan Image 3.0 model: Tencent Hunyuan Community License

35/F,Tencent Building,Kejizhongyi Avenue,Nanshan District,Shenzhen

京ICP备11018762号-111