NF4 (4-bit) quantization of the HunyuanImage-3.0 Instruct Distil model (v2). The most accessible option -- fits on a single 48GB GPU with ~6x faster generation (8 steps vs 50). Best balance of speed, quality, and VRAM.
v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.
think_recaption mode for highest quality| Component | Memory |
|---|---|
| Weight Loading | ~29 GB weights |
| Inference (additional) | ~12-20 GB inference |
| Total | ~41-49 GB |
Recommended Hardware:
This is the CFG-Distilled variant:
cfg_distilled: true means no classifier-free guidance neededLayers quantized to NF4:
Kept in full precision (BF16):
This model is designed to work with the Comfy_HunyuanImage3 custom nodes:
cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3
nf4 precisionThe Instruct model supports three generation modes:
| Mode | Description | Speed |
|---|---|---|
image | Direct text-to-image, prompt used as-is | Fastest |
recaption | Model rewrites prompt into detailed description, then generates | Medium |
think_recaption | CoT reasoning -> prompt enhancement -> generation (best quality) | Slowest |
Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the full model requires. The system keeps N transformer blocks on CPU and swaps them to GPU on demand during each diffusion step.
| blocks_to_swap | VRAM Saved | Recommended For |
|---|---|---|
| 0 | 0 GB | 96GB+ GPU (no swap needed) |
| 4 | ~5 GB | 80-90GB GPU |
| 8 | ~10 GB | 64-80GB GPU |
| 16 | ~19 GB | 48-64GB GPU |
| -1 (auto) | varies | Let the system decide |
This is a quantized derivative of Tencent's HunyuanImage-3.0 Instruct.
This model inherits the license from the original Hunyuan Image 3.0 model: Tencent Hunyuan Community License