HappyHorse-1.0 Architecture

A deep dive into the 15B-parameter single-stream Transformer

Architecture Overview

Text Tokens
Image Latent
Noisy Video + Audio Tokens

40-Layer Unified Transformer

Layers 1–4: Modality-specific projections
Layers 5–36: Shared parameters (per-head gating)
Layers 37–40: Modality-specific projections
Denoised Video
Denoised Audio

Key Technologies

Single-Stream Architecture

Unlike multi-stream models (which process modalities separately), HappyHorse uses one unified sequence of text, video, and audio tokens. This enables shared representations and reduces inference overhead.

DMD-2 Distillation

Distribution Matching Distillation v2 compresses the denoising process from 50+ steps to just 8, without requiring Classifier-Free Guidance (CFG). Result: dramatically faster inference without quality loss.

MagiCompiler

A full-graph compilation system that fuses operators across Transformer layers. Delivers ~1.2× end-to-end speedup over standard PyTorch eager execution on H100.

Per-head Gating

Each attention head has a learnable gate that controls how much each modality influences others. Critical for stable joint video+audio training — prevents one modality from dominating.

FP8 Quantization

8-bit floating point inference reduces VRAM requirements significantly, enabling single-GPU deployment on consumer A100 hardware without significant quality degradation.

Joint Audio-Video Denoising

Text, video frames, and audio spectrograms are tokenized into a single sequence and denoised jointly. No separate audio post-processing step needed.

Hardware Requirements

ConfigurationVRAMSpeed (5s 1080p)Use Case
H100 80GB80GB~38 secondsProduction / Research
A100 80GB80GB~55 secondsResearch
A100 40GB + FP840GB~90 secondsBudget Production
2× A6000 48GB96GB~70 secondsMulti-GPU Setup

Related Open Source

daVinci-MagiHuman — The Foundation

The most likely architectural foundation for HappyHorse-1.0. Released March 23, 2026 by GAIR Lab (Shanghai) + Sand.ai. Fully open source on GitHub and HuggingFace.

HappyHorse-1.0 Architecture | Technical Deep Dive