The Problem
Running SD 1.5 with ControlNets requires more VRAM than low-end hardware provides. Existing tools load the full UNet into GPU memory at once. OnnxStream demonstrated that block-by-block streaming works on CPU — the question is whether the same approach is viable on GPU via Vulkan compute, and whether the overhead is acceptable.
The Approach
Hand-written Vulkan compute shaders (GLSL → SPIR-V) that process each SD 1.5 UNet block in sequence, releasing VRAM between blocks. No Python, no CUDA — Vulkan runs cross-vendor on NVIDIA, AMD, and Intel. LLM inference coordinates via llama-server with llguidance grammars for constrained generation. MIDI generation uses a Rust reimplementation of SkyTNT — a transformer that composes new MIDI from seed sequences.
The Result
The Vulkan SD pipeline has a complete custom shader set with a 4-phase GPU optimization strategy, but end-to-end validation hasn't been completed. The concept is genuinely unexplored territory: no existing tool does block-by-block Vulkan streaming for SD inference on GPU. The Rust MIDI generation model works as a standalone library. LLM orchestration via llama-server is functional.
Architecture
Demo & Screenshots
Tech Stack
Zero-overhead orchestration. When coordinating real-time inference workloads, the orchestrator cannot be the bottleneck.
Direct GPU compute via hand-written GLSL shaders. Cross-vendor (NVIDIA, AMD, Intel) via SPIR-V. No CUDA dependency, no Python runtime.
Coordinated llama-server management — model loading, prompt routing, constrained generation with llguidance grammars.
Custom block-streaming architecture for SD 1.5. The research question: can block-by-block Vulkan compute make SD viable on 2GB VRAM?
Rust reimplementation of SkyTNT's MIDI generation model — a transformer that composes new MIDI from seed sequences. Same architecture, native speed, embeddable as a library.
Current Status
Early development, Vulkan pipeline on hold. The shader set is written and the block-streaming architecture is designed, but end-to-end SD generation hasn't been validated yet. MIDI generation and LLM orchestration are functional. The Vulkan work resumes after Uncanny Realm's content pipeline stabilizes.
What's Next
Validate the Vulkan SD pipeline end-to-end, ControlNet support, determine integration path into the game engine.