GPU-Textures-as-Derived-(Compiled)-Data

This content was automatically converted from the project's wiki Markdown to HTML. See the Basis Universal GitHub wiki for the latest content.

Overview

Modern GPU texture formats such as ASTC and BC7 are execution formats designed for fast, fixed-function decoding on the GPU. They are not optimized for storage or distribution. Historically, textures have been shipped directly in these GPU-ready formats because encoding and transcoding costs were assumed to be too expensive or disruptive.

Systems like Basis Universal formalize a different model: textures are treated as derived data, generated from a compact, portable intermediate representation and cached locally after conversion to a GPU-specific format. This aligns texture handling with workflows that are already common for shaders and other assets compiled and cached either right before or during execution in the background.

The Shader Analogy

From a systems perspective, shaders and textures now share many of the same properties:

Aspect	Shaders	Textures (Modern Model)
GPU-specific output	Yes	Yes
Portable source representation	Yes	Yes
Expensive to produce	Yes	Yes
Generated on demand	Yes	Yes
Cached on disk	Yes	Yes
Compiled form larger than source	Yes	Yes

In both cases, the cost of producing the final GPU-specific representation is typically paid once and amortized through caching. The compiled form is optimized for execution, not for distribution.

ASTC Block Size as a Distribution Choice

ASTC supports a wide range of block sizes (e.g. 4×4 through 12×12, including non-square variants). Traditionally, only a small subset of these have been used, often treating ASTC similarly to BCn formats with fixed block sizes. ASTC’s flexibility has been largely underutilized, which is surprising given that it's the most deployed GPU texture format in the world with billions of shipped devices containing hardware decoders.

When textures are distributed in a highly compressed intermediate form and transcoded locally, ASTC block size becomes a content and distribution decision, rather than a hardware constraint. Larger block sizes can significantly reduce source data size for suitable content. At runtime, textures are converted to the best available GPU format (ASTC where supported, BC7 or others where not), with optional adaptive deblocking applied either during transcoding or in a simple pixel shader. Deblocking has been used in modern video and image compression for decades, but not in GPU texture compression until now.

Hardware Context

This model is supported by current, widely deployed hardware characteristics:

NVMe SSD bandwidth: approximately 5–14 GB/s on common PCIe Gen 4 and Gen 5 systems. The first Gen 6 drive has 14 GB/s write and 28 GB/s read throughput.
CPU parallelism: commonly 8–32 cores on desktop and workstation CPUs, with additional efficiency cores on many platforms.

Texture transcoding scales well across threads and can run on background workers. When combined with modern SSD disk caching, the cost of generating GPU-ready textures is typically paid once per asset, in the background. In this context, reducing distribution size can yield much larger overall gains than minimizing local compute time.

Why This Transition Is Occurring

The shift toward treating textures as derived data is driven by changing conditions:

Network bandwidth, download size, and patch size increasingly dominate user experience and cost.
Local compute capacity and storage bandwidth have grown substantially.
One-time transcoding work can be amortized through caching.
Recent advances in DCT transform-domain GPU texture compression, combined with bounded-time O(1) analytical real-time encoders, enable very low bitrates: approximately 0.3–3.5 bpp, with typical results in the 0.75–2.5 bpp range (compared to ~4 bpp with older RDO+LZ approaches).
Deblocking advances. There are two places in the GPU pipeline were modern deblocking can be inserted: right before or during transcoding, or in a pixel shader.

This makes it practical to ship smaller, more flexible intermediate texture representations and synthesize GPU-specific formats locally using install-time transcoding on 8+ CPU cores, using the same principles already applied to shader compilation.

This approach does not replace GPU texture formats. It changes when and where they are produced, allowing distribution and storage decisions to be made at a higher level while preserving the same GPU execution formats at runtime. Essentially, there's no point in having the fastest GPU texture decompressor (which notably is now slower vs. modern SSD throughput) if your downloads are several times larger than the competition using modern approaches.