JPEG-for-ASTC

This content was automatically converted from the project's wiki Markdown to HTML. See the Basis Universal GitHub wiki for the latest content.

Intro
Similarities to JPEG
Differences from Classical Image Codecs
Emergent Artifact Avoidance
Psychovisual Foundations of the Quantization Matrix
Relationship to Modern Image Codec Theory
Comparisons to Older GPU Texture Techniques
Historical Parallels
What This Unlocks Going Forward

Intro

This page is "what XUASTC LDR is, from a JPEG perspective". Anyone who understands JPEG will find XUASTC LDR conceptually familiar, because the underlying compression architecture is structurally very similar. XUASTC LDR can be seen as applying a JPEG-like codec to ASTC’s interpolation field, rather than to pixels. Alternatively, it can be viewed as a JPEG for the ASTC latent space: a highly structured, parametric representation consisting of endpoints, partitions, and interpolation fields, with the GPU’s ASTC decoder acting as a fixed, hardware-implemented generative decoder.

Alternatively, this document is about what GPU texture compression fundamentally is.

Similarities to JPEG

Although XUASTC LDR operates on ASTC block metadata rather than pixels, the architecture is structurally similar to JPEG in its use of prediction, transform coding, and quantization. The format applies the classic predictive-transform-quantize pipeline of JPEG, but adapts it to ASTC’s internal representation of an image: endpoints, weight grids, and interpolation rules. In practice, this makes XUASTC the functional equivalent of “JPEG for ASTC”: a transform coder built for the most widely deployed hardware GPU texture format in the world.

Several deep parallels make this description technically accurate:

Adaptive decorrelation via endpoints (per-block PCA).

JPEG uses the YCbCr colorspace to decorrelate RGB channels globally. ASTC endpoints define a block’s principal color axis locally, effectively performing a highly adaptive block-by-block color channel decorrelation transform via PCA. ASTC weight grid values then describe projection along this axis, just as JPEG’s Y channel carries most of the structured variation.

DPCM prediction of “DC-like” components across blocks.

JPEG predicts each block’s DC coefficient from its neighbors. XUASTC predicts ASTC block configuration and endpoint values across nearby blocks using RUN, SOLID, and REUSE_CFG commands, and endpoint DPCM. These elements play the same architectural role as JPEG’s DC prediction, exploiting coherence between adjacent blocks.

Transform coding of weight grids using a JPEG-derived DCT.

Weight grids in ASTC function like tiny grayscale images controlling interpolation across a block. XUASTC applies a 2D DCT to these grids, uses zigzag ordering, performs run-length coding of AC coefficients, and uses a JPEG luma quantization matrix resampled to the grid size. This closely mirrors JPEG’s AC coefficient coding pipeline.

Unlike baseline JPEG, in addition to DCT coding an XUASTC encoder can choose to code weight grids predictively on a per-block basis using weight grid DPCM. This provides a robust alternative to DCT on blocks where transform coding is disabled, inefficient, or prone to artifacts. In the current encoder implementation, the weight-grid DPCM path is lossless for the reconstructed weights. A lossy variant is also possible by restricting/quantizing DPCM residual choices during rate–distortion optimization.

Adaptive Quantization tailored to block content.

JPEG incorporates perceptual weighting through its fixed 8×8 quantization matrix, which allocates finer precision to visually important spatial frequencies. XUASTC builds on this concept but goes further: it resamples the JPEG luma quant table to the ASTC weight-grid size and then applies a per-block adaptive scale factor derived from endpoint RGBA span length, grid weight quantization precision, block geometry, and the global quality setting. This allows XUASTC to allocate more detail (or more output bits) to visually complex blocks and use heavier quantization (or less output bits) on simpler blocks, achieving finer control than JPEG’s globally scaled matrix.

IDCT reconstruction with clamping into ASTC’s legal domain.

After dequantization, XUASTC performs an inverse DCT (mathematically a DCT-III). Minor floating-point differences are allowed, because weight grids do not participate in predictive loops. The final grid is clamped and re-quantized into ASTC’s ISE weight domain, just as JPEG clamps reconstructed pixels.

ASTC as the implicit image.

Conceptually, if a luminance ASTC 8x8 block’s endpoints are set to 0 and 255, and its 8x8 weight grid uses 5-bit values, the block becomes a tiny 5-bit grayscale image. Applying a DCT to that weight grid is literally equivalent to encoding a miniature 8x8 image (or a single JPEG MCU block) with JPEG. XUASTC generalizes this idea to all ASTC block sizes and grid dimensions: the interpolation weights form an implicit low-resolution image surface, and transform-coding those weights produces the same kind of efficiency gains JPEG achieves on pixels.

Because of these structural correspondences, XUASTC LDR is best understood as a complete transform-predictive codec layered on top of ASTC’s internal image model. It does for ASTC blocks what JPEG does for pixel blocks—delivering a highly efficient “supercompressed ASTC” representation while preserving full compatibility with billions of hardware ASTC decoders.

Structural comparison:

JPEG	XUASTC
Pixel blocks	ASTC latent blocks
RGB → YCbCr	Endpoint PCA-like axis
DC prediction	Endpoint + config DPCM
8×8 DCT	Weight-grid DCT + DPCM Fallback
Quant matrix	Resampled + adaptive matrix
IDCT → pixels	IDCT → weights → ASTC decode

Differences from Classical Image Codecs

The above outlines how XUASTC LDR and JPEG are similar. The next section concerns the key differences.

Even though XUASTC LDR reuses some familiar JPEG-style machinery (most notably transform coding and quantization), it is not a JPEG-like image codec, and it does not operate in the same optimization regime.

The key difference is where and how dimensionality reduction occurs, and how encoding decisions are made.

In classical image codecs such as JPEG, the representation is fixed up front: pixels are arranged on a uniform grid, optionally transformed into a different color space, and then passed through a transform (DCT, wavelet, etc.). Quantization is applied to the transform coefficients, and the codec accepts whatever error results from that process. The only degrees of freedom are coefficient magnitudes and bit allocation within a fixed basis.

XUASTC LDR inverts this process.

Instead of starting with a fixed representation and hoping the transform behaves well, the XUASTC LDR encoder first performs aggressive, content-adaptive dimensionality reduction using the ASTC block model. Each ASTC block configuration (Color Endpoint Mode, endpoint/weight quantization levels, weight grid resolution, number of planes, number of partitions and pattern, etc.) defines a different low-dimensional approximation of the original signal. This stage is effectively a parametric model selection problem, analogous to choosing a low-rank subspace or PCA-like basis that best explains the block.

Crucially, dimensionality reduction happens before transform coding, not as a side effect of it.

For each viable ASTC configuration, the encoder then:

Forms the corresponding ASTC block representation.
Applies the same transform (DCT) to the ASTC weight grid.
Applies the same quantization matrices.
Reconstructs the block.
Measures the error after transform coding and quantization.

This joint optimization process is repeated for dozens (and sometimes hundreds) of candidate ASTC configurations. All candidates are evaluated using identical transform and quantization settings, meaning they all produce the same class of distortion. The encoder then selects the configuration that minimizes post-transform error within this fixed distortion model.

This has several important consequences:

The encoder is searching over representations, not just coefficients.
Error shaping is consistent across all candidates, which makes simple metrics like MSE valid inside the encoder loop.
Representation choice and transform coding are optimized together, not sequentially.
ASTC configurations that interact poorly with transform coding are naturally rejected.

Because the transform operates on an already-reduced, structured signal (ASTC weights rather than raw pixels), classic JPEG failure modes such as ringing and "mosquito" noise are less common and lower in visibility, because high-frequency error is introduced within the ASTC interpolation domain. Errors tend to be absorbed by the interpolation model rather than radiating around edges in pixel space.

In short, XUASTC LDR is best understood as a model-selection and dimensionality-reduction system with transform coding as a secondary refinement step, rather than as a traditional transform codec. While it reuses some of the same mathematical tools, it applies them in a fundamentally different order and for a different purpose.

Emergent Artifact Avoidance

During development and testing of XUASTC LDR, we observed unexpectedly clean results at bitrates where JPEG typically exhibits significant ringing, blocking, and "mosquito" noise. After considerable analysis, we determined that ASTC's reconstruction model possesses emergent artifact-suppression properties that were not intentionally designed into either ASTC or XUASTC - but arise from the interaction of multiple independent engineering decisions made for unrelated reasons.

This section documents these properties, which we believe are not widely understood even among the original ASTC specification authors. They were certainly surprising to us.

Why JPEG Rings

Classical DCT artifacts arise from a fundamental mathematical limitation: representing a sharp discontinuity with a truncated set of periodic basis functions produces oscillation around the edge (Gibbs phenomenon). JPEG applies DCT directly to pixels, so when quantization discards high-frequency coefficients needed to represent an edge, the reconstruction overshoots and undershoots near that edge. This manifests as visible ringing or JPEG-style "mosquito artifacts".

Why XUASTC Is Ringing-Resilient

XUASTC applies DCT to ASTC weight grids, not pixels. The path from DCT coefficients to final pixel values passes through multiple stages of ASTC's procedural block generation/reconstruction pipeline, each of which independently suppresses the conditions that produce ringing. These properties were engineered into ASTC for GPU efficiency reasons - not for DCT artifact suppression - but they combine to make XUASTC highly resilient to DCT-induced ringing.

1. Bilinear upsampling is a low-pass filter.

When DCT quantization introduces mild oscillation into the weight grid, the ASTC weight grid bilinear upsampling step acts as a smoothing operator before weights reach texel reconstruction. Any ringing would have to survive both inverse DCT and bilinear interpolation - two cascaded low-pass operations. In practice, it is heavily attenuated.

When the encoder selects a full-resolution weight grid (e.g., 6×6 weights for a 6×6 block), no upsampling occurs and this smoothing effect is absent. In those configurations, ringing resilience depends on the other factors described below - particularly endpoint bounding, block resolution partition patterns, and the mode selection process.

2. Endpoint interpolation bounds the output range.

In JPEG, a reconstructed pixel can take any value in [0, 255], allowing arbitrary overshoot. In ASTC, each texel is computed as:

output = lerp(endpoint_low, endpoint_high, weight)

The output is constrained to the line segment between two endpoint colors. Even if weights oscillate between 0.0 and 1.0, pixel values cannot exceed the endpoint bounds. Ringing requires overshoot; endpoint interpolation severely limits overshoot.

3. Partition patterns absorb edge discontinuities.

JPEG treats all 64 pixels in an 8×8 block uniformly. If an edge crosses the block, the DCT must represent it with coefficients-which then ring when quantized.

XUASTC supports up to 1,024 partition patterns for 2-subset blocks and another 1,024 for 3-subset blocks. (It's a bit less than 1,024 unique patterns in practice, because of duplicates.) Each partition assigns texels to different endpoint pairs. Crucially, the encoder can select a partition pattern that aligns with image edges, representing the discontinuity as a boundary between partitions rather than as variation in the weight grid.

This means sharp edges can be absorbed by the block's structural configuration, not encoded as high-frequency weight variation that could ring. The edge becomes a parameter choice, not a signal reconstruction problem.

4. The 13,659-mode configuration space enables content-adaptive representation.

JPEG has one mode: 8×8 DCT on pixels. XUASTC has 13,659 valid block configurations arising from the combinatorial product of:

Variable weight grid dimensions (including anisotropic grids)
Multiple weight and endpoint quantization levels (BISE ranges)
LDR Color Endpoint Modes (CEMs - 8 supported in XUASTC)
1–3 partitions with 1,024 patterns each
Single or dual plane
Dual plane channel selection

5. If no configuration using DCT is suitable, the encoder can instead choose to use weight grid DPCM.

XUASTC LDR's analytical encoder ranks and evaluates many of these configurations per block and selects the one that best survives transform coding. Configurations that interact poorly with DCT-those that would produce artifacts-are naturally rejected by the encoder's joint optimization process.

This is not a single anti-ringing mechanism. It is a massive search over representations, where artifact-prone representations lose to artifact-free alternatives.

The Accidental Architecture

None of the above properties were designed for DCT supercompression:

ASTC Design Decision	Original Purpose	Emergent Effect
Bilinear upsampling	Fast hardware texel lookup	Low-pass filter attenuates oscillation before pixels
Endpoint interpolation	Fast fixed-point math in silicon	Bounded output range limits overshoot
Partition patterns	Edge fidelity in 128-bit blocks	Edges absorbed structurally, not encoded as signal
Thousands of modes	Quality/bitrate flexibility	Content-adaptive search rejects artifact-prone configs

ARM and AMD designed ASTC to be an efficient GPU texture format optimized for memory bandwidth and hardware sampling. The specification was frozen and deployed to billions of devices before anyone considered using it as a transform coding target.

When we began applying JPEG-style techniques to ASTC weight grids, we expected to encounter familiar artifacts and planned to develop mitigation strategies. Instead, we found a reconstruction model that is structurally resilient to the conditions that produce those artifacts.

Implications

ASTC's reconstruction model - designed for an entirely different purpose - happens to define an output space that is highly resilient to DCT artifacts.

The practical result is that XUASTC LDR achieves competitive compression ratios with visually cleaner results than would be expected from a straightforward application of transform coding. The emergent properties of ASTC's procedural reconstruction provide artifact suppression that no amount of encoder tuning could achieve in a traditional pixel-domain codec.

What Artifacts XUASTC Actually Produces

While XUASTC avoids DCT ringing, it has its own artifact profile. Edges that align with partition boundaries are razor-sharp: represented structurally rather than as signal variation within the weight grid. But texture detail can soften, especially at larger block sizes, because weight grids are limited to a maximum of 64 weight samples by ASTC (regardless of block size, including 12×12), and DCT quantization further smooths high frequencies. Block boundary artifacts can occur (mitigated by deblocking), and high-chroma edges at large block sizes occasionally show artifacts where partition patterns and coarse weight grids struggle with simultaneous luminance and chroma discontinuities. The net result: XUASTC preserves major structural edges cleanly but trades fine texture sharpness for the avoidance of ringing.

Psychovisual Foundations of the Quantization Matrix

XUASTC LDR's weight-grid DCT path is built on one of the most scientifically grounded components in image compression: the JPEG baseline "perceptually lossless" luminance quantization matrix. See Annex K.1, Table K.1, page 143 of the JPEG standard.

16  11  10  16  24  40   51   61
12  12  14  19  26  58   60   55
14  13  16  24  40  57   69   56
14  17  22  29  51  87   80   62
18  22  37  56  68 109  103   77
24  35  55  64  81 104  113   92
49  64  78  87 103 121  120  101
72  92  95  98 112 100  103   99

(XUASTC LDR implementation note: The (0, 0) DC DCT coefficient term is quantized and handled separately, and not via this matrix. The DC value in this matrix is modified from 16 to 4, so when the matrix is sampled via bilinear filtering the very lowest frequency AC coefficients (of the largest ASTC block sizes) are quantized correctly.)

The 64 values in JPEG's 8×8 luminance quantization table were informed by and derived from controlled psychophysical experiments conducted in the late 1980s and early 1990s. Researchers including Ahumada and Peterson at NASA Ames Research Center (also see here), and later Watson, Yang, Solomon, and Villasenor (for wavelets), measured human detection thresholds for individual DCT basis functions presented at varying amplitudes against uniform backgrounds. Using forced-choice protocols across multiple observers, they characterized contrast sensitivity across the 2D spatial frequency space of an 8×8 DCT block.

The resulting quantization matrix approximates these measured detection thresholds, encoding a frequency-by-frequency map of human visual sensitivity. While committee tuning, normalization, and integerization were applied for standardization, the table remains fundamentally grounded in experimentally measured perceptual limits rather than purely heuristic distortion weighting.

XUASTC LDR adopts this matrix to anchor quantization decisions in empirically measured visual sensitivity. Because ASTC weight grids vary in size and aspect ratio (from a total of 4 to 64 grid samples, including non-square configurations), the 8×8 matrix values are bilinearly resampled to the dimensions of each weight grid, mapping each coefficient to its corresponding position in the matrix's frequency space. This resampling is well-behaved because the underlying contrast sensitivity function is smooth across spatial frequency. The matrix entries are samples of a continuous perceptual surface, and bilinear interpolation between them produces appropriate thresholds at intermediate frequencies.

As in standard JPEG, each DCT coefficient is then quantized using a step size derived from the matrix value at that frequency position - larger steps at frequencies where the eye is less sensitive, smaller steps where sensitivity is greatest. The adaptive quantization layer further normalizes weight-domain error back into the perceptual domain in which the original thresholds were characterized, conservatively preserving the calibration of frequency sensitivity even as block geometry and endpoint span vary.

Because the underlying measurements were obtained using uniform backgrounds without masking, the resulting thresholds are conservative relative to typical natural images, where spatial and contrast masking often reduce visibility further. This provides a safety margin against perceptually visible loss while maintaining strong compression efficiency.

Anchoring on the JPEG matrix also provides a practical advantage: its perceptual behavior has been validated across decades of deployment and billions of images. Artifact characteristics are well understood and predictable. By reusing this matrix rather than introducing a novel perceptual model, XUASTC inherits both the empirical foundation and the extensive validation history of the most widely deployed lossy image codec in the world.

Relationship to Modern Image Codec Theory

From an image codec theory perspective, XUASTC can be understood as a latent-space, analysis-by-synthesis (AbS) psychovisual transform codec whose synthesis operator is fixed, standardized, and already deployed in hardware. Classic JPEG machinery (block DCT, quantization matrices, coefficient prediction, entropy coding) is applied not to pixels, but to ASTC’s latent interpolation fields (weight grids and endpoint parameters). The encoder repeatedly synthesizes candidate reconstructions through the full decode path - including transform loss, quantization, and ASTC decode, and selects representations based on measured output error. This places XUASTC squarely within the same architectural family as modern analysis-by-synthesis and learned latent codecs, differing primarily in that the latent space and synthesis operator are hand-designed, deterministic, and fixed rather than learned and evolving.

In contrast to learned decoders trained on curated datasets, XUASTC's fixed and standardized synthesis operator embodies a mature, hand-designed generative model whose inductive biases have been validated through extensive real-world texture and image deployment across diverse content and hardware.

Crucially, while final hardware synthesis is block-local, XUASTC is not limited to block-local modeling. Cross-block prediction is performed in latent space during supercompression, and optional but well-defined deblocking or reconstruction filters can be applied during transcoding or shading, with the encoder optimizing against the post-filtered result. As a result, most codec intelligence - including global redundancy exploitation, rate–distortion optimization, and error shaping - resides entirely on the encoder and transcoder side, not in the hardware decoder. The practical implication is that XUASTC demonstrates a modern codec design point in which a fixed, widely deployed synthesis operator (ASTC, BC7, etc.) is treated as a latent image model, and competitive compression performance is achieved through encoder-side analysis-by-synthesis, transform coding, prediction, and entropy modeling alone.

Comparisons to Older GPU Texture Techniques

A common historical approach to "GPU texture supercompression" has been to treat the GPU block format as an opaque byte stream and then apply byte-oriented preprocessing and general-purpose compressors (e.g. LZ77-family) on the resulting data. This is one of the methods we innovated over a decade ago in our crunch library (RDO mode vs. .CRN), and later in a universal (texture format independent) way in bc7enc_rdo (see ert.cpp). While these techniques can be useful, they tend to plateau quickly because they largely ignore the semantics of what the block format is actually encoding.

Why byte-prep + LZ approaches plateau

Byte-stream approaches are constrained by what the serialized bits look like. After extracting the easy wins:

repeated byte sequences
obvious local patterns
a bit of structuring to improve match-finding
preprocessing block structures into separate byte streams

the remaining stream is often close to high-entropy at the byte level. This is not accidental: GPU block formats are engineered to efficiently pack perceptually meaningful parameters into compact, highly quantized representations. That packing tends to destroy the long repeated substrings that LZ-style compressors exploit best.

The result is that improvements taper and become content-fragile: once the "low-hanging fruit" is gone, additional engineering yields diminishing returns.

This is not to say these tricks are useless; they are just fundamentally second-order once the stream is already near-random to LZ.

Why transform-in-the-latent keeps scaling

In contrast, XUASTC LDR models the latent parameters of ASTC and applies transform coding to the implicit image represented by the ASTC weight grid. This changes the problem from "make bytes compressible" to "compress a signal representation".

Once you move to transform-based latent coding, you inherit a large toolbox of compounding improvements that do not depend on byte-level repetition:

better prediction graphs (cross-block, configuration-conditioned)
better entropy models (contexts, mixtures, block classes)
better quantization/RDO (trellis-style decisions, coefficient pruning)
better artifact control (deblocking-aware RD objectives, perceptual weighting)
better search discipline (bounded-time decision-making + selective refinement)
better content specialization (normals, masks, lightmaps) while staying in one framework

Each of these can produce incremental gains, and importantly, the gains tend to stack rather than saturate quickly.

Why moving from “bytes” to “signals” looks obvious in hindsight

This is the same story as image/video codecs: once you stop compressing bytes and start compressing a signal representation, the roadmap becomes clear.

The key shift is recognizing GPU block formats as a deployed generator (hardware decoder) driven by a structured latent (endpoints, partitions, weights). With that framing, the natural place to apply transform coding is inside the latent space rather than after-the-fact on serialized bits.

Historical Parallels

What is happening with ASTC supercompression closely mirrors earlier transitions in other compression domains. In each case, the field crossed a boundary from direct data approximation into latent signal modeling, after which long-term codec evolution became possible:

Domain	Before	After
Image compression	Pixels	Frequency coefficients → quantized latent space
Video compression	Frames	Motion-compensated prediction → residual signals
Audio compression	Waveform Samples	Psychoacoustic models → transform coefficients
GPU texture compression (now)	Bit Packed GPU Block Data	ASTC latent parameters → transform-coded blocks

The pattern is the same in every case:

A fixed decoder defines a synthesis model.
The encoder becomes an analysis-by-synthesis system.
Compression intelligence migrates upstream into increasingly sophisticated encoders.

ASTC has now crossed this same conceptual boundary — later than other domains, but in essentially the same way.

What This Unlocks Going Forward

Once a domain becomes signal compression, several long-term dynamics become inevitable:

Encoders grow more complex
Complexity compounds constructively as better predictors, transforms, and perceptual models are layered on.
Decoders remain fixed
The hardware ASTC decoder becomes a stable synthesis operator, enabling continuous encoder innovation without ecosystem disruption.
Rate–distortion improves incrementally but relentlessly
Gains arrive through many small advances rather than format replacement, exactly as seen in mature image and video codecs.
Machine learning becomes additive, not disruptive
ML techniques naturally integrate as improved predictors, transforms, or perceptual models rather than replacing the entire codec architecture.
Expertise compounds instead of resetting
Knowledge builds over time instead of being discarded with each new format generation.

In other words, a codec ecosystem forms.

This represents a structural transition for GPU texture compression:
from ad-hoc engineering under rigid constraints to a modern, theory-backed signal compression discipline with a clear and extensible path forward.

Table of Contents