This content was automatically converted from the project's wiki Markdown to HTML. See the Basis Universal GitHub wiki for the latest content.
ASTC's complexity has created an ecosystem where correctness is an ongoing research problem rather than a fully solved one. ASTC is a format where the spec's ambition outran the ecosystem's ability to implement it consistently.
We've been trying to track down all the available open-source software CPU ASTC decoders in the wild, to use for testing and verification purposes. There are also several known GPU compute shader ASTC decoders, but it's unknown to us how correct or reliable they are. (Given their exposure to driver and shader compiler variability, we’re wary.)
There is at least one known hardware ASTC decoding bug.
The ASTC specification is complex enough that, when fuzzing, it’s possible to generate random 128 bit blocks that one decoder accepts and another rejects, and it can be a challenge to determine which decoder is actually correct. For example, there are software decoders that don't fully enforce the specification on reserved bits in void-extent blocks, and void-extent NaN/Inf handling can differ between software decoders.
This problem is serious enough that we decided to not implement LZ match-injection style RDO ASTC encoding for ASTC, because the resulting bitwise-distorted blocks may not be reliably decoded by all software or GPU hardware decoders. (By comparison, this is not a problem with BC1-BC7.) The end result: you can't safely bit-distort ASTC blocks to create LZ matches because different decoders might reject the resulting blocks. (If you're only targeting a single vendor's GPU, perhaps you can safely do so, but if they change their decoder in the future, who knows.)
The Khronos ASTC specification is so dense that it's barely enough for implementors to use to create robust decoders (let alone encoders). For example, we discovered an issue with 8-bit alpha decoding in ASTC, because the spec was (and still is) too vague/incomplete. Also see "Inconsistent decode formats".
This is from a newer version of the Khronos ASTC spec here:
NOTE: There are a number of implementations in the wild which have
small inaccuracies in the decoded result.
Future hardware should be bit-exact, so software should rely on the
behavior documented in this specification.
That's not a spec, that's hope.
The end result: ASTC decoding is unfortunately not quite bit-exact in all cases, depending on decode profile and whether sRGB conversion is enabled. This further complicates decoder fuzz testing and deciding when a decode is "correct" or not.
See "Arm GPU Errata for Application Developers, Software Developer Errata Notice, v3.0, Date of issue: March 06, 2025", section "3922301 ASTC decompression incorrectly rounds linear color endpoints when using unorm8 decode mode".
The entire "Weight Application" section should just be pseudocode. Something like:
for each component c in {R, G, B, A}:
C0 = endpoint0[c] // 8-bit, 0-255
C1 = endpoint1[c] // 8-bit, 0-255
// 16-bit expansion (all 4 components, same expansion)
if srgb_mode:
C0 = (C0 << 8) | 0x80
C1 = (C1 << 8) | 0x80
else:
C0 = (C0 << 8) | C0 // note: this is the ARM Errata 3922301 if decode_unorm8 is active; ARM uses "| 0x80" instead
C1 = (C1 << 8) | C1
// Interpolation (weight is 0-64, from weight unquantization and weight infill)
C = floor( (C0 * (64 - weight) + C1 * weight + 32) / 64 )
// Output depends on operation mode and decode mode
if srgb_mode:
// decode mode is ignored in sRGB operation mode
if c in {R, G, B}:
output[c] = srgb_eotf_u8(C >> 8) // RGB only: Apply sRGB EOTF (sRGB to linear conversion) to UNORM8 value (top 8 bits of C). Input is UNORM8 in [0,255].
else:
output[c] = C >> 8 // Alpha: final result is the top 8 bits of C
else if decode_mode == decode_unorm8:
output[c] = C >> 8 // all components: final result is the top 8 bits of C
else if decode_mode == decode_float16:
// Per ASTC specification:
// If C == 65535, the final result is 1.0 (0x3C00).
if C == 65535:
output[c] = 1.0 // 0x3C00 in FP16
else:
// Otherwise, take the infinite-precision result of dividing C by 65536,
// and convert that value to a 16-bit floating-point number (FP16) using
// round-to-zero rounding semantics.
output[c] = round_to_zero_fp16(C / 65536.0)
ASTC HDR doesn't support encoding negative values in general (non-void extent) blocks, but there is a (overly clever) loophole: void-extent FP16 colors can be negative. However, the spec says "In the HDR case, if the decoding mode is decode_rgb9e5, then any negative color component values are set to 0 before conversion to the shared exponent format".
One ramification of this loophole: the only way you can determine if a ASTC HDR texture can be fully decoded to rgb9e5 format, or transcoded to the popular (higher precision) BC6H unsigned format, is to examine every single block of each mipmap level, texture array layer, and/or cubemap face, to check for negative values in void-extent blocks.
This loophole breaks the ability to make format decisions from metadata alone. In our opinion, this should have been disallowed in the spec.
ARM's reference decoder silently swallows invalid blocks.
For fuzzing, we had to slightly modify astcenc to return proper error codes if it encounters invalid blocks during decompression. Otherwise, the caller has to examine all the texels of every single output block to determine which ones were set to the error color (which isn't reliable because the actual texture could have these "invalid" error colors, or because ASTC compression itself is lossy the input texture could have a block which is close enough to the error colors to be accidently compressed to the error colors). For a reference decoder, this behavior (of silently failing) isn't acceptable.
To fix this issue, you can patch
astcenc_decompress_image() in
source/astcenc_entry.cpp like this (new/changed lines
marked):
// Copyright 2011-2025 Arm Limited, Licensed under the Apache License, Version 2.0
astcenc_error astcenc_decompress_image(
astcenc_context* ctxo,
const uint8_t* data,
size_t data_len,
astcenc_image* image_outp,
const astcenc_swizzle* swizzle,
unsigned int thread_index
) {
astcenc_error status;
astcenc_image& image_out = *image_outp;
astcenc_contexti* ctx = &ctxo->context;
// ..... rg: lines removed .....
image_block blk {};
blk.texel_count = static_cast<uint8_t>(block_x * block_y * block_z);
// Decode mode inferred from the output data type
blk.decode_unorm8 = image_out.data_type == ASTCENC_TYPE_U8;
// If context thread count is one then implicitly reset
if (ctx->thread_count == 1)
{
astcenc_decompress_reset(ctxo);
}
// Only the first thread actually runs the initializer
ctxo->manage_decompress.init(block_count, nullptr);
bool any_failed = false; // <-- rg: new line
// All threads run this processing loop until there is no work remaining
while (true)
{
unsigned int count;
unsigned int base = ctxo->manage_decompress.get_task_assignment(128, count);
if (!count)
{
break;
}
for (unsigned int i = base; i < base + count; i++)
{
// Decode i into x, y, z block indices
int z = i / plane_blocks;
unsigned int rem = i - (z * plane_blocks);
int y = rem / row_blocks;
int x = rem - (y * row_blocks);
unsigned int offset = (((z * yblocks + y) * xblocks) + x) * 16;
const uint8_t* bp = data + offset;
symbolic_compressed_block scb;
physical_to_symbolic(*ctx->bsd, bp, scb);
if (scb.block_type == SYM_BTYPE_ERROR) // <-- rg: new line
any_failed = true; // <-- rg: new line
decompress_symbolic_block(ctx->config.profile, *ctx->bsd,
x * block_x, y * block_y, z * block_z,
scb, blk);
store_image_block(image_out, blk, *ctx->bsd,
x * block_x, y * block_y, z * block_z, *swizzle);
}
ctxo->manage_decompress.complete_task_assignment(count);
}
return any_failed ? ASTCENC_ERR_BAD_PARAM : ASTCENC_SUCCESS; // <-- rg: changed line (can also create a new error code like ASTCENC_ERR_BAD_BLOCK)
}
| Decoder | Coverage | State | Notable Issues & Notes |
|---|---|---|---|
| ARM astcenc | LDR + HDR | Active | Needs decoder error patch; reference; should match hardware (but doesn't - see ARM Errata) |
| Binomial basisu_astc_helpers.h | LDR + HDR | Active | Fuzzed ~1.8M configs vs. ARM astcenc's decoder |
| Android tcuAstcUtil.cpp | LDR + HDR | Active (forked) | Reserved bit check wrong; NaN/Inf throws; tangled with test code; tricky to isolate/extract source, must add linear unorm8 decoding |
| Google astc-codec | LDR only | Archived | Known bugs; Binomial abandoned it |
| Mesa texcompress_astc.cpp | LDR only | Active | Agrees with ARM on unorm8/sRGB; easily isolated; MIT |
| oastc | LDR only | Stale | Mesa's base; Mesa is likely newer |
| FasTC | LDR only | Unknown | 16-bit output only; untested by Binomial |
Also includes their encoder, so it's a big library. One would hope (or pray) their reference decoder matches the actual hardware and the latest ASTC spec.
As pointed out above, invalid blocks don't cause actual error codes to be returned, so out of the box it's not ideal for robust verification purposes.
basisu_astc_helpers.h is a single header file library
for unpacking logical and physical ASTC LDR/HDR blocks to texels,
packing logical blocks to physical blocks, unpacking physical blocks to
logical blocks, BISE quantization helpers, etc. Fuzzed vs. ARM's (as of
3/3/2026) decoder for HDR FP16 decoding, sRGB, and unorm8 linear, on
387,471 LDR-only configs at 12x12 and 1,835,364 total LDR/HDR configs at
12x12. We follow this ASTC
spec, dated 2/12/2025 (KDFS 1.4.0).
For testing, we have also iterated through all ASTC configurations
our LDR/HDR encoders use, packed physical ASTC blocks using these
configurations, unpacked them using various decoders to texels, and
ensured the results are the same. Our ASTC/XUASTC LDR compressor unpacks
each candidate block using both this decoder and
tcuAstcUtil.cpp for verification purposes.
We have also created millions of ASTC blocks filled with random bits, fed them into ARM's decoder and ours in HDR mode at 12x12, and ensured both decoders agree on if the block is correct or not. We also ensured the resulting random blocks decompressed to the exact same texels. To do this, we had to disable NaN/Inf void-extent color checks in our decoder (and Android's), and patch ARM's reference decoder so it returns proper error codes on invalid blocks.
This is the best (non-ARM) LDR/HDR decoder we've found so far, but it
still needed some fixes. decodeVoidExtentBlock() doesn't
correctly check the reserve bits (at bit positions 10,11) to ensure they
are 1, but other software decoders do, which is wrong ("Bits 10 and 11
are reserved and must be 1."). It'll throw an exception on NaN/Inf
void-extent colors, which goes against the spec ("In the HDR case, if
the color component values are infinity or NaN, this will result in
undefined behavior. As usual, this must not lead to an API’s
interruption or termination.").
Our variant with minor fixes and improvements is here. We had to add support for linear unorm8 decoding, and also add our own half-float related helper code to get it to compile in an isolated form. We use this forked version for verification vs. our decoder internally. We have fuzzed our fork vs. ARM's decoder and our decoder.
This decoder is not easy to cleanly separate from the library it lives in. We had to track down many helper routines/functions to isolate it.
LDR only. Has a couple of known decoding issues (see its github
Issues), slow in debug, and archived. Early on we used this for
verification purposes, but as the issues piled up we had to move on to
Android's tcuAstcUtil.cpp.
An unknown variant of this decoder is also in the Android open source code. It's also LDR only and code remarks state it doesn't support sRGB decoding.
LDR only. As of 3/3/2026, unorm8 and sRGB decompression agrees with ARM's, our decoder and our fork of Android's tcuAstcUtil.cpp. It's MIT, so we'll be integrating this decoder into our encoder in the next major release. We found this decoder to be easy to cleanly separate from the Mesa library.
LDR only. Mesa used this as a base, so Mesa's is likely the latest version.
LDR only, only 16-bit per channel output (no srgb/unorm8). We haven't tried it yet. It looks tricky to isolate/extract from the rest of the codebase.
The opposite functionality (taking a logical ASTC block description and packing a standard ASTC physical block from it) are useful while writing your own encoders or transcoders:
Binomial: astc_helpers.h
Includes a logical to physical block packer for LDR/HDR. See function
pack_astc_block().
Intel: ispc_texcomp
This ASTC encoder includes a working ASTC logical to physical block
packer. Note this encoder is archived, and in our analysis has several
serious quality issues and only supports up to 8x8 - but it does more or
less work. See function pack_block() here.
Note neither packer supports suboptimal equal CEM encodings. We'll be releasing an upgrade to ours that does in the next major release.
How many unique, valid ASTC block structural configurations exist? We couldn't find the answer. We've uploaded two CSV files on github here if you want to see them all.
We computed this first using top-down validation: by iterating
through the space of all potentially valid configuration parameters
(weight grid sizes, weight BISE levels, single vs. dual plane flag,
number of partitions, the CEM indices for each partition, and the
resulting endpoint BISE levels that fit into the remaining bits in the
ASTC block). We created a ASTC logical block structure containing this
config, along with random weight and endpoint values. For each potential
logical block config, we tried to pack a physical block using our block
packer in astc_helpers. If the logical block was valid
ASTC, we then unpacked the resulting physical block using several
different decoders for validation/cross checking.
The configuration totals:
LDR CEM's only: 387,471 total configs
LDR+HDR: 1,835,364 total configs
These counts don't include the 2 LDR/HDR void-extent configurations, or the suboptimal equal CEM configs (see the next section). Adding the two void-extent configs, the grant total is 1,835,366 "normal" configs.
As a second, bottom-up validation: we computed >1.8 billion random (fuzzed) 128-bit blocks, and attempted to unpack them to 12x12 using several decoders in HDR mode (ARM's, ours, and tcuAstcUtil's). (LDR CEM's work and are spec compliant in HDR decoding mode, and using a block size of 12x12 means all weight grid resolutions are valid to decode.) To normalize each decoder's behavior: ARM's decoder was patched to return error codes on invalid blocks (see above), the other two decoders were modified to disable void-extent NaN/Inf checking (to be compatible with ARM's decoder), and tcuAstcUtil was fixed to check the 2 reserved void-extent bits (also mentioned above) like the other decoders already do.
In each test case the results matched, i.e. either all decoders all failed the block as being invalid, or all passed the block as being decodable, and in each case the same decoded FP16 texels were returned.
When a test case passed validation, we entered the decoded block's logical parameters into a hash table setup to only save unique configurations. After 1.8 billion fuzz candidates we found a total of 1,835,366 valid decodable, unique configurations (of the "normal" variety - excluding suboptimal CEM configs).
Note these totals purposely don't include all CCS index values, or partition seed values, just the physical block config.
Surprisingly, there are some valid ASTC configs with encodings which can be packed using more than one BISE endpoint range, which we discovered while fuzzing decoders. Normally, an ASTC physical block packer will choose the BISE endpoint range with the largest number of levels that fits into the remaining block bits after everything else (header bits, weights, partition pattern seed index, encoded CEM indices, etc.) has been packed. However, an ASTC block packer has a choice on how it encodes the CEM indices, and it's possible to purposely encode equal CEM indices in a suboptimal way. This results in less available bits remaining to pack the BISE encoded endpoint bits. (See the spec: "In multi-partition mode, the CEM field is of variable width, from 6 to 14 bits.." Even if the CEM indices are equal, an encoder can still use multi-partition mode, which requires extra bits stored below the weight bits.)
There are 9,588 total suboptimal LDR/HDR CEM configs that change the endpoint BISE level in some way. (Suboptimal CEM configs that don't nudge the endpoint BISE range index down aren't being counted, because they don't seem useful. There are 5,792 of them.) This brings the grand total LDR/HDR configs up to 1,844,954. (This includes void-extent, normal configs, and suboptimal CEM configs that change the endpoint BISE level.)
For example, this config can be packed using endpoint BISE range 20 (256 levels) or 19 (192 levels), depending on how the CEM indices are packed into the physical block:
grid_width = 3;
grid_height = 7;
dp = false;
weight_range = 1;
num_parts = 2;
cem0 = 5;
cem1 = 5;
There are other (actually useful) redundant encodings, like this one which can be encoded using endpoint BISE range 17 (128 levels) or 16 (96 levels):
grid_width = 5;
grid_height = 2;
dp = true;
weight_range = 2;
num_parts = 2;
cem0 = 5;
cem1 = 5;
This capability can be very useful to an encoder. Sometimes it's beneficial to use a slightly lower BISE endpoint range, because when it's decoded the available dequantized values align better with the block's actual content.
Typically the BISE endpoint range indices only differ by 1, but sometimes they differ by 2 in our (ongoing) fuzzing:
Found 1231823 degenerate configs, expected endpoint BISE range 17, block packer returned range 18
Found 1231824 degenerate configs, expected endpoint BISE range 7, block packer returned range 8
Found 1231825 degenerate configs, expected endpoint BISE range 16, block packer returned range 17
Trial 385997000, Total unique configs found: 1772728
Found 1231826 degenerate configs, expected endpoint BISE range 10, block packer returned range 11
Found 1231827 degenerate configs, expected endpoint BISE range 9, block packer returned range 10
Found 1231828 degenerate configs, expected endpoint BISE range 7, block packer returned range 8
Trial 385998000, Total unique configs found: 1772728
Found 1231829 degenerate configs, expected endpoint BISE range 7, block packer returned range 8
Found 1231830 degenerate configs, expected endpoint BISE range 16, block packer returned range 17
Found 1231831 degenerate configs, expected endpoint BISE range 4, block packer returned range 5
Found 1231832 degenerate configs, expected endpoint BISE range 7, block packer returned range 8
Trial 385999000, Total unique configs found: 1772729
Found 1231833 degenerate configs, expected endpoint BISE range 19, block packer returned range 20
Found 1231834 degenerate configs, expected endpoint BISE range 11, block packer returned range 12
Found 1231835 degenerate configs, expected endpoint BISE range 14, block packer returned range 15
Trial 386000000, Total unique configs found: 1772730
Found 1231836 degenerate configs, expected endpoint BISE range 9, block packer returned range 10
Found 1231837 degenerate configs, expected endpoint BISE range 6, block packer returned range 7
Found 1231838 degenerate configs, expected endpoint BISE range 13, block packer returned range 14
Trial 386001000, Total unique configs found: 1772730
Found 1231839 degenerate configs, expected endpoint BISE range 8, block packer returned range 9
Found 1231840 degenerate configs, expected endpoint BISE range 7, block packer returned range 8
Found 1231841 degenerate configs, expected endpoint BISE range 13, block packer returned range 14
Trial 386002000, Total unique configs found: 1772730
Found 1231842 degenerate configs, expected endpoint BISE range 17, block packer returned range 18
Trial 386003000, Total unique configs found: 1772730
Found 1231843 degenerate configs, expected endpoint BISE range 6, block packer returned range 7
Found 1231844 degenerate configs, expected endpoint BISE range 13, block packer returned range 14
Found 1231845 degenerate configs, expected endpoint BISE range 14, block packer returned range 15
Found 1231846 degenerate configs, expected endpoint BISE range 11, block packer returned range 12
Found 1231847 degenerate configs, expected endpoint BISE range 16, block packer returned range 17
Trial 386004000, Total unique configs found: 1772730
Found 1231848 degenerate configs, expected endpoint BISE range 6, block packer returned range 8
Found 1231849 degenerate configs, expected endpoint BISE range 7, block packer returned range 8
Found 1231850 degenerate configs, expected endpoint BISE range 16, block packer returned range 17
Found 1231851 degenerate configs, expected endpoint BISE range 18, block packer returned range 20
Trial 386005000, Total unique configs found: 1772730
Found 1231852 degenerate configs, expected endpoint BISE range 8, block packer returned range 10
Found 1231853 degenerate configs, expected endpoint BISE range 8, block packer returned range 9
Found 1231854 degenerate configs, expected endpoint BISE range 4, block packer returned range 5
Found 1231855 degenerate configs, expected endpoint BISE range 7, block packer returned range 8
This means the mapping from logical configs to physical blocks isn't unique, which complicates any exhaustive enumeration or testing. Additionally, it means decoders must be tested with both "normal" and suboptimal CEM encodings.
Interestingly, a particularly strong ASTC encoder could purposely exploit the useful suboptimal CEM encodings to access more BISE endpoint ranges than a "normal" encoder. There are a lot of these configs, and they are totally valid. We're not aware of any ASTC encoder that does this yet.
We've locally added suboptimal CEM usage detection to basisu's .astc file "-peek" command, and it turns out that ARM's astcenc uses these suboptimal CEM configs quite often, but apparently only accidentally (i.e. not an explicit optimization). It seems the vast majority of the time it uses them they don't usefully change the BISE endpoint level.
As a final verification, we've fed 1,000s of test images (both artificial and from a custom corpus) to ARM's astenc, using various decode profiles, at various quality settings. We then ensured the resulting compressed ASTC blocks could be fully unpacked with several decoders, then cross checked the resulting decoded texels vs. ARM's decoder.