ASTC-and-XUASTC-LDR-Usage-Guide

This content was automatically converted from the project's wiki Markdown to HTML. See the Basis Universal GitHub wiki for the latest content.

ASTC and XUASTC LDR Usage Guide

Quick Start
Intro
ASTC Features Supported
Block Sizes/Categories
Available Supercompression Modes
- XUASTC
- RDO ASTC
Rate vs. Distortion Control
WebGL KTX2 Encoding/Transcoding Testbed Specific Notes
Important Notes About ASTC Decode Profiles and Decode Mode Extensions
All ASTC/XUASTC Related Command Line Options
C++ Encoder API Parameters
Transcoder Decode Flags
Supported LDR Texture Formats (Transcode Targets), and Format Specific Flags/Options
Advanced Low-Level Compressor and Transcoder Usage
More Tips and Low-Level System Notes

Quick Start

See the project's README for information on how to either compile the basisu command line tool, or use our precompiled platform independent .wasm executables (checked into the repository in the bin directory) with a WASM runtime such as Wasmtime. Alternatively most codec options and basic .KTX2 viewing are supported in our WebGL KTX2 encoding/transcoding testbed, which works on desktop and mobile browsers.

For near-lossless quality -- add a higher -effort, up to -effort 9, for even higher quality, but slower:

basisu -xuastc_ldr_4x4 input.png

Use -astc_ldr_4x4 if you want plain (standard) ASTC texture data.

Importantly, this example assumes the ASTC LDR texture will be sampled by the GPU using the sRGB decode profile (i.e. in OpenGL you'll be using the GL_COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR internal format). If you use linear sampling (GL_COMPRESSED_RGBA_ASTC_4x4_KHR in OpenGL) instead, add -linear or -tl. Due to how ASTC decoding works on GPUs, the encoder must know how you'll be sampling it to maximize quality. See the notes below for more details.

For high-quality photographic content:

basisu -xuastc_ldr_6x6 -quality 90 -effort 3 input.png
basisu -xuastc_ldr_8x6 -quality 75 -effort 5 input.png

You can use any of the 14 ASTC block sizes.

To save more memory and bandwidth:

basisu -xuastc_arith -xuastc_ldr_8x8 -quality 60 -effort 7 input.png

See our simple pixel shader deblocking sample for how to filter block artifacts during sampling, which allows larger ASTC block sizes (i.e. lower bitrates and lower VRAM consumption) to be much more useable. The shader is fully compatible with mipmapping. Virtually any graphics programmer who understands shaders can integrate this ~95 line shader into their engine.

For tangent space RGB normal maps:

basisu -linear -xuastc_ldr_4x4 -effort 6 -quality 90 input.png

For lossless supercompression of XUASTC using arithmetic coding, i.e. no Weight Grid DCT or windowed RDO, just pure lossless ASTC latent compression using context-based arithmetic coding:

basisu -xuastc_arith -xuastc_ldr_4x4 input.png

For fastest transcoding, or full compatibility with standard ASTC, use plain ASTC with windowed RDO (which is optional):

basisu -astc_ldr_4x4 -xy -ls_thresh_psnr 5 -ls_thresh_edge_psnr 2.5 input.png

XUASTC LDR Texture Video:

In our testing, XUASTC LDR 12x12, Zstd Profile, quality 60, effort 9 can compress texture video sequences down to ~0.25 bpp. See guide here.

To unpack .KTX2 to individual .ASTC, .PNG, .DDS, .KTX, etc. files:

basisu input.ktx2

Tools like ARM's astcenc, RenderDoc, AMD Compressonator, DirectX Texture Tools etc. can be used to convert or view the resulting .astc/.dds/.ktx files. Under Windows 11 .dds files can be previewed in Windows Explorer, and the OSX Finder also supports basic previewing of .astc files. Unpacking also outputs .PNG files using the basisu library's built-in reference block decoders.

Add -stats and -debug to see output statistics and debug/development information.

Intro

Low-level details on the XUASTC LDR format are here. ASTC is very well documented all across the web.

Internally the same compressor and transcoders are used to support both ASTC LDR and XUASTC LDR, so there is overlap between the options, capabilities, and transcoding flags in each mode. In ASTC mode our compressor internally compresses to the XUASTC latent space and then outputs plain ASTC blocks. XUASTC treats the ASTC decoder as a standardized, hardware generative latent decoder.

XUASTC is a psychovisual codec: it shapes quantization error to align with human visual sensitivity, concentrating distortion into spectrally and spatially less‑noticeable components of the image or texture.

Both ASTC and XUASTC images and textures (in any block size) can be transcoded to any other supported LDR texture format (i.e. BC1-7, ETC1, PVRTC1, etc.) or plain raster images (RGBA32 etc.). (The key exception: when transcoding to ASTC LDR, the output GPU texture's block size must match the source file's ASTC/XUASTC block size in our APIs.) When adaptive deblocking is not being used, the ASTC/XUASTC transcoding pipeline only decodes to temporarily allocated memory the minimum amount of texture block rows it needs to pack to the output texture.

Adaptive deblocking is supported, and enabled by default, on the larger block sizes (beyond 8x6) when transcoding to other LDR formats or images. Adaptive deblocking can be either fully disabled, or enabled on all block sizes, using transcoder flags. For extra performance, solid color and one subset XUASTC blocks are by default transcoded directly to BC7 blocks at 4x4, 6x6, and 8x6 block sizes, which can also be disabled using transcoder flags for a slight gain in overall PSNR. Additionally, transcoder flags can enable slightly higher quality but slower non-analytical transcoding to BC7.

When adaptive deblocking is used, each mipmap level is decompressed into memory temporarily, deblocked, and then packed to the output format using real-time encoders such as bc7f, etc1f, etc. If this temporary memory consumption isn't acceptable, you can disable automatic deblocking completely using a transcoder flag. (It's possible to stream the deblocking pass into a pipeline to minimize the memory consumption, at greater complexity, but our next focus is pixel shader-based deblocking.)

Typically, basisu command line tool options directly correspond to individual C/C++ library compressor setting member variables, or JavaScript/Python API parameters.

Both transcoders have been extensively fuzzed. Importantly, the ASTC transcoder is NOT limited to the XUASTC subset, i.e. it can handle ANY valid, standard ASTC blocks, including ones emitted by other ASTC compressors.

Like the other codecs supported by Basis Universal, the ASTC/XUASTC transcoders only support transcoding to PVRTC1 textures with power of 2 texture dimensions. (This constraint could be easily relaxed, but PVRTC1 is a very old format at this point and most devices that support it today also support more flexible texture formats like ASTC LDR, ETC1, or BC1.) There are no such limitations for any of the other supported target formats.

ASTC Features Supported

At 12x12 block resolution 13,659 total ASTC block configurations are supported in both codec modes (i.e. the XUASTC latent space). Our fully analytical ASTC compressor supports all 14 ASTC block sizes between 4x4-12x12, and nearly the entire ASTC format.

The system supports all standard ASTC Weight Grid dimensions including non-square configurations, all standard endpoint/weight BISE quantization levels, eight total L/LA/RGB/RGBA LDR CEMs (Color Endpoint Modes), Base+Offset CEM variants for RGB/RGBA, RGB/RGBA Base+Scale CEMs, optional Blue Contraction endpoint encodings, single or dual planes, and all unique partition patterns for 2-3 subsets at each block size.

Notably, Weight Grid DCT is compatible with any XUASTC block configuration: all block sizes, single or dual planes, 1-3 subsets, all partition patterns, all BISE endpoint/weight quantization levels.

At lower effort levels, some ASTC LDR features won't be used to speed up encoding. For example, the Base+Offset CEM's are only used at effort levels 4 or higher. At effort levels 0-1, subsets aren't used, etc. The higher the effort level, the more ASTC LDR features will be exploited, the more partition patterns tried, the more the candidate output ASTC blocks are refined, etc.

Block Sizes/Categories

In ASTC/XUASTC mode all 14 standard ASTC block sizes are fully supported throughout the entire pipeline, i.e. any block size can be transcoded to any other supported LDR texture format (BC1-BC7, ETC1, etc.) or raster image pixel format (RGBA32, etc.):

Small block sizes: 4x4, 5x4, 5x5, 6x5
ASTC VRAM usage: 8 bpp-4.27 bpp
Medium block sizes: 6x6, 8x5, 8x6
ASTC VRAM usage: 3.56 bpp-2.67 bpp
Large block sizes: 10x5, 10x6, 8x8, 10x8, 10x10, 12x10, 12x12
ASTC VRAM usage: 2.56 bpp-0.89 bpp

The default ASTC/XUASTC transcoder behavior is to adaptively deblock the large block sizes when transcoding to other LDR texture/pixel formats like BC7 etc. (This behavior can be overridden using a transcoder flag.)

For directly sampling an ASTC texture with a deblocking filter in a simple pixel shader, see our shader_deblocking demo. This shader is compatible with mipmapping, bilinear and trilinear filtering. It's compatible with any ASTC block size, but was designed with the large block sizes in mind.

Available Supercompression Modes

The system supports two primary forms of lossy supercompression (i.e. compression applied beyond standard ASTC itself): perceptually/spectrally shaped or structured distortion (Weight Grid DCT, which is essentially a port of JPEG's key machinery into ASTC, with the union of the two methods being surprisingly ringing artifact resilient), and perceptually unstructured distortion (windowed/bounded RDO via nearby ASTC config/endpoint reuse). Windowed RDO results in relatively minor compression gains, while Weight Grid DCT (XUASTC only) can result in enormous compression gains. Weight Grid DCT errors are not random; they map to visually less-noticeable spectral energy.

The system also supports completely lossless supercompression of ASTC/XUASTC data when neither Weight Grid DCT nor windowed RDO are enabled.

XUASTC Mode

XUASTC implements latent-aware, DCT transform-based supercompression with windowed RDO of ASTC block compressed data. XUASTC has several built-in entropy coding profiles (internally individually selected for each mipmap level): full Zstd, full arithmetic, or a hybrid mixture of the two profiles.

The arithmetic profile is slower vs. Zstd, but it uses hundreds of adaptive contexts for additional compression gains: roughly 3%-18% vs. the Zstd profile at the same PSNR. On the flip side, the Zstd mode has higher per-mipmap level header overhead. Currently, for mipmap levels with less than 64 total blocks, the arithmetic profile is always selected.

XUASTC is a fully self-contained compressed format: the final entropy coding layer (selected by the profile) emits the final compressed bitstream into the generated .KTX2/.basis file. .KTX2's built-in Zstd supercompression is not used in this mode, because doing so would be nearly pointless and wasteful: the output data is already compressed.

The primary controllable rate-distortion tradeoff available in XUASTC mode is Weight Grid DCT, although windowed RDO is also available for relatively minor gains independent of DCT. Weight Grid DCT and windowed RDO are orthogonal features, i.e. each can be enabled or disabled relative to the other. Alternatively both can be disabled, leaving just entropy coding on top of XUASTC (i.e. leaving just purely lossless supercompression applied to the XUASTC block configuration and endpoint/weight data).

In XUASTC mode if the unified -quality X option is used, windowed RDO mode is automatically enabled at quality levels below 100, unless the -xyd command line parameter is specified which disables it.

The XUASTC transcoder, at specific block sizes 4x4, 6x6 and 8x6, supports faster direct to BC7 transcoding on solid color and single subset blocks: Direct transcoding here means the ASTC's block parameters (or "latent") can be directly and rapidly converted to BC7's block parameters without texel-wise recompression. In cases where direct transcoding can't be done or isn't supported, it falls back to the bc7f analytical real-time encoder. Other real-time encoders are used for ETC1 (etc1f, also fully analytical), BC1-5, PVRTC1 etc.

RDO ASTC Mode

Basis Universal also supports compressing to, and transcoding from, fully standard ASTC LDR 4x4-12x12 textures in KTX2 or .basis files. The big advantage to using plain ASTC: standard ASTC format files require no extra transcoding cost on devices supporting ASTC. The transcoder supports converting any ASTC LDR block size to any other supported LDR format (with minimum memory overhead - just a few rows at most), and the ASTC decoder supports the entire ASTC specification (i.e. not just the XUASTC modes). By default, large block size ASTC textures are deblocked automatically on the CPU while transcoding to other LDR texture formats. This can be disabled using the cDecodeFlagsNoDeblockFiltering transcoder decode flag.

The KTX2 file format standard (but not the .basis file format) supports additional, but optional, lossless Zstd supercompression applied on top of the raw ASTC block texture data. Our encoder is capable of trading off quality for increased LZ compression efficiency in a perceptually unstructured way.

The primary rate-distortion tradeoff available in this mode is optional windowed/bounded RDO (enabled via -xy), which is controlled via the PSNR window size settings: -ls_min_psnr X etc. In windowed RDO mode the compressor will reuse nearby ASTC configurations if doing so wouldn't drop the block's PSNR too much (where "too much" is controllable by the user).

In ASTC mode, by default windowed RDO is disabled unless the -xy command line parameter is used to enable it.

This codec ignores the unified DCT quality setting (-quality X).

In ASTC mode the transcoder always uses pixel-wise recompression using the bc7f analytical real-time encoder, i.e. direct BC7 transcoding is not currently supported.

Note the current primary development focus is XUASTC LDR and Weight Grid DCT followed by entropy coding profiles, not ASTC RDO+LZ which was a lower priority. (Standard ASTC LDR mode strongly piggybacks our XUASTC LDR encoder.)

Rate vs. Distortion Control

The codec's basic knobs, what they impact, and the distortion type:

Knob	Affects	Distortion Type
Block size	Bitrate (major), VRAM size, transcode speed, block artifacts, quality ceiling	Structural
DCT quality	Bitrate (heavily), spectral energy	Structured, perceptual
Windowed RDO	Bitrate (minor), bit reuse	Unstructured, blocky
Entropy profile	Encoding density, transcode speed	None
Effort	Search depth, encode speed	Quality ceiling, ringing resistance

The rate vs. distortion tradeoff can be controlled via two primary features, beyond changing the block size:

Weight Grid DCT

XUASTC only: Perceptually structured distortion using ASTC Weight Grid DCT (Discrete Cosine Transform) combined with JPEG's standard luminance quantization table applied independently to each ASTC weight plane for large reductions in bitrate. -quality X controls the amount of perceptual quantization allowed, where X ranges from 1 (lowest quality) to 99 (100=DCT disabled). Typical useable settings are roughly 12-99. The lowest useable setting is highly content dependent.

At larger block sizes, higher -effort X settings are recommended to allow the encoder to explore more of the XUASTC latent space (for higher quality).

The block size and the Weight Grid DCT quality setting are the primary quality vs. bitrate knobs available in XUASTC LDR.

Weight Grid DCT shapes distortion in the frequency domain (like JPEG), making pixel-wise error metrics (PSNR, MSE, RMSE) unreliable (also like JPEG). Perceptual metrics like SSIMULACRA 2 or PSNR-HVSM are recommended for quality evaluation.

Bounded/Windowed RDO

Note: this is sometimes called "lossy supercompression" in the library.

Both XUASTC and ASTC: Permits a small, controlled drop in block PSNR by reusing nearby ASTC configurations for relatively minor bitrate reductions. This type of RDO introduces perceptually unstructured distortion. Note only ASTC configuration and endpoints are reused from nearby blocks, not weight grids, so the current implementation is not particularly aggressive. Aggressive threshold settings risk introducing very noticeable block artifacts.

If the initial PSNR of the block is below a controllable threshold (-ls_min_psnr X etc.), no additional drop in PSNR is allowed.

The amount of PSNR reduction permitted for different block categories (edge vs. non-edge) can be controlled by the user or developer using the -ls_thresh_psnr X etc. options.

XUASTC specific note: when using the -quality parameter, windowed RDO mode is automatically enabled (along with Weight Grid DCT) unless the -xyd command line parameter is used.

Other features can also impact the rate vs. distortion tradeoff in a relatively minor and indirect way:

-xs, -xp: Disables 2/3 subset usage and dual plane usage during encoding. The primary intention of these options is to speed up direct XUASTC->BC7 transcoding (which only supports solid blocks or 1-subset ASTC configurations), but they also impact the overall rate-distortion behavior of each codec.
-effort X: Lower effort levels correspond to less of the XUASTC latent space to be utilized, which results in fewer overall bits used to encode each block. Conversely, higher effort levels allow more of the latent space to be explored, resulting in more overall bits per block.

WebGL KTX2 Encoding/Transcoding Testbed Specific Notes

In ASTC mode, Windowed RDO is enabled by default in the testbed. This is the opposite of the basisu command-line tool and C++ API defaults. You can disable it by unchecking “Bounded/windowed RDO lossy supercompression.”
In XUASTC mode, when the “Use unified quality/effort options” is checked, Windowed RDO is enabled whenever the quality factor is < 100, otherwise it's always disabled when the quality factor is 100 (i.e. the "Bounded/windowed RDO lossy supercompression" checkbox is ignored).

To manually control Windowed RDO in XUASTC mode, uncheck the unified option and toggle Windowed RDO explicitly. We plan to simplify and better align these defaults in a future update.

Important Notes about ASTC Decode Profiles and Decode Mode Extensions

Note: The ASTC spec uses the phrase "Operation Mode" instead of "Decode Profiles", which is used here. Also see our page ASTC Decoding: Software Decoders, Spec Issues, and ARM Errata.

Importantly, exactly how you sample the ASTC texture at run-time on the GPU, which decode mode extension you use on the GPU, or which ASTC decode profile you use to unpack ASTC blocks using software decoders, or how you configure our ASTC/XUASTC encoder (linear vs. sRGB) matters if you want maximum quality. This is unfortunately true for all ASTC encoders, not just ours. In our opinion this is a subtle weakness of the ASTC design and specification which makes using it in practice more brittle than it should be (but that ship has sailed).

Linear vs. sRGB ASTC Decode Profile

The ASTC/XUASTC encoder supports two ASTC decode profiles: sRGB (the default) vs. Linear. These correspond to the -ts and -tl options in the command line tool (see below). These options directly correspond to the ASTC LDR specification's sRGB vs. linear decode profiles.

For maximum decoded quality, it's important that the linear vs. sRGB setting specified at encode time always matches how the texture will be sampled by the GPU at run-time. For example in OpenGL, if you create a 4x4 LDR texture using the GL_COMPRESSED_RGBA_ASTC_4x4_KHR internal format, you should encode with -tl (linear) so the encoder's decoding matches how the GPU will be sampling it, and if you use GL_COMPRESSED_SRGB8_ALPHA8_ASTC_4x4_KHR (which converts the texture sample from sRGB to linear upon sampling) you should encode with -ts (sRGB).

More technical detail on why this matters: The encoder's internal distortion estimates and error metrics themselves are purely plain channel weighted SSE/MSE (i.e. independent of linear vs. sRGB), but the decode profile impacts how the encoder's analysis-by-synthesis (AbS) stages decompress candidate ASTC blocks to texels before computing SSE/PSNR distortion metrics.

The decode profile selected during encoding will also impact how ASTC/XUASTC blocks will be unpacked to plain 32bpp RGBA texels if they have to be transcoded to another LDR format. This setting is stored in the KTX2 file's DFD, and XUASTC stores the exact linear vs. sRGB profile setting used by the encoder into each stream header to always ensure correct software transcoding/decoding.

The transcoder module ensures that the correct decode profile is used when it must decode ASTC/XUASTC blocks to plain texels for transcoding purposes, but we can't control how the developer will sample the texture with the GPU.

`decode_unorm8` Decode Mode Extension

Our encoder optimizes for 8-bit (upper 8 bits after the 16 bit ASTC interpolation) precision during candidate evaluation. This is the correct result for sRGB sampling (our default) and for linear sampling with the decode_unorm8 extension. If we instead optimized for full 16-bit linear precision, and the developer or engine decodes using sRGB sampling or enables decode_unorm8, both of which only use the upper 8 bits, we would be introducing more error into the bits that actually matter. The 8-bit optimization is the safer default because it minimizes error for the most common decode paths, and in our simulations the quality loss on the 16-bit linear path without the extension is typically under .05 dB for the most common weight precisions (essentially noise).

This impacts the linear decode profile only (not sRGB), when not using the decode_unorm8 extension: While decoding ASTC LDR blocks, we only utilize the upper 8 bits of the decoded (final interpolated) 16-bit channel values while computing weighted SSE or PSNR. The encoder assumes you will be using the ASTC VK_EXT_astc_decode_mode Decode Mode extension - Vulkan (or GL_EXT_texture_compression_astc_decode_mode - OpenGL) to limit the decompressed ASTC LDR precision to UNORM8. This extension increases GPU texture cache utilization efficiency.

See the ASTC spec here:

If sRGB conversion is not enabled and the decode mode is ++decode_unorm8++,
then the top 8 bits of the interpolation result for the _R_, _G_, _B_
and _A_ channels are used as the final result.

This doesn't apply to the sRGB profile (our default setting), which on GPUs only converts the upper 8 bits of the decoded output from sRGB to linear.

Unfortunately, as far as we know, not all ASTC devices actually support this extension, and the ones that do don't implement it right anyway: There is a known ARM Errata related to this extension: "Arm GPU Errata for Application Developers, Software Developer Errata Notice, v3.0, Date of issue: March 06, 2025", section "3922301 ASTC decompression incorrectly rounds linear color endpoints when using unorm8 decode mode".

This is actually a very tricky problem and we're still determining what's best to do and recommend overall. Currently, all ASTC LDR decoders in our repo follow the latest ASTC spec. There seems to be no action we can take that is correct across all deployed hardware, so following the spec seems the best thing for us to do long term.

In reality, ASTC is more fragile in actual production than BC1-7. The encoder shouldn't need to be intimately aware of how the texture will actually be sampled at run-time. Too much can go wrong.

To avoid these issues: Use sRGB sampling (which is our command line tool's default, and also the KTX2 WebGL testbed's default) when you can. The sRGB decode path is simpler, uses only the upper 8 bits on all hardware, is unaffected by the decode_unorm8 extension and errata, is the correct choice for color/albedo textures in a linear lighting pipeline, and is also correct in an sRGB image/photo display pipeline (because you want filtering to occur in linear light). Convert from linear to sRGB (as needed) after sampling.

For linear sampling (normal maps etc.), it's hard to recommend using the decode_unorm8 extension because so many devices don't implement it correctly.

Codec Mode and Block Size Setting

-ldr_4x4 - 12x12 or -astc_ldr_4x4 - 12x12

Enables ASTC LDR 4x4 - 12x12 mode. The block size must be one of the 14 standard ASTC block sizes.

-ldr_4x4i - 12x12 or -xuastc_ldr_4x4 - 12x12

Enables XUASTC LDR 4x4 - 12x12 mode. The block size must be one of the 14 standard ASTC block sizes.

-quality X: The primary unified "quality" option (XUASTC only)

Enable lossy Weight Grid DCT and sets DCT quality level [1,100]. Higher=better quality, but higher bitrate. Good values to try first are 30-90. Default is no Weight Grid DCT. 100=no DCT.

In XUASTC mode, windowed RDO will be automatically enabled if the quality factor is less than 100. Windowed RDO can be disabled using -xyd.

Effort and Profile Settings

-effort X: The primary unified "effort" option, which controls the encoding time vs. max achievable quality tradeoff. Must be between [0,10]. Higher values=slower but higher overall quality. Default=3, 0=Fastest, 9=Max Practical, 10=Insane.

The higher the effort setting, the more of the ASTC/XUASTC LDR latent space that will be explored by the encoder. Higher settings are recommended on the larger block sizes for better quality. Higher effort settings result in noticeably higher quality when Weight Grid DCT is enabled. Effort 0 sacrifices a significant amount of quality.

-xuastc_arith, -xuastc_hybrid, -xuastc_zstd: Selects the entropy coding profile, which controls the transcoding speed vs. overall compression ratio tradeoff.

ZStd is fastest/lowest ratio, arith is slowest/highest ratio (3-18% better vs. ZStd).

Default is -xuastc_zstd (fastest, lowest ratio).

Currently, if a mipmap level has less than 64 blocks it always uses the arithmetic profile independent of this setting.

Linear vs. sRGB ASTC Decode Profile

Ideally, this parameter should exactly match how the developer will decode or sample the ASTC texture data on the GPU. If you use sRGB to linear sampling on the texture use -ts (the default), otherwise you probably should use -tl. These parameters only impact how candidate ASTC LDR blocks are decompressed for weighted SSE or PSNR calculations. (See above for more technical info.)

-ts: Use LDR sRGB ASTC decoding profile - the default. Inverse of -tl. Same as -srgb.
-tl: Use LDR Linear ASTC decoding profile (same as -linear). Inverse of -ts.

These options only change how the encoder evaluates candidates, which transfer function value is written to the KTX2's DFD, and which decode profile value (a single bit) is written into each mipmap's XUASTC LDR header.

Latent Space Constraints Settings

The XUASTC->BC7 transcoder supports a faster direct path to BC7 for 4x4, 6x6 and 8x6 block sizes, but only for solid color and single subset ASTC blocks. When these options are used the probability of the direct BC7 path being used increases. Disabling subsets and dual plane usage also speeds up encoding and can result in slightly better rate-distortion performance.

-xs: Force disable 2-3 subset usage in all effort levels
-xp: Force disable RGB dual plane usage in all effort levels

Encoder Channel Weights

-weights X Y Z W: Set unsigned integer channel error weights. Defaults are 1,1,1,1. Useful to favor certain channels during compression. Weights must be between [1,256]. For alpha textures, setting the alpha channel weight above 1 can be quite useful.

Windowed RDO Control

-xy: Enables windowed/bounded RDO for extra compression

Default is disabled in both ASTC/XUASTC mode, except for XUASTC when the unified -quality X option is used, in which case windowed RDO is automatically enabled unless the -xyd option is specified.

-xyd: Disables windowed/bounded RDO (the default is disabled, unless Weight Grid DCT is enabled in XUASTC LDR mode at DCT quality levels < 100)

Windowed RDO Settings

-ls_min_psnr X: If an RGB block's PSNR is below this value, no changes can be made. Default is 35 dB.
-ls_min_alpha_psnr X: If an RGBA block's PSNR is below this value, no changes can be made. Default is 38 dB.
-ls_thresh_psnr X: Allow the non-edge RGB block's PSNR to fall at most by X dB. Default is 1.5 dB.
-ls_thresh_alpha_psnr X: Allow the non-edge RGBA block's PSNR to fall at most by X dB. Default is .75 dB.
-ls_thresh_edge_psnr X: Allow the edge RGB block's PSNR to fall at most by X dB. Default is 1.0 dB.
-ls_thresh_edge_alpha_psnr X: Allow the edge RGBA block's PSNR to fall at most by X dB. Default is .5 dB.

Experimental Settings

-xuastc_blurring: Also consider several pre-Gaussian blurred latent candidates of each input block during compression. Very slow, but higher quality. Not well tested and currently experimental. (But not that experimental: Our ASTC/UASTC HDR 6x6 encoder uses this technique by default on the very hardest blocks.)

Basic Codec Testing

-test_xuastc_ldr: Performs a basic test of the encoder and transcoder. Run in the repo's bin subdirectory. Uses test .PNG files from the test_files subdirectory.

C++ Encoder Parameters

Also see our Pure C API - Compression reference, which is a simple wrapper on top of the (harder to use, but more powerful) C++ API.

The following struct basis_compressor_params members in encoder/basisu_comp.h directly control ASTC/XUASTC specific options:

Format and Block Size - Unified Options

Call set_format_mode_and_quality_effort() with basist::basis_tex_format::cASTC_LDR_4x4 etc. or basist::basis_tex_format::cXUASTC_LDR_4x4 etc. to enable ASTC LDR or XUASTC LDR mode. The quality and effort parameters control the encoder's unified quality and effort parameters. In XUASTC mode the Weight Grid DCT and windowed RDO mode will be automatically enabled if the quality level is less than 100, or disabled.

This is the recommended API to control the mode, quality and effort levels.

Format and Block Size - Lower Level Control

Call set_format_mode() to set the mode only, and then manually set the m_quality_level and m_xuastc_ldr_effort_level members. Alternatively, call set_format_mode_and_effort() to set the mode and effort level in a single call, then manually set m_quality_level.

Enable Weight Grid DCT by setting m_xuastc_ldr_use_dct to true. The quality level must be less than 100 to fully enable Weight Grid DCT.

Enable windowed RDO mode by setting m_xuastc_ldr_use_lossy_supercompression to true.

Profile Control

m_xuastc_ldr_syntax sets the profile: basist::astc_ldr_t::xuastc_ldr_syntax::cFullZstd, etc. See transcoder/basisu_transcoder_internal.h.

The default profile is Zstd, except for mipmaps with less than 64 blocks, which always use the arithmetic profile.

Latent Space Control

m_xuastc_ldr_force_disable_subsets: disables 2/3 subset usage.
m_xuastc_ldr_force_disable_rgb_dual_plane: disables dual plane usage.

RGBA Channel Weights

m_xuastc_ldr_channel_weights[4]: controls the encoder's RGBA channel weights.

Windowed RDO Control

m_xuastc_ldr_use_lossy_supercompression: Enables or disables windowed/bounded RDO. (Beware that set_format_mode_and_quality_effort() may override this setting automatically.) Note: Importantly, despite the member name, this does not control Weight Grid DCT.
m_ls_min_psnr and m_ls_min_alpha_psnr: Blocks below these PSNRs won't be modified by windowed RDO ("alpha" variants are for blocks using alpha)
m_ls_thresh_psnr, m_ls_thresh_alpha_psnr: The amount a non-edge block's PSNR is allowed to decrease
m_ls_thresh_edge_psnr, m_ls_thresh_edge_alpha_psnr: The amount an edge block's PSNR is allowed to decrease

Transcoder Decode Flags

The ASTC LDR transcoder (which fully supports the entire ASTC format, not just the XUASTC subset) and the XUASTC LDR transcoders can have their default behaviors overridden by passing in "decode flags". The following transcoder decoder flags (enum basisu_decode_flags, defined in transcoder/basisu_transcoder.h) control the ASTC/XUASTC transcoder's behavior:

cDecodeFlagsTranscodeAlphaDataToOpaqueFormats: Transcode the alpha channel to the output instead of RGB.
cDecodeFlagsHighQuality: Prefer higher quality real-time block encoders. For BC7F, this disables fully analytical mode. Also disables direct BC7 transcoding (which is individually disabled via the cDecodeFlagXUASTCLDRDisableFastBC7Transcoding flag).
cDecodeFlagsNoDeblockFiltering: Disable all adaptive deblocking (faster, less temporary memory on large block sizes when transcoding to other formats/raw pixels)
cDecodeFlagsStrongerDeblockFiltering: Use stronger deblocking filter coefficients
cDecodeFlagsForceDeblockFiltering: Always use deblock filter, even on small/medium block sizes (by default only large block sizes are deblocked).
cDecodeFlagXUASTCLDRDisableFastBC7Transcoding: Disable direct XUASTC->BC7 block transcoding (slightly higher quality, but slower)

These bit flags are passed into the decode_flags parameter of the C++ transcoding APIs: see ktx2_transcoder::transcode_image_level(), or basisu_transcoder::transcode_image_level(), or basisu::transcoder::transcode_slice().

These same decode flags are also available to JavaScript users: see transcode_uastc_image2() or transcode_uastc_image() in webgl/transcoder/basis_wrappers.cpp.

They are also available via the pure C API: See the bt_ktx2_transcode_image_level() function in encoder/basisu_wasm_transcoder_api.h, parameter decode_flags.

In XUASTC LDR 4x4 specifically, if you're targeting a non-ASTC format such as BC7: You can use very low Weight Grid DCT quality factors (1-15) if you force adaptive deblocking on all block sizes (which includes 4x4), and also enable stronger deblocking. (Deblocking permits the very lowest bitrates, or the largest block sizes, to become useable, but is currently only available when transcoding to a format other than ASTC.)

Supported LDR Texture Formats (Transcode Targets) and Format-Specific Flags/Options

transcoder_texture_format::cTFETC1_RGB
Supports transcoding RGB or alpha to ETC1 using the cDecodeFlagsTranscodeAlphaDataToOpaqueFormats flag.
transcoder_texture_format::cTFETC2_RGBA
Always packs RGBA to output.
cDecodeFlagsHighQuality flag supported for slightly higher quality.
transcoder_texture_format::cTFBC1_RGB
Always packs RGB to output (alpha ignored).
cDecodeFlagsHighQuality flag supported for slightly higher quality.
transcoder_texture_format::cTFBC3_RGBA
Always packs RGBA to output.
cDecodeFlagsHighQuality flag supported for slightly higher quality.
transcoder_texture_format::cTFBC4_R
Packs R channel by default (can override using channel0 transcoder parameter).
transcoder_texture_format::cTFBC5_RG
Packs R,A channels by default (can override using channel0/channel1 transcoder parameters).
transcoder_texture_format::cTFBC7_RGBA
Always packs RGB or RGBA to output (automatically determined per output block).
Default is fully analytical encoding. cDecodeFlagsHighQuality flag supported for slightly higher quality (partially analytical).
transcoder_texture_format::cTFPVRTC1_4_RGB, transcoder_texture_format::cTFPVRTC1_4_RGBA
Important: The texture dimensions MUST be a power of 2.
Low quality fallback: prefer any other GPU texture format if available.
Transcoder must temporarily allocate a buffer of 4*num_dst_blocks_x*num_dst_blocks_y bytes on the heap to transcode the output.
Supports transcoding RGB or alpha to PVRTC1 using the cDecodeFlagsTranscodeAlphaDataToOpaqueFormats flag.
transcoder_texture_format::cTFASTC_LDR_4x4_RGBA through transcoder_texture_format::cTFASTC_LDR_12x12_RGBA (all 14 ASTC block sizes)
Important: The transcode texture format's ASTC block size MUST match the source file's block size.
transcoder_texture_format::cTFETC2_EAC_R11
Packs R channel by default (can override using channel0 transcoder parameter).
cDecodeFlagsHighQuality flag supported for slightly higher quality (much slower transcoding).
transcoder_texture_format::cTFETC2_EAC_RG11
Packs R,A channels by default (can override using channel0/channel1 transcoder parameters).
cDecodeFlagsHighQuality flag supported for slightly higher quality (much slower transcoding).

Plain pixel formats:

transcoder_texture_format::cTFRGBA32, transcoder_texture_format::cTFRGB565, transcoder_texture_format::cTFBGR565, transcoder_texture_format::cTFRGBA4444
Just plain quantization, no dithering, no resizing.

Advanced Low-Level Compressor and Transcoder Usage

These are undocumented and "unofficial" APIs, but they are available to advanced users who want (or need) to bypass all .ktx2/.basis logic entirely and do their own thing with their own custom container formats.

The low-level ASTC/XUASTC compressor's definitions are located in encoder/basisu_astc_ldr_encode.h. struct basisu::astc_ldr::astc_ldr_encode_config handles compressor configuration, basisu::astc_ldr::compress_image() compresses to XUASTC/ASTC. Most of the parameters are the same ones passed into the high-level compressor API or command line tool.

The compressor function returns the compressed data in a dynamic byte array, along with a 2D array of logical ASTC blocks, which can be easily packed to physical (standard 128-bit) ASTC blocks using helpers in transcoder/astc_helpers.h. See function astc_helpers::pack_astc_block().

Low-level ASTC/XUASTC transcoding is entirely file format independent. See basist::basisu_lowlevel_xuastc_ldr_transcoder in transcoder/basisu_transcoder.h.

For even lower level XUASTC-only decompression to logical ASTC blocks using a simple callback API, see the xuastc_ldr_decompress_image() function.

Note we may change these functions at any time, but odds are they'll remain fairly stable even as new codecs are added to the system.

More Tips and Low-Level System Notes

This section documents low-level encoder behavior, tradeoffs, and intentional design decisions.

About the various primary system "knobs":

The block size is the overall "VRAM size vs. detail preservation/fuzziness/blockiness/bitrate range/transcoding speed" knob. It sets the bitrate ceiling.
The Weight Grid DCT -quality factor is the "spectral distortion knob", and is fairly block size independent. Lower quality factors transcode faster due to fewer non-zero AC coefficients, which also results in faster IDCTs.
The [1,100] quality level values are roughly libjpeg-ish, because quite similar math is used internally to create coefficient quantization matrices (excluding ASTC specific changes because we DCT normalized weight values quantized to various weight BISE levels, not 8-bit pixels). Levels beyond 90 are very high quality, levels below ~20 are very low quality, and levels between 50-90 are likely the sweet spot.
The Profile (Zstd, arithmetic, or hybrid) is the "bitrate/transcoding speed" knob. Output quality is independent of profile (it only impacts entropy coding efficiency).
The -effort level impacts compression speed by trading off maximum achievable quality, ringing prevention, and overall sharpness.

Block Size & Content Characteristics

The larger the block, the more effective DCT can be. 4x4 block size is least effective. Small blocks raise the minimum representable spatial frequency, forcing even smooth signals to project onto many DCT coefficients, resulting in less energy compaction and making quantization expensive.
At the individual block level, XUASTC/ASTC excels on smooth gradients (due to upsampled weight grids and dual plane support). Conversely, on blocks with highly complex, high frequency chroma variation it is weakest, because multiple subsets place higher bitrate pressure on the rest of the block's metadata.
Textures with very complex alpha channels look better with smaller block sizes, even if the RGB content isn't very demanding.
Block edges and other high frequency details can be represented by partition patterns at the texel level, and/or by full-res or upsampled weight grids. The AbS encoder will try to pick the XUASTC+DCT configuration that minimizes overall error, ringing, etc. The higher the -effort level, the more effective this process is.
The larger the block size (especially beyond ~8x6), the more the encoder has to rely on subsets and partition patterns to preserve edges/structure. Very large block sizes will have fuzzier high frequency details when they can't be represented using a partition pattern.

DCT & Weight Grids

The DCT is applied only to each plane's weight grids, either within the encoder (to estimate SSE/distortion introduced by smaller weight grids relative to the block size), or during transcoding to unpack weights. Never to raw pixels.
Weight grids are not reused across blocks or predicted against, allowing Weight Grid DCT to be implemented using floating point or integer math. The current implementation uses floating point math.
The current floating point IDCT could be greatly improved for various dimensions, even without SIMD, using more advanced techniques.

Lossless Guarantees

Per-block CEM encoded endpoints are losslessly encoded, unless optional windowed/bounded RDO is enabled.
Weight Grid DPCM is always lossless in the current encoder.
When DCT is disabled and windowed RDO is disabled, the supercompression (profile encoder) stage is always purely lossless relative to the input XUASTC LDR data.
Setting "Min Acceptable PSNR" and "Min Acceptable Alpha PSNR" in the WebGL KTX2 testbed (equivalent to -ls_min_psnr and -ls_min_alpha_psnr) to 95.0 dB each essentially disables windowed RDO (because no block with lower than 95.0 dB PSNR can be modified, which is a very high bar for ASTC LDR).

Validation & Correctness

The encoder is paranoid:
- It currently always validates the candidate XUASTC solutions before the profile's backend compressor creates the final bitstream.
- Packed (standard) physical ASTC blocks are decoded and validated using three separate physical block decoders at compression time: Our generic decoder, our XUASTC LDR-only decoder, and an open source 3rd party decoder from the Android testing framework. Mismatches (which should never happen unless there's a regression) immediately cause the compressor to fail with an error. These ASTC block decoders and our block encoders were fuzzed vs. ARM's reference decoder in astcenc.
Compressed XUASTC LDR streams have a small marker at the end of the compressed mipmap to detect decode desyncs. Markers at the beginning of each stream are used to ensure correctness before decoding the entire stream.
The encoder is fuzzed with artificially generated test inputs. All three entropy profiles of the transcoder, and the physical block ASTC decoder, have also been fuzzed.

Transcoding & Performance

The larger the block size, the faster transcoding goes: fewer symbols, less XUASTC metadata, fewer weight grids, fewer endpoints etc.
Transcoding performance can radically differ on modern mobile CPUs vs. modern x64 desktop CPUs. Our optimization priorities have been plain WASM (non-SIMD) and mobile CPUs first, followed by desktop CPUs.
All transcoders are thread safe at the .KTX2/.basis file format level, i.e. you can transcode different textures in parallel. Different XUASTC LDR mipmaps can be transcoded in parallel too (once loaded there is no mutable shared state to transcode XUASTC LDR mipmaps). We'll be documenting this more and shipping a sample in the future.

Determinism & Threading

We've invested development time ensuring the encoder is deterministic when threading is enabled (which is tricky, as to prevent constant slow heap reallocations it can reuse temporary basic encoder structures across threads). The encoder uses a lot of floating point math, which may be optimized differently by various compilers/compiler settings. If full compiler, compiler optimization, platform, and compression determinism is really important to your use case, use the WASM WASI single threaded build in bin/basisu_st.wasm with Wasmtime.

Future Improvements

The XUASTC LDR encoder is a conservative design. 10-15% compression gains are possible with more work on joint optimization, smarter bitstream compression, smarter choosing between DPCM vs. DCT, etc. without changing the format.
The block encoder won't consider L or L/A CEMs unless R=G=B. Also, alpha CEMs will always be used if any texel in the block has A<255, which isn't always optimal but simplified the encoder. Similarly, many of the search heuristics used to speed up compression can be improved.
RDO ASTC could be greatly improved by adding more aggressive searching/reuse, but it's not been the priority. An RDO ASTC codec that uses preprocessing techniques (already popular in the BC1-BC7 world), by splitting the ASTC bitstream into multiple independent sections, is also possible.

ASTC and XUASTC LDR Usage Guide

Table of Contents

Quick Start

Intro

ASTC Features Supported

Block Sizes/Categories

Available Supercompression Modes

XUASTC Mode

RDO ASTC Mode

Rate vs. Distortion Control

Weight Grid DCT

Bounded/Windowed RDO

WebGL KTX2 Encoding/Transcoding Testbed Specific Notes

Important Notes about ASTC Decode Profiles and Decode Mode Extensions

Linear vs. sRGB ASTC Decode Profile

decode_unorm8 Decode Mode Extension

All ASTC/XUASTC Related Command Line Options

Codec Mode and Block Size Setting

Effort and Profile Settings

Linear vs. sRGB ASTC Decode Profile

Latent Space Constraints Settings

Encoder Channel Weights

Windowed RDO Control

Windowed RDO Settings

Experimental Settings

Basic Codec Testing

C++ Encoder Parameters

Format and Block Size - Unified Options

Format and Block Size - Lower Level Control

Profile Control

Latent Space Control

RGBA Channel Weights

Windowed RDO Control

Transcoder Decode Flags

Supported LDR Texture Formats (Transcode Targets) and Format-Specific Flags/Options

Plain pixel formats:

Advanced Low-Level Compressor and Transcoder Usage

More Tips and Low-Level System Notes

About the various primary system "knobs":

Block Size & Content Characteristics

DCT & Weight Grids

Lossless Guarantees

Validation & Correctness

Transcoding & Performance

Determinism & Threading

Future Improvements

`decode_unorm8` Decode Mode Extension