Transcoder-Internals-Analytical-Real-Time-Encoders

This content was automatically converted from the project's wiki Markdown to HTML. See the Basis Universal GitHub wiki for the latest content.

Several low-level, portable (no SIMD required), strictly-bounded O(1) (bc7f/etc1f/bc6hf) and thread-safe real-time analytical/predictive GPU texture block encoders are available in the single file .cpp library transcoder module transcoder/basisu_transcoder.cpp.

These encoders are used internally by various transcoders, but are available for external use. Over time we'll ship examples and expose these encoder API's via the C API and to Python/Javascript users.

The transcoder module MUST be initialized first (by calling basist::basisu_transcoder_init()) before using these block encoding functions.

Note: We may change these currently internal API's at any time in the future, but the core functionality will remain.

bc6hf

void basist::astc_6x6_hdr::fast_encode_bc6h(const basist::half_float* pPixels, basist::bc6h_block* pBlock, const fast_bc6h_params &params);

Encodes a 4x4 block of RGB FP16 positive half float values to 1-2 subset BC6H (unsigned variant) using the bc6hf encoder. Supports all BC6H modes. Accepts arbitrary input blocks and is used as a fallback for transcoding ASTC HDR 6x6 content to BC6H.

bc7f

uint32_t basist::bc7f::fast_pack_bc7_auto_rgba(uint8_t* pBlock, const color_rgba* pPixels, uint32_t flags);

Encodes a 4x4 block of LDR/SDR RGBA pixel values to BC7 using the bc7f encoder, which is a follow up to our popular bc7e.ispc encoder The entire BC7 format is supported. See the bc7f flags in this namespace for low-level control over the encoder. This encoder supports a fully analytical mode, or a higher PSNR but slower partially analytical mode (see the flags).

bc7f is several times faster than our b7ce.ispc encoder at level 1, with a slightly lower average PSNR. bc7f is essentially like bc7e.ispc level ~.8, but shockingly faster because it's analytical/predictive. It uses closed form expressions to very rapidly estimate endpoint and weight MSE for each BC7 mode, based off elementary block statistics (block covariance, per-channel variances, channel pair Pearson correlation coefficients, etc.), that must be computed anyway or are dirt cheap to compute. It also heavily exploits divergence between blocks, something bc7e.ispc can't do due to SIMD. bc7f is a "one-shot" encoder.

Note PSNR isn't the whole story: unlike bc7e.ispc level 1, bc7f heavily exploits the entire BC7 format: all modes, all dual plane channels, all mode 4/5 index options, and all partition patterns. bc7f works fine without SIMD, but there are optional SIMD optimizations available in the code which are currently disabled.

Also, a fast but low quality (brittle) encoder that uses bc7f in mode 6 only mode, and transcodes the results to ASTC 4x4 LDR is available here: void fast_pack_astc(void* pBlock, const color_rgba* pPixels). (Currently disabled in the code because we don't use it for anything, but it's there if you want to experiment with it.) There are also optional SIMD (SSE 4.1) optimizations checked in, but they are only ~10% faster and not well tested yet.

Also see the blog post bc7f: Prediction, Not Search for more mathematical details. A copy of the post is here:

bc7f: A New Real-Time Analytical BC7 Encoder bc7f: Prediction, Not Search

The portable, non-SIMD bc7f encoder relies on an analytical, statistics-driven error model rather than iterative search. This full featured (all BC7 modes, all mode features, all dual-plane channels, all partition patterns), strictly bounded O(1) real-time encoder exploits simple closed-form expressions to predict which BC7 mode family (4/5, 0/2, 1/3/7, or 6) is worth considering. It then estimates the block’s SSE/MSE for each candidate using lightweight block statistics derived from covariance analysis together with the mode’s weight and endpoint quantization characteristics. All of this is performed prior to encoding any BC7 modes. In purely analytical mode, bc7f predicts, encodes the input to a single BC7 mode configuration (without any decoding or error measurement), and returns.

BC7 block decoding is an affine interpolation between quantized endpoints using quantized weights, which allows first-order error propagation to be modeled directly. For a given block, the encoder computes basic statistics such as the covariance of the input texels; the principal axis derived from the covariance is used both for endpoint fitting and to estimate the orthogonal least-squares (“line fit”) residual error as trace(covariance) − λ₁. Quantization noise from endpoints and weights is modeled independently using uniform quantization assumptions, with endpoint error contributing an additive term and weight/index error contributing a span-dependent term proportional to the squared endpoint distance. These closed-form estimates are sufficient to predict relative SSE across BC7 mode families, partitions, and dual-plane configurations without trial encodes. As a result, bc7f can select parameters and emit a single BC7 block in strictly bounded time, producing deterministic, high-quality results without brute-force search or refinement.

bc7f is significantly faster than bc7e.ispc Level 1, but because it exploits the entire BC7 format, it isn’t as brittle. It's a “one-shot”, non-AbS (analysis by synthesis), but full featured encoder. The follow-up, “bc7g” is in the works, and it will be released as open source as well.

Binomial first developed these techniques for our full-featured (all block size) ASTC encoder, which is vastly more complex, and later used them to implement bc7f. We expect these predictive, analytical encoding techniques to be rapidly adopted.

etc1f

void basist::etc1f::pack_etc1_solid(uint8_t* pBlock, const color_rgba& color, pack_etc1_state& state, bool init_flag = false);
void pack_etc1(uint8_t* pBlock, const color_rgba* pPixels, pack_etc1_state& state);
void pack_etc1_grayscale(uint8_t* pBlock, const uint8_t* pPixels, pack_etc1_state& state);

Packs a block of 4x4 LDR/SDR RGBA pixel values to ETC1 using etc1f. etc1f is significantly faster and more portable than etcpak, with substantially higher visual quality, especially on smooth (low variance) blocks. etc1f is also significantly faster than our previous ETC1 encoders, such as rg_etc1. etc1f utilizes the entire ETC1 format including subblocks.

BC1-5

void basist::encode_bc1(void* pDst, const uint8_t* pPixels, uint32_t flags);
void basist::encode_bc1_alt(void* pDst, const uint8_t* pPixels, uint32_t flags);
void basist::encode_bc1_solid_block(void* pDst, uint32_t fr, uint32_t fg, uint32_t fb);
void basist::encode_bc4(void* pDst, const uint8_t* pPixels, uint32_t stride);

Fairly fast BC1 and BC4 block encoders. BC3/5 are just variants of these. Note the BC1-5 encoders in rgbcx.h were developed after these encoders and are more capable.

ETC2 EAC (alpha portion)

static void basist::pack_eac(eac_block& blk, const uint8_t* pPixels, uint32_t stride)
static void basist::pack_eac_high_quality(eac_block& blk, const uint8_t* pPixels, uint32_t stride)

The alpha portion of ETC2 RGBA. You can use ETC1 for color, which is compatible with ETC2. Also useful for ETC2 EAC R11 and ETC2 EAC RG11, if you don't need more than 8-bit component precision. These older encoders are less modern (too search-based, minimally analytical vs. etc1f/bc7f), and aren't very fast.

PVRTC1 4bpp RGB/RGBA

void basist::encode_pvrtc1(block_format fmt, void* pDst_blocks, const basisu::vector2D<color32>& temp_image, uint32_t dst_num_blocks_x, uint32_t dst_num_blocks_y, bool from_alpha);

Fast, but low quality PVRTC1 4bpp encoder for block_format::cPVRTC1_4_RGB and block_format::cPVRTC1_4_RGBA formats. Quality is good enough for basic uses, especially as a fallback. Input must be square, power of 2 image dimensions.