XUASTC-LDR-Range-Coding

This content was automatically converted from the project's wiki Markdown to HTML. See the Basis Universal GitHub wiki for the latest content.

XUASTC LDR - Arithmetic/Range Coding

Copyright (C) 2025-2026 Binomial LLC. All rights reserved except as granted under the Apache 2.0 LICENSE. If you modify the Basis Universal source code, specifications, or wiki documents and redistribute the files, you must cause any modified files to carry prominent notices stating that you changed the files (see Apache 2.0 §4(b)).

Introduction
Lowest Level Constants and Limits
Data Structures Models and Contexts
Decoder State (arith_dec)
Mid-Level: Basic Decoding Operations
Higher Level: Specialized Non-Adaptive Decoders
Final Notes

Introduction

Note: "Range Coding" and "Arithmetic Coding" are used interchangeably in the XUASTC LDR specification.

This document provides a specification for implementing the range coder used by XUASTC LDR, focusing exclusively on decompression (decoding) operations. The decoder is based on the range coding approach described by Amir Said in his Hewlett-Packard Laboratories April 21, 2004 technical report HPL-2004-76 "Introduction to Arithmetic Coding - Theory and Practice" (or here), and his open-source implementations on GitHub. (Also see the Lossless Compression Handbook, ed. Khalid Sayood, 2002.) However, this specification is derived directly from the independent implementation in the Basis Universal compression library, although the intention is to closely follow Said's original design with extensions and minor improvements. A compatible encoder can be easily created by following Said's technical report and open source code.

The decoder supports adaptive binary and multi-symbol models, as well as specialized encoding schemes like truncated binary, Rice, and Gamma codes. It uses 32-bit unsigned integer arithmetic to avoid overflows, with renormalization to maintain precision. The design assumes input streams of at least 5 bytes and handles adaptive model updates during decoding.

At a high level, the decoder works by maintaining a current range (defined by a base value and length) within a large integer space, progressively narrowing this range based on decoded symbols or bits. As the range shrinks below a minimum threshold, renormalization expands it by shifting in more bytes from the input stream. Adaptive models (binary or multi-symbol) are used to estimate probabilities, which guide how the range is subdivided for each decoding step. These models update their statistics after each decode to adapt to the data distribution, improving compression efficiency over time.

Specialized decoders like Gamma or Rice build on basic bit decoding but incorporate adaptive contexts or fixed structures for efficient representation of non-negative integers. The overall system is modular: the core decoder state interacts with models during adaptive decoding, and non-adaptive routines serve as building blocks for higher-level ones.

This spec is structured from the lowest level (constants and basic structures) to the highest level (specialized decoding methods).

Lowest Level: Constants and Limits

These constants define the fundamental limits and scaling factors for the entire decoder system. They ensure consistent behavior across models and decoding operations, preventing overflows and setting thresholds for renormalization. For example, shifts like DM_LEN_SHIFT and BM_LEN_SHIFT determine the precision of probability scaling in multi-symbol and binary models, respectively, while ARITH_MIN_LEN triggers renormalization in the core decoder to maintain numerical stability. These values are used throughout the classes and routines to bound arrays, scale probabilities, and control loop behaviors.

ARITH_MAX_SYMS: 2048 (maximum symbols in a multi-symbol model).
DM_LEN_SHIFT: 15 (shift for multi-symbol probabilities; DM_MAX_COUNT = 1 << 15 = 32768).
BM_LEN_SHIFT: 13 (shift for binary probabilities; BM_MAX_COUNT = 1 << 13 = 8192).
ARITH_MIN_LEN: 1 << 24 = 16777216 (threshold for renormalization).
ARITH_MAX_LEN: 4294967295 (UINT32_MAX, initial/full range length).
ARITH_MIN_EXPECTED_DATA_BUF_SIZE: 5 (minimum input buffer size; shorter streams are invalid).
MAX_PUT_BITS_LEN: 20 (maximum bits for direct multi-bit decoding, as a safety limit).
ARITH_GAMMA_MAX_PREFIX_CTX: 3 (contexts for Gamma prefix).
ARITH_GAMMA_MAX_TAIL_CTX: 4 (contexts for Gamma tail).

All arithmetic uses unsigned 32-bit integers (uint32_t) unless noted otherwise.

Data Structures: Models and Contexts

These data structures form the backbone of adaptive decoding, storing probability estimates and counts that evolve as data is decoded. The binary model handles simple 0/1 decisions, the multi-symbol model manages larger alphabets, and gamma contexts provide specialized binary models for Gamma coding. They interact with the core decoder by providing probability splits for range subdivision during decoding, and they are updated post-decode to refine estimates. Initialization and reset routines prepare them for use, often called at the start of decoding or when switching contexts in an application.

Binary Model (`arith_bit_model`)

The binary model is an adaptive probability estimator for bits (0 or 1). It maintains live counts of observed bits and a snapshot probability used for decoding. During decoding, the core decoder queries the model's probability to split the range, decodes the bit based on where the current value falls, then updates the counts. If the update countdown reaches zero, the model rescales and recomputes its probability. This adaptation allows it to learn from the data stream, improving accuracy for skewed distributions. It interacts with the decoder's adaptive bit routine and is used in contexts like Gamma coding.

Fields:

bit0_count: uint32_t, count of 0 bits (live counter).
bit_count: uint32_t, total bits (live counter).
bit0_prob: uint32_t, scaled probability of 0 (snapshot from last update).
bits_until_update: int, countdown to next update.
update_interval: uint32_t, interval between updates.

Initialization/Reset:

This routine resets the model to a uniform prior (equal chance for 0 and 1), setting initial counts and intervals. It's called when creating a new model or resetting statistics, ensuring a clean start before decoding begins.

function reset_binary_model(model):
  model.bit0_count = 1
  model.bit_count = 2
  model.bit0_prob = 1 << (BM_LEN_SHIFT - 1)  // 4096
  model.update_interval = 4
  model.bits_until_update = 4

Update Operation:

This routine is invoked after a certain number of bits have been decoded (when the countdown hits zero). It checks for count overflow and halves them if needed to prevent saturation, then recomputes the probability snapshot using scaling. The interval is adjusted to grow over time, delaying updates as the model stabilizes. This interacts with the adaptive bit decoder, which decrements the countdown after each decode.

function update_binary_model(model):
  if model.bit_count >= BM_MAX_COUNT:
    model.bit_count = (model.bit_count + 1) >> 1
    model.bit0_count = (model.bit0_count + 1) >> 1
    if model.bit0_count == model.bit_count:
      model.bit_count += 1
  scale = 0x80000000U / model.bit_count
  model.bit0_prob = (model.bit0_count * scale) >> (31 - BM_LEN_SHIFT)
  model.update_interval = clamp((5 * model.update_interval) >> 2, 4, 128)
  model.bits_until_update = model.update_interval

Multi-Symbol Model (`arith_data_model`)

The multi-symbol model estimates probabilities for an alphabet of up to 2048 symbols. It keeps live frequency counts and a snapshot of cumulative probabilities for efficient range splitting. During symbol decoding, the core decoder uses the cumulatives to find the symbol via binary search, then updates the frequency and total. Updates rescale frequencies on overflow and rebuild cumulatives. This model interacts with the adaptive symbol decoder, adapting to symbol distributions in the stream for better compression.

Fields:

num_data_syms: uint32_t, number of symbols (0 if uninitialized).
total_sym_freq: uint32_t, sum of frequencies (live).
sym_freqs: array of uint32_t, frequencies per symbol (size num_data_syms, live histogram).
cum_sym_freqs: array of uint32_t, cumulative frequencies (size num_data_syms + 1, snapshot from last update).
update_interval: uint32_t, interval between updates.
num_syms_until_next_update: int, countdown to next update.

Initialization:

This sets up the model with a given number of symbols, allocates arrays, and calls reset. It's the entry point for creating a model instance, preparing it for use in decoding.

function init_multi_model(model, num_syms, faster_update):
  model.num_data_syms = num_syms
  allocate model.sym_freqs[num_syms]
  allocate model.cum_sym_freqs[num_syms + 1]
  reset_multi_model(model, faster_update)

Clear:

This fully clears the model's state, releasing arrays and resetting to an uninitialized form. It's complementary to init for object reuse.

function clear_multi_model(model):
  clear model.cum_sym_freqs  // e.g., resize to 0 or free
  clear model.sym_freqs      // e.g., resize to 0 or free
  model.num_data_syms = 0
  model.total_sym_freq = 0
  model.update_interval = 0
  model.num_syms_until_next_update = 0

Reset:

This reinitializes frequencies to uniform (1 each), resets totals and intervals, and triggers an initial update to build cumulatives. The faster_update flag accelerates adaptation for small alphabets. It's used to restart the model mid-stream or at the beginning.

function reset_multi_model(model, faster_update):
  if model.num_data_syms == 0:
    return
  for i = 0 to model.num_data_syms - 1:
    model.sym_freqs[i] = 1
  model.total_sym_freq = model.num_data_syms
  model.update_interval = model.num_data_syms
  model.num_syms_until_next_update = 0
  update_multi_model(model)
  if faster_update:
    model.update_interval = clamp((model.num_data_syms + 7) / 8, 4, (model.num_data_syms + 6) << 3)
  model.num_syms_until_next_update = model.update_interval

Update Operation:

Invoked after decoding enough symbols (countdown zero), this halves frequencies on overflow, recomputes scaled cumulatives for probability mapping, and adjusts the update interval. It ensures the model doesn't saturate and adapts its update frequency.

function update_multi_model(model):
  while model.total_sym_freq >= DM_MAX_COUNT:
    model.total_sym_freq = 0
    for n = 0 to model.num_data_syms - 1:
      model.sym_freqs[n] = (model.sym_freqs[n] + 1) >> 1
      model.total_sym_freq += model.sym_freqs[n]
  scale = 0x80000000U / model.total_sym_freq
  sum = 0
  for i = 0 to model.num_data_syms - 1:
    model.cum_sym_freqs[i] = (scale * sum) >> (31 - DM_LEN_SHIFT)
    sum += model.sym_freqs[i]
  model.cum_sym_freqs[model.num_data_syms] = DM_MAX_COUNT
  model.update_interval = clamp((5 * model.update_interval) >> 2, 4, (model.num_data_syms + 6) << 3)
  model.num_syms_until_next_update = model.update_interval

Gamma Contexts (`arith_gamma_contexts`)

This structure groups multiple binary models for contextual Gamma decoding: prefix for unary length, tail for binary bits. Each context is a separate binary model, allowing adaptation per position. It interacts with the Gamma decoder, which selects models based on bit position for better prediction in variable-length codes.

Fields:

ctx_prefix: array of arith_bit_model (size ARITH_GAMMA_MAX_PREFIX_CTX = 3), for unary prefix.
ctx_tail: array of arith_bit_model (size ARITH_GAMMA_MAX_TAIL_CTX = 4), for binary suffix.

Usage: Reset each arith_bit_model individually using the binary reset method.

function init_gamma_contexts(ctxs):
  for i = 0 to ARITH_GAMMA_MAX_PREFIX_CTX - 1:
    reset_binary_model(ctxs.ctx_prefix[i])
  for i = 0 to ARITH_GAMMA_MAX_TAIL_CTX - 1:
    reset_binary_model(ctxs.ctx_tail[i])

Decoder State (`arith_dec`)

The decoder state encapsulates the input buffer and current range (value and length). It serves as the central hub, providing the context for all decoding operations. Basic bit decoders modify the range directly, while adaptive ones use models to guide splits. Renormalization is called from within decoding routines when length drops too low, loading more input bytes.

Fields:

p_data_buf: const uint8_t*, start of input buffer.
p_data_buf_last_byte: const uint8_t*, last byte of buffer.
p_data_buf_cur: const uint8_t*, current read position.
data_buf_size: size_t, buffer size.
value: uint32_t, current code value.
length: uint32_t, current range length.

Clear/Reset:

This clears all state, preparing for a new initialization. It's useful for reusing the decoder object.

function clear_decoder(dec):
  dec.p_data_buf = null
  dec.p_data_buf_last_byte = null
  dec.p_data_buf_cur = null
  dec.data_buf_size = 0
  dec.value = 0
  dec.length = 0

Initialization:

This sets up the decoder with an input buffer, loading the initial 32-bit value and setting the full range. It skips the first 4 bytes for the initial value load, assuming the stream starts with encoded data. This is the starting point for any decoding session.

function init_decoder(dec, pBuf, buf_size):
  if buf_size < 5:
    return false
  dec.p_data_buf = pBuf
  dec.p_data_buf_last_byte = pBuf + buf_size - 1
  dec.p_data_buf_cur = dec.p_data_buf + 4
  dec.data_buf_size = buf_size
  dec.value = (pBuf[0] << 24) | (pBuf[1] << 16) | (pBuf[2] << 8) | pBuf[3]
  dec.length = ARITH_MAX_LEN
  return true

Renormalization (`renorm`)

Renormalization is a core utility routine that expands the decoding range when it becomes too narrow for precise subdivision. It's called inline from all decoding operations whenever length falls below ARITH_MIN_LEN, shifting in 8 bits (a byte) from the input stream each time. If the end of the buffer is reached, it pads with 0s. This maintains the decoder's precision and allows continued operation on long streams without overflow.

function renorm(dec):
  while dec.length < ARITH_MIN_LEN:
    next_byte = if dec.p_data_buf_cur > dec.p_data_buf_last_byte then 0 else *dec.p_data_buf_cur++
    dec.value = (dec.value << 8) | next_byte
    dec.length <<= 8

Mid-Level: Basic Decoding Operations

These routines provide fundamental non-adaptive bit extraction, serving as primitives for higher-level decoders. They directly manipulate the decoder state's range without models, narrowing it based on the decoded bits and triggering renormalization as needed. They are building blocks for truncated binary, Rice, and adaptive methods.

Decode Single Bit (`get_bit`)

This decodes a single bit by halving the range and checking where the value falls. If it's in the upper half, it's a 1 and the range shifts down; otherwise, 0. Renormalization follows if needed. It's the simplest operation, used extensively in unary prefixes or binary tails.

function get_bit(dec):
  dec.length >>= 1
  bit = if dec.value >= dec.length then 1 else 0
  if bit == 1:
    dec.value -= dec.length
  if dec.length < ARITH_MIN_LEN:
    renorm(dec)
  return bit

Decode Multiple Bits (`get_bits`, `num_bits` from 1 to `MAX_PUT_BITS_LEN`=20)

This extracts a fixed-length integer by dividing the range into 2^num_bits parts and finding the sub-range containing the value. It updates the state accordingly and renormalizes. It's efficient for small fixed-bit fields, like remainders in Rice coding.

function get_bits(dec, num_bits):
  if num_bits < 1 or num_bits > 20:
    return 0  // or error
  dec.length >>= num_bits
  v = dec.value / dec.length
  dec.value -= dec.length * v
  if dec.length < ARITH_MIN_LEN:
    renorm(dec)
  return v

Higher Level: Specialized Non-Adaptive Decoders

These build on basic bit operations to decode variable-length codes for non-negative integers, without adaptation. They are self-contained but can be used in contexts where models aren't needed, and they interact with the decoder state via get_bit and get_bits.

Decode Truncated Binary (`decode_truncated_binary`, n >=2)

This efficiently decodes a uniform value from 0 to n-1 using a variable number of bits (floor(log2(n)) or one more). It computes a threshold and potentially decodes an extra bit, minimizing bits for non-power-of-2 ranges. It's a low-level optimizer for bounded integers.

function decode_truncated_binary(dec, n):
  k = floor_log2(n)
  u = (1 << (k + 1)) - n
  result = get_bits(dec, k)
  if result >= u:
    result = (result << 1) | get_bits(dec, 1)
    result -= u
  return result

Decode Rice (`decode_rice`, m >0)

Rice coding decodes large values efficiently: a unary quotient (sequence of 1s ended by 0) followed by a fixed m-bit remainder. It uses get_bit for the unary part and get_bits for the binary, with a sanity limit on quotient size. Useful for geometric distributions.

function decode_rice(dec, m):
  q = 0
  while true:
    k = get_bit(dec)
    if k == 0:
      break
    q += 1
    if q > 64:
      return 0  // error
  return (q << m) + get_bits(dec, m)

Highest Level: Adaptive and Gamma Decoders

These top-level routines incorporate adaptation or contexts, updating models after each decode. They represent the full power of the system for compressed streams.

Decode Adaptive Bit (`decode_bit`, using `arith_bit_model` data model)

This decodes a bit using the model's current probability to split the range asymmetrically. After decoding, it updates counts and potentially the model. It combines range coding with adaptation for binary streams.

function decode_bit(dec, dm):
  x = dm.bit0_prob * (dec.length >> BM_LEN_SHIFT)
  bit = if dec.value >= x then 1 else 0
  if bit == 0:
    dec.length = x
    dm.bit0_count += 1
  else:
    dec.value -= x
    dec.length -= x
  dm.bit_count += 1
  if dec.length < ARITH_MIN_LEN:
    renorm(dec)
  dm.bits_until_update -= 1
  if dm.bits_until_update <= 0:
    update_binary_model(dm)
  return bit

Decode Gamma (`decode_gamma`, using `arith_gamma_contexts ctxs`)

Gamma decoding handles variable-length codes for positives: adaptive unary prefix (using prefix contexts) determines bit length k, then adaptive binary suffix (using tail contexts) fills the value. It uses decode_bit for each, adapting per position for better efficiency on power-law data.

function decode_gamma(dec, ctxs):
  k = 0
  while decode_bit(dec, ctxs.ctx_prefix[min(k, ARITH_GAMMA_MAX_PREFIX_CTX-1)] ) == 1:
    k += 1
    if k > 16:
      return 0  // error
  n = 1 << k
  for i = k-1 downto 0:
    bit = decode_bit(dec, ctxs.ctx_tail[min(i, ARITH_GAMMA_MAX_TAIL_CTX-1)] )
    n |= (bit << i)
  return n

Decode Adaptive Symbol (`decode_sym`, using `arith_data_model` data model)

This decodes a symbol by binary-searching the model's cumulatives to find the matching sub-range, then narrowing the decoder's range. Post-decode, it increments frequency and potentially updates the model. Ideal for arbitrary alphabets with changing probabilities.

function decode_sym(dec, dm)
  x = 0
  y = dec.length
  dec.length >>= DM_LEN_SHIFT
  low_idx = 0
  hi_idx = dm.num_data_syms
  mid_idx = hi_idx >> 1
  do:
    z = dec.length * dm.cum_sym_freqs[mid_idx]
    if z > dec.value:
      hi_idx = mid_idx
      y = z
    else:
      low_idx = mid_idx
      x = z
    mid_idx = (low_idx + hi_idx) >> 1
  while mid_idx != low_idx
  dec.value -= x
  dec.length = y - x
  if dec.length < ARITH_MIN_LEN:
    renorm(dec)
  dm.sym_freqs[low_idx] += 1
  dm.total_sym_freq += 1
  dm.num_syms_until_next_update -= 1
  if dm.num_syms_until_next_update <= 0:
    update_multi_model(dm)
  return low_idx

Final Notes

No flush operation in decoder; decode until application end.
Models must be initialized/reset per use case.
For floor_log2(n): for n>0, return highest bit position (e.g., 31 - clz(n) in C).
All asserts are sanity checks; handle errors as needed in implementation.
This spec includes every detail from the code for full reimplementation in transcoder/basisu_transcoder_internal.h.

XUASTC LDR - Arithmetic/Range Coding

Table of Contents

Introduction

Lowest Level: Constants and Limits

Data Structures: Models and Contexts

Binary Model (arith_bit_model)

Fields:

Initialization/Reset:

Update Operation:

Multi-Symbol Model (arith_data_model)

Fields:

Initialization:

Clear:

Reset:

Update Operation:

Gamma Contexts (arith_gamma_contexts)

Fields:

Decoder State (arith_dec)

Fields:

Clear/Reset:

Initialization:

Renormalization (renorm)

Mid-Level: Basic Decoding Operations

Decode Single Bit (get_bit)

Decode Multiple Bits (get_bits, num_bits from 1 to MAX_PUT_BITS_LEN=20)

Higher Level: Specialized Non-Adaptive Decoders

Decode Truncated Binary (decode_truncated_binary, n >=2)

Decode Rice (decode_rice, m >0)

Highest Level: Adaptive and Gamma Decoders

Decode Adaptive Bit (decode_bit, using arith_bit_model data model)

Decode Gamma (decode_gamma, using arith_gamma_contexts ctxs)

Decode Adaptive Symbol (decode_sym, using arith_data_model data model)

Final Notes

Binary Model (`arith_bit_model`)

Multi-Symbol Model (`arith_data_model`)

Gamma Contexts (`arith_gamma_contexts`)

Decoder State (`arith_dec`)

Renormalization (`renorm`)

Decode Single Bit (`get_bit`)

Decode Multiple Bits (`get_bits`, `num_bits` from 1 to `MAX_PUT_BITS_LEN`=20)

Decode Truncated Binary (`decode_truncated_binary`, n >=2)

Decode Rice (`decode_rice`, m >0)

Decode Adaptive Bit (`decode_bit`, using `arith_bit_model` data model)

Decode Gamma (`decode_gamma`, using `arith_gamma_contexts ctxs`)

Decode Adaptive Symbol (`decode_sym`, using `arith_data_model` data model)