Encoding-ETC1S-and-XUASTC-LDR-Texture-Video

This content was automatically converted from the project's wiki Markdown to HTML. See the Basis Universal GitHub wiki for the latest content.

Intro
XUASTC LDR Examples
ETC1S Texture Video
XUASTC LDR Texture Video

Intro

Texture Video support is compatible with any LDR/HDR Basis Universal codec format, including XUASTC LDR, but currently only ETC1S has any temporal optimizations (which are quite minimal). Texture Video files are essentially large texture arrays to the encoder/transcoder and are useful for animated textures (including animated normal maps with XUASTC LDR, which are poorly suited to conventional video codecs). Texture Video files support mipmapping or alpha channels. Only ETC1S and XUASTC LDR have usable (0.25-1 bpp) bitrates.

The current compressor and encoder were originally designed with typical 2D or cubemap texture use cases in mind, so it loads all frames into memory at once. There are practical limits to the maximum video size of a single .basis/.ktx2 file. Texture Video isn't a big development focus for us, but we do ensure it works.

ETC1S is low quality, but it does support optimizing the VQ codebooks across all frames and skip blocks. ETC1S transcodes to all other formats quite rapidly. The average bitrate is ~0.4-0.5 bpp.

XUASTC LDR is way higher quality but only supports I-frames (so no skip blocks, or any other temporal optimizations). At 12x12 with the Zstd profile it can achieve surprisingly low average bitrates (~0.25-.4 bpp). The tradeoff vs. ETC1S is slower transcoding. XUASTC LDR transcodes to ASTC the most rapidly, followed by BC7, then the other formats. XUASTC LDR is surprisingly temporally stable for a GPU texture codec, but the larger the block size, the more subtle temporal artifacts may become visible.

The video_test WebGL example can be used to view .basis Texture Video files, as long as they fit into available WASM memory. This sample doesn't support our HDR formats yet, just LDR. (The library, JavaScript wrappers, and command line tool do support HDR Texture Video, just not this sample yet.) It supports ETC1S, ASTC LDR/XUASTC LDR in all 14 block sizes, along with the other LDR codec formats.

Texture video files can be directly created using the C++ API.

XUASTC LDR Examples

Screenshots resampled to fit the wiki.

12x12 block size (ignore the "block size: 4" debug text: that's the transcode target's (BC7) block size)

4x4 block size

ETC1S Texture Video

ETC1S Texture Video support was a stretch goal of ours. Video is significantly more challenging than textures. Supporting this use case helped us create a better-looking system overall, as well as helping us gain experience with video. ETC1S Texture Video has noticeable block artifacts, but the tradeoff is fast transcode times. In Texture Video mode ETC1S supports Conditional Replenishment (CR, or "skip blocks"), which can reduce the bitrate of some videos (highly dependent on how dynamic the content is) by over 50%.

The current system only supports I-frames and basic P-frames with skip blocks; however, it does use global endpoint/selector codebooks across all frames in the Texture Video sequence. Currently, the first frame is always an I-frame, and all subsequent frames are P-frames, although this limitation is not imposed by the file format itself, just the encoder. (We plan on removing this limitation by allowing the developer to specify a periodic I-frame interval rate for seeking.)

To compress small video sequences, using tools like ffmpeg or VirtualDub, first uncompress the video frames to multiple individual .PNG files:

ffmpeg -i input.mp4 pic%04d.png`

Then, to compress the first 200 frames to a .basis file (.KTX2 works too):

basisu -basis -comp_level 2 -tex_type video -multifile_printf "pic%04u.png" -multifile_num 200 -multifile_first 1 -max_selectors 16128 -max_endpoints 16128 -endpoint_rdo_thresh 1.05 -selector_rdo_thresh 1.05`

The -resample_factor .5 option, which resamples each input frame to half size, can be useful for testing.

Texture Video stresses the encoder beyond its typical use, so some extra configuration is recommended. For Texture Video use -comp_level 2 or 3. The default is 1, which isn't quite good enough for Texture Video. Higher comp_level's result in reduced ETC1S artifacts. Level 5 is extremely slow, so unless you have a very powerful machine, levels 1-4 are recommended. For nearly maximum achievable ETC1S mode quality with the current format and encoder (completely ignoring encoding speed!), use:

-comp_level 5 -max_endpoints 16128 -max_selectors 16128 -no_selector_rdo -no_endpoint_rdo

The -no_selector_rdo -no_endpoint_rdo parameters are optional. Using these options hurts rate-distortion performance, but they increase quality. An alternative is to use -selector_rdo_thresh X and -endpoint_rdo_thresh X, with X ranging from [1,2] (higher=lower quality/better compression - see the tool's help text).

For ETC1S video encoding, the more cores and memory your machine has, the better. (Generating globally optimized codebooks across 100s or 1,000s of frames is expensive.) BasisU is intended for smaller videos of a few dozen seconds or so. On a powerful enough machine you should be able to encode up to a few thousand 720p frames using a single set of codebooks.

The .basis file will contain multiple ETC1S image frames (or slices) in a large 2D texture array, all using the same global codebooks. You can retrieve each frame using the .basis transcoder's "image" API. In Texture Video mode, ETC1S image frames must be requested from the transcoder in sequence from first to last, and random access is only allowed to I-frames (and currently only the very first frame is an I-frame).

Be sure to experiment with increasing the endpoint RDO threshold (-endpoint_rdo_thresh X). This setting controls how aggressively the compressor's backend will combine together nearby blocks so they use the same block endpoint codebook vectors, for better coding efficiency. The default setting is a modest 1.5, which means the backend is allowed to increase the overall error by 1.5x while searching for merge candidates. The higher this setting, the better the compression, with the tradeoff of more block artifacts. Settings up to ~2.25 can work well and make the codec stronger. -endpoint_rdo_thresh 1.75 is a good setting on many textures.

For more info on controlling the ETC1S encoder's quality vs. encoding speed tradeoff, see ETC1S Compression Effort Levels.

Note the default ETC1S->BC7 transcoder includes an adaptive chroma filter, which greatly reduces block artifacts on highly saturated blocks. This results in slower transcoding to specifically BC7, so it can be disabled using the cDecodeFlagsNoETC1SChromaFiltering transcoder decode flag.

XUASTC LDR Texture Video

XUASTC LDR looks substantially better for video compared to ETC1S, but it's an I-frame only codec (i.e. there are no temporal optimizations at all, not even ETC1S-style skip blocks), and transcoding is noticeably slower. XUASTC LDR 10x10-12x12 is capable of surprisingly low bitrates (~0.25-0.4 bpp) on Texture Video content.

This example command creates a .basis Texture Video file of 2864 frames (numbered from 1-2864), using XUASTC LDR 12x12 Zstd at DCT quality level 60, effort 9 (highest practical effort), debug output, and resamples each frame to 50%. The resulting .basis file can be played back using our videotest WebGL example (assuming it's not too large).

basisu -basis -tex_type video -multifile_printf "pic%04u.png" -multifile_num 2864 -multifile_first 1 -xuastc_ldr_12x12 -quality 60 -effort 9 -debug

The -resample_factor .5 option, which resamples each input frame to half size, can be useful for testing.

The Zstd profile is recommended. The arithmetic profile is likely too slow at usable resolutions for real-time decoding without using threads to decode multiple frames in parallel. XUASTC LDR transcodes most rapidly to ASTC. If it has to transcode to BC7, that's additional overhead, and be aware that by default the transcoder will deblock going to BC7 and other LDR formats (even more overhead, but this can be disabled using the cDecodeFlagsNoDeblockFiltering transcoder decode flag). For ASTC usage on large block sizes, GPU shader deblocking may be useful.

The larger the block size, the faster the transcoding and the lower the overall bitrate. The very largest ASTC block sizes (8x6 or beyond) are probably the most useful for Texture Video.

The .ktx2 file format supports Texture Video too, but we haven't made a playback sample for it yet.

Future Directions

We could easily add several lightweight temporal optimizations to XUASTC LDR. Skip blocks would be very easy to add. Reusing block configs/endpoints from the previous frame, or DPCM coding vs. endpoints on the previous frame would also be very easy to implement. We could likely cut bitrates approximately in half with minimal changes. We're unsure if anyone is really interested in this use case, so it's been a low priority.