AV1-decoder Dav1d 0.7.0 ”Frigatebird” has been released.
Changes for 0.7.0 ‘Frigatebird’:
0.7.0 is a major release for dav1d:
[]Faster refmv implementation gaining up to 12% speed while –25% of RAM (Single Thread)
[]10b/12b ARM64 optimizations are mostly complete:
[]ipred (paeth, smooth, dc, pal, filter, cfl)
[]itxfm (only 10b)
[]AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize
[]AVX2 for cfl4:4:4
[]AVX-512 CDEF filter
[]ARM64 8b improvements for cfl_ac and itxfm
[]ARM64 implementation for emu_edge in 8b/10b/12b
[]ARM32 implementation for emu_edge in 8b
[*]Improvements on the dav1dplay utility player to support 10 bit, non-4:2:0 pixel formats and film grain on the GPU
Starting from VP8 in 2010, the WebM Project has delivered up to 50% video bitrate savings with VP9 in 2013 and an additional 30% with AV1 in 2018 – with adoption by YouTube, Facebook, Netflix, Twitch, and more. Equally importantly, the WebM team co-founded the Alliance for Open Media which has freely licensed the IP of over 40 major tech companies in support of open and free codecs.
[]Improve the performance by using a picture buffer pool; The improvements can reach 10% on some cases on Windows.
[]Support for Apple ARM Silicon
[]ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl
[]ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg, put/prep 8tap/bilin, wiener and CDEF filters
[]ARM64 optimizations for cfl_ac 444 for all bitdepths
[]x86 optimizations for MC 8-tap, mc_scaled in AVX2
[*]x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3
[]Further quality-speed tradeoffs tuning for VOD use cases
[]Improved TPL support within 1-pass and 2-pass CRF moode
[]Continued non-optimized support for 2pass VBR and CRF
[]Align kernel nomenclature to prefix svt_aom for kernels brough from libaom to avoid symbol conflicts
Build and Testing
[]Bug fixes
[]Improve CI
[]Added CI support for gitlab
[]Improve Unit Test Coverage
[]Address C vs asm mismatches
[]Fix static analysis warnings / errors
[]Add address sanitizer
[]Fix symbol conflicts with libaom and libvpx when staticly linked to ffmpeg
[]Feature optimizations: creating new mode decision / encode-decode feature levels allowing better speed / quality trade-off granularity
[]Preset repositioning after adopting new levels
[]Preset 8 achieving similar speed levels to that of x265 medium in the VOD (shot-based encoding) use-case while maintaining quality gains
[]New 1-pass and 2-pass VBR implementation ported from libaom and adapted to the SVT architecture – still a WIP
[]Cleaned up old VBR and CVBR RC code along with the lookahead mechanism associated with them
[]Improvements for TPL algorithm to handle long clips and easy content
[]Added HDR support and color primaries SEI signaling (off by default until integrated with ffmpeg)
[]Memory optimizations, cleaning up data structures to reduce memory usage up to 2× memory reduction in multi-threaded VBR environment
[]Additional AVX2 and AVX512 optimizations
[]Cleaned up unused command line parameters except the config params that are linked to ffmpeg
[*]Update user guide and documentation
0.9.0 is a major version of dav1d, adding notably 10b acceleration on x64.
Details:
[]x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide a large boost for high-bitdepth decoding on modern x86 computers and servers.
[]ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit)
[*]New API to signal events happening during the decoding process
0.9.1 is a middle-size revision of dav1d, adding notably 10b acceleration for SSSE3:
[]10/12b SSSE3 optimizations for mc (avg, w_avg, mask, w_mask, emu_edge), prep/put_bilin, prep/put_8tap, ipred (dc/h/v, paeth, smooth, pal, filter), wiener, sgr (10b), warp8x8, deblock, film_grain, cfl_ac/pred for 32bit and 64bit x86 processors
[]Film grain NEON for fguv 10/12b, fgy/fguv 8b and fgy/fguv 10/12 arm32
[]Fixes for filmgrain on ARM
[]itx 4x4 for SSE4
[*]Misc improvements on SSE2, SSE4
0.9.2 is a small update of dav1d on the 0.9.x branch:
[]x86: SSE4 optimizations of inverse transforms for 10bit for all sizes
[]x86: mc.resize optimizations with AVX2/SSSE3 for 10/12b
[]x86: SSSE3 optimizations for cdef_filter in 10/12b and mc_w_mask_422/444 in 8b
[]ARM NEON optimizations for FilmGrain Gen_grain functions
[]Optimizations for splat_mv in SSE2/AVX2 and NEON
[]x86: SGR improvements for SSSE3 CPUs
[*]x86: AVX2 optimizations for cfl_ac
[]New faster presets M9-M12, M12 reaching similar complexity level to that of x264 veryfast
[]New multi-pass and single pass VBR implementation minimizing the quality difference vs CRF while reducing the cycle overhead
[]Quality vs density tradeoffs improvements across all presets in CRF mode
[]Added support for CRF with capped bitrate
[]Added support for overlay frames and super resolution
[]Fixed film grain synthesis bugs
[]Added experimental support for higher than 4k resolutions
[]Added experimental support for the low delay prediction structure
[]Significant memory reduction especially for faster presets in a multi-threaded environment
[]API configuration structure cleanup removing all invalid or out of date parameters
[]Speedup legacy CPUs for faster development by adding SSE code for corresponding C-kernels
[]Updated the code license from BSD 2-clause to BSD 3-clause clear
[]Cleaned up the code for various kernels
[]Updated the user guide and feature documentation
Build and Testing
[]Bug fixes
[]Improve CI coverage
[]Improve Unit Test Coverage
[]Address C vs asm mismatches
[*]Fix static analysis warnings / errors