Skip to content

v1.1.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@ggerganov ggerganov released this 15 Jan 12:00
· 1043 commits to master since this release
8738427

Overview

The major change in this pre-release is the improved decoding implementation in whisper.cpp:

  • Support for average logprob and entropy based criteria for fallback
  • Support for temperature T > 0
  • Improved Greedy decoder via best_of parameter for T > 0
  • Add beam search decoding (a.k.a beam_size)

More information about the decoding changes can be found in #291
Additionally, there are a few performance improvements for Apple Silicon, WASM and non-F16C platforms.
Support for POWER9 architectures has been added.

The reason that this is a pre-release and not an official release is that the new implementation has not been sufficiently tested yet and the existing bindings for other languages have not been updated to support the API changes. The official release 1.1.x will be created when there is enough feedback about the new decoding implementation and when the bindings have been updated. So make sure to send your feedback in the discussion created for this pre-release. For now, the 1.0.4 release should be considered more stable.

What's Changed

Core ggml / whisper

  • ggml : POWER9 support by @fitzsim in #320, #349, #369
  • ggml : simplify the SIMD code by @ggerganov in #324
  • ggml : add SSE3 and fp16 conversion lookup table by @abitofevrything in #368
  • ggml : utilise Accelerate's vDSP for some computations d51fc3e
  • ggml : speed-up softmax compute via Accelerate and loop unrolling d61d55c
  • ggml : do not start extra threads when using BLAS d347a59
  • whisper : do sample_to_timestamp calculation with 64 bit precision to avoid overflow by @boolemancer in #388
  • whisper : various code clean-up and improvements by @asmaloney in #317 #318 #319 #322 etc
  • whisper : improve decoding by @ggerganov in #291
  • whisper : account for speed_up flag for short audio #405

C-style API

  • Add loader class to allow loading from buffer and others by @prsyahmi in #353
  • Add whisper_token_data::plog
  • Add whisper_init_from_file()
  • Add whisper_init_from_buffer()
  • Change whisper_init()
  • Remove whisper_sample_best()
  • Remove whisper_sample_timestamp()
  • Add whisper_n_audio_ctx()
  • Add whisper_get_logits()
  • Remove whisper_get_probs()
  • Change struct whisper_full_params

Bindings

Examples

  • whisper.android : remove android ABI constraint by @Digipom in #301
  • whisper.swiftui : SwiftUI example by @Digipom in #308
  • main : add -ocsv, aka --output-csv for writing CSV file containing millisecond timestamps by @NielsMayer in #340
  • command : refactor to split command list & general transcription modes by @asmaloney in #331
  • command : always-prompt mode by @dnhkng in #383
  • stream : fix data race on bool + avoid division-by-zero a466c34
  • stream : fix a bug that inserted a lot of empty audio at the start a6dbd91
  • bench.wasm : print system info fafd789

New Contributors

Full Changelog: v1.0.4...v1.1.0

Highlights

image