Skip to content

Latest commit

 

History

History
96 lines (69 loc) · 4.93 KB

NEWS.md

File metadata and controls

96 lines (69 loc) · 4.93 KB

CHANGES IN audio.whisper VERSION 0.4.1

  • Added function predict.whisper_transcription which allows to assign a transcription segment to either a left/right channel based on a Voice Activity Detection

CHANGES IN audio.whisper VERSION 0.4

  • Allow to pass on multiple offset/durations
  • Allow to give sections in the audio (e.g. detected with a voice activity detector) to filter out these (voiced) data, make the transcription and make sure to add the amount of time which was cut out to the from/to timestamps such that the resulting timepoints in from/to are aligned to the original audio file
  • The data element of the predict.whisper now includes a column called segment_offset indicating the offset of the provided sections or offsets

CHANGES IN audio.whisper VERSION 0.3.3

  • Fixes of typos in documentation of functions
  • Add stereo.wav file
  • Allow to do diarization for audio with 2 channels by comparing the energy of the signal in each channel for each segment

CHANGES IN audio.whisper VERSION 0.3.2

  • Documentation of arguments in predict.whisper
  • Add option to download quantised models
    • tiny-q5_1, tiny.en-q5_1
    • base-q5_1, base.en-q5_1
    • small-q5_1, small.en-q5_1
    • medium-q5_0, medium.en-q5_0
    • large-v2-q5_0 and large-v3-q5_0
  • Allow to disable printing the transcription evolution during the prediction with the trace argument
  • Enable O3 optimisations by default
  • Allow speedup of transcriptions by compiling with cuBLAS against CUDA on Linux
    • specify Sys.setenv(WHISPER_CUBLAS = "1") before installing the package if you have a GPU with CUDA

CHANGES IN audio.whisper VERSION 0.3.1

  • Makevars
    • Added detection of AVX512F for adding compilation flags to PKG_CFLAGS/PKG_CPPFLAGS
    • Enable Metal for speeding up transcriptions on the GPU on Mac
    • Enable compiling with OpenBLAS to speed up the transcriptions
  • Add whisper_languages to get a data.frame of all languages the Whisper model can handle
  • whisper_download_model
    • change default timeout to 10 minutes if no timeout is set by the user + change output element in the list to 'download_success' instead of 'download_failed'
    • model_dir now defaults to the directory set in the environment variable WHISPER_MODEL_DIR and if this is not set, the current working directory
  • whisper
    • Add option use_gpu to be able to run the prediction on a GPU (e.g. Metal)
  • predict.whisper
    • Add option to pass on initial prompt
    • Output of predict.whisper adds the audio duration of the wav file in seconds in the params list element
    • Gains an extra argument indicating to transcribe or translate

CHANGES IN audio.whisper VERSION 0.3

  • Upgrade to whisper.cpp version v1.5.4
  • whisper_download_model allows to download 'large-v1', 'large-v2', 'large-v3' while model 'large' should no longer be used

CHANGES IN audio.whisper VERSION 0.2.2

CHANGES IN audio.whisper VERSION 0.2.1-1

  • whisper_download_model now Deprecates downloading from https://ggml.ggerganov.com and changed the URL's to download models from huggingface (Issue #18)

CHANGES IN audio.whisper VERSION 0.2.1

  • Add option to compile with own PKG_CFLAGS by setting environment variable WHISPER_CFLAGS
  • Add option to compile with extra PKG_CPPFLAGS by setting environment variable WHISPER_CPPFLAGS

CHANGES IN audio.whisper VERSION 0.2.0

  • Incorporate whisper.cpp version v1.2.1

CHANGES IN audio.whisper VERSION 0.1.3

  • Ongoing work on improving compilation instructions to speed up transcribing while still being CRAN compliant
  • Add whisper_benchmark

CHANGES IN audio.whisper VERSION 0.1.2

CHANGES IN audio.whisper VERSION 0.1.1

CHANGES IN audio.whisper VERSION 0.1.0