I use the following process to transcribe videos to text:
ffmpeg -i video1570237543.mp4 \
-ar 16000 \
-ac 1 \
-c:a pcm_s16le \
video1570237543.wav
whisper-cli -m ./models/whisper/ggml-large-v3-q5_0.bin \
-f video1570237543.wav \
--max-context 0 \
--entropy-thold 2.8 \
--output-txt --output-file video1570237543.txt
Sometimes the videos are in Hebrew, because I work at an Israeli company. In this case, I add --language he --translate to the whisper-cli invocation:
whisper-cli \
--model ./models/whisper/ggml-large-v3-q5_0.bin \
--language he --translate \
--max-context 0 \
--output-txt --output-file video1570237543.txt
./video1570237543.wav
Prerequisites
brew install ffmpeg whisper-cpp
Install the model:
mkdir -p ./models/whisper
curl -L -o ./models/whisper/ggml-large-v3-q5_0.bin \
"https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-q5_0.bin?download=true"