I use the following process to transcribe videos to text:

ffmpeg -i video1570237543.mp4 \
       -ar 16000 \
       -ac 1 \
       -c:a pcm_s16le \
       video1570237543.wav

whisper-cli -m ./models/whisper/ggml-large-v3-q5_0.bin \
            -f video1570237543.wav \
            --max-context 0 \
            --entropy-thold 2.8 \
            --output-txt --output-file video1570237543.txt

Sometimes the videos are in Hebrew, because I work at an Israeli company. In this case, I add --language he --translate to the whisper-cli invocation:

whisper-cli \
  --model ./models/whisper/ggml-large-v3-q5_0.bin \
  --language he --translate \
  --max-context 0 \
  --output-txt --output-file video1570237543.txt
  ./video1570237543.wav

Prerequisites

brew install ffmpeg whisper-cpp

Install the model:

mkdir -p ./models/whisper
curl -L -o ./models/whisper/ggml-large-v3-q5_0.bin \
  "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-q5_0.bin?download=true"