How to Run Whisper in Node.js With Word-Level Timestamp
Daniel Hayes
Full-Stack Engineer · Leapcell
To utilize OpenAI's Whisper model for speech recognition with word-level timestamps in a Node.js environment, you can employ the nodejs-whisper
package. This package provides Node.js bindings for the Whisper model and supports word-level timestamping.
Key Takeaways
- The
nodejs-whisper
package enables easy word-level timestamp integration in Node.js. - Word-level timestamps improve transcription accuracy and timing precision for subtitles and analysis.
- Whisper offers flexible model and output options to suit different project needs.
Step-by-Step Guide
-
Install Build Tools:
Ensure that your system has the necessary build tools installed. On Debian-based systems, you can install them using:
sudo apt update sudo apt install build-essential
For Windows users, installing MinGW-w64 or MSYS2 is recommended. After installation, ensure that
mingw32-make
ormake
is available in your system's PATH. -
Install the
nodejs-whisper
Package:Use npm to install the package:
npm install nodejs-whisper
-
Download the Whisper Model:
After installing the package, download the desired Whisper model:
npx nodejs-whisper download
The available models include:
tiny
tiny.en
base
base.en
small
small.en
medium
medium.en
large-v1
large
large-v3-turbo
Choose a model that balances performance and accuracy based on your requirements.
-
Transcribe Audio with Word-Level Timestamps:
Create a JavaScript or TypeScript file (e.g.,
transcribe.js
) and add the following code:const path = require('path'); const { nodewhisper } = require('nodejs-whisper'); // Provide the exact path to your audio file const filePath = path.resolve(__dirname, 'YourAudioFileName.wav'); (async () => { await nodewhisper(filePath, { modelName: 'base.en', // Specify the downloaded model name autoDownloadModelName: 'base.en', // (Optional) Auto-download the model if not present removeWavFileAfterTranscription: false, // (Optional) Remove WAV file after transcription withCuda: false, // (Optional) Use CUDA for faster processing if available logger: console, // (Optional) Logging instance, defaults to console whisperOptions: { outputInCsv: false, // Output result in CSV file outputInJson: false, // Output result in JSON file outputInJsonFull: false, // Output result in JSON file with detailed information outputInLrc: false, // Output result in LRC file outputInSrt: true, // Output result in SRT file outputInText: false, // Output result in TXT file outputInVtt: false, // Output result in VTT file outputInWords: true, // Output result in WTS file for karaoke translateToEnglish: false, // Translate from source language to English wordTimestamps: true, // Enable word-level timestamps timestamps_length: 20, // Amount of dialogue per timestamp pair splitOnWord: true, // Split on word rather than on token }, }); })();
Replace
'YourAudioFileName.wav'
with the path to your audio file. This script will process the audio and generate an SRT file with word-level timestamps.
Additional Notes
-
Audio Format: The
nodejs-whisper
package automatically converts audio files to WAV format with a 16000 Hz frequency to support the Whisper model. -
CUDA Support: If you have a compatible NVIDIA GPU and CUDA installed, you can set
withCuda: true
for faster processing. -
Output Formats: The
whisperOptions
allow you to specify various output formats, including CSV, JSON, LRC, SRT, TXT, VTT, and WTS. Adjust these options based on your needs.
FAQs
Set wordTimestamps: true
in the whisperOptions
configuration.
They provide precise timing for each word, ideal for detailed transcription and subtitle alignment.
Yes, enabling word-level timestamps may slightly increase processing time due to higher precision.
Conclusion
By following these steps, you can effectively perform speech recognition with word-level timestamps in a Node.js environment using the Whisper model.
We are Leapcell, your top choice for hosting Node.js projects.
Leapcell is the Next-Gen Serverless Platform for Web Hosting, Async Tasks, and Redis:
Multi-Language Support
- Develop with Node.js, Python, Go, or Rust.
Deploy unlimited projects for free
- pay only for usage — no requests, no charges.
Unbeatable Cost Efficiency
- Pay-as-you-go with no idle charges.
- Example: $25 supports 6.94M requests at a 60ms average response time.
Streamlined Developer Experience
- Intuitive UI for effortless setup.
- Fully automated CI/CD pipelines and GitOps integration.
- Real-time metrics and logging for actionable insights.
Effortless Scalability and High Performance
- Auto-scaling to handle high concurrency with ease.
- Zero operational overhead — just focus on building.
Explore more in the Documentation!
Follow us on X: @LeapcellHQ