How to Run Whisper in Node.js With Word-Level Timestamp

To utilize OpenAI's Whisper model for speech recognition with word-level timestamps in a Node.js environment, you can employ the nodejs-whisper package. This package provides Node.js bindings for the Whisper model and supports word-level timestamping.

Key Takeaways

The nodejs-whisper package enables easy word-level timestamp integration in Node.js.
Word-level timestamps improve transcription accuracy and timing precision for subtitles and analysis.
Whisper offers flexible model and output options to suit different project needs.

Step-by-Step Guide

Install Build Tools:

Ensure that your system has the necessary build tools installed. On Debian-based systems, you can install them using:
```
sudo apt update
sudo apt install build-essential
```
For Windows users, installing MinGW-w64 or MSYS2 is recommended. After installation, ensure that mingw32-make or make is available in your system's PATH.

Install the nodejs-whisper Package:

Use npm to install the package:

npm install nodejs-whisper

Download the Whisper Model:

After installing the package, download the desired Whisper model:
```
npx nodejs-whisper download
```
The available models include:
- tiny
- tiny.en
- base
- base.en
- small
- small.en
- medium
- medium.en
- large-v1
- large
- large-v3-turbo
Choose a model that balances performance and accuracy based on your requirements.

Transcribe Audio with Word-Level Timestamps:

Create a JavaScript or TypeScript file (e.g., transcribe.js) and add the following code:

const path = require('path');
const { nodewhisper } = require('nodejs-whisper');

// Provide the exact path to your audio file
const filePath = path.resolve(__dirname, 'YourAudioFileName.wav');

(async () => {
  await nodewhisper(filePath, {
    modelName: 'base.en', // Specify the downloaded model name
    autoDownloadModelName: 'base.en', // (Optional) Auto-download the model if not present
    removeWavFileAfterTranscription: false, // (Optional) Remove WAV file after transcription
    withCuda: false, // (Optional) Use CUDA for faster processing if available
    logger: console, // (Optional) Logging instance, defaults to console
    whisperOptions: {
      outputInCsv: false, // Output result in CSV file
      outputInJson: false, // Output result in JSON file
      outputInJsonFull: false, // Output result in JSON file with detailed information
      outputInLrc: false, // Output result in LRC file
      outputInSrt: true, // Output result in SRT file
      outputInText: false, // Output result in TXT file
      outputInVtt: false, // Output result in VTT file
      outputInWords: true, // Output result in WTS file for karaoke
      translateToEnglish: false, // Translate from source language to English
      wordTimestamps: true, // Enable word-level timestamps
      timestamps_length: 20, // Amount of dialogue per timestamp pair
      splitOnWord: true, // Split on word rather than on token
    },
  });
})();

Replace 'YourAudioFileName.wav' with the path to your audio file. This script will process the audio and generate an SRT file with word-level timestamps.

Additional Notes

Audio Format: The nodejs-whisper package automatically converts audio files to WAV format with a 16000 Hz frequency to support the Whisper model.
CUDA Support: If you have a compatible NVIDIA GPU and CUDA installed, you can set withCuda: true for faster processing.
Output Formats: The whisperOptions allow you to specify various output formats, including CSV, JSON, LRC, SRT, TXT, VTT, and WTS. Adjust these options based on your needs.

FAQs

Set wordTimestamps: true in the whisperOptions configuration.

They provide precise timing for each word, ideal for detailed transcription and subtitle alignment.

Yes, enabling word-level timestamps may slightly increase processing time due to higher precision.

Conclusion

By following these steps, you can effectively perform speech recognition with word-level timestamps in a Node.js environment using the Whisper model.

We are Leapcell, your top choice for hosting Node.js projects.

Leapcell is the Next-Gen Serverless Platform for Web Hosting, Async Tasks, and Redis:

Multi-Language Support

Develop with Node.js, Python, Go, or Rust.

Deploy unlimited projects for free

pay only for usage — no requests, no charges.

Unbeatable Cost Efficiency

Pay-as-you-go with no idle charges.
Example: $25 supports 6.94M requests at a 60ms average response time.

Streamlined Developer Experience

Intuitive UI for effortless setup.
Fully automated CI/CD pipelines and GitOps integration.
Real-time metrics and logging for actionable insights.

Effortless Scalability and High Performance

Auto-scaling to handle high concurrency with ease.
Zero operational overhead — just focus on building.

Explore more in the Documentation!

How to Run Whisper in Node.js With Word-Level Timestamp

Key Takeaways

Step-by-Step Guide

Additional Notes

FAQs

Conclusion

We are Leapcell, your top choice for hosting Node.js projects.

Share this article

More Posts from Leapcell

How to Read .aspx Files in Node.js

2025's Top 10 Python Web Frameworks Compared

Popular Posts

Key Takeaways

Step-by-Step Guide

Additional Notes

FAQs

How do I enable word-level timestamps in Whisper?

What is the benefit of word-level timestamps?

Does enabling word-level timestamps impact processing speed?

Conclusion

We are Leapcell, your top choice for hosting Node.js projects.

Share this article

More Posts from Leapcell

How to Read .aspx Files in Node.js

2025's Top 10 Python Web Frameworks Compared

Popular Posts