Skip to content

whisper

stable

Audio signal processing utilities for preparing raw PCM audio. Provides sample format conversion, resampling, energy analysis, windowing, and framing — useful for feeding audio into speech recognition pipelines.

use plugin whisper::{pcm_i16_to_f32, f32_to_pcm_i16, resample_linear, …}
9 functions Audio & Media
/ filter jk navigate Esc clear
Functions (9)
  1. pcm_i16_to_f32 Convert raw i16 PCM bytes to normalized float samples
  2. f32_to_pcm_i16 Convert normalized float samples back to i16 PCM bytes
  3. resample_linear Resample audio to a different sample rate
  4. rms_energy Compute root-mean-square energy of a sample buffer
  5. is_silence Check if a sample buffer is below an energy threshold
  6. zero_crossing_rate Compute zero-crossing rate of a sample buffer
  7. window_hann Generate a Hann window of the given size
  8. apply_window Multiply samples element-wise by a window
  9. frame_audio Split a sample buffer into overlapping frames

Convert raw i16 PCM bytes to normalized float samples

Converts raw little-endian 16-bit PCM bytes to a table of normalized float samples in the range [-1.0, 1.0]. The byte slice length must be even.

use plugin whisper::{pcm_i16_to_f32}

let raw = file.read_bytes("audio.pcm")
let samples = pcm_i16_to_f32(raw)
print("samples: {#samples}")

Convert normalized float samples back to i16 PCM bytes

Converts a table of normalized float samples back to little-endian 16-bit PCM bytes. Values are clamped to [-1.0, 1.0] before conversion.

use plugin whisper::{pcm_i16_to_f32, f32_to_pcm_i16, resample_linear}

let raw = file.read_bytes("audio.pcm")
let samples = pcm_i16_to_f32(raw)
let resampled = resample_linear(samples, 44100.0, 16000.0)
let out = f32_to_pcm_i16(resampled)
file.write_bytes("audio_16k.pcm", out)

Resample audio to a different sample rate

Resamples a float sample buffer from from_rate Hz to to_rate Hz using linear interpolation. Returns a new sample table at the target rate.

use plugin whisper::{pcm_i16_to_f32, resample_linear}

let samples = pcm_i16_to_f32(file.read_bytes("mic.pcm"))
let s16k = resample_linear(samples, 48000.0, 16000.0)
print("resampled to {#s16k} samples")

Compute root-mean-square energy of a sample buffer

Computes the root-mean-square energy of the sample buffer. Returns 0.0 for an empty buffer. Use the result to gauge loudness or detect voice activity.

use plugin whisper::{pcm_i16_to_f32, rms_energy, is_silence}

let samples = pcm_i16_to_f32(file.read_bytes("chunk.pcm"))
let energy = rms_energy(samples)
print("RMS energy: {energy}")

Check if a sample buffer is below an energy threshold

Returns true if the RMS energy of samples is below threshold. Use with a threshold around 0.010.05 for typical microphone input to skip silent frames.

use plugin whisper::{pcm_i16_to_f32, is_silence}

let samples = pcm_i16_to_f32(chunk)
if is_silence(samples, 0.02) {
  print("silent frame, skipping")
}

Compute zero-crossing rate of a sample buffer

Returns the fraction of consecutive sample pairs that cross the zero axis. Higher values indicate noise or unvoiced fricatives; lower values indicate voiced or silent audio.

use plugin whisper::{pcm_i16_to_f32, zero_crossing_rate}

let samples = pcm_i16_to_f32(chunk)
let zcr = zero_crossing_rate(samples)
print("ZCR: {zcr}")

Generate a Hann window of the given size

Generates a Hann (raised cosine) window of size floating-point coefficients. Use with apply_window before computing an FFT or feeding frames into a model.

use plugin whisper::{window_hann, apply_window}

let win = window_hann(512)

Multiply samples element-wise by a window

Multiplies each sample by the corresponding window coefficient. samples and window must have the same length. Returns the windowed sample table.

use plugin whisper::{window_hann, apply_window}

let win = window_hann(512)
let windowed = apply_window(frame, win)

Split a sample buffer into overlapping frames

Splits the sample buffer into overlapping frames of frame_size samples, advancing by hop_size samples between frames. Returns a 1-indexed table of frames, each itself a table of float samples.

use plugin whisper::{pcm_i16_to_f32, resample_linear, frame_audio, window_hann, apply_window}

let samples = resample_linear(pcm_i16_to_f32(file.read_bytes("speech.pcm")), 44100.0, 16000.0)
let frames = frame_audio(samples, 400, 160)
let win = window_hann(400)
print("frames: {#frames}")
enespt-br