whisper
stableAudio signal processing utilities for preparing raw PCM audio. Provides sample format conversion, resampling, energy analysis, windowing, and framing — useful for feeding audio into speech recognition pipelines.
use plugin whisper::{pcm_i16_to_f32, f32_to_pcm_i16, resample_linear, …} Functions (9)
- pcm_i16_to_f32 Convert raw i16 PCM bytes to normalized float samples
- f32_to_pcm_i16 Convert normalized float samples back to i16 PCM bytes
- resample_linear Resample audio to a different sample rate
- rms_energy Compute root-mean-square energy of a sample buffer
- is_silence Check if a sample buffer is below an energy threshold
- zero_crossing_rate Compute zero-crossing rate of a sample buffer
- window_hann Generate a Hann window of the given size
- apply_window Multiply samples element-wise by a window
- frame_audio Split a sample buffer into overlapping frames
Convert raw i16 PCM bytes to normalized float samples
Converts raw little-endian 16-bit PCM bytes to a table of normalized float samples in the range [-1.0, 1.0]. The byte slice length must be even.
use plugin whisper::{pcm_i16_to_f32}
let raw = file.read_bytes("audio.pcm")
let samples = pcm_i16_to_f32(raw)
print("samples: {#samples}")
Convert normalized float samples back to i16 PCM bytes
Converts a table of normalized float samples back to little-endian 16-bit PCM bytes. Values are clamped to [-1.0, 1.0] before conversion.
use plugin whisper::{pcm_i16_to_f32, f32_to_pcm_i16, resample_linear}
let raw = file.read_bytes("audio.pcm")
let samples = pcm_i16_to_f32(raw)
let resampled = resample_linear(samples, 44100.0, 16000.0)
let out = f32_to_pcm_i16(resampled)
file.write_bytes("audio_16k.pcm", out)
Resample audio to a different sample rate
Resamples a float sample buffer from from_rate Hz to to_rate Hz using linear interpolation. Returns a new sample table at the target rate.
use plugin whisper::{pcm_i16_to_f32, resample_linear}
let samples = pcm_i16_to_f32(file.read_bytes("mic.pcm"))
let s16k = resample_linear(samples, 48000.0, 16000.0)
print("resampled to {#s16k} samples")
Compute root-mean-square energy of a sample buffer
Computes the root-mean-square energy of the sample buffer. Returns 0.0 for an empty buffer. Use the result to gauge loudness or detect voice activity.
use plugin whisper::{pcm_i16_to_f32, rms_energy, is_silence}
let samples = pcm_i16_to_f32(file.read_bytes("chunk.pcm"))
let energy = rms_energy(samples)
print("RMS energy: {energy}")
Check if a sample buffer is below an energy threshold
Returns true if the RMS energy of samples is below threshold. Use with a threshold around 0.01–0.05 for typical microphone input to skip silent frames.
use plugin whisper::{pcm_i16_to_f32, is_silence}
let samples = pcm_i16_to_f32(chunk)
if is_silence(samples, 0.02) {
print("silent frame, skipping")
}
Compute zero-crossing rate of a sample buffer
Returns the fraction of consecutive sample pairs that cross the zero axis. Higher values indicate noise or unvoiced fricatives; lower values indicate voiced or silent audio.
use plugin whisper::{pcm_i16_to_f32, zero_crossing_rate}
let samples = pcm_i16_to_f32(chunk)
let zcr = zero_crossing_rate(samples)
print("ZCR: {zcr}")
Generate a Hann window of the given size
Generates a Hann (raised cosine) window of size floating-point coefficients. Use with apply_window before computing an FFT or feeding frames into a model.
use plugin whisper::{window_hann, apply_window}
let win = window_hann(512)
Multiply samples element-wise by a window
Multiplies each sample by the corresponding window coefficient. samples and window must have the same length. Returns the windowed sample table.
use plugin whisper::{window_hann, apply_window}
let win = window_hann(512)
let windowed = apply_window(frame, win)
Split a sample buffer into overlapping frames
Splits the sample buffer into overlapping frames of frame_size samples, advancing by hop_size samples between frames. Returns a 1-indexed table of frames, each itself a table of float samples.
use plugin whisper::{pcm_i16_to_f32, resample_linear, frame_audio, window_hann, apply_window}
let samples = resample_linear(pcm_i16_to_f32(file.read_bytes("speech.pcm")), 44100.0, 16000.0)
let frames = frame_audio(samples, 400, 160)
let win = window_hann(400)
print("frames: {#frames}")