whisper-large-v3 Adapter¶
Transcribes speech audio to text across 99 languages using openai/whisper-large-v3.
Model details¶
| Field | Value |
|---|---|
| Model | openai/whisper-large-v3 |
| Task | transcribe |
| Domain | audio, multilingual |
| License | Apache 2.0 |
Install¶
pip install synapse-adapter-sdk
pip install transformers torch
Verified output schema¶
The transformers automatic-speech-recognition pipeline returns a single dict:
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")
result = pipe("audio.mp3")
# {"text": " And so my fellow Americans..."}
The adapter strips leading/trailing whitespace and sets payload.content to the transcription string, replacing the original audio reference. Provenance confidence is fixed at 1.0.
Audio input formats¶
payload.content carries the audio input. The transformers pipeline accepts:
| Format | Example |
|---|---|
| File path string | "/data/earnings_call.mp3" |
| NumPy float32 array at 16 kHz | np.array([...], dtype=np.float32) |
| Dict with array and sampling rate | {"array": np.ndarray, "sampling_rate": 16000} |
Common audio formats supported: MP3, WAV, FLAC, OGG, M4A.
Supported task types¶
transcribe
Supported domains¶
audiomultilingual
Usage example¶
import time
from transformers import pipeline
from whisper_large_v3_adapter import WhisperLargeV3Adapter
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-large-v3")
adapter = WhisperLargeV3Adapter()
# 1. Prepare model input — payload.content holds the audio reference
model_input = adapter.ingress(ir)
# {"audio": "/data/earnings_call.mp3"}
# 2. Run the model (caller's responsibility)
t0 = time.monotonic()
model_output = pipe(model_input["audio"])
latency_ms = int((time.monotonic() - t0) * 1000)
# {"text": " Good morning. Revenue for Q3 came in at..."}
# 3. Convert output back to canonical IR
result_ir = adapter.egress(model_output, ir, latency_ms=latency_ms)
# 4. Access the transcription — original audio reference is REPLACED
transcript = result_ir.payload.content
# "Good morning. Revenue for Q3 came in at..."
Whisper large-v3 automatically detects the spoken language — no language tag is required in the IR. To force a specific language or enable translation to English, pass generate_kwargs to the pipeline (caller's responsibility, outside the adapter contract).