Speech to Text
Learn how to turn audio into text.
Overview
The Audio API provides two speech to text endpoints, transcriptions and translations, based on our state-of-the-art Whisper model.
Transcriptions
Transcribe audio into whatever language the audio is in.
Translations
Translate and transcribe the audio into English.
Transcriptions
The transcriptions API takes as input the audio file you want to transcribe and the desired output file format.
Transcribe audio
import fs from "fs";
import OpenAI from "openai";
const openai = new OpenAI();
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream("/path/to/file/audio.mp3"),
model: "whisper-1",
});
console.log(transcription.text);
Translations
The translations API takes audio in any supported language and transcribes it into English.
Translate audio
import fs from "fs";
import OpenAI from "openai";
const openai = new OpenAI();
const transcription = await openai.audio.translations.create({
file: fs.createReadStream("/path/to/file/german.mp3"),
model: "whisper-1",
});
console.log(transcription.text);
Timestamps
Get structured and timestamped output at the segment or word level for precise transcripts and video edits.
Timestamp options
import fs from "fs";
import OpenAI from "openai";
const openai = new OpenAI();
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream("audio.mp3"),
model: "whisper-1",
response_format: "verbose_json",
timestamp_granularities: ["word"]
});
console.log(transcription.words);
Longer Inputs
For files larger than 25 MB, you'll need to split them into smaller chunks.
Splitting audio files
from pydub import AudioSegment
song = AudioSegment.from_mp3("good_morning.mp3")
# PyDub handles time in milliseconds
ten_minutes = 10 * 60 * 1000
first_10_minutes = song[:ten_minutes]
first_10_minutes.export("good_morning_10.mp3", format="mp3")
Prompting
Use prompts to improve transcription quality and handle specific words or acronyms.
Using prompts
import fs from "fs";
import OpenAI from "openai";
const openai = new OpenAI();
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream("/path/to/file/speech.mp3"),
model: "whisper-1",
response_format: "text",
prompt:"ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array",
});
console.log(transcription.text);
Supported Languages
English
Spanish
French
German
Italian
Portuguese
Dutch
Russian
Japanese
Chinese
Korean
Arabic
Hindi
Turkish
Vietnamese
Polish
Ukrainian
Greek
And many more languages supported through the Whisper model.