Speech to Text

Learn how to turn audio into text.

Overview

The Audio API provides two speech to text endpoints, transcriptions and translations, based on our state-of-the-art Whisper model.

Transcriptions

Transcribe audio into whatever language the audio is in.

Translations

Translate and transcribe the audio into English.

Transcriptions

The transcriptions API takes as input the audio file you want to transcribe and the desired output file format.

Transcribe audio

import fs from "fs";
import OpenAI from "openai";

const openai = new OpenAI();

const transcription = await openai.audio.transcriptions.create({
  file: fs.createReadStream("/path/to/file/audio.mp3"),
  model: "whisper-1",
});

console.log(transcription.text);

Translations

The translations API takes audio in any supported language and transcribes it into English.

Translate audio

import fs from "fs";
import OpenAI from "openai";

const openai = new OpenAI();

const transcription = await openai.audio.translations.create({
  file: fs.createReadStream("/path/to/file/german.mp3"),
  model: "whisper-1",
});

console.log(transcription.text);

Timestamps

Get structured and timestamped output at the segment or word level for precise transcripts and video edits.

Timestamp options

import fs from "fs";
import OpenAI from "openai";

const openai = new OpenAI();

const transcription = await openai.audio.transcriptions.create({
  file: fs.createReadStream("audio.mp3"),
  model: "whisper-1",
  response_format: "verbose_json",
  timestamp_granularities: ["word"]
});

console.log(transcription.words);

Longer Inputs

For files larger than 25 MB, you'll need to split them into smaller chunks.

Splitting audio files

from pydub import AudioSegment

song = AudioSegment.from_mp3("good_morning.mp3")

# PyDub handles time in milliseconds
ten_minutes = 10 * 60 * 1000

first_10_minutes = song[:ten_minutes]

first_10_minutes.export("good_morning_10.mp3", format="mp3")

Prompting

Use prompts to improve transcription quality and handle specific words or acronyms.

Using prompts

import fs from "fs";
import OpenAI from "openai";

const openai = new OpenAI();

const transcription = await openai.audio.transcriptions.create({
  file: fs.createReadStream("/path/to/file/speech.mp3"),
  model: "whisper-1",
  response_format: "text",
  prompt:"ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array",
});

console.log(transcription.text);

Supported Languages

English

Spanish

French

German

Italian

Portuguese

Dutch

Russian

Japanese

Chinese

Korean

Arabic

Hindi

Turkish

Vietnamese

Polish

Ukrainian

Greek

And many more languages supported through the Whisper model.