logoalt Hacker News

minimaxir10/01/20241 replyview on HN

From the Realtime API blog post: https://openai.com/index/introducing-the-realtime-api/

> Audio in the Chat Completions API will be released in the coming weeks, as a new model `gpt-4o-audio-preview`. With `gpt-4o-audio-preview`, developers can input text or audio into GPT-4o and receive responses in text, audio, or both.

> The Realtime API uses both text tokens and audio tokens. Text input tokens are priced at $5 per 1M and $20 per 1M output tokens. Audio input is priced at $100 per 1M tokens and output is $200 per 1M tokens. This equates to approximately $0.06 per minute of audio input and $0.24 per minute of audio output. Audio in the Chat Completions API will be the same price.

As usual, OpenAI failed to emphasize the real-game changer feature at their Dev Day: audio output from the standard generation API.

This has severe implications for text-to-speech apps, particularly if the audio output style is as steerable as the gpt-4o voice demos.


Replies

OutOfHere10/01/2024

> and $0.24 per minute of audio output

That is substantially more expensive than TTS (text-to-speech) which already is quite expensive.

show 2 replies