However, I find that when I realize a podcast is generated using AI and synthetic audio, I immediately lose interest. For me, the value of podcasts lies in authentic human conversations, and AI-generated content just doesn’t have the same appeal.
Probably it's just me being obsessed with old-school podcasts, though. I do believe there are listeners (not sure if many or few) who don't mind if a podcast is AI-generated.
Take some article or book written for adults. Maybe some archaeological discovery, interesting stuff from HN. Or science books from the 1960s.
Then have it turned into a conversation between the father and a curious, seven year old daughter. And convert it to audio with two different speakers.
While it’s been fun to build this, I never ended up letting my kids use it. It just feels wrong. The educational equivalent of Harlow’s Monkeys.
Do you have any samples of the audio? It would be great to hear what it's like before trying it out.
Also, have you considered doing this all in client side JS? Would be a good way to protect the API key (at least in this demo case).
Like others have mentioned, I’d be scared to accidentally upload a 100 page PDF only for it to cost me $100 without me really knowing up front.
Just wondering why the choice of OpenAI TTS instead of elevenlabs?
I don't like podcasts. I tune out after about 30 seconds of chit chat and intros and blah blah blah and end up missing stuff and can't search for it or copy and paste it.
That said if it's a topic that I'm really really ignorant about, a little podcast/YouTube can be helpful. For example Yannick kilchers YouTube videos, especially how he annotates and breaks down the math equations, can be very useful if the paper's domain is new to me.
I think about it as pre-reading the paper.
A more focused first and second reading mode, may I propose, would add even more value. In these modes, the paper would be read more faithfully.
A problem that text to speech has when you feed it a regular PDF is that it will choke on titles, headings, footers, inline citations, page numbers, acronyms, abbreviations, numerical tables, charts, and diagrams.
So I would like to build or see something that conversationally reads the PDF as if it were a peer reading to me, unpacking abbreviations, mentioning titles and authors and years of citations (when I want that), describing charts, and perhaps even letting me interrupt to discuss specific misunderstandings I'm having.
There's obviously a challenge that reading a paper is an active engagement depending on your own knowledge state. We might gloss over formulas, footnotes, and citations on a first read, for example.
Still, a low hanging fruit would be a converter mode that accurately strips out page numbers and headers. There is little in this world more aggravating than listening to a 30 page paper, and having to hear that paper title and authors repeated an additional 15 times because it's reading the header.
It starts like this:
The way this uses different OpenAI TTS voices for the different roles is really neat!