Show HN: PDF to Podcast – Convert Any PDF into a Podcast Episode

simonw

I always go straight for the prompt with this kind of thing - it's here: https://github.com/knowsuchagency/pdf-to-podcast/blob/512bfb...

It starts like this:

    Your task is to take the input text provided and turn it into
    an engaging, informative podcast dialogue. The input text may
    be messy or unstructured, as it could come from a variety of
    sources like PDFs or web pages. Don't worry about the
    formatting issues or any irrelevant information; your goal is
    to extract the key points and interesting facts that could be
    discussed in a podcast.

The way this uses different OpenAI TTS voices for the different roles is really neat!

cush

It might be a good idea to toss some kind of audio disclaimer at the beginning of the podcast that cites the source and that the audio is completely fabricated. Reason being, the "Attention is All You Need" example on your site has Anya Sharma (an actual AI researcher who is unrelated to the Attention paper) on as a guest. Not sure if this is intentional or a hallucination, but it seems like a huge liability

wenbin

Awesome project!

However, I find that when I realize a podcast is generated using AI and synthetic audio, I immediately lose interest. For me, the value of podcasts lies in authentic human conversations, and AI-generated content just doesn’t have the same appeal.

Probably it's just me being obsessed with old-school podcasts, though. I do believe there are listeners (not sure if many or few) who don't mind if a podcast is AI-generated.

leobg

I tried the same thing for my kids:

Take some article or book written for adults. Maybe some archaeological discovery, interesting stuff from HN. Or science books from the 1960s.

Then have it turned into a conversation between the father and a curious, seven year old daughter. And convert it to audio with two different speakers.

While it’s been fun to build this, I never ended up letting my kids use it. It just feels wrong. The educational equivalent of Harlow’s Monkeys.

david1542

Looks good. As other people said, it's risky to give you my OpenAI key, so I'd make the app run locally with React maybe. Moreover, it'd be good to give an approximation of the price. It's kinda scary to click "Submit" and later on see that I was charged $3 by OpenAI.

edward-ca

Looks like a fun project!

Do you have any samples of the audio? It would be great to hear what it's like before trying it out.

Also, have you considered doing this all in client side JS? Would be a good way to protect the API key (at least in this demo case).

eggbrain

I think it would probably help to take the PDF up front, do a combination of checking the DPI and page count to get an estimated word count (as OCRing to get an exact word count might be costly on your end), and then return back a “price preview” at which point the customer just pays the price to get their podcast.

Like others have mentioned, I’d be scared to accidentally upload a 100 page PDF only for it to cost me $100 without me really knowing up front.

unraveller

Sounds exactly like the way the simply news podcast is put together. That is 100% ai for each topic (ai, tech, business, science etc) and combines multiple recent papers/stories for a hundreds of daily podcasts.

https://simply-ai.podbean.com

https://www.simplynews.ai

InfiniteLoup

Love the idea, as I find never enough time to sit down and read but could listen to it while running or commuting. However, I'm hesitant to hand over my OpenAI API key to a website that's not under my control. No idea though how the trust problem can be solved.

mikae1

Cool! But the really cool thing were would be a service that converts the contents of a text RSS/Atom feed to a podcast with a podcast feed. Imagine your favorite blogs being podcasts that you could listen to on the go.

rubiquity

How do you tell it where to put the Athletic Greens advertisements?

archsurface

I never listen to a podcast on less than 1.5x because there is already too much crap conversation, and I only want the nuggets of value; so I would only use tts for listening to text.

m4rc3lv

This is cool! It works nice. Too bad the audio is only in English, even if you submit a PDF in another language

WalterBright

Should add an option to get Swedish Chef output, Bork Bork!

elphinstone

Can I download as an mp3 for later playback or archiving?

cchance

Can you imagine this, with an RVC pass to do voice transfer... what a time to be alive.

Just wondering why the choice of OpenAI TTS instead of elevenlabs?

iJohnDoe

Congrats on launch! Brilliant.

spaceship__sun

Nice work but I gotta provide my own OAI key? Why not just run one of the API demos at this point.

davidw

Can someone make something going the other way?

I don't like podcasts. I tune out after about 30 seconds of chit chat and intros and blah blah blah and end up missing stuff and can't search for it or copy and paste it.

WhackyIdeas

[flagged]

barfbagginus

I don't like podcasters because they usually muddle through stuff and approach things in a kind of non-productive superficial way that drives easy engagement rather than hard work results.

That said if it's a topic that I'm really really ignorant about, a little podcast/YouTube can be helpful. For example Yannick kilchers YouTube videos, especially how he annotates and breaks down the math equations, can be very useful if the paper's domain is new to me.

I think about it as pre-reading the paper.

A more focused first and second reading mode, may I propose, would add even more value. In these modes, the paper would be read more faithfully.

A problem that text to speech has when you feed it a regular PDF is that it will choke on titles, headings, footers, inline citations, page numbers, acronyms, abbreviations, numerical tables, charts, and diagrams.

So I would like to build or see something that conversationally reads the PDF as if it were a peer reading to me, unpacking abbreviations, mentioning titles and authors and years of citations (when I want that), describing charts, and perhaps even letting me interrupt to discuss specific misunderstandings I'm having.

There's obviously a challenge that reading a paper is an active engagement depending on your own knowledge state. We might gloss over formulas, footnotes, and citations on a first read, for example.

Still, a low hanging fruit would be a converter mode that accurately strips out page numbers and headers. There is little in this world more aggravating than listening to a 30 page paper, and having to hear that paper title and authors repeated an additional 15 times because it's reading the header.