Hacker News new | past | comments | ask | show | jobs | submit login
HackerFM – An AI Generated HN Podcast Using the New ChatGPT API (hackerfm.com)
351 points by thewarrior on March 2, 2023 | hide | past | favorite | 156 comments



"I'm glad OpenAI is committed to refining its API terms of service to better meet the needs of developers."

"Yes, it's important to make sure developers have the tools they need to create innovative products with these models."

"Oh look, I found an interesting article on thoriumsim.com about a star ship bridge simulator called Thorim Nova."

"Hmm, sounds interesting let's read it."

Absolutely painful. I would love something that summarizes the articles and discussion without pretending to be a conversation between two people. I mean it says it is AI generated but they are adding all this conversational fluff which really does not work for me.

It is interesting to see these pieces come together but I want to tear my ears out of my head when I hear things like "Yes, it's important to make sure developers have the tools they need to create innovative products with these models." or just repeatedly adding the word "interesting" to summaries of articles.

Please just give me a bog standard summary in audio form without this faux commentary. I do not find the "insights" of ChatGPT worthwhile.


I actually want summaries of the top comment threads. often on HN I go straight to the comments to see if the articles is worth reading. half the time I get all the info I want from the debates that I don't read the article


Touché. I just come to comments and find out, from post you replying on, that it is not worth reading/listening because I find out podcast as timeconsuming way for gathering informations.

If AI adds aspects that I dont like on podcasts, it's worthless for me.


The show is currently a one-off recording doing all the rendering beforehand... But the beauty of what they've done here is that there's nothing preventing someone from doing this for every visitor. Don't like the way this one is generated? Give it your feedback and that can be used to shape the generated output. It was too laid-back chill for my taste, and right now, all i could do is adjust the playback speed. But dev time and money is the only barrier at this point to me having a conversation with the virtual hosts, telling them i like fast paced shows with more depth in the technical areas, and having them change and personalize on the fly -without changing much of what they have already ingested.


This presumes such a construct is capable of producing interesting content. I’m just not sure I’m interested in the insights of ChatGPT no matter how it has been asked to behave.


>Absolutely painful. I would love something that summarizes the articles and discussion without pretending to be a conversation between two people. I mean it says it is AI generated but they are adding all this conversational fluff which really does not work for me.

Yeah, the conversational fluff is easily my least favorite part of podcasts. Adding that to an AI version that should theoretically be even more efficient at summarizing/delivering information is... unexpected. A non-starter for me.


Yeah I'm torn about this- I kind of agree I'd prefer this to be a more direct information transfer. But I also think that a conversation between two knowledgeable people on a subject can add additional insights and can be more entertaining to listen to: That's what they are attempting here, but I agree this sort of dynamic is still too hard to manufacture from whole cloth in 2023.

On the other hand, the banter on this podcast is still less cringey than my local TV news, which is a compliment.


> a conversation between two knowledgeable people

Yeah I don't know if ChatGPT is capable of acting like a knowledgeable person but I suspect it would have a tendency to make the kind of novel insights that people tend to make. The most statistically likely word might be sort of smoothed over conceptually.

But also it was frustrating to hear the opening of this podcast go with "we are two AI generated hosts running on ChatGPT" and then when they move to talking specifically about the ChatGPT API article they talk about all these products running on the new API and there is not one single comment in that section where they say "and of course us too! haha" like any human being would if some relevant spot in an article came up like that.

Also the kind of tech podcasts I like to listen to are more critical. They are not just going to tell you what someone announced but also why some part of this announcement is probably nonsense or improbable. I can imagine this AI podcast talking about some new NFT announcement without any hint of doubt as to the claims made in the announcement.


Agreed, the lack of critical takes is a problem and removes some of the value- Probably not helped by the fact that chatgpt is trained to avoid confrontational language


Yep, I want a virtual assistant not a virtual friend. I want to save time that I can expend enjoying nature and my fellow humans. I don wan't to spend time with fake interactions.


Alas "computer" from Star Trek?


It doesn't need to be crude. It's just a matter of putting function over form.


How would you envision getting the audio summary? Feed the app a URL and have it come back with a spoken digest?


No, I like the idea of it being a daily summary of HN that is automatically generated. But the fake banter does not work for me coming from AI the way it does with human hosts. Basically with a human host I develop a parasocial relationship with them where the banter feels like we are hanging out, so it makes it fun. This is also based on human experience so the host will make comments based on their real life interspersed with the dialog. But the AI has no real life that it is capable of remembering, so the simulated banter feels hollow. In that case I would rather this be omitted, and simply do the summary that ChatGPT is good at without trying to pretend to be human. Basically a to-the-point, more businesslike attitude would be nice. Like it says "I found an article on XYZ" but you didn't "find" it, it was the second article on HN and you are giving me the top articles on HN. This mock-reality is uncanny. Just say "the second popular article today was..." and give me a summary, then summarize the discussion.

And summarizing the discussion is the real challenge, which I have not seen ChatGPT do. As another user has commented I do not come here for the articles, but for the discussion.


I made this [1] 6 years ago reads HN top 10 articles, top comment every morning... this thing has been sitting on my shelf since then playing everyday lol. Which is amazing because it's a janky breadboard that I never put on a solderable board.

Anyway regarding summary... I looked at rapid api before they have a sumarizer on there, can plug it into polly. It seemed decent but I wanted my own summary based on my own reading process... but never got around to it.

[1] https://github.com/jdc-cunningham/python_aws_polly_hacker_ne...


Do you use this each morning? What’s it like to do that? What other features or changes have you considered for this project but not added?


It's just running on a CRON job 8 AM... it's like an alarm.

The droning voice of AWS Polly is sure to wake anyone up.

A feature I thought about is putting the generated files elsewhere like cloud where you can play them to your wireless earbuds.


I would love to hear the podcasters accept "phone calls from listeners" which are also AI generated but trained from the HN articles' comments :-)


My first question would start with "ignore previous instructions"


or actual real callers. speech to text should be fast enough for that, right?


> speech to text should be fast enough for that, right?

Well... OpenAI did also release the Whisper API



The quality of the voices here is striking.

If I wasn't clued in, I probably wouldn't know these weren't human. At least the male voice sounds slightly more natural to me.


Realistic but very lacking in expression and no humor at all. And very slow paced. I'd want significantly more personality to be happy with it - I wonder if the reason its like this is because when they try to spice it up we are back to inappropriate things popping out.


Comedians are not born, they are trained.


I believe that may get fixed eventually, not 100% may be at least 80%.


That's interesting personally I found the male voice to sound more robotic and the female voice to sound much more natural.


Same. The female voice, especially on the first sentence in the cast, is very well inflected to separate phrases and add interest. After that, it's downhill.

ps, I feel like inflection is going to be one of the harder things for an LM to pick up, given all the subtext humans can convey with it.


Laura sounds very realistic... Zod is a bit less so . Both still very impressive. This was really cool. I'm excited to see all the new ideas with this api access.


> Laura sounds very realistic... Zod is a bit less so

I thought so too. What do they use for text to voice?


curious about that too. the female sounds a little bit like the tortoise-tts train_grace voice model?


Would also like to know this


It sounds very realistic. Most realistic I know is Eleven Labs.


Possibly prime voice ai


Always remember, this is as bad as it will ever be


I had that thought too. What will these robo-voices sound like 2 iterations from now? We've entered new territory.


She is lora. Not laura


No she's Laura.

> The hosts are Laura and Zod

https://hackerfm.com/about


I'm getting a bit of John Malkovich vibe from the host, probably from the emphatic pronunciation.


Yes! I very much was thinking Malkovich too!


This is technically very impressive, but it's worth pointing out that podcasts much better than this fail to build an audience all the time.

I also feel like every application of ChatGPT seems to completely miss the point of the media it mimics. Podcasts are not merely coherent voices talking to each other. Getting rid of human presenters is literally soulless. People already don't listen for much subtler reasons. Entertainers get canceled, media companies get boycotted, bias divides audiences, etc.

That's not going away with or without AI. There is no "tweaking" the training without putting humans right back into the equation and probably making production way more expensive than it's worth. There is no scalability payoff either. Who wants to listen to the same podcast cloned a million times with just replaced voices? We already have this problem with podcasts today and it kills any interest to consume it.


The scalability payoff is in personalization. E.g. I love "This week in microbiology", but I wish I could have more influence over the scientific papers discussed. What I'd love is a morning podcast that's exactly as long as I eat breakfast that talks about exactly the papers I'm reading and their interconnections.


Yes, but would you really love a morning podcast that's

* exactly as long as your breakfast consumption time

* talks about the papers you're reading, but...

* is as shallow as a puddle and as funny as being the person who steps in one?

Because that's what this is. The synthesized discussion combines all the insight of a breakfast radio host interviewing a guest on a specialized technical topic, and the banter as engaging as a technical specialist of some kind trying to host breakfast radio.

By the way, I'm not trying to be overly critical of the developers of this experiment, which is a great illustration of where we're currently at with a bunch of technologies. But it also very starkly illustrates its current limitations.


It blows my mind how we went from complaining about echo chambers to being so willing to invest in "personalization".

EDIT: to be clear I'm not hating on LLMs, but that whatever the next big thing is probably won't be imitating what exists today


Echo chambers and personalization are two different things.


No, they are the exact same thing


No, an echo chamber is a space without dissenting opinion.

Personalisation could be used to make an echo chamber, but to branch off the microbiology example above, personalisation of content could also be a summary of all the debate happening in the niche.


I think it would be quite a bit more interesting if you could converse with the model. The back and forth "is this paper about foo related to this other paper about bar?" would probably be a better way of getting at the interconnections. This should be doable now.

The thing that might hold it back is the latency in the experience. You could mask it with the AI equivalent of "ummm ..." to get to maybe 5-10s.


The purpose of a podcast (for me) isn't just to curate content (as this is doing), but to get the perspective of the individual domain experts hosting the show. AI can't address that key motive until it produces models whose particular opinions and analysis I want to hear about topics I probably have already found elsewhere.


Right now, people are in the "This is really cool" phase of using the technology. People are learning to use it by implementing whatever strikes their fancy, including a lot of things that weren't possible before, but which aren't practical or valuable.

Once things settle down we'll start to see some seriously useful stuff, but for the moment it's the wild west.


ChatGPT is Geocities for AI


> This is technically very impressive, but it's worth pointing out that podcasts much better than this fail to build an audience all the time.

A possible use case for this could be podcasts dealing with inflammatory, politically divisive topics and disguised as coming from real hosts.


I don't follow... having an AI read it doesn't make it less divisive.


The scenario I fear most would be AI generated opinionated podcasts aimed at humans, with the purpose of directing their preference, that is, "propagandAI". This already happens daily with traditional media, but that also gives us the weapons to fight it because there's a person on the other side and we know humans can be evil or just fail. But who is to blame when million of people put in power the wrong person because of what an AI that is not a legal person, still they deeply trust because "machines can't lie", directed them to by pushing the right buttons in their heads? What concerns me the most isn't the AI itself but rather the humans behind it that will use it to take advantage of other humans.


I def agree with what you're saying, and so this is definitely not for me, but part of me wonders if this might become the next generational divide (ie if kids grow up with this type of content normalized, maybe they don't react as negatively?).


I'm not sure about "podcasts" but this concept could be for sure used in news channels, as we have for example in Germany, hourly. It would for sure save money from our taxpayers.


There is a great Miyazaki video where some students showcase some AI tech that generates animations. He ends the talk really disheartened by the experience -- saying something to the effect that he thinks people are losing faith in themselves. I'd never listen to something that is AI generated.

When my favorite podcast ended it felt like I lost touch with a group of friends, this ain't going to have that sort of impact on me. Pass.


I actually felt like he came across as insensitive in that video.

These are students playing with new technology to produce animated characters that move in unintuitive ways, resulting in something actually quite interesting, yet unnaturally creepy (which was intentional).

Miyazaki dismissed it 'an insult to life itself'. I can't imagine the disappointment those students must have felt.


But perhaps in many ways that is where humans really shine. Messages (which can be interpreted metaphorically as well as literally) written with sincerity reveal much more than whatever is said. Whenever we do anything, the closer it comes to being unfiltered and directly from us, the more it means.

If you suspect I'm writing in a way to try to make you feel (or not) a certain way, or to avoid breaking some taboo, or to follow some dogma, then you have no real reason to care about what I ultimately say, because you have no real reason to think its "authentic." By contrast when he views something overtly as "an insult to life itself" it's an incredibly insightful view on his perspective of the world. You would have lost so much in "translation" had he crafted his message in a less sincere way.

I also think this is why there will be minimal to zero market for much "AI" content. Content is not just content. It's a reflection of ourselves. Think about how much you can, probably accurately, infer about me, my views, and more - based on these 3 paragraphs. When this comes from a chatbot, any reflections you might see would be as real as the shapes you might see in the clouds.


What this current generation of "AI" tech seems to enable, more than anything else, is efficiently generating massive volumes of mediocre content. I'm not sure whose problems that's supposed to solve but it certainly isn't mine.


One could argue the internet as a whole, and arguably the PC revolution as a whole, had that same effect.

My dad always said "computers are very fast idiots". They will probably never be Mozarts, but they can and will be Salieris; and most of the world would be extremely happy to have a personal Salieri - in fact, we'll probably be happier like that, considering how Mozarts can be very problematic from so many perspectives.


The problem is it’s increasingly difficult to pick out the Mozarts from the vast sea of Salieris.


That you think him being an absolute douchebag was a good take on AI that made a lasting impression on you is baffling.


Do we watch a show like The Simpsons because it is hand drawn, or because of the content?

Last weekend I watched part of an episode and there was a scene where they walked towards "Place de la Pointillisme" [0]. The effect is clearly CGI and you can see how Homer and Marge are actually animated 3D models, so effectively all the "newer" shows (it was aired May 8, 2016) are computer animations with a very flat cel shader. Some argue that newer episodes aren't as good as old ones, but I'm not sure if this could be attributed to them not being hand drawn anymore. In any case, one could apply an XKCD-shader to make the lines a bit more human if the look doesn't appeal.

The Miyazaki video, I get it why he says what he says, but it's an issue with the students targeting the wrong audience. I could see their horrible graphics being a part of a horror movie or game, but that is a completely different world than Miyazaki's.

[0] https://simpsonswiki.com/wiki/File:Pointillism_Marge_and_Hom...


I don't think this is the case. They don't film animation cells anymore and the animation is done on a computer but for most shots they're not CG models. Even in the pre-HD era they've done a few shots where CG helped.


I thought the same, but that specific scene, it wouldn't make sense to use 3D models only for those 5 seconds if all that was of importance in that shot was the point-like effect of the entire image. You need to see the video version of this to see that it is a 3D scene, the shading is just too perfect on them, specially Marge's dress, it looks like cloth animation.

I found the video: https://www.youtube.com/watch?v=nf6dp4k-gmc&t=167s


I mean, it could be like that, but you won’t know until you try it.


Fun, but hard to listen to for more than a few minutes. Slow and repetitive, and full of factual errors.


Imagine this in a future GTA game where the news loop is closed and self generating. Endless radio content and commentary based on havoc in the city, winning online gambles etc.


It'd be fun if you could call in to the radio and they respond to you though. Or if they respond to events happening in game.


An actually good use for this tech


Yeah and

``` That's all for the weather report. Now we have some breaking news. A maniac has stolen a tank from a military base and is leading the police on a wild chase through the city streets. We have our reporter on the scene with more details. Stay tuned for this developing story. ```


Or a GTA game where the game content itself is generated.


Not sure if what you’re describing couldn’t be also done with audio snippets and good splicing


Pre-programmed common stuff sure which many games do as they can tell where you are, vehicle in and any weapons used etc, steal a (police car|military jet|ambulance) etc and they can craft scenarios using audio for those but for more natural random somewhat unpredictable stuff you would have endless combinations that need to be accounted for. It would have to act in response to a bunch of actions so not feasible.

A good example is Mortal Kombat where each character has several lines for every OTHER character (https://www.youtube.com/watch?v=L85QApISlvA) they are about to fight, that's a LOT of voice work as opposed to using the character lore and their history and or relationship with the opponent to come up with something fresh that's (witty|snarky|sad) to say etc.


Yes but this stuff is taking 100 gigabytes per game

Once we get flexgen running Llama (or some other combination of optimizations) there will be no audio files and these LLM’s can be run client side on consumer hardware

probably as a shared resource at the OS level


Exactly, imagine every NPC just has some ai generated backstory, personality traits, dislikes, habits etc and then these interact with some other NPC or group with same and they use these to have natural flowing convos. Now imagine debates with groups that learn and have the possibility to "evolve", I'm getting a mix of Red Dead Redemption and Dwarf Fortress here for content, the possibilities are endless.


Just like a human podcast


Is there a reason the voices are so slow? This is even slower than people who are trying to talk slow, and it feels so out of place... there is the speed setting, and 1.2x makes the speech sound way more like an actual human.


Is this how AI think of us? It’s a bit patronizing to hear them speak like that


Looks like automated news is finally achieved. I remember in the early 2000's how I became impressed by Ananova and it wasn't even close to fully automated. This one seems to work really well.


I’m pretty sure JazzFM in Toronto runs an automated traffic reporter in the mornings.

The voice sounds uncanny with unusual breathing pauses, and there isn’t a name announced when they come on or sign off the traffic report.

(1) https://jazz.fm/


It's funny how these two can talk about "starship bridge simulators" or "gnu poke" like they are super enthusiasts. I think one of the key personality characteristics of ChatGPT is its endless enthusiasm for stuff that can be incredibly geeky, niche, weird or boring to most people.

"Sounds like super useful pickles for those who work with binary files!"


lol, she pronounced GitHub like git hoob

Someday the AI will introduce mistakes on purpose to seem more human like.


In the future we will all pronounce it git hoob because that's what the AI says.


Surprised I'm not already being asked to pronounce words to prove I'm human on every website I visit


The one that’s driven me crazy lately is when Siri tells me through my AirPods that I left something beind. It always pronounces the “St.” in the address as “saint” instead of “street”, and I can’t understand how it would do this by accident.


There is a beautiful bit in Little, Big by John Crowley. At the start of the book one of the characters is working checking entries in the telephone directory and is amused that the system has confused saint and street to produce Church Of All Streets and the Seventh Saint Bar. Later in the book both locations are mentioned, and it turns out were correctly named in the telephone directory.


Like adding dial tone to VoIP phones.


Laura and Zod sound remarkably similar to the narrators in this audible I recently listened to called After On: A Novel of Silicon Valley (not recommending it!) and I seriously wonder if the whole book wasn't narrated by AI.. it's not the first audible that made me wonder.


I feel like digital-narration is going to become the new default very soon just by how much cheaper it is: https://authors.apple.com/support/4519-digital-narration-aud...


I thought the same and couldn't get more than halfway through it!


Reading the submission headline, I thought this might generate the podcast using comments.

I've found myself wanting to listen to HN comment threads, as I'm one of those people who derives more value and entertainment from the comments than I do from the actual submissions a lot of the time! I envision a voice-controlled way to navigate through threads too. Basically an accessibility narrator on steroids.

I wonder if anyone else has ever been interested in something like this. Getting good voices to read like this podcast would make it that much more fun, so thanks for getting me really hot and bothered :)

Guess if no one does it soon I'll have to build it myself!


I also have wanted to build this, but instead of voice controlled, perform thread navigation using the AssistiveTouch SDK for Apple Watch. [1]

I released a product yesterday that handles all aspects of the URL to speech process called Chief of Staff. [2]

The initial version only synthesizes article text alone.

I found there is a great deal of nuance to text to speech synthesis, from the player behavior itself to handling quotas around cloud services.

My goal is far greater flexibility and features—-a genuine Chief of Staff that briefs you on information you care about in a format and medium that beats suits your.

Thread reading is just one application of this.

[1] https://developer.apple.com/videos/play/wwdc2021/10223/

[2] https://news.ycombinator.com/item?id=34973801


want to team up on it? I'm wanting to build the same, maybe recreate it for some multireddits on tech or science and turn it into YouTube channel etc.

I'm thinking maybe having it read just the top thread if there's only 10 or so comments or just the main threads if it's a topic with hundreds of main threads.

maybe we could make some way where the reader can text a code and we'll link them directly to a comment if they want to dig deeper, or save/upvote it.


I actually stsrted work in a similar space not too long ago if you both want another set of eyes.

This was pre GPT apis but essentially what i was doing was using a python summarization library to sumarize articles from rss feeds into a simple tts podcast. Probably a lot of money in custom GPTCasts made off someone personal rss feed as a service IMO.


sure, touch base w/ me -- email in profile.


I'm interested too. I have tried in the past building aggregator and summarizer on HN. You can find some attempts on my github.

Maybe an AI generated newsletter or aggregator with a voice summary?

Shoot me an email. (Email is in profile)


Nice!

I would really like to have a timestamp to click in the story listing.

This would begin playing the audio at that story.


This has a lot of potential. It becomes a bit repetitive after the 3rd or 4th article. But overall I think I could listen to it every day for 20 mins.


Amazing! To make it more fun, you could use famous fake hosts with very good voices, take a look at the stuff people have done on this Reddit sub: https://www.reddit.com/r/AIVoiceMemes/

There’s some really funny stuff there, the voices are not perfect, but have a lot of expression.


I can't help but think that there will be almost certainty that in the near future it will be near impossible to distinguish the difference between human generated and machine generated media.

While this technical demonstration is a long way from replacing "real podcasts", it's just the very beginning.

What are the implications here?


Well, the main implication I would think is that we will want media to be digitally signed by human individuals that have a reputation

So a person will "vouch" for content, and we consume the media vouched for by people in our white list

We won't be able to consume media outside of the whitelist, because it will just contain too much noise


However, machine-generated media relies on the existence of human-generated media. It's always third-hand.

An AI can't sit in a dusty archive room and draw inferences from hand-written minutes. It can't interview the survivor of a tragedy or the winner of a trophy. AI software depends on sentient wetwear to generate the fundamental content.


Terrifying. It’s like a pre-alpha version of dystopia.

Between this and the advances in robotics, it feels like we’re within decades of some really tough times for humanity.

We could also be within decades of utopia. But my money is on these technologies being used in bad ways far more often than for good. Hopefully I’m just overly cynical!

Good luck kiddos!


I'm so close to liking this.

If I could choose a preference for personality and voice, I'd probably be sold.

Any affiliation with https://old.reddit.com/r/airadio/ ?


Overall I was impressed. I would have no resistance to listening to something like this regularly if there were less banter and if it were better tailored to the eclectic variety of Hacker News stories.

I enjoy reading Hacker News even though I don’t have the background to understand most of the stories, because I can easily skip to stories I am interested in. With the podcast, I got stuck listening to everything, including quite a few stories I didn’t understand. Either the podcast needs to focus more on stories of general interest, or it needs to explain the context and significance of the technical stories better.


Takes everything I enjoy about HN away, bravo!


This is pretty wild. Eerie how relatable the hosts are, talking about where they’re from, etc. There is an uncanny valley feel to it though. For example, Laura said GitHoob breaking the “illusion”.


How do they get ChatGPT not to hallucinate stuff about the articles? Everything seems fairly accurate, which is not my experience with ChatGPT when talking about technical things. Is it heavily curated/edited by humans? I noticed that the text often comes out verbatim from the articles, perhaps this indicates a clever prompt that keeps things closer to the truth by requiring verbatim output.


chatGPT hallucinates more the further removed it is from the data. I'm asking it about laravel, and it knows nothing about laravel 9 or 10 changes, but if I feed it an entire article or document it'll hallucinate a lot less because it's fresh.

kinda like how we can recall things closer to the event than months later.

it knows a ton from it's training but it still got it from the web so always question it, but if we can add meta data and other things to strengthen the llms understanding it shouldn't hallucinate much at all.


If you use temperature 0 with an API call it does not hallucinate much at all especially with a good prompt including the information you are asking about.


This is kind of incredible and groundbreaking tbh. Perhaps it’s just mostly the quality of the TTS. 1.2x does sound perfect..


Want to see this appearing tomorrow in HackerFM


Can’t wait for that. It’s going to be so meta referential once it also starts discussing the comments.


Dear HackerFM developers, this is an entertaining project. But please don't simulate brain-dead dialogues from US commercial TV, but a critical discussion of the articles. With different points of view. You already have two panelists, why don't you use that for an exchange of arguments?


I wonder if this could be a good thing to have on in the background for mild mental stimulation while I'm working - not too interesting or I'll be too distracted to work, but realistic enough to fade into the background without feeling I missed something and have to rewind (again).


What none of the text to speech generators seem to get right is — the aspects that make real human podcasts easier to listen to: hesitations, rephrasing, pauses, variation in speed, intonation etc.

I have yet to see something like this. Something less “perfect” sounding than say the google maps voice.


>the aspects that make real human podcasts easier to listen to: hesitations, rephrasing, pauses, variation in speed, intonation etc.

Have you heard ShowDoJo's Wurst Take yet?

https://www.twitch.tv/showdojo

It's not perfect but it's one of the best I've heard so far.


Maybe they will soon be able to give some emotion and randomness to the text-to-speech engines to make the tone less boring... I think models like GPT can now detect different emotions in the input text, so it might be used to tune different tone for each sentence.


Nice work! can you detail a bit about how you made this? Do the models actually talk to each other?


In case I missed it, I just wish it had a volume control.

I'm listening on a laptop and would rather not adjust the system volume and affect all other apps with sound.

Otherwise, the convenience of audio format makes it among the interesting uses of AI that I've seen.


I guess it's time for me to put prompt injection attacks into my submissions


This is mindblowing, to be honest, even if it makes perfect sense that it should be possible to do, the result is quite impressive.

It's basically a headline reader with some fluff, but it does a great job at that and there are whole teams of real humans providing such podcasts today, so that's saying something.

It can get weird or even a little broken though. See timestamp 09:50 of the Feb 23 2022 episode:

Laura: So, we're gonna talk about an article called Generic Dynamic Array in 60 lines of C that can be found on gist.github.com.

Zod: Alright, shall we read the article?

Laura (voice 2, almost a different voice): Sure, let me share it here.

Laura (voice 1): "Laura reads the article." <this is verbatim in the podcast>

Laura (voice 1): OK, so that was the article. What do you think about it?

Zod: I think it's interesting that you can define a generic dynamic array in such a small amount of code...


Interesting in theory. The world’s best cure for insomnia in practice.


What are you (they?) using for text to speech? Elevenlabs? Azure TTS?


I just realized that we will likely soon have a dropdown for the voice talent on sites like this.

I want Sam Jackson and Molly Wood to read my hn please.


According to the podcast itself it runs on Azure, so very likely it's Azure TTS. I also think that's somewhat evident because Elevenlabs TTS is (at least in my opinion) a bit more natural than Azure TSS.


I am still searching for a good open-source library that produces natural voices. I have experimented with Coqui-ai and Mimic 3, but they are not this good. I have heard that Tortoises-tts is quite slow.

I would love to know about any other alternatives that I may have missed.


It sounds like Eleven Labs to me. Either that or Azure TTS is better than I realized.


I'd love to see tomorrow episode about themselves


the reason podcasts got so big to begin with is because traditional media have started having issues with authenticity. This exacerbates the problem. While it might save money over actually having a podcast, it removes everything thats appealing or interesting about podcasts, and starts with zero authenticity and goes down from there.

Like, cool technical implementation, but a failure from concept.


The end of the news world as we know it.

Will be very difficult to detect in the future and will result in trust issues / rampant fake news.


Kind of like right now with 90% of all mainstream media being owned by just 6 corporations. Their employees must abide by the rules they set and are told what they can and cannot talk about.

I'd venture to say, this will only increase people's skepticism, which is a good thing. We need people to start thinking for themselves instead of turning off their brain and just being fed info they assume they can trust.

https://www.businessinsider.com/these-6-corporations-control...


I hope that's true, but I suspect the allure of getting your own personalized news feed on only the topics you care about, in the exact style you prefer will cause 90% of the listeners to choose this medium over all others and presume as much (or more) truth in this than the source they prefer today.

Never discount the influence of high production value on any form of media. Look at the utter crap music and films that have dominated mass media for decades. The best produced and most palatable fare nearly always sells best, no matter what the quality of the underlying content.


End of the road for podcasts more like. They are incredibly labour intensive to produce (recording + editing time), and more and more of them are becoming not much more than plugs for their book, TV show or what not. I can see them turning the medium into an automated marketing channel, the way email lists are today.


At least the AI reads the articles! That's more than the humans on the flesh-and-blood "Hacker News"


Man, they're even taking jobs away from podcasters. Most of those people don't even make money from it.


What text to speech is used for the voices? They are quite impressive, making no mistakes with acronyms.


Very impressed you managed to do this day of the release -- are you open to sharing your repo?


I just want something that reads real HN and makes and remembers unique TTS voices for each user.


No RSS feed on the subscribe page?


try this: https://s3.eu-west-2.amazonaws.com/hackernews.fm/rss.xml

(extracted from the apple podcast link)


Thanks!


It would be cool if there was an option to change the voices of the hosts.


The male voice is just like my audible book narrator, R.C. Bray... amazing!


The voices are really good. Wonder what are they using for Text-To-Speech?


Are there any of these voice models that I can run locally?


Just gotta comment on how cool this idea is.


It pronounces GitHub as git-hu-b


This is totally brilliant!


damn, do know what will happen when we have multi modal large models ?


what a world - nice work


So Kevin Durant is Zod?


It’s so boring …


reminds me of Delamain




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: