"I'm glad OpenAI is committed to refining its API terms of service to better meet the needs of developers."
"Yes, it's important to make sure developers have the tools they need to create innovative products with these models."
"Oh look, I found an interesting article on thoriumsim.com about a star ship bridge simulator called Thorim Nova."
"Hmm, sounds interesting let's read it."
Absolutely painful. I would love something that summarizes the articles and discussion without pretending to be a conversation between two people. I mean it says it is AI generated but they are adding all this conversational fluff which really does not work for me.
It is interesting to see these pieces come together but I want to tear my ears out of my head when I hear things like "Yes, it's important to make sure developers have the tools they need to create innovative products with these models." or just repeatedly adding the word "interesting" to summaries of articles.
Please just give me a bog standard summary in audio form without this faux commentary. I do not find the "insights" of ChatGPT worthwhile.
I actually want summaries of the top comment threads. often on HN I go straight to the comments to see if the articles is worth reading. half the time I get all the info I want from the debates that I don't read the article
Touché. I just come to comments and find out, from post you replying on, that it is not worth reading/listening because I find out podcast as timeconsuming way for gathering informations.
If AI adds aspects that I dont like on podcasts, it's worthless for me.
The show is currently a one-off recording doing all the rendering beforehand... But the beauty of what they've done here is that there's nothing preventing someone from doing this for every visitor. Don't like the way this one is generated? Give it your feedback and that can be used to shape the generated output. It was too laid-back chill for my taste, and right now, all i could do is adjust the playback speed. But dev time and money is the only barrier at this point to me having a conversation with the virtual hosts, telling them i like fast paced shows with more depth in the technical areas, and having them change and personalize on the fly -without changing much of what they have already ingested.
This presumes such a construct is capable of producing interesting content. I’m just not sure I’m interested in the insights of ChatGPT no matter how it has been asked to behave.
>Absolutely painful. I would love something that summarizes the articles and discussion without pretending to be a conversation between two people. I mean it says it is AI generated but they are adding all this conversational fluff which really does not work for me.
Yeah, the conversational fluff is easily my least favorite part of podcasts. Adding that to an AI version that should theoretically be even more efficient at summarizing/delivering information is... unexpected. A non-starter for me.
Yeah I'm torn about this- I kind of agree I'd prefer this to be a more direct information transfer. But I also think that a conversation between two knowledgeable people on a subject can add additional insights and can be more entertaining to listen to: That's what they are attempting here, but I agree this sort of dynamic is still too hard to manufacture from whole cloth in 2023.
On the other hand, the banter on this podcast is still less cringey than my local TV news, which is a compliment.
Yeah I don't know if ChatGPT is capable of acting like a knowledgeable person but I suspect it would have a tendency to make the kind of novel insights that people tend to make. The most statistically likely word might be sort of smoothed over conceptually.
But also it was frustrating to hear the opening of this podcast go with "we are two AI generated hosts running on ChatGPT" and then when they move to talking specifically about the ChatGPT API article they talk about all these products running on the new API and there is not one single comment in that section where they say "and of course us too! haha" like any human being would if some relevant spot in an article came up like that.
Also the kind of tech podcasts I like to listen to are more critical. They are not just going to tell you what someone announced but also why some part of this announcement is probably nonsense or improbable. I can imagine this AI podcast talking about some new NFT announcement without any hint of doubt as to the claims made in the announcement.
Agreed, the lack of critical takes is a problem and removes some of the value- Probably not helped by the fact that chatgpt is trained to avoid confrontational language
Yep, I want a virtual assistant not a virtual friend. I want to save time that I can expend enjoying nature and my fellow humans. I don wan't to spend time with fake interactions.
No, I like the idea of it being a daily summary of HN that is automatically generated. But the fake banter does not work for me coming from AI the way it does with human hosts. Basically with a human host I develop a parasocial relationship with them where the banter feels like we are hanging out, so it makes it fun. This is also based on human experience so the host will make comments based on their real life interspersed with the dialog. But the AI has no real life that it is capable of remembering, so the simulated banter feels hollow. In that case I would rather this be omitted, and simply do the summary that ChatGPT is good at without trying to pretend to be human. Basically a to-the-point, more businesslike attitude would be nice. Like it says "I found an article on XYZ" but you didn't "find" it, it was the second article on HN and you are giving me the top articles on HN. This mock-reality is uncanny. Just say "the second popular article today was..." and give me a summary, then summarize the discussion.
And summarizing the discussion is the real challenge, which I have not seen ChatGPT do. As another user has commented I do not come here for the articles, but for the discussion.
I made this [1] 6 years ago reads HN top 10 articles, top comment every morning... this thing has been sitting on my shelf since then playing everyday lol. Which is amazing because it's a janky breadboard that I never put on a solderable board.
Anyway regarding summary... I looked at rapid api before they have a sumarizer on there, can plug it into polly. It seemed decent but I wanted my own summary based on my own reading process... but never got around to it.
Realistic but very lacking in expression and no humor at all. And very slow paced. I'd want significantly more personality to be happy with it - I wonder if the reason its like this is because when they try to spice it up we are back to inappropriate things popping out.
Same. The female voice, especially on the first sentence in the cast, is very well inflected to separate phrases and add interest. After that, it's downhill.
ps, I feel like inflection is going to be one of the harder things for an LM to pick up, given all the subtext humans can convey with it.
Laura sounds very realistic... Zod is a bit less so . Both still very impressive. This was really cool. I'm excited to see all the new ideas with this api access.
This is technically very impressive, but it's worth pointing out that podcasts much better than this fail to build an audience all the time.
I also feel like every application of ChatGPT seems to completely miss the point of the media it mimics. Podcasts are not merely coherent voices talking to each other. Getting rid of human presenters is literally soulless. People already don't listen for much subtler reasons. Entertainers get canceled, media companies get boycotted, bias divides audiences, etc.
That's not going away with or without AI. There is no "tweaking" the training without putting humans right back into the equation and probably making production way more expensive than it's worth. There is no scalability payoff either. Who wants to listen to the same podcast cloned a million times with just replaced voices? We already have this problem with podcasts today and it kills any interest to consume it.
The scalability payoff is in personalization. E.g. I love "This week in microbiology", but I wish I could have more influence over the scientific papers discussed. What I'd love is a morning podcast that's exactly as long as I eat breakfast that talks about exactly the papers I'm reading and their interconnections.
Yes, but would you really love a morning podcast that's
* exactly as long as your breakfast consumption time
* talks about the papers you're reading, but...
* is as shallow as a puddle and as funny as being the person who steps in one?
Because that's what this is. The synthesized discussion combines all the insight of a breakfast radio host interviewing a guest on a specialized technical topic, and the banter as engaging as a technical specialist of some kind trying to host breakfast radio.
By the way, I'm not trying to be overly critical of the developers of this experiment, which is a great illustration of where we're currently at with a bunch of technologies. But it also very starkly illustrates its current limitations.
No, an echo chamber is a space without dissenting opinion.
Personalisation could be used to make an echo chamber, but to branch off the microbiology example above, personalisation of content could also be a summary of all the debate happening in the niche.
I think it would be quite a bit more interesting if you could converse with the model. The back and forth "is this paper about foo related to this other paper about bar?" would probably be a better way of getting at the interconnections. This should be doable now.
The thing that might hold it back is the latency in the experience. You could mask it with the AI equivalent of "ummm ..." to get to maybe 5-10s.
The purpose of a podcast (for me) isn't just to curate content (as this is doing), but to get the perspective of the individual domain experts hosting the show. AI can't address that key motive until it produces models whose particular opinions and analysis I want to hear about topics I probably have already found elsewhere.
Right now, people are in the "This is really cool" phase of using the technology. People are learning to use it by implementing whatever strikes their fancy, including a lot of things that weren't possible before, but which aren't practical or valuable.
Once things settle down we'll start to see some seriously useful stuff, but for the moment it's the wild west.
The scenario I fear most would be AI generated opinionated podcasts aimed at humans, with the purpose of directing their preference, that is, "propagandAI". This already happens daily with traditional media, but that also gives us the weapons to fight it because there's a person on the other side and we know humans can be evil or just fail. But who is to blame when million of people put in power the wrong person because of what an AI that is not a legal person, still they deeply trust because "machines can't lie", directed them to by pushing the right buttons in their heads?
What concerns me the most isn't the AI itself but rather the humans behind it that will use it to take advantage of other humans.
I def agree with what you're saying, and so this is definitely not for me, but part of me wonders if this might become the next generational divide (ie if kids grow up with this type of content normalized, maybe they don't react as negatively?).
I'm not sure about "podcasts" but this concept could be for sure used in news channels, as we have for example in Germany, hourly. It would for sure save money from our taxpayers.
There is a great Miyazaki video where some students showcase some AI tech that generates animations. He ends the talk really disheartened by the experience -- saying something to the effect that he thinks people are losing faith in themselves. I'd never listen to something that is AI generated.
When my favorite podcast ended it felt like I lost touch with a group of friends, this ain't going to have that sort of impact on me. Pass.
I actually felt like he came across as insensitive in that video.
These are students playing with new technology to produce animated characters that move in unintuitive ways, resulting in something actually quite interesting, yet unnaturally creepy (which was intentional).
Miyazaki dismissed it 'an insult to life itself'. I can't imagine the disappointment those students must have felt.
But perhaps in many ways that is where humans really shine. Messages (which can be interpreted metaphorically as well as literally) written with sincerity reveal much more than whatever is said. Whenever we do anything, the closer it comes to being unfiltered and directly from us, the more it means.
If you suspect I'm writing in a way to try to make you feel (or not) a certain way, or to avoid breaking some taboo, or to follow some dogma, then you have no real reason to care about what I ultimately say, because you have no real reason to think its "authentic." By contrast when he views something overtly as "an insult to life itself" it's an incredibly insightful view on his perspective of the world. You would have lost so much in "translation" had he crafted his message in a less sincere way.
I also think this is why there will be minimal to zero market for much "AI" content. Content is not just content. It's a reflection of ourselves. Think about how much you can, probably accurately, infer about me, my views, and more - based on these 3 paragraphs. When this comes from a chatbot, any reflections you might see would be as real as the shapes you might see in the clouds.
What this current generation of "AI" tech seems to enable, more than anything else, is efficiently generating massive volumes of mediocre content. I'm not sure whose problems that's supposed to solve but it certainly isn't mine.
One could argue the internet as a whole, and arguably the PC revolution as a whole, had that same effect.
My dad always said "computers are very fast idiots". They will probably never be Mozarts, but they can and will be Salieris; and most of the world would be extremely happy to have a personal Salieri - in fact, we'll probably be happier like that, considering how Mozarts can be very problematic from so many perspectives.
Do we watch a show like The Simpsons because it is hand drawn, or because of the content?
Last weekend I watched part of an episode and there was a scene where they walked towards "Place de la Pointillisme" [0]. The effect is clearly CGI and you can see how Homer and Marge are actually animated 3D models, so effectively all the "newer" shows (it was aired May 8, 2016) are computer animations with a very flat cel shader. Some argue that newer episodes aren't as good as old ones, but I'm not sure if this could be attributed to them not being hand drawn anymore. In any case, one could apply an XKCD-shader to make the lines a bit more human if the look doesn't appeal.
The Miyazaki video, I get it why he says what he says, but it's an issue with the students targeting the wrong audience. I could see their horrible graphics being a part of a horror movie or game, but that is a completely different world than Miyazaki's.
I don't think this is the case. They don't film animation cells anymore and the animation is done on a computer but for most shots they're not CG models. Even in the pre-HD era they've done a few shots where CG helped.
I thought the same, but that specific scene, it wouldn't make sense to use 3D models only for those 5 seconds if all that was of importance in that shot was the point-like effect of the entire image. You need to see the video version of this to see that it is a 3D scene, the shading is just too perfect on them, specially Marge's dress, it looks like cloth animation.
Imagine this in a future GTA game where the news loop is closed and self generating. Endless radio content and commentary based on havoc in the city, winning online gambles etc.
```
That's all for the weather report. Now we have some breaking news. A maniac has stolen a tank from a military base and is leading the police on a wild chase through the city streets. We have our reporter on the scene with more details. Stay tuned for this developing story.
```
Pre-programmed common stuff sure which many games do as they can tell where you are, vehicle in and any weapons used etc, steal a (police car|military jet|ambulance) etc and they can craft scenarios using audio for those but for more natural random somewhat unpredictable stuff you would have endless combinations that need to be accounted for. It would have to act in response to a bunch of actions so not feasible.
A good example is Mortal Kombat where each character has several lines for every OTHER character (https://www.youtube.com/watch?v=L85QApISlvA) they are about to fight, that's a LOT of voice work as opposed to using the character lore and their history and or relationship with the opponent to come up with something fresh that's (witty|snarky|sad) to say etc.
Yes but this stuff is taking 100 gigabytes per game
Once we get flexgen running Llama (or some other combination of optimizations) there will be no audio files and these LLM’s can be run client side on consumer hardware
Exactly, imagine every NPC just has some ai generated backstory, personality traits, dislikes, habits etc and then these interact with some other NPC or group with same and they use these to have natural flowing convos. Now imagine debates with groups that learn and have the possibility to "evolve", I'm getting a mix of Red Dead Redemption and Dwarf Fortress here for content, the possibilities are endless.
Is there a reason the voices are so slow? This is even slower than people who are trying to talk slow, and it feels so out of place... there is the speed setting, and 1.2x makes the speech sound way more like an actual human.
Looks like automated news is finally achieved. I remember in the early 2000's how I became impressed by Ananova and it wasn't even close to fully automated. This one seems to work really well.
It's funny how these two can talk about "starship bridge simulators" or "gnu poke" like they are super enthusiasts. I think one of the key personality characteristics of ChatGPT is its endless enthusiasm for stuff that can be incredibly geeky, niche, weird or boring to most people.
"Sounds like super useful pickles for those who work with binary files!"
The one that’s driven me crazy lately is when Siri tells me through my AirPods that I left something beind. It always pronounces the “St.” in the address as “saint” instead of “street”, and I can’t understand how it would do this by accident.
There is a beautiful bit in Little, Big by John Crowley. At the start of the book one of the characters is working checking entries in the telephone directory and is amused that the system has confused saint and street to produce Church Of All Streets and the Seventh Saint Bar. Later in the book both locations are mentioned, and it turns out were correctly named in the telephone directory.
Laura and Zod sound remarkably similar to the narrators in this audible I recently listened to called After On: A Novel of Silicon Valley (not recommending it!) and I seriously wonder if the whole book wasn't narrated by AI.. it's not the first audible that made me wonder.
Reading the submission headline, I thought this might generate the podcast using comments.
I've found myself wanting to listen to HN comment threads, as I'm one of those people who derives more value and entertainment from the comments than I do from the actual submissions a lot of the time! I envision a voice-controlled way to navigate through threads too. Basically an accessibility narrator on steroids.
I wonder if anyone else has ever been interested in something like this. Getting good voices to read like this podcast would make it that much more fun, so thanks for getting me really hot and bothered :)
Guess if no one does it soon I'll have to build it myself!
I also have wanted to build this, but instead of voice controlled, perform thread navigation using the AssistiveTouch SDK for Apple Watch. [1]
I released a product yesterday that handles all aspects of the URL to speech process called Chief of Staff. [2]
The initial version only synthesizes article text alone.
I found there is a great deal of nuance to text to speech synthesis, from the player behavior itself to handling quotas around cloud services.
My goal is far greater flexibility and features—-a genuine Chief of Staff that briefs you on information you care about in a format and medium that beats suits your.
want to team up on it? I'm wanting to build the same, maybe recreate it for some multireddits on tech or science and turn it into YouTube channel etc.
I'm thinking maybe having it read just the top thread if there's only 10 or so comments or just the main threads if it's a topic with hundreds of main threads.
maybe we could make some way where the reader can text a code and we'll link them directly to a comment if they want to dig deeper, or save/upvote it.
I actually stsrted work in a similar space not too long ago if you both want another set of eyes.
This was pre GPT apis but essentially what i was doing was using a python summarization library to sumarize articles from rss feeds into a simple tts podcast. Probably a lot of money in custom GPTCasts made off someone personal rss feed as a service IMO.
Amazing! To make it more fun, you could use famous fake hosts with very good voices, take a look at the stuff people have done on this Reddit sub: https://www.reddit.com/r/AIVoiceMemes/
There’s some really funny stuff there, the voices are not perfect, but have a lot of expression.
I can't help but think that there will be almost certainty that in the near future it will be near impossible to distinguish the difference between human generated and machine generated media.
While this technical demonstration is a long way from replacing "real podcasts", it's just the very beginning.
However, machine-generated media relies on the existence of human-generated media. It's always third-hand.
An AI can't sit in a dusty archive room and draw inferences from hand-written minutes. It can't interview the survivor of a tragedy or the winner of a trophy. AI software depends on sentient wetwear to generate the fundamental content.
Terrifying. It’s like a pre-alpha version of dystopia.
Between this and the advances in robotics, it feels like we’re within decades of some really tough times for humanity.
We could also be within decades of utopia. But my money is on these technologies being used in bad ways far more often than for good. Hopefully I’m just overly cynical!
Overall I was impressed. I would have no resistance to listening to something like this regularly if there were less banter and if it were better tailored to the eclectic variety of Hacker News stories.
I enjoy reading Hacker News even though I don’t have the background to understand most of the stories, because I can easily skip to stories I am interested in. With the podcast, I got stuck listening to everything, including quite a few stories I didn’t understand. Either the podcast needs to focus more on stories of general interest, or it needs to explain the context and significance of the technical stories better.
This is pretty wild. Eerie how relatable the hosts are, talking about where they’re from, etc. There is an uncanny valley feel to it though. For example, Laura said GitHoob breaking the “illusion”.
How do they get ChatGPT not to hallucinate stuff about the articles? Everything seems fairly accurate, which is not my experience with ChatGPT when talking about technical things. Is it heavily curated/edited by humans?
I noticed that the text often comes out verbatim from the articles, perhaps this indicates a clever prompt that keeps things closer to the truth by requiring verbatim output.
chatGPT hallucinates more the further removed it is from the data. I'm asking it about laravel, and it knows nothing about laravel 9 or 10 changes, but if I feed it an entire article or document it'll hallucinate a lot less because it's fresh.
kinda like how we can recall things closer to the event than months later.
it knows a ton from it's training but it still got it from the web so always question it, but if we can add meta data and other things to strengthen the llms understanding it shouldn't hallucinate much at all.
If you use temperature 0 with an API call it does not hallucinate much at all especially with a good prompt including the information you are asking about.
Dear HackerFM developers,
this is an entertaining project. But please don't simulate brain-dead dialogues from US commercial TV, but a critical discussion of the articles. With different points of view. You already have two panelists, why don't you use that for an exchange of arguments?
I wonder if this could be a good thing to have on in the background for mild mental stimulation while I'm working - not too interesting or I'll be too distracted to work, but realistic enough to fade into the background without feeling I missed something and have to rewind (again).
What none of the text to speech generators seem to get right is — the aspects that make real human podcasts easier to listen to: hesitations, rephrasing, pauses, variation in speed, intonation etc.
I have yet to see something like this. Something less “perfect” sounding than say the google maps voice.
Maybe they will soon be able to give some emotion and randomness to the text-to-speech engines to make the tone less boring... I think models like GPT can now detect different emotions in the input text, so it might be used to tune different tone for each sentence.
This is mindblowing, to be honest, even if it makes perfect sense that it should be possible to do, the result is quite impressive.
It's basically a headline reader with some fluff, but it does a great job at that and there are whole teams of real humans providing such podcasts today, so that's saying something.
It can get weird or even a little broken though. See timestamp 09:50 of the Feb 23 2022 episode:
Laura: So, we're gonna talk about an article called Generic Dynamic Array in 60 lines of C that can be found on gist.github.com.
Zod: Alright, shall we read the article?
Laura (voice 2, almost a different voice): Sure, let me share it here.
Laura (voice 1): "Laura reads the article." <this is verbatim in the podcast>
Laura (voice 1): OK, so that was the article. What do you think about it?
Zod: I think it's interesting that you can define a generic dynamic array in such a small amount of code...
According to the podcast itself it runs on Azure, so very likely it's Azure TTS. I also think that's somewhat evident because Elevenlabs TTS is (at least in my opinion) a bit more natural than Azure TSS.
I am still searching for a good open-source library that produces natural voices. I have experimented with Coqui-ai and Mimic 3, but they are not this good. I have heard that Tortoises-tts is quite slow.
I would love to know about any other alternatives that I may have missed.
the reason podcasts got so big to begin with is because traditional media have started having issues with authenticity. This exacerbates the problem. While it might save money over actually having a podcast, it removes everything thats appealing or interesting about podcasts, and starts with zero authenticity and goes down from there.
Like, cool technical implementation, but a failure from concept.
Kind of like right now with 90% of all mainstream media being owned by just 6 corporations. Their employees must abide by the rules they set and are told what they can and cannot talk about.
I'd venture to say, this will only increase people's skepticism, which is a good thing. We need people to start thinking for themselves instead of turning off their brain and just being fed info they assume they can trust.
I hope that's true, but I suspect the allure of getting your own personalized news feed on only the topics you care about, in the exact style you prefer will cause 90% of the listeners to choose this medium over all others and presume as much (or more) truth in this than the source they prefer today.
Never discount the influence of high production value on any form of media. Look at the utter crap music and films that have dominated mass media for decades. The best produced and most palatable fare nearly always sells best, no matter what the quality of the underlying content.
End of the road for podcasts more like. They are incredibly labour intensive to produce (recording + editing time), and more and more of them are becoming not much more than plugs for their book, TV show or what not. I can see them turning the medium into an automated marketing channel, the way email lists are today.
"Yes, it's important to make sure developers have the tools they need to create innovative products with these models."
"Oh look, I found an interesting article on thoriumsim.com about a star ship bridge simulator called Thorim Nova."
"Hmm, sounds interesting let's read it."
Absolutely painful. I would love something that summarizes the articles and discussion without pretending to be a conversation between two people. I mean it says it is AI generated but they are adding all this conversational fluff which really does not work for me.
It is interesting to see these pieces come together but I want to tear my ears out of my head when I hear things like "Yes, it's important to make sure developers have the tools they need to create innovative products with these models." or just repeatedly adding the word "interesting" to summaries of articles.
Please just give me a bog standard summary in audio form without this faux commentary. I do not find the "insights" of ChatGPT worthwhile.