Hacker News new | past | comments | ask | show | jobs | submit login

Oh yeah. The exciting thing is that is pretty low hanging fruit (at least for the common modalities)

What would a Large Language Model that can manipulate audio-visual data as expertly as it can manipulate text look like ? This is beyond just Text to Speech or Captioning and Image Q&A. I think we'll find out very soon.




Yeah, I think this is now in the "we know it can be done, and how to do it". And also som

I think we might have audio2audio editing on the level of Stable Diffusion within a year or two. Based on recent progress with AudioLM etc.

For short clips, we will probably get video2video editing in the same timeframe. While it is computationally more challenging, it might end up being here before audio editing - because video is so popular in todays social media and marketing landscape.

Usable joint video and audio editing (that is coherent) will probably take longer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: