Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Making videos searchable with LLMs (to get leads)
3 points by yuvalkarmi on Aug 11, 2023 | hide | past | favorite
Hey HN friends! Yuval here, I'm the maker of Conversational Demos:

https://www.producthunt.com/posts/conversational-demos?utm_source=hn

This is a product with a bit of an interesting history: it evolved from my fascination with what we can do with LLMs and video.

My original (engineering) thinking was "how cool would it be if we let someone search through video with natural language"?

I built the original proof of concept for this over the weekend, taped together with passion, code glue, and sheer will to look away from the really shitty code I originally wrote :)

When I showed it to users, I discovered that nobody was really willing to pay for video search - everyone thought it was cool, but there was no hook from a commercial standpoint. And so, as these things happen, I ended up turning it a marketing / sales tool. That's all I'll say about the commercial aspect of it, though I'm happy to expand on it if interesting to the community here.

Technical stuff: ---

Behind the scenes, when you upload a video (or record one - it's integrated with the not-yet-officially-released Loom SDK), it sends the video to AWS Transcribe, which works with nearly 40 languages, and at the same time converts it to an MP4 format that's tiny and viewable across devices (I do this with AWS Media Transcoder).

Next up, I break the transcription down into chunks of around 30 seconds and then send it to OpenAI's embedding database.

Through TONS of trial and error, I learned that to get good latency on this thing, you can't really use a (cheap) vector database, and so I'm doing something pretty hacky, and storing the index on a always-on Docker (deployed to fly.io) as opposed to a serverless function, which I was really trying to use until I realized it's not fit-for-purpose.

Next up, I upload the embedding index to S3 for later use, as well as store it locally on the machine. That's because the machine has ephemeral memory, and when I re-deploy, the index goes away, and so I just re-download it from S3 if it's not there.

I have no idea if this project will commercially succeed. It's live on Product Hunt now, and we'll see how people respond.

To be honest, I find building the most fun, and marketing the least fun. So on that front, I did learn several important things about LLMs and building chat-bots along the way:

1. Use the simplest solution if you're optimizing for latency

2. You can't just feed a VTT transcript into an LLM and expect it to figure out the context - you have to do your own chunking

3. There are lots of open source projects out there for using LLMs (langchain, llamaindex, auto-gpt, etc), but just using OpenAI's APIs directly yielded the best results at lowest latency, and least complexity.

I'm curious to get the feedback of the HN community - less so commercially, and moreso on the technical front, UX, etc.

What suggestions do you have, and what would love to see?

Thanks! Yuval




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: