I wrote a simple library to reduce latency in voice generations from LLM chat completion streams.
This lets you generate voices from streams of text from local LLMs, such as Ollama and local TTS clients, such as Apple Say along with external clients such as Google Text-to-Speech with the same speed as privately created assistants such as OpenAI.
As each sentence end is detected, it will run TTS on it and play it out loud while the rest of the completion is being generated in the background.
Developed an open source voice assistant that integrates OpenAI's Whisper, Chat Completion and Voice Generation APIs to provide an assistant experience.
Some potential extensions could include integrating into custom hardware or adding function calling to expand the default capabilities.
Simple experiment for question answering on YouTube videos using embeddings and the top n YouTube search result transcripts.
Take a question and optionally a YouTube search query (otherwise an LLM will auto-generate one), will compile transcripts for each video result, generate an embedding index using the transcripts and then answer the question using the relevant embeddings.
Returns both a string response and a list of sources that were used for the answer.
Introducing our Video Summarization API — a game-changing tool that leverages advanced language models to summarize any YouTube video, no matter the length. Similar in technology to OpenAI's ChatGPT, our API distills key points and themes from videos, offering a quick way to grasp content without watching it in entirety. Ideal for content creators, researchers, and anyone who wants to consume video content more efficiently.
Can be easily integrated into Apple Shortcuts to summarize YouTube videos on the go (will publish an example soon).
Currently in beta, feel free to leave comments and feedback so I can improve the API.
OpenAI's GPT-3 seems to be better than Google and smart home assistants so I wanted to make my own by wrapping the GPT-3 API in voice recognitions and text to speech.
I wrote a short script to recognize vocal input from a computer microphone, send the text to OpenAI's GPT-3 and respond with a voice over your speaker.
This lets you generate voices from streams of text from local LLMs, such as Ollama and local TTS clients, such as Apple Say along with external clients such as Google Text-to-Speech with the same speed as privately created assistants such as OpenAI.
As each sentence end is detected, it will run TTS on it and play it out loud while the rest of the completion is being generated in the background.