Hey HN, I am proud to show you guys that I have built an open source alternative to Azure OpenAI services.
Azure OpenAI services was born out of companies needing enhanced security and access control for using different GPT models. I want to build an OSS version of Azure OpenAI services that people could self host in their own infrastructure.
"How can I track LLM spend per API key?"
"Can I create a development OpenAI API key with limited access for Bob?"
"Can I see my LLM spend breakdown by models and endpoints?"
"Can I create 100 OpenAI API keys that my students could use in a classroom setting?"
These are questions that BricksLLM helps you answer.
BricksLLM is an API gateway that let you create API keys with rate limit, cost control and ttl that could be used to access all OpenAI and Anthropic endpoints with out of box analytics.
When I first started building with OpenAI APIs, I was constantly worried about API keys being comprised since vanilla OpenAI API keys would grant you unlimited access to all of their models. There are stories of people losing thousands of dollars and the existence of a black market for stolen OpenAI API keys.
This is why I started building a proxy for ourselves that allows for the creation of API keys with rate limits and cost controls. I built BricksLLM in Go since that was the language I used to build performative ads exchanges that scaled to thousands of requests per second at my previous job. A lot of developer tools in LLM ops are built with Python which I believe might be suboptimal in terms of performance and compute resource efficiency.
One of the challenges building this platform is to get accurate token counts for different OpenAI and Anthropic models. LLM providers are not exactly transparent with the way how they count prompt and completion tokens. In addition to user input, OpenAI and Anthropic pad prompt inputs with additional instructions or phrases that contribute to the final token counts. For example, Anthropic's actual completion token consumption is consistently 4 more than the token count of the completion output.
The latency of the gateway hovers around 50ms. Half of the latency comes from the tokenizer. If I start utilizing Go routines, might be able to lower the latency of the gateway to 30ms.
BricksLLM is not an observability platform, but we do provide integration with Datadog so you can get more insights regarding what is going on inside the proxy. Compared to other tools in the LLMOps space, I believe that BricksLLM has the most comprehensive features when it comes to access control.
Let me know what you guys think.