More

ThomasRooney · 2024-11-12T20:36:27 1731443787

Would you mind explaining a bit more over why this has value over and above a google group in collaborative inbox mode?

Annecdotally, I think there's a lot of good problems for a new vendor to solve with a product in this category, but a collaborative inbox is really just the baseline of a solution. Personally, the main issue my team has with collaborative inboxes are not issues with handling who replys to each message, it's an issue of spam. Would love to have a vendor build a solution powerful enough to solve these specific problems:

  1. Filtering out automated beg-bounty outreach from any actual security issues by having some form of LLM responder: ideally having a bit of semi-automated back/forth (e.g. approved with a rich Slack button) to help determine if someone is serious or not (after two years of operating, I'm still at 100% of messages (over 1-2 messages per month per company) to security@example.com being spam; suspect over the mid-term it'll still be 98%+). 
  2. Filtering out spam where people are accidentally reaching out to the wrong company. 
  3. Filtering out spam where people are trying to sell us products we're not interested in. E.g. we attend conferences, for every actual conference email we get maybe 5 or 6 trying to sell us attendee email lists.

(would be happy to chat more, if you want to interview a potential customer; if you could really solve these above problems I'd pay you way more than your highest monthly rate on your pricing tier in a heartbeat, ideally scaling per email inbox rather than seat which would be likely be more lucrative for you, and more predictable for me)

lazyatom · 2024-11-12T21:24:16 1731446656

I believe if you want a Google Group Collaborative Inbox for an email address at a domain you own, then you need to be paying for a Google Workspace, which is currently something like $6/user/month.

Beyond that, Jelly has better design (IMHO!), can be used without needing a Google account, lets you discuss conversations inline, gives you an activity view for quickly seeing everything that's happened... basically, GGCI is fine, but we are laser-focussed on making Jelly a _great_ shared inbox for teams.

We'd love to chat more about your ideas though -- send us an email! You can find the contact details on https://letsjelly.com ;-)

ThomasRooney · 2024-09-30T19:30:04 1727724604

We've been a customer for the past year (https://speakeasy.com/docs). I was honestly highly skeptical about putting a RAG powered search in front of our documentation site instead of what we were using (FlexSearch / Nextra). Have been delighted to be proved wrong.

The learning I've had is that whilst the majority of queries go through standard search patterns (i.e. users search for something that's covered by documentation), a subset of queries are not answerable by our documentation but only implied by it. I've direct experience that Inkeep is serving a large part of that user segment and reducing our support burden.

As a very recent/specific example from last week, we had a community user generating a terraform provider for an internal use-case. By putting error messages from our CLI tooling into Inkeep's "Ask AI" feature, they discovered a nuance in "x-speakeasy-match" (the error message implied it created a circular reference, but didn't spell that out) and self-served a solution.

Inkeep effectively turned our documentation into a guided tutorial on our product, specific to the customer. Pretty strong ROI.

engomez · 2024-09-30T19:56:21 1727726181

best way to frame the customer-facing AI: guided tutorials on-demand that can translate between user terminology and product lingo.

james_marks · 2024-10-01T01:33:04 1727746384

Strong agree. I increasingly feel like this is one of the major benefits of AI.

ThomasRooney · on Nov 10, 2023

I’ve built conviction that code generation only gets useful in the long term when it is entirely deterministic, or filtered through humans. Otherwise it is almost always technical debt. Hence LLM code generation products are a cool toy, but no sensible teams will use them without an amazing “Day 2” workflow.

As an example, in my day job (https://speakeasyapi.dev), we sell code generation products using the OpenAPI specification to generate downstream artefacts (language SDKs, terraform providers, markdown documentation). The determinism makes it useful — API updates propagate continuously from server code, to specifications, then to the SDKs / providers / docs site. There are no breaking changes because the pipeline is deterministic and humans are in control of the API at the start. The code generation itself is just a means to an end : removing boilerplate effort and language differences by driving it from a source of truth (server api routes/types). Continuously generated, it is not debt.

We’ve put a lot of effort into trying to make an LLM agent useful in this context. However giving them control of generated code directly means it’s hard to keep the “no breaking changes”, and “consistency” restrictions that’s needed to make code generation useful.

The trick we’ve landed on to get utility out of an LLM in a code generation task, is to restrict it to manipulating a strictly typed interface document, such that it can only do non-breaking things to code (e.g. adjust comments / descriptions / examples) by making changes through this interface.

rattray · on Nov 10, 2023

+1, I work at a similar company (https://stainlessapi.com) and have had the exact same conclusions. We use LLMs in similar ways.

Well said @ThomasRooney.

There may be other contexts where pure LLM codegen could work well, but I haven't really encountered them personally yet.

morgante · on Nov 11, 2023

We've reached a similar conclusion for refactoring.

The first version of our product (https://grit.io) was entirely LLM-powered. It was very easy to get started with, but reliability was low on enterprise-scale codebases.

Since then, we've switched to a similar approach: using LLMs to manipulate a verifiable interface, but making actual changes through deterministic code.

ThomasRooney · on Oct 9, 2023

> This sounds pretty cool, but I'd love to know a bit more about how you handle the impedance mismatch between OpenAPI and an IaaC provider. So far I've only dabbled in making small changes to existing providers but to my somewhat uninformed eyes it seems like a "draw the rest of the owl" situation.

So the impedance mismatch is tackled in a few different ways:

CRUD:

The "operation" that interacts with the entity looks in an OpenAPI spec like "POST /entity", "GET /entity/{id}", "DELETE /entity". CRUD semantics aren't visible at this layer without making a bunch of heuristic guesses about the ways that people use (and mis-use) REST semantics.

Rather than guess, we add OpenAPI extensions to operations that guide these semantics. I.e. If `POST /entity` is used to create the entity, it is annotated with `x-speakeasy-entity-operation: MyEntity#create`. Similarly `#update`, `#read`, `#delete` for all other operations that interact with resources in terraform.

Entity Attributes:

The "entity" doesn't usually look the same across all CRUD request/responses. I.e. more attributes (e.g. `id`) are often returned in a response, that aren't in the request.

To tackle this, we annotate every JSON Schema in request/response with `x-speakeasy-entity: MyEntity` that applies to an interaction with an entity in terraform. Some versions of this might be bigger/smaller depending on API semantics. To build the terraform schema entry, we merge all of these together, applying inference logic to work out how each attribute is interacted with in the CRUD requests to determine the terraform properties.

E.g. if an attribute is returned in a CREATE API response, but isn't in a CREATE API Request, it's marked as `Computed`. If one attribute is in the UPDATE API Request, but not another, one is marked as `ForceNew` via a terraform plan extension (i.e. only modifiable with a full delete/create cycle) whereas the other is left alone.

The `type` / `format` that are visible in JSON Schemas make their way to runtime validations, e.g. ensuring that `format: date` are runtime validated in `YYYY-MM-DD` format.

In total there's around 35 different inference rules so far to cover how different bits of the JSON Schema / OpenAPI specification map into terraform state.

Hoisting: The "entity" that is been managed isn't always at the root level of a request/response body, but might be hidden at some deeper level.

E.g. 1: if a response body looked like `{data: {the-entity}}`, we annotate `the-entity` JSON Schema level and "hoist" that to the top level. Anything at a "higher" level is inlined into the resource.

E.g. 2: A parameter in a CREATE request will automatically be inlined into `the-entity`, even if not defined in the JSON Schema of a request body. E.g. `POST /{workspace_id}/entity` marked as `x-speakeasy-entity-operation: MyEntity#create` will not only have whatever request body is defined in the terraform type, but it will also gain an attribute `workspace_id` marked as `ForceNew` in the root of the state.

> Writing some terraform providers is very likely in our not-to-distant future, so I created an account and imported our spec. It failed validation and wouldn't let me proceed, and the errors provided weren't helpful. When I have a bit more time I'll throw it at another OpenAPI validator to try to work out where the problems are, but we're using this spec for code generation for several clients already and haven't had any issues. It's dynamically generated from our backend.

Thanks for letting us know. There's definitely gaps in our validation library: we operate on a relatively high level of strictness to minimize the complexity of code generation, but we're constantly trying to loosen it.

> One more thing, I noticed you have logos for stripe, twilio and plaid on your site, above a "Learn how SDKs help" button, which 404's. If those are your customers that's pretty cool, but otherwise using other companies' logos is a bit shady.

We'll remove them. It's meant to be an illustrator in how improving developer experience directly impacts revenue for a company, directly linking out to some sources. However definitely don't want us to appear shady. Some of our more recognizable customers are listed in the "Trusted By" section in the header.

rjst01 · on Oct 9, 2023

Thanks for the explanation. It'll be interesting to see how this strategy performs with our current API - it has a few rough edges that stem from being designed for consumption by internal clients only at this stage.

ThomasRooney · on Oct 8, 2023

Hey HN, after a fair few months of iteration, we're excited to share our latest offering in Speakeasy: auto-generation of Terraform providers using OpenAPI specifications.

The Problem: Building a Terraform Provider to expose an API via Infrastructure As Code (e.g. HCL, Pulumi, CDK) is expensive, error-prone, and highly repetitive.

However, if you don't have a mature terraform provider many products will not even be considered by mature organisations with a mandate to automate-all-the-things.

Our Solution:

1. Deep Integration with OpenAPI: Just point Speakeasy to your OpenAPI spec. Every change, every tweak, every evolution of your API is monitored, and we adapt in real-time.

2. Automated Terraform Provider Generation: Instead of manually coding a Terraform provider, Speakeasy synthesizes one for you, ensuring it remains in sync with your API's latest version.

3. Smart Schema Semantics: Based on CRUD operations, Speakeasy can smartly deduce and apply Terraform schema attributes like Computed, Optional, and Force Replace.

4. Continuous GitHub PRs: With each OpenAPI spec alteration, PRs are raised automatically against your Terraform provider repository, ensuring seamless and continuous integration.

Why Speakeasy for Terraform?

While several tools play around the fringes, none offer 100% automation via code synthesis. We've built and extensively tested our Terraform Provider Generation engine from OpenAPI, and have been in production with real customers for the last 6 months.

* It is possible to generate an OpenAPI specification for almost any server side framework (even things like ProtoBuf via REST Gateways) entirely automatically.

* Once your OpenAPI specification is automatically generated from your codebase, Speakeasy enables subsequent integration artifacts, like SDKs and Terraform Providers, to be automatically maintained with close-to-zero engineering effort.

* Once an API is exposed via a Terraform Provider, it becomes usable by the entire IaC ecosystem like Terraform, CDK, and Pulumi through the use of bridging tools.

* Speakeasy will also generate documentation, usage examples, support/guide you through the launch, and upgrade the provider automatically as the ecosystem matures.

Dive Deeper:

- Explore our product: https://www.speakeasyapi.dev

- Explore our CLI: https://github.com/speakeasy-api/speakeasy

- Explore our largest yet terraform provider: https://github.com/airbytehq/terraform-provider-airbyte

- Explore exposing a terraform provider via Pulumi: https://www.speakeasyapi.dev/post/pulumi-terraform-provider

- Explore a toy example: https://github.com/speakeasy-sdks/terraform-provider-hashicu...

A massive thanks to our early adopters and the vibrant Terraform community for guiding our journey.

HN, we’re eager for your insights. Whether it’s rigorous feedback, burning queries, or just wanting to geek out over Terraform and APIs, hit us up!

ThomasRooney · on Aug 31, 2023

The registry is more similar to https://sum.golang.org/ than the Chrome Web Store. It pretty much just stores a checksum database, a list of links to github (which actually hosts the cross-compiled binaries), a channel [Official, Partner, Community], some ownership metadata, and some static markdown per provider/module version for documentation.

E.g. back-of-envelope for terraform providers this is:

  Metadata: 4KB JSON [0] * ~15 OS/arch combinations * ~50 versions * ~3000 providers = ~10GB in total

  Docs: ~700Kb [1] * ~50 versions * ~3000 providers = ~100GB in total

In my mind the analagous behaviour would be if the golang checksum database added in license terms that stated "you need to abide by a BSL to use data from this service". What that actually would mean is so nebulous that it feels threatening.

[0] Source: https://registry.terraform.io/v1/providers/airbytehq/airbyte...

[1] Source: https://github.com/airbytehq/terraform-provider-airbyte/tree... gzipped : ~300 resources, ~300 data sources

(NB: in airbyte's case the TF Provider was generated from a ~150Kb OpenAPI spec via https://speakeasyapi.dev: implying docs could be compressed even more)

ThomasRooney · on July 22, 2023

Anecdotally, I ran an experiment with Envoy to see how far the number of signing keys could scale. This was for a B2B “API Key” auth solution; we wanted user keys to be self revocable, but just be a relatively standard JWT format for maintainability. The hypothesis was that, rather than running a whitelist or blacklist, we could improve the security signature by have 1 signing key for each JWT.

When we ran some stress tests, turned out Envoy could happily run with ~300K signing keys in its JWK Set before noticeable service degradation occurred. Even then, by bumping up the memory on the validation servers, there was a small sacrifice of a few ms per extra 100K keys.

This makes me fully agree that, for many applications, there’s probably an opportunity to vastly improve the security surface by bumping up the number of signing keys dramatically.

As long as both Keys and Signing keys define a KID, key verification is prefaced only by a hash table lookup or a tight loop through a keyset to find the appropriate Signing Key, before the slower verification procedure.

vladvasiliu · on July 22, 2023

I guess 640K keys ought to be enough for anybody (TM)(r) (c)

More seriously, though, I wonder how AzureAD is implemented and how hard it would be to scope keys per tenant, if not per application. If I'm not mistaken, SAML certificates are per application.

rvdginste · on July 23, 2023

If you want 1 signing key per JWT, you would need to generate a new key pair for each JWT; wouldn’t that be too expensive? Or was the generation included in your tests?

specialist · on July 22, 2023

ELI5, sorry:

Taking your POC just a bit further and you've got the basis for zero trust networking, right?

That's the Future Perfect Correct Answer™, right?

mistrial9 · on July 22, 2023

zero trust, or zero stability ?

specialist · on July 22, 2023

LMGTFY

https://en.wikipedia.org/wiki/Zero_trust_security_model

ThomasRooney · on June 6, 2023

This is pretty neat! Is the intent to serve small consumers (e.g. people who want to incorporate IoT devices into a smart home), or larger companies (e.g. the IoT manufacturers themselves)?

Would love to get to a place where I could "terraform apply" my IoT devices configuration. Any plans to build this? It's only a small jump from a well-documented API in an OpenAPI spec to a terraform provider.

__sy__ · on June 6, 2023

Most of our customers today are startups and a handful of large enterprise customers. Generally they're trying to connect and control their app users' devices. A common use case is smartlock & thermostat control for Airbnb reservation software. But some of the customers are actually controlling their own devices. This is usually the case for the large ones. Think real-estate group spanning multiple states with hundreds of buildings. They have fragmented fleets of devices and can't integrate them all.

Ultimately you can definitely use Seam as a hobbyist. A few people do. I have it running in my house alongside Home Assistant. But we're not ever planning on monetizing that segment.

The terraform is pretty interesting. If you could drop me a note sy@seam.co with an example configuration, I'd like to discuss it internally.

ThomasRooney · on Aug 31, 2022

This is a really good point. The generated SDK is configured to allow a super-set of the API request body to hit the backend, which could expose unnecessary data fields if passed in by client code.

It comes because we’re a bit pragmatic: if a user doesn’t specify their full data structure in their OpenAPI specification, and we can’t generate a strict type, we allow an arbitrary structure to reach the backend. In our experience the hard bit isn’t really writing the SDK, it’s making and maintaining a good OpenAPI spec (hence part of the commercial product we’re moving towards is the ability to generate a strict OpenAPI specification directly from handler code / traffic analysis in a backend server). Your concern is totally valid, and this is something we will make configurable.

ThomasRooney · on Aug 31, 2022

Hey all ! I worked on this with Sagar a couple weeks back. The tl;dr of this is that it generates a (in my heavily-biased opinion) relatively clean SDK given an OpenAPI schema, similar to what a human would write. We’d used several other OpenAPI SDK generators but found their result to be a bit too big (and not tree-shakable); so spent a bit of effort trying to work out a way to compile-in the OpenAPI spec into a thin (but statically typed) wrapping around axios — very similar to an SDK coded manually.

Here's a few examples:

1. The Petstore API (an tiny example): https://easysdk.xyz/sdk/petstore.json-7bb7c53e017c0f7432f7bd...

2. Our own API: https://easysdk.xyz/sdk/openapi.yaml-ee89154ee9cf9a77f9fb07d...

3. The LOTR API: http://easysdk.xyz/sdk/lotr.yaml-f1ec4cde1ca7839dca2685e283e...

The generator works by:

1. Dereferencing an OpenAPI specification into something with inline types. (Ideally we'd handle type references rather than inlining them, but haven't got there yet)

2. Walking the type-graph, and mapping it to Operations (a combination of Path and Method).

3. Using the Typescript SDK, generating the SDK via creating AST nodes whilst walking the type graph.

4. Trying to compile in:

    1. Path Parameters as ES6 Template strings (e.g. `"/v1/apis/{apiID}/api_endpoints"` => `/v1/apis/${props.apiID}/api_endpoints`)

    2. Query params into axios parameters

    3. Body params as an additional argument to the SDK

It's not perfect, but we've used this to help run our own unit tests (and have a few customers trying it out too)! Happy to answer any questions