Querying AWS at scale across APIs, regions, and accounts

amanzi · on Nov 12, 2021

I've been playing with Steampipe a bit and it's really great, but unfortunately the SQL language lets it down a bit because of the structure of the language.

To explain... Steampipe has this amazing autocomplete functionality so you don't even have to know the table name you want - you can just start typing the table name and it will show you the list of possible matches. But because you have to type the "column names" before you type the "table name", Steampipe is unable to offer any autocomplete on the available attributes.

For example, instead of typing: `SELECT name, create_date FROM aws_iam_role` it would be much better if you could type: `FROM aws_iam_role SELECT name, create_date`. That way, Steampipe would be able to autocomplete the attributes available for each object.

Arnavion · on Nov 12, 2021

Yes. LINQ (C#'s DSL for SQL-like queries) puts the equivalent of the FROM clause ahead of the equivalent of the SELECT clause for the same reason; IDEs can autocomplete the SELECT clause that way.

cddotdotslash · on Nov 11, 2021

Interesting! Feels similar to CloudQuery[1] which is also open source.

[1] https://www.cloudquery.io/

oatmeal_croc · on Nov 11, 2021

Wonder how it handles the API rate limits

nathanwallace · on Nov 11, 2021

Hi - I'm a lead on the Steampipe (https://steampipe.io) project. Rate limits are definitely an interesting challenge, but actually less of an issue for AWS than many of our other plugins.

Because rate limits are separate across accounts, regions and services, the parallel queries work really well in those cases.

We do sub-API queries for tables, but only when the column is requested. So, "select name from aws_s3_bucket" just does a list call while "select * from aws_s3_bucket" does multiple API calls per row. These sub-API calls are the main potential source of rate limits since they hit the same API [1]. BTW, Cloud Control from AWS is actually much more subsceptible to this problem! [2].

We also use a custom backoff algorithm that is fast then slow to give good speed in the usual cases and ensure results if throttling ramps up [3].

Finally, we automatically cache results in memory between requests and can always save results into materialized views or similar to avoid repeated calls in larger cases.

TBH, live querying of APIs has proven so much more effective than we even hoped when we started!

1 - https://steampipe.io/blog/selective-select 2 - https://steampipe.io/blog/aws-cloud-control 3 - https://github.com/turbot/steampipe-plugin-aws/blob/4cbd8813...

posnet · on Nov 11, 2021

A few years ago I built something similar to this, in that it used the JSON API definitions to automatically scrape the entire AWS API surface area.

But I was doing it every 5 minutes and storing the result in postgres instead of it being on demand. I very quickly received a call from our account rep asking for it to not query particular services as frequently.

Apparently some of the newer services at the time I was scraping were not as optimized in terms of response caching or metadata queries as they should have been.

(Note: apparently it wasn't the overall frequency that was the problem, it was a few services that returned paginated results and fetching every page every 5 minutes was causing issues.)

zxcvbn4038 · on Nov 12, 2021

Most of the AWS SDKs have built in support to retry SDK calls using exponential backoff + jitter. However it is not obvious unless you instrument logging to see the individual attempts, nor is it obvious how to tweak the number of retries (default 3 for everything except dynamodb) or the backoff parameters. Java, Golang, and nodejs have it for sure - PHP does not.

jiggawatts · on Nov 11, 2021

Seems to be catching up to Azure’s Resource Graph…

jcims · on Nov 11, 2021

Boggles my mind that AWS still doesn't support something like this.

jiggawatts · on Nov 11, 2021

AWS and Azure have different philosophies:

Azure uses a unified resource manager that has everything in "one place".

AWS uses a "federation of systems" with almost nothing in common, except for authentication.

Azure's approach is more consistent and makes things like resource graph relatively easy to implement.

AWS's approach is more robust against failure, because the various service offerings are more independent and have fewer common failure points.

This isn't just theory, it matches my experience pretty well. I hate how in AWS I have to switch portals to go from service-to-service, whereas in Azure the "Resource Groups" contain all related resources in one place, even if it's a random mix of IaaS and PaaS. Conversely, I've been bitten by several wide-spread Azure outages. Repeatedly.

It all depends on your business requirements. For me, the consistency is more important. For other people, availability is far more critical.

jcims · on Nov 12, 2021

Agree on your points. I just think they could define a standard service discovery/enum interface and leave the implementation to the product team. Basically Config.

nathanwallace · on Nov 12, 2021

AWS is trying to do this with Cloud Control [1]. It's a common interface for resource inventory and provisioning and seems to come out of internal APIs for the CloudFormation service. It currently seems better suited to provisioning use cases and is still fairly limited as an inventory interface [2].

1 - https://aws.amazon.com/cloudcontrolapi/ 2 - https://steampipe.io/blog/aws-cloud-control