If you are interested in Postgres FDWs, please check out Steampipe (https://steampipe.io). It's an open source project with a Go-based plugin model (similar to Terraform) to instantly query cloud services (AWS, GCP, GitHub, Slack, etc) using SQL. TBH, we've been completely shocked how far we can push this model querying real-time against APIs.
We're bullish on the ELT use case and think of virtual tables as "Data Rainbows" - structured, ephemeral access to cloud data. I spoke about this concept for YOW Data (https://youtu.be/2BNzIU5SFaw?t=183). Still structuring our thoughts here, so feedback and ideas would be greatly appreciated.
Disclaimers: Steampipe is open source. I'm a lead on the project. I can't stand listening to my own talk.
A completely fair point ... and one we debated at length.
Ultimately we decided that the convenience for users and documentation of a single location outweighed the benefits of following the basedir spec, particularly when deployed across operating systems. We looked at other tools, including terraform [1], in reaching a final decision. FWIW, we do let you customize the location of the "install dir" [2].
Hopefully the benefits of Steampipe outweigh the "peeve" factor in this case <grin>
Sadly I read that as "Other tools do their own thing so we decided to do our own thing too" :(
Though admittedly, "convenience for users" is a rather tricky thing to define, as you really need to define "which users". I imagine ~/.projectname is easier getting started out, but the long term management is why we (try) to have standard locations for things.
Agreed. Unfortunately requirements for long term management / large scale deployment aren't a priority until a tool is widely adopted, which is best achieved by keeping it simple. An interesting trade off...
I can really see how Steampipe rounds out a great DIY data pipeline. I've used Segment, Fivetran, Stitch Data, and Airbyte to shovel data into local storage, from RDBMS to Kafka, but this is definitely the most developer-friendly experience I've seen so far.
Already exploring using plugin metadata to do useful things in dbt data pipelines.
I really appreciate that, we're working hard to make it as developer friendly as possible. If you (or anyone else here) can spare a few mins to discuss use cases for Steampipe in data pipelines please drop me a note (email in bio)!
We currently have 37 open source plugins [1] like AWS, GCP, GitHub, Kubernetes, Hacker News, etc. More are in development by Turbot and community members [2]. Each plugin has many tables, e.g. 223 for AWS [3], 23 for GitHub [4].
The 200+ on the home page is referring to tables. Most tables have a single API source behind them, but others like aws_s3_bucket [5] have ~10 API calls for each row to collect related data like tags, versioning, etc.
(We're excited about the rapid growth of our plugins / tables, but can see that if you read it as plugin == data source then the 200+ would be wildly impressive.)
Steampipe is a new project we launched in Jan 2021. Turbot is leading the build, and has been working with cloud APIs since 2014. We're bootstrapped, and have a lot less than 100 people, but are hiring (engineering, marketing, tech writing) if anyone would like to get involved :-)
I noticed that, but it lists only 37 plugins, not 200+, as advertized here: https://steampipe.io/ I figured the remaining 163+ might be built in, instead of being supplied as plugins.
We're bullish on the ELT use case and think of virtual tables as "Data Rainbows" - structured, ephemeral access to cloud data. I spoke about this concept for YOW Data (https://youtu.be/2BNzIU5SFaw?t=183). Still structuring our thoughts here, so feedback and ideas would be greatly appreciated.
Disclaimers: Steampipe is open source. I'm a lead on the project. I can't stand listening to my own talk.