PgOSQuery: Expose the operating system as a Postgres database

JonnieCache · on Oct 30, 2014

https://github.com/shish/pgosquery/blob/master/pgosquery/__i...

Wow. That does look easy.

Shish2k · on Oct 30, 2014

Indeed it is; TBH, having seen how easy this was to make, I wonder if there would be any merit to making a stripped-down postgres package which is just daemon + SQL parser + FDW interface (so no built-in storage engine) -- the FDW interface is amazing for building SQL interfaces for arbitrary data sets, but the whole postgres bundle seems a little heavy-weight and complex to install if that's the only thing you want it for...

Or I guess you could start from the other direction, and make a FDW/Multicorn-compatible API for SQLite? (Facebook's OSQuery is SQLite-based, but it looks like the virtual table interface over there is way more complex)

johnteslade · on Oct 30, 2014

I tried this with sqlite and the apsw python library. A quick example is at https://github.com/johnteslade/pysqliteosquery

The interface is not quite as nice as Multicorn's but certainly nicer than the C/C++ SQLlite interface. But you could probably create a wrapper around apsw so it looked similar to Multicorn.

Erwin · on Oct 30, 2014

FYI, you can use ":memory:" as the sqlite data source, instead of having to conditionally drop the tables (unless you actually do want to store some persistent data on disk).

johnteslade · on Oct 31, 2014

Good spot. The in memory table also runs 3 times quicker.

crxgames · on Oct 30, 2014

That would remove some of my hesitancy in letting this run as root for full process information. At least it limits what code could be exploited in the daemon process to some extent.

mamcx · on Oct 30, 2014

And the opposite? Have the sqlite storage but no the sql parser?

rakoo · on Oct 30, 2014

What would be the use case ? Data already exists on the OS, no need to copy it on another storage and risk having stale values.

batbomb · on Oct 30, 2014

Don't forget about Presto

hippowithgas · on Oct 30, 2014

I can see we are going to get one of these DB / OS Adapters each day now. Looking forward to BerkDBOSQuery - because, well, the Berkeley DB needs some love too.

orf · on Oct 30, 2014

Yeah, heres a link to the awesome psutil library it uses: https://github.com/giampaolo/psutil

It could be made a bit more efficient by filtering the processes[1], but the code would not look nearly as lovely.

1. http://multicorn.org/implementing-an-fdw/#idoptimizations

sehrope · on Oct 30, 2014

Now this is cool! (and I'm not saying that just because I was thinking about it yesterday[1]).

It's amazing to see how with the multicorn[2] FDW it takes less than 40 lines of python to put this together.

[1]: https://news.ycombinator.com/item?id=8529044

[2]: http://multicorn.org/

oooooomg · on Oct 30, 2014

Give me a break with all these "Oh, I commented about a similar thing yesterday therefore I am a prophet" comments.

vezzy-fnord · on Oct 30, 2014

Nice. Now if only we could find a way to expose the operating system as a hierarchical file system.

dragonwriter · on Oct 30, 2014

> Now if only we could find a way to expose the operating system as a hierarchical file system.

Step 1: The OS-as-Postgres-DB system in this thread.

Step 2: FUSE filesystem wrapping the DB, as in https://github.com/petere/postgresqlfs

Q.E.D.

SixSigma · on Oct 30, 2014

I did write a postgres database that served 9p as a hierarchical file system, pretty pointless tbh but it was a fun project

zedr · on Nov 1, 2014

We would reinvent UNIX, poorly.

Beltiras · on Oct 30, 2014

Run postgresql as root, make an interface for the filesystem, hook up to a Django ORM and you have a webpage with the power of bash? (yes, bad idea on so many levels but still, potential is .... staggering.)

rosser · on Oct 30, 2014

Postgres can't be run as root. From src/backend/main.c:

    if (geteuid() == 0)
    {
        write_stderr("\"root\" execution of the PostgreSQL server is not permitted.\n"
                     "The server must be started under an unprivileged user ID to prevent\n"
          "possible system security compromise.  See the documentation for\n"
                  "more information on how to properly start the server.\n");
        exit(1);
    }

Morgawr · on Oct 31, 2014

Comment it out, for science!

Erwin · on Oct 30, 2014

Cool, I got inspired by this system too -- the aspw library and examples made it extremely easy to get started. I happen to have a lot of non-relational yet tabular data sitting around and the interface to read it is Python. It took a short time before I could join up several of my proprietary databases to do queries on them.

One thing I am planning on doing is to let the user mix in just about any data source: any list of objects with properties/values can be used really. So you can take your database (in proprietary format) and join it up with a text file (quite a useful thing for me). But you could also mix in an API call (again done using your identity). I've been thinking about how many REST APIs suport a subset of SQL via letting you seletct what fields to output, how to order things etc. -- if you instead allow the user to process it via SQL/sqlite you will allow the user quite powerful remixing/aggregation facilities.

shawn-butler · on Oct 30, 2014

What benefit does this afford over existing SNMP tools?

http://www.oid-info.com/get/1.3.6.1.2.1.25.4.2.1

or

http://www.oid-info.com/get/1.3.6.1.2.1.25.5.1.1

wtbob · on Oct 31, 2014

Well, not requiring OIDs feels like a benefit grin

(I actually understand completely why SNMP uses OIDs, and why they actually have advantages, but man is it tough to get into SNMP)

kbenson · on Oct 30, 2014

Complex joining of multiple data sets with filters and conditions utilizing a query language that has been utilized to great effect for longer than SNMP has existed.

shawn-butler · on Oct 30, 2014

The output of snmptable is already hierarchal and there is already well-defined MIB for thousands of data sources.

I guess using SQL instead of regex is the benefit you mean? Certainly more readable.

Not that it really matters much to conversation at hand but standard SQL is not really that much older than SNMP. Both were late 80s if memory serves.

kbenson · on Oct 31, 2014

Truthfully, I'm not entirely sure the capabilities of snmptable, but from your statement I assume it allows joining of different sets of data in ways to show specific correlated data (I'm not seeing that when searching, but I'll assume my search skills are to blame here). In that case, the real benefit of this is SQL the language, which while it would be overkill if it was designed for just this application, is well understood and utilized and a has a large existing userbase.

I'm not sure how to use snmptable, as I've never had the need, and while I suspect in the simple cases it may be easier to learn and use than SQL, I'm highly skeptical the medium complexity cases that SQL handles are even possible in it (unless it has a fairly well fleshed out query language of it's own, as opposed to specifying parameters on a command line).

odysseas · on Oct 30, 2014

About 15 years apart to be exact: 1974 for SQL http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.... and 1988 for SNMP http://tools.ietf.org/html/rfc1065 .

dragonwriter · on Oct 30, 2014

shawn-butler is technically correct -- he refers to standard SQL (first standardized in 1986) and SNMP being about the same time, and both in the late 1980s, and this is correct.

Of course, SQL was around quite a while before being standardized.

kbenson · on Oct 31, 2014

I looked it up before posting, and saw the same thing, that SQL was standardized the year before the SNMP RFCs were put out, but that it was in use since the 70s. That made the statement correct by either a little or a lot, depending on how you wanted to look at it, so I felt safe in making it.

crxgames · on Oct 30, 2014

Never worked with FDWs before, but man that looks simple to work with.

DrJ · on Oct 30, 2014

am I understanding this requires one to run Pg on every server that you want to run the query on?

johnmudd · on Oct 30, 2014

Not necessarily. You could put a Python (with psutil module) server on machines and connect from a single Postgres/FDW server.

0xbadcafebee · on Oct 30, 2014

While this looks nice, it still depends on Postgres. Why not just import a set of key/values into any SQL database? Really dumb 5-minute example below.

  ~/$ ps af | perl -lne 's/(^\s+|\s+$)//g;next if(!@h&&(@h=split(/\s+/,$_)));@_=map{s/"/\\"/g;$_}split(/\s+/,$_,@h);print "INSERT INTO process_table (".join(",",@h).") VALUES (".join(",",map{"\"$_\""}@_).");"'
  INSERT INTO process_table (PID,TTY,STAT,TIME,COMMAND) VALUES ("19789","pts/8","Ss+","0:00","/bin/bash");
  INSERT INTO process_table (PID,TTY,STAT,TIME,COMMAND) VALUES ("30616","pts/23","Ss+","0:00","/bin/bash");
  INSERT INTO process_table (PID,TTY,STAT,TIME,COMMAND) VALUES ("29661","pts/22","Ss+","0:00","/bin/bash");

ddevault · on Oct 30, 2014

The reason is obvious - the postgres-driven version lets you query the system in realtime, rather than repeatedly updating the database through a manual process every so often.

0xbadcafebee · on Oct 30, 2014

Ohhhh, I didn't realize that was the end goal. In that case yeah, this is useful, if you want to run Postgres on every node in your network (which, yikes). Would still rather separate the query engine from the data feed, though. My ideal would be a database application that sends out a request to update the database and returns new records as they come in (or sorted/ordered if preferred)

hyperliner · on Oct 31, 2014

Not wishing to start a battle of civilizations, but this is also possible in SQL Server using User-Defined types.

http://msdn.microsoft.com/en-us/library/ms131120.aspx

gjvc · on Oct 30, 2014

Am I alone in that whenever I see a TL;DR at the top of something, I immediately lose interest because it appears to want to appeal to the idiocracy.

Ironic, I know.

Shish2k · on Oct 30, 2014

In the 30 minutes where the example was at the bottom of the README: 7 upvotes.

In the 30 minutes where the example was moved to the top, so you could see it without scrolling: 30 upvotes.

Given the fast-paced nature of HN, getting to the point ASAP seems to have significant merit. All the other projects that I've announced on HN, with weeks of work put into them, and websites which expect the user to spend 5 minutes reading before they get the point, have had no upvotes or comments at all :-P

empthought · on Oct 30, 2014

I was tl;dring before tl;dring was cool. In fact, it was called "Executive Summary" and was greatly appreciated by people who make more in a year than I will make in my life.

dragonwriter · on Oct 30, 2014

A first-party TL;DR is just another name for what in more formal contexts is called an "abstract" or "executive summary".

I'm not particularly fond myself of that name having caught on for that use, but I can't manage to be as put-off by it as you say you are.

swasheck · on Oct 30, 2014

Yeah, I think I'd rather see a format of:

Abstract:

Content:

dspillett · on Oct 30, 2014

It is useful to give you the gist of what you are about to read, and why you might want to or not if you were not sure.

It is a more informal version of papers having an abstract on the front page.