Show HN: Graph Processing with Postgres and GraphBLAS

sg0 · on March 13, 2019

I am following your discussions on the GraphBLAS email list. Are you interested in the distributed-memory API of GraphBLAS (for clusters, typically over MPI) or can you make use of it if it were available (you may know, CombBLAS already has a distributed-memory implementation)?

michelpp · on March 13, 2019

Yes, I have only reviewed ComBLAS's documentation, but would eventually like to dig into it deeper. A thought I had for distributed graphs would be something like support on top of a distributed store like citus, where shards hold subgraphs. This may be an interesting way to sidestep Postgres' 1GB limit on varlena objects as well.

Edit: I'd forgot to ask, do you have some ideas on the subject? I'm all ears.

sg0 · on March 13, 2019

Thanks for the clarification. I basically wanted to know if database systems could make use of existing distributed-memory HPC programming models, and it seems that there is a possibility, however engineering efforts might be significant. I really have no experience with database engineering, and I am not sure if MPI and Citus will play out well. I hope you are considering submitting a paper/poster at HPEC'19 for your current work (http://www.ieee-hpec.org/), I think that's a good venue to engage with the GraphBLAS community in person.

michelpp · on March 13, 2019

Good idea! Hopefully I can put something together.

codekilla · on March 13, 2019

What about something like kdb+/q? I realize it is proprietary software, but it is an array language with super fast vector operations designed to operate on distributed data. I was actually thinking about how well graphBLAS might map to this system. I work in academia, so the non-commercial license is applicable, but I see how it might be an issue for some.

methyl · on March 13, 2019

Since you are mentioning Redis Graph and I'm not really into academic side of graphs, I want to ask - how do you use it from the user perspective, who wants to do some simple graph traversals using Cypher or some equivalent? For Redis Graph you have some nice higher-level API, is something similar planned for pggraphblas or outside of scope for this project?

michelpp · on March 13, 2019

There is work being done by a standards committee on a common high level Cypher like language:

https://www.gqlstandards.org

Long term that's maybe a promising target. That's out of scope of pggraphblas now, but like RedisGraph uses GraphBLAS internally, one could implement a Cypher or GQL look alike using the foundational bits of pggraphblas.

Edit: and to answer your actual question, if Cypher, or SQL/GQL for that matter, is "high-level" user experience then the GraphBLAS API experience is sort of aiming for the "mid-level". Still quite technical, but more powerful tools than for "low-level" which would be writing graph algorithms by hand in some procedural language.

One of the key goals of GraphBLAS is this sort of meeting of the minds between the hardware experts and the graph problem solvers. This one page poster kind of sums that up nicely:

https://resources.sei.cmu.edu/asset_files/Poster/2016_020_00...

sandGorgon · on March 13, 2019

Has anyone done a similar thing with Spark and Postgresql for large graphs ? I'm curious to know how you write and structure your algorithms, given that graph partitioning is quite tricky (and not possible for a lot of algorithms).

I'm guessing Google uses a variant of this at scale - but not sure if their label propagation algorithms can be implemented on something like Spark and Postgres.

sntran · on March 14, 2019

A slightly off topic question regarding the mxm (matrix times matrix), mxv (matrix times vector), and vxm (vector times matrix) functions. AFAIK, Postgres supports function overloading. With the new types of matrix and vector, would it be easier to have a single function for multiplication, than three separate ones?

Also, can we overload operators in Postgres?

michelpp · on March 14, 2019

The * operator can be overloaded, and it is already done s so for the mxm, mxv and vxm combinations. The catch is, all these AxB functions have more arguments than just their operands (mask, accumulator, semiring, descriptors). So you can `a_matrix * b_vector` and get "classical" matrix multiplication, but if you want accumulation, masking, a non-default semiring etc, you have to use the functional notation.

jarym · on March 13, 2019

I found this project on Github a few weeks ago and have been quietly following it.

It looks really cool but I didn’t get from the readme how I would best take my relational tables and make a set of matrices from them.

I’ll take another look this weekend

michelpp · on March 14, 2019

Matrices are constructed from arrays of rows, columns, and values. The simplest way to pivot a table into a matrix is to use array_agg()

  select matrix(array_agg(row), array_agg(col), array_agg(val))) from table;

For the inverse operation there are set returning functions called 'matrix_elements_<type>(m)' which return the graph elements as row tuples.

From a table management standpoint, matrices are varlena objects much like arrays. Triggers can be used to call set_element() to keep matrices and tables sync'ed, although like arrays, this has a high re-serialization cost for large arrays. Because of this it's best to modify matrices in bulk operations.