Show HN: Little Ball of Fur – A NetworkX extension library for graph sampling

klmadfejno · on May 12, 2020

Can you give some examples of when this would be useful? What's distinctive about sampling methods beyond, say, picking a random node and all of its neighbors? What problem does that solve?

benitorosenberg · on May 12, 2020

The reason for not doing that is the bias that such sampling introduces.

We are writing a paper out of this, but the main point is that you can achieve these two things with minimal classification performance degradation:

1. Speeding up node embedding and classification. 2. Speeding up whole graph embedding and classification.

klmadfejno · on May 12, 2020

Can you speak a little more about how those work? I understand word embeddings conceptually. And I can imagine using a similar process to embed the arbitrary data stored in a graph. Embedding an entire graph makes less sense to me, unless 'entire graph' means a subgraph of the general population.

I do social network stuff occasionally. If I hypothetically could create an embedding representation of everyone, I could imagine it might be useful to, say, TSNE it all as opposed to a force layout for viz. Or maybe run it as a pretty black box prediction input? Wondering if I'm missing something more obvious here

benitorosenberg · on May 12, 2020

Entire graph embedding means that you have a lot of smaller graphs (e.g. molecules, transactions, threads) and you want to classify them. We created this package which covers these methods:

https://github.com/benedekrozemberczki/karateclub