Hacker News new | past | comments | ask | show | jobs | submit login

He most likely means that reasoners and databases that provide reasoning abilities do not scale. This makes sense, specially for OWL ontologies. For most OWL reasoners, if you feed them with the ontology and with a large set of instance data (class instances connected by edges that are labeled with properties defined in said ontology), it will likely take way more time than you would like to produce results (if it produces something).

The reason for that is twofold:

1. Many of tools created for reasoning are research-first tools. Some papers were published about the tool and it really was a petter and more scalable tool than anything before it. But every PhD student graduates and needs to find a job or move to the next hyped research area 2. Tools are designed under the assumption that the whole ontology, all the instance data and all results fit in main memory (RAM). This assumption is de-facto necessary for more powerful entailment regimes of OWL.

Reason 2 as a secondary sub-reason that OWL ontologies use URIs (actually IRIs), which are really inneficient identifiers compared to 32/64-bit integers. HDT is a format that fixes this inneficiency for RDF (and thus is applicable to ontologies) but since it came about nearly all reasoners where already abandoned as per reason #1 above.

Newer reasoners that actually scale quite a bit are RDFox [1] and VLog [2]. They use compact representations and try to be nice with the CPU cache and pipeline. However, they are limited to a single shared memory (even if NUMA).

There is a lot of mostly academic distributed reasoners designed to scale horizontally instead of vertically. These systems technically scale, but vertically scaling the centralized aforementioned systems will be more efficient. The intrinsic problem with distributing is that (i) it is hard to partition the input aiming at a fair distribution of work and (ii) inferred facts derived at one node often are evidence that multiple other nodes need to known.

loose from modern single-node However, the problem of computing all inferred edges from a knowledge graph involves a great deal of communication, since one inference found by one node is evidence required by another processing node.

[1]: https://www.oxfordsemantic.tech/product [2]: https://github.com/karmaresearch/vlog/




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: