Hacker News new | past | comments | ask | show | jobs | submit login

>The work on the semantic web in the 2000s added a lot of valuable ideas and I see a tendency to downplay these ideas in favor of new technologies. Storing relationships as RDF triples and building indices that are well-suited to your queries will be good enough for the majority of people who need an ontology.

Is this basically the idea of using RDF to create explicit joins between datasets in other systems? E.g. each tuple has a foreign key reference to a row/doc that represents that node in the graph?

>That turned into a bit of a ramble, but damn ontologies are cool. :)

This is a major hurdle in infosec as enterprises adopt cloud products. The traditional ontologies around the risk profiles and state of operating systems and networks have been quite stable over the past 10-20 years and leverage things like DMTF's CIM and IETF's SNMP. However, what I'm finding is that 'cloud' products and services surface entirely new namespaces and concepts in nearly every instance. Sometimes, as in the case of AWS EC2, it's pretty simple to map these back to an existing schema, but in other cases like say AWS Chime or Glue, it gets really tricky. It feels like I'm looking for a system that can rapidly ingest the service's native information model then adapt it with overlays that transform or aggregate various services in a way that allows consistent data modelling.

Any suggestions there? :)




re: RDF for joins. That's exactly right. You can keep things as light or as heavy as you need them to be regarding when/how/whether you validate foreign key constraints, whether you use a schema registry, etc. The simplicity of the format makes it really easy to build around.

My only suggestion re: infosec ontologies is to not push too much functionality into the ontology all at once. I have had ontology projects end in spectacular failure, and all those failures have resulted from putting too much responsibility on the ontology from the very beginning.

The ontology should be as dumb as possible to start with. In your case, the approach I would suggest is:

1. Ontology stores mapping between cloud provider resource schemas to your internal schema. These mappings can be represented in multiple ways - as a URL to a service which implements the mapping or as a JSON object in some DSL which defines the transformations or even as the code/binary that should be run to perform the transformation.

2. From the ontology, it should be easy to query/update the following things: current cloud provider schema for a given resource, internal schema corresponding to a cloud resource, and current mapping to the internal schema, the number of failed transformations from cloud schema to internal schema in the past {1 hour | 1 day | 1 week | ...}.

3. At first, you build the transformations from the cloud provider schemas to internal schema by hand.

4. A worker which alerts you when the cloud provider schemas change (should be easy to build on top of your ontology after step 2 - check the number of failures in transformation).

5. Now write a worker which statically analyzes the terraform code base to see when they changed the code related to a particular cloud API you are working with. Represent this in the ontology (e.g. "AWS Chime" - "terraform:vblah.blah.blah" -> "<json resource definition>"

6. Write a worker which monitors updates to - "terraform:*" edges and propagates alerts you to update the mapping given the new JSON resource definition.

7. For some services, it will be easy to automate the transformation update. For these things, create a new ontology worker.

I have found that it's better to let workers run idempotently on cronjobs/systemd timers than to implement event-driven workers.

The whole thing will probably take a few months, but you will have production-ready code at the end of each step and start benefitting from the existence of the ontology right away.

(Sounds like you are already at step 3 or 4?)


This is amazing, thank you! Most of this is a mental experiment at this point, but I might take some of this to start building it. Appreciate the note!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: