>The work on the semantic web in the 2000s added a lot of valuable ideas and I s...

zomglings · on Sept 12, 2021

re: RDF for joins. That's exactly right. You can keep things as light or as heavy as you need them to be regarding when/how/whether you validate foreign key constraints, whether you use a schema registry, etc. The simplicity of the format makes it really easy to build around.

My only suggestion re: infosec ontologies is to not push too much functionality into the ontology all at once. I have had ontology projects end in spectacular failure, and all those failures have resulted from putting too much responsibility on the ontology from the very beginning.

The ontology should be as dumb as possible to start with. In your case, the approach I would suggest is:

1. Ontology stores mapping between cloud provider resource schemas to your internal schema. These mappings can be represented in multiple ways - as a URL to a service which implements the mapping or as a JSON object in some DSL which defines the transformations or even as the code/binary that should be run to perform the transformation.

2. From the ontology, it should be easy to query/update the following things: current cloud provider schema for a given resource, internal schema corresponding to a cloud resource, and current mapping to the internal schema, the number of failed transformations from cloud schema to internal schema in the past {1 hour | 1 day | 1 week | ...}.

3. At first, you build the transformations from the cloud provider schemas to internal schema by hand.

4. A worker which alerts you when the cloud provider schemas change (should be easy to build on top of your ontology after step 2 - check the number of failures in transformation).

5. Now write a worker which statically analyzes the terraform code base to see when they changed the code related to a particular cloud API you are working with. Represent this in the ontology (e.g. "AWS Chime" - "terraform:vblah.blah.blah" -> "<json resource definition>"

6. Write a worker which monitors updates to - "terraform:*" edges and propagates alerts you to update the mapping given the new JSON resource definition.

7. For some services, it will be easy to automate the transformation update. For these things, create a new ontology worker.

I have found that it's better to let workers run idempotently on cronjobs/systemd timers than to implement event-driven workers.

The whole thing will probably take a few months, but you will have production-ready code at the end of each step and start benefitting from the existence of the ontology right away.

(Sounds like you are already at step 3 or 4?)

jcims · on Sept 12, 2021

This is amazing, thank you! Most of this is a mental experiment at this point, but I might take some of this to start building it. Appreciate the note!