Those are all interesting ideas. For the index structure itself you can use succ...

Those are all interesting ideas.

For the index structure itself you can use succinct dynamic data structures to reduce the size. The bulk loading approach is particularly amenable to use of succinct data structures as the node-vectors in a HNSW are montonic they can use an Elias-Fano encoding. The neighborhoods can use log-arrays.

Creating a multi-index also seems possible, where you use the triangle-inequality to prune the candidate vectors. This will probably require storing the distances for neighbor distances in the bottom layer, in order to be time-efficient for query and would thereby be slightly bigger. I haven't tried it yet but I intend to.

There is also the possibility of using random projection for dimensionality reduction. I also haven't tried this yet either but will give it a go soon. We haven't folded the parallel hnsw into our open source VectorLink yet, but we'll be doing it in the near future - we wanted a little bit of stability of approach first.