Documenting architecture: Wireshark, PlantUML and a REPL to glue them all (2017)

crispyambulance · on March 18, 2019

I think it's fine work and really cool.

However, this makes me wonder about the premise of this tool.

Have systems gotten so messy and complicated that we build them without knowing they're actually doing, until after they're built?

In other words, UML diagrams (including sequence diagrams) were intended ways to clarify and reason about systems during design time. This project seems like it could be a reverse engineering tool, but it isn't presented that way.

stult · on March 18, 2019

I can think of a few use cases that aren't really reverse engineering. Documenting legacy systems that weren't properly documented up front. Documenting/identifying divergence from design time documentation. Documenting a microservice stack where each service is individually well documented but the totality is not. Generating documentation from a prototype to use a starting point for a more structured design process.

Generally, I've seen very few projects that have architecture documentation which is kept up to date past the first release. So it's not an insane idea given the on the ground reality, but yes it does seem a bit like shutting the barn door after the horse is already out.

wwweston · on March 18, 2019

> Have systems gotten so messy and complicated that we build them without knowing they're actually doing, until after they're built?

Oh, hey, you've just practically described one popular implicit formulation of Agile. :)

Detailed and coherent design documentation is rare in my experience, even at organizations that are staffed at a level including dedicated staff ostensibly for this purpose. I can think of precisely one place I've worked for that had a design process that produced documentation effective for specifying the project ahead of time and as a reference from then on out. A culture committed to this plus dedicated technical writers who regularly met with a team of a domain expert, tech lead, and UX/UI folks for the purpose of producing project documentation helped a lot.

Other organizations may have had individuals producing documentation but it was a lossy process, magnified when they moved on from the project.

yaleman · on March 18, 2019

Working in secops, we aren't always the ones building the tools, and the products we're required to support never come with this kind of documentation.

th0ma5 · on March 18, 2019

Jeez Louise I worked with the Oracle BI stuff once and it was so poorly documented that using Wireshark was the only to figure out what all pieces did what and how it all worked. I'd imagine other such enterprisey stuff that's been hobbled together over the years through acquisitions may be similar and these companies are always rather terse or ask you to put in a ticket and wait a month to find out.

kjs3 · on March 27, 2019

In 30 years of doing consulting on 'systems' work, it's astoundingly rare to see documentation of any sort that accurately & completely represents the current running state of the system. Sometimes it's close, but more often than not it's whatever was presented to get budget before any development has started.

Occasionally, the folks involved actually believe their doco does represent reality. They are invariably wrong, sometimes in small details, usually in large ones. Tools like this are an invaluable check on reality.

> Have systems gotten so messy and complicated that we build them without knowing they're actually doing, until after they're built?

Yes, in general, at least in my world (medium size to enterprise). Agile has made this in practice much much worse.

rooam-dev · on March 18, 2019

I think it's more about changing architecture and having up to date documentation.

yingw787 · on March 18, 2019

I really like this! I like the notion of automatically generated documentation, because it's always in line with what the source code is doing (or should be). Thanks for making this!

toyg · on March 18, 2019

The linked PlantUML tool looks pretty cool. Recently I’ve used Sequence Diagram (https://itunes.apple.com/gb/app/sequence-diagram/id119542670...) , that is very similar but limited to (eh) sequence diagrams, and I’ve loved every minute.

I wish I had known of these tools 10 years ago, when I had to spend hours in Visio...

commandlinefan · on March 18, 2019

It is cool (compared to Visio or even Rational Rose), but it makes me wonder about the value of the diagram itself. The textual input is just as readable and takes up less space on a screen (or, God forbid, a printed piece of paper) - other than giving managers a warm and fuzzy feeling that there's a pretty picture there, why bother generating the diagram at all?

greggyb · on March 18, 2019

Depending on how you organize your relationships, the diagram can make shared dependencies more clear.

If you have multiple levels of inputs and outputs, you might define a set of relationships together that are all related to a single output. In this case shared inputs are not trivially realizable in the text.

E.g.

    @startuml

    database DB1
    database DB2
    
    ... lots of other artifacts ...

    artifact ArtifactN

    DB1 --> Artifact1
    ... lots of other input to Artifact1 ...
    
    ... other relationships grouped based on output - more than a screenful - only some of which have DB1 as input ...

    DB1 --> ArtifactN
    @enduml

In the above, it might not be clear how often DB1 is an input to an artifact. Whereas the diagram will show the size of the tree pretty clearly.

It is unlikely that such a diagram would be very illustrative to its author, but the visual representation may be a much better communication tool to others who are not as intimately familiar with the system in question.

mark_l_watson · on March 18, 2019

A really nice use of UML sequence diagrams!

A little off topic, but even though I wrote a UML book a long while ago, I have more or less stopped using hand written UML except for sequence diagrams. Any class diagrams I put in documentation are auto-generated from source code.

beardedwizard · on March 18, 2019

Wireshark creates tcp sequence diagrams out of the box, they might even be exportable...

bjconlan · on March 18, 2019

This is the greatest idea! I've been writing soapui test cases that show the communication flow between microservices and use cases to provide new developers (and product owners) some incite into how the application is hung together. Something like this is the logical next step and exactly what I was hoping I could derive from new-relic/open tracing/logs (but these are only as good as the technical details)

Nice work I look forward to delving deeper

drieddust · on March 18, 2019

This is a cool project but how does it take care of load balancers in the flow?

I think most LBs terminate TCP and then create a new connection to the backend.

bzbz · on March 19, 2019

I haven’t read the source code, but I assume that it just looks at the timings of the request. After all, it’s just one user, no need to worry about 50 concurrent requests.

clagio · on March 18, 2019

It's really cool, but from my understanding you get only the traffic exchanged with your browser and not between the servers. I wonder if something similar could be done at the server level, aggregating traffic from all the servers, merge together and display

bigiain · on March 18, 2019

The bottom half of the example diagram shows traffic between the "backend" and "keycloak" that clearly isn't browser to nginx traffic.

You'd need to be running wireshark on a network interface that could see all that traffic, so yeah - sort of "at the server level" - in the sense that you'd need to be sitting on the server's network(s) to sniff that.

bauerd · on March 18, 2019

Weave Scope can do that: https://www.weave.works/oss/scope

Edit: Not affiliated

dang · on March 18, 2019

Discussed at the time: https://news.ycombinator.com/item?id=15325649

linuxdude314 · on March 18, 2019

Mermaid works really well for this as well:

https://mermaidjs.github.io/

jcims · on March 18, 2019

Would be cool to add osquery or similar to the mix and get process information to pair up with the network traffic.

chrisweekly · on March 18, 2019

This looks very useful. Thanks for sharing! And @danlebrero, thanks for making this!