Hacker News new | past | comments | ask | show | jobs | submit login
Documenting architecture: Wireshark, PlantUML and a REPL to glue them all (2017) (danlebrero.com)
141 points by walterbell on March 18, 2019 | hide | past | favorite | 23 comments



I think it's fine work and really cool.

However, this makes me wonder about the premise of this tool.

Have systems gotten so messy and complicated that we build them without knowing they're actually doing, until after they're built?

In other words, UML diagrams (including sequence diagrams) were intended ways to clarify and reason about systems during design time. This project seems like it could be a reverse engineering tool, but it isn't presented that way.


I can think of a few use cases that aren't really reverse engineering. Documenting legacy systems that weren't properly documented up front. Documenting/identifying divergence from design time documentation. Documenting a microservice stack where each service is individually well documented but the totality is not. Generating documentation from a prototype to use a starting point for a more structured design process.

Generally, I've seen very few projects that have architecture documentation which is kept up to date past the first release. So it's not an insane idea given the on the ground reality, but yes it does seem a bit like shutting the barn door after the horse is already out.


> Have systems gotten so messy and complicated that we build them without knowing they're actually doing, until after they're built?

Oh, hey, you've just practically described one popular implicit formulation of Agile. :)

Detailed and coherent design documentation is rare in my experience, even at organizations that are staffed at a level including dedicated staff ostensibly for this purpose. I can think of precisely one place I've worked for that had a design process that produced documentation effective for specifying the project ahead of time and as a reference from then on out. A culture committed to this plus dedicated technical writers who regularly met with a team of a domain expert, tech lead, and UX/UI folks for the purpose of producing project documentation helped a lot.

Other organizations may have had individuals producing documentation but it was a lossy process, magnified when they moved on from the project.


Working in secops, we aren't always the ones building the tools, and the products we're required to support never come with this kind of documentation.


Jeez Louise I worked with the Oracle BI stuff once and it was so poorly documented that using Wireshark was the only to figure out what all pieces did what and how it all worked. I'd imagine other such enterprisey stuff that's been hobbled together over the years through acquisitions may be similar and these companies are always rather terse or ask you to put in a ticket and wait a month to find out.


In 30 years of doing consulting on 'systems' work, it's astoundingly rare to see documentation of any sort that accurately & completely represents the current running state of the system. Sometimes it's close, but more often than not it's whatever was presented to get budget before any development has started.

Occasionally, the folks involved actually believe their doco does represent reality. They are invariably wrong, sometimes in small details, usually in large ones. Tools like this are an invaluable check on reality.

> Have systems gotten so messy and complicated that we build them without knowing they're actually doing, until after they're built?

Yes, in general, at least in my world (medium size to enterprise). Agile has made this in practice much much worse.


I think it's more about changing architecture and having up to date documentation.


I really like this! I like the notion of automatically generated documentation, because it's always in line with what the source code is doing (or should be). Thanks for making this!


The linked PlantUML tool looks pretty cool. Recently I’ve used Sequence Diagram (https://itunes.apple.com/gb/app/sequence-diagram/id119542670...) , that is very similar but limited to (eh) sequence diagrams, and I’ve loved every minute.

I wish I had known of these tools 10 years ago, when I had to spend hours in Visio...


It is cool (compared to Visio or even Rational Rose), but it makes me wonder about the value of the diagram itself. The textual input is just as readable and takes up less space on a screen (or, God forbid, a printed piece of paper) - other than giving managers a warm and fuzzy feeling that there's a pretty picture there, why bother generating the diagram at all?


Depending on how you organize your relationships, the diagram can make shared dependencies more clear.

If you have multiple levels of inputs and outputs, you might define a set of relationships together that are all related to a single output. In this case shared inputs are not trivially realizable in the text.

E.g.

    @startuml

    database DB1
    database DB2
    
    ... lots of other artifacts ...

    artifact ArtifactN

    DB1 --> Artifact1
    ... lots of other input to Artifact1 ...
    
    ... other relationships grouped based on output - more than a screenful - only some of which have DB1 as input ...

    DB1 --> ArtifactN
    @enduml
In the above, it might not be clear how often DB1 is an input to an artifact. Whereas the diagram will show the size of the tree pretty clearly.

It is unlikely that such a diagram would be very illustrative to its author, but the visual representation may be a much better communication tool to others who are not as intimately familiar with the system in question.


A really nice use of UML sequence diagrams!

A little off topic, but even though I wrote a UML book a long while ago, I have more or less stopped using hand written UML except for sequence diagrams. Any class diagrams I put in documentation are auto-generated from source code.


Wireshark creates tcp sequence diagrams out of the box, they might even be exportable...


This is the greatest idea! I've been writing soapui test cases that show the communication flow between microservices and use cases to provide new developers (and product owners) some incite into how the application is hung together. Something like this is the logical next step and exactly what I was hoping I could derive from new-relic/open tracing/logs (but these are only as good as the technical details)

Nice work I look forward to delving deeper


This is a cool project but how does it take care of load balancers in the flow?

I think most LBs terminate TCP and then create a new connection to the backend.


I haven’t read the source code, but I assume that it just looks at the timings of the request. After all, it’s just one user, no need to worry about 50 concurrent requests.


It's really cool, but from my understanding you get only the traffic exchanged with your browser and not between the servers. I wonder if something similar could be done at the server level, aggregating traffic from all the servers, merge together and display


The bottom half of the example diagram shows traffic between the "backend" and "keycloak" that clearly isn't browser to nginx traffic.

You'd need to be running wireshark on a network interface that could see all that traffic, so yeah - sort of "at the server level" - in the sense that you'd need to be sitting on the server's network(s) to sniff that.


Weave Scope can do that: https://www.weave.works/oss/scope

Edit: Not affiliated



Mermaid works really well for this as well:

https://mermaidjs.github.io/


Would be cool to add osquery or similar to the mix and get process information to pair up with the network traffic.


This looks very useful. Thanks for sharing! And @danlebrero, thanks for making this!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: