Tests drive design in two ways, and I rarely seem them distinguished (including here):
1) By making you think about the interface up front. E.g. if you are designing a REST API writing a test that calls that API that does not yet exist. This almost always a good thing.
2) By making it painfully difficult to write a test - e.g. because you need to mock all over the place. The most annoying thing about this form of feedback is that it does help so it's not completely useless, but it's very expensive and usually has a low if not negative ROI.
I don't find that thought pieces on "test driven design" ever seem to distinguish the two different modes, and they're usually awfully vague about whether they're talking about high level tests matching business logic or low level tests on implementation details because they have very different payoffs.
High level TDD where the tests match the behavior and business logic will often NOT provide design feedback on lower levels. I think this is actually a good thing. It simply provides a safety net to experiment with your designs.
My e2e tests automatically dump a multitude of debugging information and throw open a console from which I can quickly fire up code debuggers, logs, screenshots, network traces and browser traces in a few seconds.
Building test tooling is hard I'll grant you. It requires engineering chops and being up to date on the latest tooling. But not luck.
I'm fully aware of the idea that TDD is a "design practice" but I find it to be completely wrongheaded.
The principle that tests that couple to low level code give you feedback about tightly coupled code is true but it does that because low level/unit tests couple too tightly to your code - I.e. because they too are bad code!
Have you ever refactored working code into working code and had a slew of tests fail anyway? That's the child of test driven design.
High level/integration TDD doesnt give "feedback" on your design it just tells you if your code matches the spec. This is actually more useful. It then lets you refactor bad code with a safety harness and give failures that actually mean failure and not "changed code".
I keep wishing for the idea of test driven design to die. Writing tests which break on working code is inordinately uneconomic way to detect design issues as compared to developing an eye for it and fixing it under a test harness with no opinion on your design.
So, yes this - high level test driven development - is TDD and moreover it's got a better cost/benefit trade off than test driven design.
I think many people realise this, thus the spike and stabilise pattern. But yes, integration and functional tests are both higher value in and of themselves, and lower risk in terms of rework, so ought to be a priority. For pieces of logic with many edge cases and iterations, mix in some targeted property-based testing and you’re usually in a good place.
Part of test-driven design is using the tests to drive out a sensible and easy to use interface for the system under test, and to make it testable from the get-go (not too much non-determinism, threading issues, whatever it is). It's well known that you should likely _delete these tests_ once you've written higher level ones that are more testing behaviour than implementation! But the best and quickest way to get to having high quality _behaviour_ tests is to start by using "implementation tests" to make sure you have an easily testable system, and then go from there.
>It's well known that you should likely _delete these tests_ once you've written higher level ones that are more testing behaviour than implementation!
Building tests only to throw them away is the design equivalent of burning stacks of $10 notes to stay warm.
As a process it works. It's just 2x easier to write behavioral tests first and thrash out a good design later under its harness.
It mystifies me that doubling the SLOC of your code by adding low level tests only to trash them later became seen as a best practice. It's so incredibly wasteful.
> As a process it works. It's just 2x easier to write behavioral tests first and thrash out a good design later under its harness.
I think this “2x easier” only applies to developers who deeply understand how to design software. A very poorly designed implementation can still pass the high level tests, while also being hard to reason about (typically poor data structures) and debug, having excessive requirements for test setup and tear down due to lots of assumed state, and be hard to change, and might have no modularity at all, meaning that the tests cover tens of thousands of lines (but only the happy path, really).
Code like this can still be valuable of course, since it satisfies the requirements and produces business value, however I’d say that it runs a high risk of being marked for a complete rewrite, likely by someone who also doesn’t really know how to design software. (Organizations that don’t know what well designed software looks like tend not to hire people who are good at it.)
"Test driven design" in the wrong hands will also lead to a poorly designed non modular implementation in less skilled hands.
I've seen plenty of horrible unit test driven developed code with a mess of unnecessary mocks.
So no, this isnt about skill.
"Test driven design" doesnt provide effective safety rails to prevent bad design from happening. It just causes more pain to those who use it as such. Experience is what is supposed to tell you how to react to that pain.
In the hands of junior developers test driven design is more like test driven self flagellation in that respect: an exercise in unnecessary shame and humiliation.
Moreover since it prevents those tests with a clusterfuck of mocks from operating as a reliable safety harness (because they fail when implementation code changes, not in the presence of bugs), it actively inhibits iterative exploration towards good design.
These tests have the effect of locking in bad design because keeping tightly coupled low level tests green and refactoring is twice as much work as just refactoring without this type of test.
> I've seen plenty of horrible unit test driven developed code with a mess of unnecessary mocks.
Mocks are an anti-pattern. They are a tool that either by design or unfortunate happenstance allows and encourages poor separation of concerns, thereby eliminating the single largest benefit of TDD: clean designs.
> … TDD is a "design practice" but I find it to be completely wrongheaded.
> The principle that tests that couple to low level code give you feedback about tightly coupled code is true but it does that because low level/unit tests couple too tightly to your code - I.e. because they too are bad code!
But now you’re asserting:
> "Test driven design" in the wrong hands will also lead to a poorly designed non modular implementation in less skilled hands.
Which feels like it contradicts your earlier assertion that TDD produces low-level unit tests. In other words, for there to be a “unit test” there must be a boundary around the “unit”, and if the code created by following TDD doesn’t even have module-sized units, then is that really TDD anymore?
Edit: Or are you asserting that TDD doesn’t provide any direction at all about what kind of testing to do? If so, then what does it direct us to do?
>"Test driven design" in the wrong hands will also lead to a poorly designed non modular implementation in less skilled hands.
>Which feels like it contradicts your earlier assertion that TDD produces low-level unit tests.
No, it doesnt contradict that at all. Test driven design, whether done optimally or suboptimally, produces low level unit tests.
Whether the "feedback" from those tests is taken into account determines whether you get bad design or not.
Either way I do not consider it a good practice. The person I was replying to was suggesting that it was a practice that was more suited to be people with a lack of experience. I dont think that is true.
>Or are you asserting that TDD doesn’t provide any direction at all about what kind of testing to do?
I'm saying that test driven design provides weak direction about design and it is not uncommon for test driven design to still produce bad designs because that weak direction is not followed by people with less experience.
Thus I dont think it's a practice whose effectiveness is moderated by experience level. It's just a bad idea either way.
> Whether the "feedback" from those tests is taken into account determines whether you get bad design or not.
Which to me was kind of the whole point of TDD in the first place; to let the ease and/or difficulty of testing become feedback that informs the design overall, leading to code that requires less set up to test, fewer dependencies to mock, etc.
I also agree that a lot of devs ignore that feedback, and that just telling someone to “do TDD” without first making sure that they know that they need to strive to have little to no test setup and few or no mocks, etc., otherwise the advice is pointless.
Overall I get the sense that a sizable number of programmers accept a mentality of “I’m told programming is hard, this feels hard so I must be doing it right”. It’s a mentality of helplessness, of lack of agency, as if there is nothing more they can do to make things easier. Thus they churn out overly complex, difficult code.
>Which to me was kind of the whole point of TDD in the first place; to let the ease and/or difficulty of testing become feedback that informs the design overall
Yes and that is precisely what I was arguing against throughout this thread.
For me, (integration) test driven development development is about creating:
* A signal to let me know if my feature is working and easy access to debugging information if it is not.
* A body of high quality tests.
It is 0% about design, except insofar as the tests give me a safety harness for refactoring or experimenting with design changes.
Don't agree, though I think it's more suble than "throw away the tests" - more "evolve them to a larger scope".
I find this particularly with web services,especially when the the services are some form of stateless calculators. I'll usually start with tests that focus on the function at the native programming language level. Those help me get the function(s) working correctly. The code and tests co-evolve.
Once I get the logic working, I'll add on the HTTP handling. There's no domain logic in there, but there is still logic (e.g. mapping from json to native types, authentication, ...). Things can go wrong there too. At this point I'll migrate the original tests to use the web service. Doing so means I get more reassurance for each test run: not only that the domain logic works, but that the translation in & out works correctly too.
At that point there's no point leaving the original tests in place. They're just covering a subset of the E2E tests so provide no extra assurance.
I'm therefore with TFA in leaning towards E2E testing because I get more bang for the buck. There are still places where I'll keep native language tests, for example if there's particularly gnarly logic that I want extra reassurance on, or E2E testing is too slow. But they tend to be the exception, not the rule.
> At that point there's no point leaving the original tests in place. They're just covering a subset of the E2E tests so provide no extra assurance.
They give you feedback when something fails, by better localising where it failed. I agree that E2E tests provide better assurance, but tests are not only there to provide assurance, they are also there to assist you in development.
Starting low level and evolving to a larger scope is still unnecessary work.
It's still cheaper starting off building a playwright/calls-a-rest-api test against your web app than building a low level unit test and "evolving" it into a playwright test.
I agree that low level unit tests are faster and more appropriate and if you are surrounding complex logic with a simple and stable api (e.g. testing a parser) but it's better to work your way down to that level when it makes sense, not starting there and working your way up.
That’s not my experience. In the early stages, it’s often not clear what the interface or logic should be - even at the external behaviour level. Hence the reason tests and code evolve together. Doing that at native code level means I can focus on one thing: the domain logic. I use FastAPI plus pytest for most of these projects. The net cost of migrating a domain-only test to use the web API is small. Doing that once the underlying api has stabilised is less effort than starting with a web test.
I dont think ive ever worked on any project where they hadnt yet decided whether they wanted a command line app or a website or an android app before I started. That part is usually fixed in stone.
Sometimes lower level requirements are decided before higher level requirements.
I find that this often causes pretty bad requirements churn - when you actually get the customer to think about the UI or get them to look at one then inevitably the domain model gets adjusted in response. This is the essence of why BDD/example driven specification works.
What exactly is it wasting? Is your screen going to run out of ink? Even in the physical contruction world, people often build as much or more scaffolding as the thing they're actually building, and that takes time and effort to put up and take down, but it's worthwhile.
Sure, maybe you can do everything you would do via TDD in your head instead. But it's likely to be slower and more error-prone. You've got a computer there, you might as well use it; "thinking aloud" by writing out your possible API designs and playing with them in code tends to be quicker and more effective.
Time. Writing and maintaining low level unit tests takes time. That time is an investment. That investment does not pay off.
Doing test driven development with high level integration tests also takes time. That investment pays dividends though. Those tests provide safety.
>Sure, maybe you can do everything you would do via TDD in your head instead. But it's likely to be slower and more error-prone.
It's actually much quicker and safer if you can change designs under the hood and you dont have to change any of the tests because they validate all the behavior.
Quicker and safer = you can do more iterations on the design in the available time = a better design in the end.
The refactoring step of red, green, refactor is where the design magic happens. If the refactoring turns tests red again that inhibits refactoring.
> It's well known that you should likely _delete these tests_ once you've written higher level ones that are more testing behaviour than implementation!
Is it? I don't think I've ever seen that mentioned.
I think there can be some value to using TDD in some situations but as soon as people get dogmatic about it, the value is lost.
The economic arguments are hard to make. Sure, writing the code initially might cost $X and writing tests might cost $1.5X but how can we conclude that the net present value (NPV) of writing the tests is necessarily negative - this plainly depends on the context.
I don't even like TDD much, but I think that this missed the point:
> Have you ever refactored working code into working code and had a slew of tests fail anyway?
Yes - and that is intended. The "refactor of working code into working code" often changes some assumptions that were made during implementation.
Those tests are not there to give "feedback on your design", they are there to endure that the implementation does what you thought it should do when you wrote your code. Yes, that means that when you refactor your code, quite a few tests will have to be changed to match the new code.
But the amount of times I had this happen and it highlighted issues on the refactor is definitely not negligible. The cost of not having these tests (which would translate into bugs) would certainly have surpassed the costs of keeping those tests around.
If we’re talking “what you thought it should do” and not “how you thought it should do it” this is all fine. If requirements change tests should change. I think the objection is more to changing implementation details and having to rewrite twice as much code, when your functional tests (which test things that actually make you money) never changed.
Maybe, but I think the point is that it's probably very easy to get into this situation, and not many people talk about it or point out how to avoid it.
I’m still not following what the issue is. If you refactor some code and change the behaviour of the code, and the code tests the expected behaviour and passes, then you have one of two problems:
1. You had a bug you didn’t know about and your test was invalid (in which case the test is useless! Fix the issue then you fix the test…)
or
2. You had no bug and you just introduced a new one, in which case the test has done its job and alerted you to the problem so you can fix your mistake.
What is the exact problem?
Now if this is an issue with changing the behaviour of the system, that’s not a refactor. In that case, your tests are testing old behaviour, and yes, they are going to have to be changed.
The point is that you're not changing the interface to the system, but you're changing implementation details that don't affect the interface semantics. TDD does lead you to a sort of coupling to implementation details, which results in breaking a lot of unit tests if you change those implementation details. What this yields is either hesitancy to undertake positive refactorings because you have to either update all of those tests or just delete them altogether, so were those tests really useful to begin with? The point is that it's apparently wasted work and possibly an active impediment to positive change, and I haven't seen much discussion around avoiding this outcome, or what to do about it.
There has been discussion about this more than a decade ago by people like Dan North and Liz Keogh. I think it’s widely accepted that strict TDD can reduce agility when projects face a lot of uncertainty and flux (both at the requirements and implementation levels). I will maintain that functional and integration tests are more effective than low-level unit tests in most cases, because they’re more likely to test things customers care about directly, and are less volatile than implementation-level specifics. But there’s no free lunch, all we’re ever trying to do is get value for our investment of time and reduce what risks we can. Sometimes you’ll work on projects where you build low level capabilities that are very valuable, and the actual requirements vary wildly as stakeholders navigate uncertainty. In those cases you’re glad to have solid foundations even if everything above is quite wobbly. Time, change and uncertainty are part of your domain and you have to reason about them the same as everything else.
> I will maintain that functional and integration tests are more effective than low-level unit tests in most cases
Right, that's pretty much the only advice I've seen that makes sense. The only possible issue is that these tests may have a broader state space so you may not be able to exhaustively test all cases.
Absolutely right. If you’re lucky, those are areas where you can capture the complexity in some sort of policy or calculator class and use property based testing to cover as much as possible - that’s a level of unit testing I’m definitely on board with. Sometimes it’s enough to just trust that your functional tests react appropriately to different _types_ of output from those classes (mocked) without having to drive every possible case (as you might have seen done in tabular test cases). For example I have an app that tests various ways of fetching and visualising data, and one output is via k-means clustering. I test that the right number of clusters gets displayed but I would never test the correctness of the actual clustering at that level. Treat complexity the same way you treat external dependencies, as something to be contained carefully.
Why does testing behavior matter? I don’t care if my tests exhaustively test each if branch of the code to make sure that they call the correct function when entering that if branch. That’s inane.
I care about whether the code is correct. A more concrete example; say I’m testing a float to string function, I don’t care how it converts the floating point binary value 1.23 into the string representation of “1.23”. All I care about, is the fact that it correctly turns that binary value into the correct string. I also care about the edge cases. Does 0.1E-20 correctly use scientific notation? What about rounding behavior? Is this converter intended to represent binary numbers in a perfect precision or is precision loss ok?
If your tests simply check that you call the log function and the power function x times, your tests are crap. And this is what I believe the parent commenter was talking about. All too often, tests are written to fulfill arbitrary code coverage requirements or to obsequiously adhere to a paradigm like TDD. These are bad tests, because they’ll break when you refactor code.
One last example, I recently wrote a code syntax highlighter. I had dozens of test cases that essentially tested the system end to end and made sure if I parsed a code block, I ended up with a tree of styles that looked a certain way. I recently had to refactor it to accommodate some new rules, and it was painless and easy. I could try stuff out, run my tests, and very quickly validate that my changes did not break prior correct behavior. This is probably the best value of testing that I’ve ever received so far in my coding career.
"Have you ever reconsidered your path up the cliff face and had to reposition a slew of pitons? This means your pitons are too tightly coupled to the route!"
> Have you ever refactored working code into working code and had a slew of tests fail anyway? That's the child of test driven design.
I had this problem, when either testing too much implementation, or relying too much on implementation to write tests. If, on the other hand, I test only the required assumptions, I'd get lower line/branch coverage, but my tests wouldn't break while changing implementation.
My take on this - TDD works well when you fully control the model, and when you don't test for implementation, but the minimal required assumptions.
I don't think that's TDD's fault, that's writing a crappy test's fault.
If you keep it small and focussed, don't include setup that isn't necessary and relevant, only exercise the thing which is actually under test, only make an assertion about the thing you actually care about (e.g. there is the key 'total_amount' with the value '123' in the response, not that the entire response body is x); that's much less likely to happen.
Not sure why I’m getting downvoted so badly, because by its very nature refactoring should t change the functionality of the system. If you have functional unit tests that are failing, then something has changed and your refactor has changed the behaviour of the system!
It is very common for unit tests to be white-box testing, and thus to depend significantly on internal details of a class.
Say, when unit testing a list class, a test might call the add function and then assert that the length field has changed appropriately.
Then, if you change the list to calculate length on demand instead of keeping a length field, your test will now fail even thought the behavior has not actually changed.
This is a somewhat silly example, but it is very common for unit tests to depend on implementation details. And note that this is not about private VS public methods/fields. The line between implementation details and public API is fuzzy and depends on the larger use of the unit within the system.
Checking length is now a function call and not a cached variable — a change in call signature and runtime performance.
Consumers of your list class are going to have to update their code (eg, that checks the list length) and your test successfully notified you of that breaking API change.
Then any code change is a breaking API change and the term API is meaningless. If the compiler replaces a conditional jump + a move with a conditional move, it has now changed the total length of my code and affected its performance, and now users will have to adjust their code accordingly.
The API of a piece of code is a convention, sometimes compiler enforced, typically not entirely. If that convention is broken, it's good that tests fail. If changes outside that convention break tests, then it's pure overhead to repair those tests.
As a side note, the length check is not necessarily no longer cached just because the variable is no longer visible to that test. Perhaps the custom list implementation was replaced with a wrapper around java.ArrayList, so the length field is no longer accessible.
If I have the tooling all set up (e.g. playwright, database fixtures, mitmproxy) and the integration test closely resembles the requirement then I'm about as productive doing TDD as not doing TDD except I get tests as a side effect.
If I do snapshot test driven development (e.g. actual rest API responses are written into the "expected" portion of the test by the test) then I'm sometimes a little bit more productive.
There's a definite benefit to fixing the requirement rather than letting it evaporate into the ether.
Uncle bob style unit test driven development, on the other hand, is something more akin to a ritual from a cult. Unit test driven development on integration code (e.g. code that handles APIs, databases, UIs) is singularly useless. It only really works well on algorithmic or logical code - parsers, pricing engines, etc. where the requirement can be well represented.
You could spend your entire life improving test infrastructure. There's clearly a cut off point where the investment stops making sense but it's hard to know when.
The investment calculation is quite complex and many of the variables require guesses. A lot of returns on automation work are not positive.
You can get quite close to reality with really good integration tests. There's all sorts of real life scenarios ive hit like flaky networks, email appearance in various clients, etc. that can be integration tested with a bit of creativity but most people wouldnt even think to do it.
The investment in this stuff can be quite high...unless you've got premade tooling for all of this that you can just drop in.
Yes, unless you are doing a clean room implementation of paxos or the raft protocol it probably isn’t worth the effort to create harnesses to simulate packet loss, thundering herds, split brain, out of order responses, etc. Even then, if you are writing some distributed synchronization primitives you might be better off with formal proofs than some sort of test harness.
>Sometimes, I just CBF. I spent months (part time, of course) building out an end-to-end test system for my open source project Lazygit, and every day I think of all the regressions that system has prevented and how much harder it would be to add that system now given how many tests have been written since. I know for a fact it was worth the effort. So why haven’t I added end-to-end tests to my other project, Lazydocker? Because I CBF.
Building end to end tests is hellishly annoying and a huge amount of work if the tooling to do it just isn't there. There needs to be better drop in default frameworks for building them.
You basically need to allocate a whole developer’s worth of time to maintain end to end tests. If you’re lucky, they will also have time for your integration tests.
Frankly I think even decoupling for the purposes of unit testing is another example of cargo culting. It emerged as a best practice when CPUs were much slower, containerization didn't exist and mock service tooling was far inferior - writing integration tests was an exercise in flaky futility in 2001. The technology has since moved on but the culture of best practices hasn't.
It still can make sense if you've got a very complex bundle of isolated stateless business logic but in practice I find that most dependency inversion I see these days isn't for that. It increases the SLOC by 30% and reduces code coherence just so that a test can run in 20 milliseconds instead of 2.1 seconds.
In many cases those unit tests check to see if you put a number in one end of a class that the same number comes out the other end. What bug is that going to catch?
Meanwhile all of the actual bugs are probably hiding in the database queries or the interaction layers created as a sacrifice to bob the bearded god of unit testing.
The most fascinating instantiation of this idea is, I think, the notion that because unit tests are horrible to work with that unit tests can therefore "drive" design because it'll make you fight to make the unit test less horrible. It's like advocating an abusive relationship to help fix your life because the abuser won't be afraid to point out all of your flaws which you can then fix. The two corollaries that never seems to permeate are A) maybe when unit tests become horrible to work it's means they were also the bad code that coupled too tightly? and B) maybe it's cheaper to spot bad design by learning what to look for rather than building a unit test that will punch you in the face for it?
> It increases the SLOC by 30% and reduces code coherence just so that a test can run in 20 milliseconds instead of 2.1 seconds.
There are people who cargo cult and people who do things for a reason. It may not always be obvious which category a developer was in just by talking to them or looking at their work.
Imagine the following scenario: you have two projects each having the same functionality and the same level of test coverage. Project A is 30% longer than project B but project A's test suite runs in 1 second while project B's runs in 2 minutes. Personally, I am taking project A every time (but I'm also probably trying to figure out how to get that 30% down to 0.1-2%).
The path to project B is paved with 50 decisions to do the thing that took 2.1 seconds rather than 20ms.
>Imagine the following scenario: you have two projects each having the same functionality and the same level of test coverage. Project A is 30% longer than project B but project A's test suite runs in 1 second while project B's runs in 2 minutes. Personally, I am taking project A every time.
Project A's tests will usually be sacrificing realism in favor of speed. The practical upshot of that is entire classes of bug won't get caught.
To take some recent practical examples off the top of my head: I have some tests that do 404 checks on all web pages. I have another that checks if sent emails are outlook compliant. These checks necessitate that the tests be slower but somebody still needs to do this.
I know there are plenty of people out there who think like you do. That was kind of my point. Realism in testing is an underappreciated quality.
Clock decoupling is one of the few patterns that's actually required for reproducible tests. Using the real clock makes test run in realtime (slow) and puts them at the mercy of the OS timing and other exterior factors.
Not necessarily. This is exactly the kind of thing I mean, in fact.
I once had a bug scenario I wanted to reproduce with a test where the time was grabbed from datetime.now() in the first half of the scenario and indirectly via a postgres query in the second half (with a time delimited query).
With libfaketime I injected a new running service time and postgres service via the test runner. It let me do TDD to create a reproducible test - what you thought was impossible.
I could have jumped straight to dependency injection on that mess of a code base but that would have meant a risky architectural change that I would have to have done without tests. That is bad practice.
This sort of thing was not really possible in 2001 but it's possible now.
So is JSON, and all mainstream config formats actually.
However, that stopped being a problem with JSON schema, which, despite the name, works on yaml too.
Blink. I beg your pardon, JSON is strongly typed per RFC 8259 standard:
> JSON can represent four primitive types (strings, numbers, booleans, and null) and two structured types (objects and arrays).
The type system does not align well with any other type system out there (float/int ambiguity, no timestamps, etc.) but it's still better than any coercion.
There are two types of github actions workflows you can build.
1) Program with github actions. Google "how can I send an email with github actions?" and then plug in some marketplace tool to do it. Your workflows grow to 500-1000 lines and start having all sorts of nonsense like conditionals and the YAML becomes disgusting and hard to understand. Github actions becomes a nightmare and you've invited vendor lock in.
2) Configure with github actions. Always ask yourself "can I push this YAML complexity into a script?" and do it if you can. Send an email? Yes, that can go in a script. Your workflow ends up being about 50-60 lines as a result and very rarely needs to be changed once you've set up. Github actions is suddenly fine and you rarely have to do that stupid push-debug-commit loop because you can debug the script locally.
Every time I join a new team I tell them that 1 is the way to madness and 2 is the sensible approach and they always tepidly agree with me and yet about half of the time they still do 1.
The thing is, the lack of debugging tools provided by Microsoft is also really not much of a problem if you do 2, vendor lock in is lower if you do 2, debugging is easier if you do 2 but still nobody does 2.
This is a great perspective, and one I agree with -- many of the woes associated with GitHub Actions can be eliminated by treating it just as a task substrate, and not trying to program in YAML.
At the same time, I've found that it often isn't sufficient to push everything into a proper programming language: I do sometimes (even frequently) need to use vendor-specific functionality in GHA, mark dependencies between jobs, invoke REST APIs that are already well abstracted as actions, etc. Re-implementing those things in a programming language of my choice is possible, but doesn't break the vendor dependency and is (IME) still brittle.
Essentially: the vendor lock-in value proposition for GHA is very, very strong. Convincing people that they should take option (2) means making a stronger value proposition, which is pretty hard!
No, you're right it's not necessarily a good idea to be anal about this rule. E.g. If an action is simple to use and already built I use it - I won't necessarily try to reimplement e.g. upload artifacts step in code.
Another thing I noticed is that if you do 1 sophisticated features like build caching and parallelization often becomes completely impractical whereas if you default to 2 you can probably do it with only a moderate amount of commit-push-debug.
Option 2 also makes it easier for developers to run their builds locally, so you're essentially using the same build chain for local debugging than you do for your Test/Staging/Prod environments, instead of maintaining two different build processes.
It's not just true for GHA, but for any build server really: The build server should be a script runner that adds history, artifact management, and permissions/auditing, but should delegate the actual build process to the repository it's building.
Good perspective. Unfortunately (1) is unavoidable when you're trying to automate GH itself (role assignments, tagging, etc.). But at this point, I would rather handle a lot of that manually than deal with GHA's awful debug loop.
FWIW, there's nektos/act[^1], which aims to duplicate GHA behavior locally, but I haven't tried it yet.
> Unfortunately (1) is unavoidable when you're trying to automate GH itself (role assignments, tagging, etc.)
Can't you just use the Github API for that? The script would be triggered by the YAML, but all logic is inside the script.
But `act` is cool, I've used it for local debugging. Thing is its output is impossibly verbose, and they don't aim to support everything an action does (which is fine if you stick to (2)).
Yeah, I've done quite a bit of Github scripting via octokit and it's pretty simple. Using GHA's built-in functionality might turn a five line script into a one-liner, but I think being able to run the script directly is well worth the tradeoff.
The main thing that you can't decouple from GHA is pushing and pulling intermediate artifacts, which for some build pipelines is going to be a pretty big chunk of the logic.
How DO you debug your actions? I spend so long in the commit-action-debug-change loop it’s absurd. I agree with your point re: 2 wholeheartedly though, it makes debugging scripts so much easier too. CI should be runnable locally and GitHub actions, while supported with some tooling, still isn’t very easy to work with like that.
We may be splitting hairs given what this thread is going on about, but I strongly advocate for `--force-with-lease` as a sane default versus `-f` so that one does not blow away unexpectedly newer commits to the branch
The devil's in the details, etc, etc, but I think it's a _much_ more sane default, even for single-user setups/branches because accidents can happen and git DGAF
Act works pretty well to debug actions locally. It isn't perfect, but I find it handles about 90% of the write-test-repeat loop and therefore saves my teammates from dozens of tiny test PRs.
May have misread this but you know you can push to one branch and then run the action against it? Would reduce PRs if you're doing that to then check the action in master. You have to add a workflow_dispatch to the action: https://docs.github.com/en/actions/using-workflows/manually-...
Yeah most of the time that is a good way to test. There are some specific actions that aren't easily tested outside of the regular spot though. Mostly deployment related pieces due to the way our infrastructure is setup.
The main reason I aim for (2) is that I want to be able to drive my build locally if and when GitHub is down, and I want to be able to migrate away easily if I ever need to.
I think of it like this:
I write scripts (as portable as possible) to be able to build/test/sign/deploy/etc
They should work locally always.
GitHub is for automating me setting up the environments where I can run those scripts and then actually running them.
Totally get what you're saying. I once switched our workflow to trigger on PRs to make testing easier. Now, I'm all about using scripts — they're just simpler to test and fix.
I recommend making these scripts cross-platform for flexibility. Use matrix: and env: to handle it. Go for Perl, JavaScript, or Python over OS shells and put file tasks in scripts to dodge path issues.
I've tried boxing these scripts into steps, but unless they're super generic for everyone, it doesn't seem worth it.
They don't seem to grasp how bad their setup is, and consequently are willing to understand awful programming conditions. Even punch cards were better as these people had the advantage of working with a real programming language with defined behaviour. "when exactly is this string interpolation step executed? in the anchor or when referenced? (well, it depends)". No it's black box tinkering
(you might as well be prompt engineering)
the C in IaC is supposed to stand for code. Well, if you're supposed to code something you need to
- be able to assert correctness before you commit,
- be able to step through the code
If the setup they give you doesn't even have these minimal requirements you're going to be in trouble regardless of how brilliant an engineer you are.
I agree overall, but you oversimplify the issue a bit.
> can I push this YAML complexity into a script?
- what language is the script written in?
- will developers use the same language for all those scripts?
- does it need dependencies?
- where are we going to host scripts used by multiple github actions?
- if we ended up putting those scripts in repositories, how do we update the actions once we release new version of the scripts?
- how do you track those versions?
- how much does it cost to write a separate script and maintain it versus locking us in with an external github action?
These are just the first questions that pop in my mind, but there is more. And some answers may not be that difficult, yet is still something to think about.
And I agree with the core idea (move logic outside pipeline configuration), but I can understand the tepid reaction you may get. Is not free and you compromise on some things
I think they framed it accurately and you are instead over complicating. Language for scripts is a decision that virtually every team ends up making regardless. The other questions are basically all irrelevant since the scripts and actions are both stored in repo, and therefore released together and versioned together.
I think the point about maintenance cost is valid, but the thesis of the comment that you are responding to is that the prebuilt actions are a complexity trap.
I think you are still envisioning a fundamentally incorrect approach. Build scripts for a project are part of that project, not some external thing. The scripts are stored in the repository, and pulled from the branch being built. Dependencies for your build scripts aren't any different from any other build-time dependencies for your project.
I have a few open source projects that have lasted for 10+ years, and I can’t agree more with approach #2.
Ideally you want your scripting to handle of the weird gotchas of different versions of host OSes, etc. Granted my work is cross-platform so it is compounded.
So far I’ve found relying on extensive custom tooling has allowed me to handle transitions from local, to Travis, to AppVeyor, to CircleCI and now also GitHub Actions.
You really want your CI config to specify the host platform and possibly set some env vars. Then it should invoke a single CI wrapper script. Ideally this can also be run locally.
There’s a curve. Stringy, declarative DSLs have high utility when used in linear, unconditional, stateless programming contexts.
Adding state?
Adding conditionals?
Adding (more than a couple) procedure calls?
These concepts perform poorly without common programming tools: testing (via compilation or development runtime), static analysis, intellisense, etc etc
Imagine the curve:
X axis is (vaguely) LinesOfYaml (lines of dsl, really)
Y axis is tool selection. Positive region of axis is “use a DSL”, lower region is “use a GeneralPurposeProgrammingLanguage”
The line starts at the origin, has a SMALL positive bump, than plummets downwards near vertically.
Gets it right? Tools like ocurrent (contrasted against GH actions) [1], cdk (contrasted against TF yaml) [2]
Gets it wrong? Well, see parent post. This made me so crazy at work (where seemingly everyone has been drinking the yaml dsl koolaide) that i built a local product simulator and yaml generator for their systems because “coding” against the product was so untenable.
Your advice is sane and I can tell speaks from experience. Unfortunately, now that Github Actions are being exposed through Visual Studio, I fear that we are going to see an explosion of number 1, just because the process is going to be more disconnected from Github itself (no documentation or Github UI visible while working within Visual Studio).
I try to do (2), but I still run into annoyances. Like I'll write a script to do some part of my release process. But then I start a new project, and realize I need that script, so I copy it into the new repo. Then I fix a bug in that script, or ad some new functionality, and I need to go and update the script in the other repo too.
Maybe this means I should encapsulate this into an action, and check it in somewhere else. But I don't really feel like that; an action is a lot of overhead for a 20-line bash script. Not to mention that erases the lack of lock-in that the script alone gives me.
I guess I could check the script into a separate utility repo, and pull it into my other repos via git submodules? That's probably the least-bad solution. I'd still have to update the submodule refs when I make changes, but that's better than copy-pasting the scripts everywhere.
I agree, but of course all CI vendors build all their documentation and tutorials and 'best practices' 100% on the first option for lock-in and to get you to use more of their ecosystem, like expensive caching and parallel runners. Many github actions and circleci orbs could be replaced by few lines of shell script.
Independent tutorials unfortunately fall in the same bucket as they first look at official documentation to try to follow so-called best practices or just try to get their things working, and I would say also because shell scripts will seem more hacky for many people -unfairly-.
That's true for all CI services, do as little as possible in yaml, mostly just use it to start your own scripts, for the scripts use something like python or deno to cover Linux, Mac and Windows environments with the same code.
When GitHub actions came out, I felt bad about myself because I had no desire to learn their new programming language of breaking everything down into multiple small GitHub actions.
I think you explained quite well what I couldn't put my finger on last time:
Building every simple workflow out of a pile of 3rd party apps creates a lot of unnecessary complexity.
Since then, I have used GitHub actions for a few projects, but mostly stayed away from re-using and combining actions (except for the obvious use cases of "check out this branch").
YAML is perfect for simple scenarios. But users produces with it really complex use cases.
Is it possible to write Python package that based on YAML specification produces Python API? User will code in Python and YAML will be the output.
I was working on YAML syntax for creating UI. I converted it to Python API and Im happy. For exmple, dynamic widgets in YAML were hard, in Python they are strightforward.
Absolutely agreed. Well said and I'll be stealing this explanation going forward. Hell, just local running with simplicity and ability to test is a massive win of #2, aside from just not dealing with complex YAML.
It can be any scripting language, Python or Typescript via Deno are good choices because they have batteries-included cross-platform standard libs and are trivial to setup.
Python is actually preinstalled on Github CI runners.
Exactly, I showed here how we just write plain shell scripts. It gives you "PHP-like productivity", iterating 50 times a minute. Not one iteration every 5 minutes or 50 minutes.
I appreciate this perspective, however, after spending 6mo on a project that went (2) all the way, never again. CI/CD SHOULD NOT be using the same scripts you build with locally. Now, we have a commit that every dev must apply to the makefile to build locally, and if you accidentally push it, CI/CD will blow up (requiring an interactive rebase before every push). However, you can’t build locally without that commit.
I won’t go into the details on why it’s this way (build chain madness). It’s stupid and necessary.
This comment is hard to address without understanding the details of your project, but I will at least say that it doesn't mirror my experience.
Generally, I would use the same tools (e.g. ./gradlew build or docker build) to build stuff locally as on CI, and config params are typically enough to distinguish what needs to be different.
My CI scripts still tend up to be more complicated than I'd like to (due to things like caching, artifacts, code insights, triggers, etc.), but the main build logic at least is extracted.
1) By making you think about the interface up front. E.g. if you are designing a REST API writing a test that calls that API that does not yet exist. This almost always a good thing.
2) By making it painfully difficult to write a test - e.g. because you need to mock all over the place. The most annoying thing about this form of feedback is that it does help so it's not completely useless, but it's very expensive and usually has a low if not negative ROI.
I don't find that thought pieces on "test driven design" ever seem to distinguish the two different modes, and they're usually awfully vague about whether they're talking about high level tests matching business logic or low level tests on implementation details because they have very different payoffs.
High level TDD where the tests match the behavior and business logic will often NOT provide design feedback on lower levels. I think this is actually a good thing. It simply provides a safety net to experiment with your designs.