Hacker News new | past | comments | ask | show | jobs | submit login
How a Jenkins Job Broke our Jenkins UI (slack.engineering)
135 points by FraBle90 on June 3, 2021 | hide | past | favorite | 149 comments



Since everybody seems to be hating on jenkins so much, I'll speak up. IMO jenkins is one of the most valuable tools at any startup and every good engineer uses either jenkins or something very similar.

There's no comparison between managed/logged/permissioned/distributed jobs that jenkins provides for free vs building an overwrought service or an insufficient crontab. However it's a power-tool and I think a lot of people go in expecting something dead-simple and pretty and are put off by something that they need to invest in learning.

Just for example, in 2 days I built an uptime checker that ran every 60-seconds via jenkins and triggered slack alarms/sms on problem, and it just worked (< 2 days total maintenance), for years. An equivalent service (Pingdom) quoted 10k a year.


I've been using jenkins for over 10 years.

There is no finer system for turning quick CI hacks into entrenched technical debt.

Jenkins gives you the freedom to do things without the burden of asking whether you should do those things. Jenkins permits you to munge a whole lot of separate domains into a single, creaky system.

There are better tools available for most aspects of CICD and automated testing these days.


> There are better tools available for most aspects of CICD and automated testing these days.

It's easy to say this and not cite examples.

I rock perforce and windows agents, requirements of the environment, and we struggled very hard to find something that beats Jenkins.

One guy even bought an Octopus Deploy license for a year (at the tune of a few thousand dollary-doos) and it did everything worse than our Jenkins setup.

I actually dislike Jenkins, I'm open to suggestions here.


Almost 7 years ago, we set up our first build pipeline and I was a bit anxious about Jenkins because I simply hated it and it would have been the only piece of Java technology in our stack. Our infra guy basically told me "look, there’s nothing more powerful and widely known out there" (which was pretty much true at the time, at least in open source) and had already set up the VM when Gitlab came out with their CI/CD.

Since we just had set Gitlab up as our source control platform, this was the logical choice and we immediately ripped down the Jenkins server. Since we were still starting out, integration and rewiring was done in just a day.

I‘ve never been so happy playing the early adopter guinea pig. Actually we never had any real issues with it and aren‘t looking back.

For me, powerful equals "footgun included". CI should do one job and do it well imho. Huge bonus points for tight coupling into the source control workflow.

Your mileage may vary.


Jenkins to me is great for prototyping certain process activities in a very arbitrary set of conventions, but it provides a lesson as to why the Java world moved on from tools like Apache Ant (XML-based Make) to Maven (convention over configuration as a tooling value). While 10 years ago such processes were not really standardized nowadays people are used to conventions like GitOps with well documented processes for how artifacts are built, deployed, and operated with container primitives after we've spent a _lot_ of time now trying to manage Jenkins installations that become incredibly brittle over time and become more of a liability than a force multiplier for the organization. So people are adopting more specialized tooling instead of trying to have Jenkins do everything under the sun from building artifacts to deploying them to deploying infrastructure with CD conventions.

Things I've found better than Jenkins once an organization lets go of over-engineered processes:

- TeamCity - Concourse CI - Circle CI - ArgoCD - Tekton

For Windows build agents I've met plenty of people successful with Octopus Deploy that have hated Jenkins and your data point is the first I've seen that went back to Jenkins.

Perhaps it's worth looking at Team Foundation Server / Azure DevOps?


Octopus Deploy and TeamCity are horrific. Jenkins is hilarious better despite all of its flaws. Concourse CI has a higher learning curve than Jenkins (at least for users, not necessarily admins), but once you get it, it's definitely a good contender. CircleCI is great!


I've used Jenkins for a long time, long enough to have used Hudson (Jenkins is a fork of Hudson due to Oracle drama).

The alternatives listed in reply to your thread are all much better than Jenkins, from system design, to usability, to cost of ownership of the personnel needed and tech debt incurred. Here's my own opinionated list, having used or evaluated them before:

1. GitlabCI 2. Circle CI 3. Travis CI 4. Github Actions 5. TeamCity 6. Concourse CI

Jenkins just happen to have been around a long time. It's like Jira -- everyone uses it, but nobody likes it.


Jenkins has an important niche: "overgrown or enterprise projects that were initially created without ci in mind and require a ugly, hacky pipline to build"

I think it still has no rival in that area and it's also why is getting a bad rep.


Buildkite might be an option in a regulated environment.

They don't get access to your code (BYO runners, they manage pipeline config / logs / notifications). You run their (open source, easy to build yourself) agent on infrastructure you control.

It's really nicely put together.


Buildkite's problem is they only support git, unfortuantely.


https://buildkite.com/docs/agent/v3/hooks indicates you can override the builtin git support with your own checkout system. Sucks having to write a plug-in but it should be short.



I've looked into this plugin before (we also use perforce). This does techincally let you build from perforce, in the same way making `p4 sync` your first build step does.


What are your requirements? The most recent CI/CD system I've used was Concourse, and using it was great - it's very flexible, if a bit barebones. (I didn't have hand in the initial installation for it, so this is only for using it)

https://concourse-ci.org/


Gitlab CI.


I think you missed that I'm using Perforce; gitlabCI would be a super hack job.


I think Jenkins is a very powerful tool but I would strongly disagree that it's quite valuable in a startup environment. Jenkins has struck me as a philosophical sibling as C++ - it offers a plethora of foot guns to the user that can also be used to accomplish rather good things. I would advise Jenkins to companies with projects that don't require agile responses to user requirements where the functionality desired at the end is very clear from the beginning - since the times I see Jenkins fail the most are when the configuration of jobs on it are being updated and reconfigured rapidly.

Jenkins tends to suffer from extremely poor state management and experimenting with things can cause a permanent loss of value to companies if backups are not properly configured - compared with a system like ansible where recipes will grow and be regularly committed to a repository Jenkins doesn't have a nature version control system and as a result strikes me as extremely brittle.

I almost wish the UI for jenkins simply didn't exist and it actually was just a whole mess of config files and shell scripts so that it could be locked down to a much better degree.


I don't care to unpack most of that. But what I will say is that I recommend automatically backup up jenkins nightly, I accomplished this simply by commiting jenkins_home to a git repo and pushing every night.


These days you can also use the Configuration as Code plugin and you can automate the setup, version control it, document it, etc: https://www.jenkins.io/projects/jcasc/


But did you try to restore from backup? I did and was flawless and smooth


As someone who runs a large build cluster at a major corp, Jenkins is clunky and outdated. It's a pain to deploy and keep maintained. I'd rather stub my toe than figure out why some event isn't being properly processed in the bowels of Jenkins.

However, when coupled with Kubernetes, Jenkins is more powerful than almost any other CI tool.

You can orchestrate any iteration of build types or actions while easily providing the underlying resources transparently to your users.

Notice that almost every other CI tool mentioned here is either 3rd party hosted or severely limited in some fashion when compared to Jenkins.


If you are already running in kubernetes, how is any other tool severely hampered if it also runs in kubernetes?


It's partially because I'm not as familiar with other tools but they don't expose the ability to interact with an entire pod within the build.

Gitlab Action Services for instance is closer to Docker Compose.


Seems to me people hating on Jenkins are not hating on the idea of a central automation service/task runner, they hate on Jenkins itself.


You could use a tool desgned for it and probably be better of, since upgrading jenkins and plugins, as everyone knows... is not always exactly pain free.

Regarding your batch scheduling solution - take hashicorp nomad for a spin: a single go binary and your scheduled job can be declared in 10 lines of yaml. Won’t miss a beat.


This isn't a batch scheduling problem. It's a monitoring and alerting problem. OP needs a monitoring and alerting solution.

Stop ~~~ engineering ~~~ things and choose the right tool for the job.


"Solution" and "Job" may be overstating things here. If all op needs is monitoring and alerting of a couple of machines, then Datadog/New Relic/Ops Genie/Pager Duty/Server Density/Pingdom/AppDynamics/Loupe/Sysdig/Dynatrace (just to name a few in the crowded space) are all likely overkill and not worth the cost.

A large portion of the cost of many of these tools is spent "choosing the right tool for the job" Figuring out what they do, what they do well, where they overlap, where the company that makes the tool is headed, how hard its going to be to swap that tool out for a better one (or cheaper one) - thats a lot of expense in labor and training.


A script scheduled to test something and notify? Hardly ”engineered”.

I claim using pretty much any tool other than jenkins is easier to maintain over time (as illustrated in this very article).


Don't abuse CI/CD system as workflow automation tool. There are better and much easier alternatives.


What do you have in mind? We have a number of workflow automation tasks on Jenkins and while it works, it requires constant upkeep.

An ability to share machine pool with a CI system, or having an integrated CI is a plus.


Someone, already, mentioned Argo tool family. I have got very good experience coupling argocd and argo workflows. Its not so advanced (from workflow side) as Airflow, but works as pretty good. There is always option to fallback to Rundeck and folks from older times


Such as?


(crickets)

We don't use Jenkins for CI/CD. It's probably too overkill for those basic tasks.

But as the middle ground between crontabs and something like Airflow it fixed all our problems without creating new ones.

Are there really any alternatives out there? (And no, we don't need Kubernetes, thanks.)


I think the question itself begets more questions. For example, I've seen Jenkins jobs replaced with an Apache Airflow DAG or two, but obviously that's not the right call for continuously deploying applications to K8S clusters, which is where I'd use Tekton, Harness, or Argo CD. If we go way back, companies used to churn out Perl and Ruby scripts and eventually they were replaced with anything from Python to Go to Rust. It always goes back to _why_ someone finds deficiencies in what Jenkins jobs provide. Ask 10 people that hate Jenkins and one is likely to find 10 different products / solutions appropriate for them.

Most folks I've seen replacing Jenkins for a job runner wind up using Ansible, Salt, or Rundeck if they're not in a giant enterprise (there's also the old HP Operations Orchestration system but that's buried now unfortunately). All of these have their own holy wars brewing and deficiencies, sure, but I personally prefer those warts over adding more maintenance issues by adding in Jenkins. I'd also suggest StackStorm for a more modern, async approach to orchestration workflows. The workflow software ecosystem is built around business domain specialization and it's kind of silly to try to go against Jenkins toe to toe as a company, so this is what we've got sadly. After all, Jetbrains makes TeamCity which is essentially a Better Architected than Jenkins system but it's definitely not very popular either.


For us, the Jenkins is:

- Website with access control which supports identity provider.

- Properly authorized user can start a job from their browser, and can specify arguments as well (using checkboxes, input boxes, multi-choice boxes etc...).

- Job works by allocating few machines from the pool and running some shell commands on them. The number of machines, type of machines, and commands to run are all customizable and can vary based on parameters user has entered.

- Once job is running, you can view text logs (in real time) or hit "cancel" button, and it'll properly cancel job and release all the resources. If the job fails for any reason, it sends notifications.

- From web interface, you can view the list of all past jobs, their status, logs, outputs, test results, etc...

- The machines can be physical on-prem ones, or allocated from AWS. For physical machines, all you need is plain linux with ssh access. For AWS machines, machines are created/destroyed in response to load.

- The whole thing needs no infra other than a single master machine which is hosted on premises. As long as you back up the master regularly, you can recover from complete meltdown, the only problem will be that some jobs will get cancelled.

------

Jenkins is actually pretty bad at this task, and we need a fair amount of Groovy code (ugh...) to get it to do what we want.

But there seems to be very little alternatives which can do all that. For example, Apache Airflow is missing authentication for web ui. Tekton seems to require Kubernetes and I (based on reading docs) has no UI controls. etc...


Jenkins can be relegated to simply launching tasks in other systems and having almost no state or configuration in itself then for your needs. One can lock down Airflow access to solely from Jenkins agents or setup an nginx proxy in front of it to protect it from prying eyes.

Jenkins is architecturally an artifact of enterprise software kitchen sinks for the 2000s and the way around it is to cut away at anything one doesn’t need out of Jenkins and to instill discipline in engineers to think about requirements carefully and to avoid hacking more and more into Groovy Jenkins libs than one would spend time actually doing work.

For a lot of what you’re describing Rundeck was built for this common case as essentially a central job portal, and at the least it supports more integration options than Jenkins. It’s difficult to wholeheartedly recommend it still given its surprisingly low rate of adoption. The market now is that for an OSS product a decent web UI with authentication is basically something to charge like it’s an enterprise product, and so lots of start-ups even today will begrudgingly fire up Jenkins in 2021.

I’m of the opinion that the time fighting Jenkins jobs and writing Groovy for common, trivial tasks in most CI systems is not acceptable in a start-up situation where strong focus and minimizing distractions / non-core work is so important for success. The kind of code to define a build doesn’t make sense for orchestration (it’s not just a digraph but about how to react to events and edge transitions) but it’s exactly how Jenkins DSLs work out.


I'd also like to know. Many of them are <<clunkier>> and have smaller communities, less active development, etc. Far from a clear win.


Rundeck comes to mind


Jenkins is endlessly flexible, but at the end of the day its complexity adds such an upfront cost that less flexible tools win outright when the metric you care about is the “ability to get shit done without giving a fuck about groovy sandboxing”.

A system like Gitlab-CI is less flexible than Jenkins but because it makes 99% of the use cases you have for a CI system take 99% less effort it wins hands down.

Combine it with a Kubernetes-based executor and you have a scalable, isolated, reproducible and flexible CI system that requires basically no maintenance and most importantly is about as approachable and understandable as you can possibly get for a CI system. It’s simple shell commands vs AbstractProxyCSPFactoryGroovyBean classes.


After a while it got easier for us to write our own code to automate things than to try and piece together a wordpress-like component system.

We still have Jenkins running somewhere, but I can't remember the last time I needed to run a job on it.


If you haven't looked in a while, it's probably not running, or if it is, it's seven incompatible plugin upgrades away from actually working.


> Combine it with a Kubernetes-based executor and you have a scalable, isolated, reproducible and flexible CI system that requires basically no maintenance and most importantly is about as approachable and understandable as you can possibly get for a CI system

You can also setup a docker image with all the Jenkins related bits installed (including any custom/specific setup) and then that can be integrated with whatever Kubernetes or Docker setup you need. Jenkins also has concept of slave executors and they can be deployed to any build nodes (fixed or on-demand).

> It’s simple shell commands vs AbstractProxyCSPFactoryGroovyBean classes.

No idea what this is about - you don't need to write Java or Groovy classes to run Jenkins jobs. One of the most common use of Jenkins is to run steps via scripts (Shell/Python/Perl/whatever).

Sure the UI looks a bit dated but from what I have seen, it hardly needs much maintenance either.


we've had great luck with gitlab-ci. Jenkins is just a nightmare.


I see why Jenkins is complex, but a nightmare? I use and manage a rather complex setup and hardly ever had problems related to Jenkins itself.

I do advice to limit users access to the internals of the framework (not the jobs but rather the core or plugin installations) and leave that the a small number of people


The thing is, Gitlab & co. almost all mandate Docker.

You don't use Docker, the list of options narrows down dramatically.

You need to support non-Linux OSes, same thing.

If your needs are limited, yeah, something other than Jenkins is better. If they're not... I don't think there's a better alternative to Jenkins.


Gitlab CI doesn’t mandate anything - you can run it on a “bare metal” server. However if you use a kubernetes executor it mandates that your builds run inside a container image, which is fantastic for reproducibility and visibility. Everything is versioned and can be customised at the repository or branch level with very little effort.

Of course this limits some more esoteric builds that might need dedicated hardware or for whatever reason cannot run inside a container, but you can mix executor types if you so please.


Do they support Windows Docker containers?


Yes, GitLab CI supports Windows Docker containers, see https://docs.gitlab.com/runner/executors/docker.html#using-w...


Gitlab CI does not mandate docker. The gitlab CI agent will happily execute directly on the host without starting docker containers.


Gitlab doesn't mandate anything. Also, even if your CI pipeline does use Docker, it doesn't matter if "you don't use Docker" _as a team_. Your pipeline may name an image, and that's it. That's way different than maintaining your own Dockerfiles and images.


Jenkins has always rubbed me the wrong way. The quality of plugins, the dated UI, Groovy. It just feels out of place and bloated every time I use it. Being a Java app doesn't help either. It reminds me of Jira where you have to hack that death to make it "fit in" to common workflows. I'll take any of the alternatives over it.


Despite everything you said being true, it's still one of the few open-source, free solutions in the mobile build space.

I'm wondering if Buildkite or something newer is comparable today, but for a long time, Jenkins was one of the only non-custom ways of building an in-house iOS/Android build system.


At Yahoo (Verizon Media), the CI/CD team started with Jenkins, then built a UI to manage all the instances, then wrote their own engine that replaced Jenkins entirely. It’s a good product:

http://screwdriver.cd/


The problem is that website presents me almost zero information about the product.


Is there something about mobile builds that makes the standard Jenkins competitors like GitLab, Drone etc. unsuitable?

Even hosted Gitlab (gitlab.com) lets you bring your own runner to use whatever arch you need to run the jobs on.

I’m sure I’m missing something here though.


Large cross-platform native builds generally, not just mobile, kinda suck on the usual cloud solutions. The cloud stuff's amazing, performs well-enough, and is easy to configure if your workflow is... containers, and that's all. It gets rough in a hurry once you're much outside that, and especially with large (=long build time) apps. Many can accommodate native builds, but for anything non-tiny you'll experience slow builds or go over plan limits (and see spending skyrocket) in a hurry. Anything that requires large assets at build-time, or anything else for which it's nice to have persistent state between builds, is right out. You'll end up spending a bunch of time (=money) re-configuring your build to fit the tools, so performance is merely bad and not astonishingly, uselessly bad.

You're also more likely, with native builds on cloud CI, to run into "yeah, we support that! (I mean... we shipped it, anyway, pay no attention to all the bugs and rough edges)".


The thing is, none of that is true. Even for the cloud hosted providers (Gitlab at least) you can install your own native runner and connect it to your CI system. Then run any weird combination of commands you want.


I've not used buildkite (we don't use git), but we're moderately happy with TeamCity. It's definitely not perfect, but it's orders of magnitude better than Jenkins.


TeamCity is expensive and when we evaluated it we hit a significant bug that they couldn't fix (if you rebuild a snapshot build from early in a long chain of snapshot dependencies, it would run a quadratic number of rebuilds rather than rebuilding everything once in topological order as Jenkins does).


I've found TeamCity's pricing to be pretty reasonable; It's free to start, $300 per extra agent after 3 agents, or $2500/year for support with 5 build agents (which is where we're at right now). It's also 50% off for startups. Compared to any other bring-your-own-agent solution, it's very reasonable.

The dependency bug sounds rough; have you got a youtrack link to it by any chance?


> I've found TeamCity's pricing to be pretty reasonable

Well, compared to Jenkins it's very expensive :). I guess the cheaper options may be newer, or maybe that employer ran the evaluation in a dysfunctional way (wouldn't be the first time). All I can say is that was a consideration for us.

> The dependency bug sounds rough; have you got a youtrack link to it by any chance?

No, this was three jobs ago and was all handled verbally with their representative (who gave off a pretty unprofessional vibe, frankly, so may not have recorded it in youtrack even if that's their policy). If you want to test it out, set up something like 10 maven projects with SNAPSHOT versions all depending on each other, and set to rebuild on commit or when their dependencies change: B depends on A, C depends on B (and so implicitly on A), D depends on C (and so implicitly A and B), etc.. And then commit a change to A and see what happens. Whether you think this is a good way to set up your projects or not, it's what we were doing at that place, and Jenkins handles it fine (it will rebuild A, B, C, ... in order) whereas TeamCity (at least when we were evaluating it) will rebuild A, then rebuild B-L, then rebuild C-L because B has changed, and so on.


It's even better than that, because you can pay just once and stay on the same version forever, as long as you don't need new features and support. And JetBrains gives 50% discount for license renewals, which makes it not $2500 but $1250/year – I believe this applies to all JetBrains products, not only TeamCity.


I could be wrong but isn't Jenkins at its core just "wait for a job trigger (git push/commit to remote origin), then run some shell scripts"?


Not really. Jenkins allows a very fine control over jobs and steps orchestration. I realize it might appear as a dated glorified cron, but it's really not. There's nothing you cannot do with it, which is more than I can say about any other CI system I used.


To throw my hat into the ring: I've found the best "runnable jobs" / "digraph of jobs" system I've ever used to be Gitlabs built in CI system. The UI is nice, it's got tight integration into our existing code workflows meaning there's a close-but-not-too-close association between code and jobs meaning it's easy to say that "the code and the jobs running that code live 'in the same place'" if you want that, and it still provides all the pipeline features you'd want if you want to use your CI/jobs as a cornerstone of your infra; features like the ability to schedule pipelines, to trigger pipelines via other pipelines, remote triggering of pipelines via cURL request, pipeline DAGs for a 'make-like' build experience, etc.

I've absolutely loved using Gitlab CI for the last several years, and I highly recommend it.


My experience is the opposite. Gitlab has a lot of features but they are all half-baked. Ever time I try out a feature, I feel like I get bit for doing so.

A couple of cases

- At first child pipelines couldn't access artifacts from the parent job. Now you can opt-in to it but nothing enforces sequencing

- Some features only work with "needs" (getting parent pipeline artifacts) while others only work without it (depending on an entire stage)

- When a job you "needs" doesn't exist, instead of having a sane behavior (e.g. skip yourself), the pipeline errors. Correctly propagating and maintaining all of the rules correctly in a moderately complex pipeline is ... difficult.


> At first child pipelines couldn't access artifacts from the parent job

Why would you use the tool like this? Make jobs / blocks that generate artifacts, and then chain them together into a workflow so there isn't hierarchical issues.

Now, maybe I am working in a simplistic world, but: the only time I have parent / child relationships is when a repo or service is built and another pipeline is invoked to run integration tests or deployments. What situations have lead you to needing to pass artifacts between pipelines and not job stages?

> When a job you "needs" doesn't exist, instead of having a sane behavior ... the pipeline errors

You and I may work in very different worlds, but I personally would be quite offended if a dependency graph workflow executor saw a node with an explicitly specified dependency missing and proceeded to NOT error.


I think there are missing concepts and features in gitlab. And the advertised features don’t really match reality. But it’s still way better than Jenkins, as you can work around any issues and it’s clear how it works. It’s just a yaml file.


We (Cronitor) are seriously considering diving in here. People already install Cronitor to remotely monitor their background jobs, we have a lot of the UI and data platform challenges solved, and we are building a control plane to allow you to securely invoke your jobs remotely (via polling). I'm not totally convinced we should go this path vs better and deeper monitoring and metrics capabilities, but we're contemplating it because, honestly, Jenkins needs to be replaced.

(Readers, if that sounds interesting, and you have some Kubernetes and Golang experience, we are hiring!)


I hate jenkins groovy so much


Yeah, Jenkins can be really unpleasant. Groovy as a language isn't so bad - I just don't use it anywhere else. So... like a few other languages, I have to look things up every time I need to mess with it.


What are good alternatives?


GitHub Actions and GitLab.

I personally found both of those to be paradigm shifts (I went from Hudson to Jenkins to Travis CI to GitLab to GitHub Actions). Travis CI introduced self-serve test infrastructure configuration-as-code, GitLab introduced composability and API integration, GHA introduced many more API features (like various kinds of event sources) and better composability. Each transition also took more hosting headaches and compatibility issues off my hands.


+1 to this. Travis, Circle and Jenkins are entirely in the rear view in this area for me now. My only concern about Actions has become that I use it too much.


I have very little experience with GitHub actions but it seems much more limited and even worst it ties you to a single vendor


We run everything through TeamCity. Not going to claim its the best tool in the world but it's certainly the best I've used, and it's fairly straightforward to work with!


We use TeamCity here as well. Pretty limited in the options we have since we have most of our stuff in Perforce.

I definitely prefer TC over Jenkins though. The Jetbrains support for TC is also really good and they have been helpful whenever we contacted them or opened an issue on their public tracker.


TeamCity works well but can be a bit expensive.

It has a Kotlin DSL that is quite powerful and flexible, but of course you need to know your way around Java to make the most of it.

Custom plugins is also an option if you need something special that is not provided out of the box.

Even though I have moved on to Azure DevOps Services, it's still good to see that JetBrains have fixed a 10 year old feature request.

https://youtrack.jetbrains.com/issue/TW-17939


I would rather use Jenkins for all of eternity than use TeamCity again. TeamCity is probably the worst CI/CD product out there.


Can you give some examples?


Its been a while, but basically horrendous UI/UX, unintuitive way to create jobs and manage variables, things like that. I don't know if it was completely rewritten in the last 2 years but it was dogshit awful before 2019.


Concourse: https://concourse-ci.org/

It takes some getting used to, but I really got to love its concepts for resources/inputs/outputs and how they work together to get you actually reproducible builds.


I must admit that I got put off by the lack of abstractions for the common use cases. The resources/input/output model is so primitive that it almost feels like programming a Turing machine.

Edit: Took a look again: The situation might have gotten better since I last played with concourse


It doesn't parse tests. It doesn't track test success/failure/runtime. Because it doesn't parse tests, it can't show you the output from only the failing tests. The UI is horrid. It's so far inferior to almost all it's competition, I have no idea why anyone would use it.


Can you explain a bit more what do you mean by "it doesn't parse tests"?


Concourse explicitly wants to not know anything about the outputs of your jobs. Most of the other ci products will have _some_ method (integration with common test runners, markers in output) to figure out how many tests you've run and display them as a collapsible outline, or something similar.

Because the ci system is aware of your tests, and perhaps a filename/test description, it can keep stats on them over time.

This is an ancient version of teamcity [0], but it shows how it knows how many tests were run, how many passed, and that there's one "new" failure in this run vs previous runs.

This lets it tell you which tests are slow, which are flaky, all sorts of history of those wrt to state of the associated repo, etc.

Not having these sorts of things once you have them is like flying blind.

[0] https://i.imgur.com/ywB7u11.png


Really love the expressivity of the config syntax and the domain model in Concourse.

Seems to be a bit harder to integrate with other systems (like GH check results), and alas the community is pretty small (which probably factors into the first point).


Zuul CI, it works great for both small scale and scales very easily. It's the CI developed for openstack but it's also used by BMW and Volvo Cars for example. https://zuul-ci.org/


Bamboo from Atlassian is really good IMO but obviously not free.


If you are Atlassian shop, then yes. Otherwise, don't even bother


I like Drone, and run it on my own systems. It integrates nicely with Gitea.



+1 for drone.

Note that it's intended for dockerized deploys, so it won't fit if you're not deploying containers.


That’s not true. It runs whatever you want, inside a container with your code. You can deploy from within that container however you like. You can run tests that just pass/fail, and throw away the container result. You can ssh or rsync things from within the container to other remote hosts, and throw away the CI container. It uses containers to ensure build consistency, but you don’t have to use them to deploy containers just because your jobs run in them.


Oh thanks for the correction!


Drone Enterprise is fantastic. Rough around the edges and the docs are really lacking, but it's just so god damn simple and effective.


Gitlab.


Both Gitlab and Bitbucket Pipelines fall under the "sufficient" category in my book. I've used both in large productions deploys and, they for the most part, get the job done. Gitlab has moments where it doesn't work - and one needs to go check the status page... and at times report an outage. But honestly, I still place it in higher regards than Jenkins.


Yes gitlab so so much nicer to run on than Jenkins.

Sure there is something missing for more elaborate build pipelines, but not enough missing to block you.

The best think about gitlab vs Jenkins is that there are no plugins. So it’s very clean what is gitlab and what is your config of it.

Whereas Jenkins has so many plugins and config for the plugins that it’s hard to say where Jenkins ends and your build pipelines begin.



Oh no my username is a an actual product :/

Though this looks pretty awesome!


Buildbot as a project has been around since 2003... with a username like that I am surprised you hadn't heard of it before!

Buildbot was the first CI I ever deployed at a company a long time ago. It came before Jenkins!


Jenkins (when it was still just Hudson) was the forerunner for CI. It made a lot of mistakes but only because nothing else was there at the time to learn from, in my humble opinion. The only alternative in the early days was CruiseControl (which became GoCD) and of the two, Jenkins was far better and more advanced.

I made extensive use of Hudson/Jenkins from 2007 to about 2015 and it has flaws, sure, but I know it made some pretty difficult tasks sane for me, and was pretty straightforward building CD pipelines.

I didn't (and still don't) like the decision to adopt groovy but it is better than configuring via UI and I like that it is imperative, at least.

Some of the jobs I built using Jenkins pretty rapidly include full CD from svn push/git commit to production with all the bells, whistles, gates, and stages in-between, to managing failovers, and even my early foray into IAC with rudimentary remote exec scripts and the likes of chef.

I think it became a victim of its own success. It was _flooded_ with contributions that yanked Jenkins in all kinds of directions with no clear owner for direction and maintenance, which lead to frankly some horrific (but "working") architecture within and a jumbled mess of extensions vs plugins vs patches and all kinds of horrific UI changes, often all conflicting.

I believe it was Gojko Adzic that wrote up a blog article (that I can't find) about 10 years ago listing some of the truly horrendous code in Jenkins source. Stuff like abstract classes typecasting themselves to derivatives to access members.

Looking back Jenkins was and is clunky, messy, uncertain of its purpose in life. But so was just about everything to do with CI at the time. Would I use Jenkins again in 2021 and beyond? Probably not but it definitely added value to the build technosphere.


I think you're talking about this blog post: https://web.archive.org/web/20110410011410/https://gojko.net...

I took a peek at the code and it looks like nothing has improved since then. For instance, the Hudson class is now deprecated, and became an empty shell inheriting from Jenkins - which is still a singleton with a public constructor, only now you have to know that the instances of it created must actually be Hudson instances since they're being downcasted to that all over place... Ouch.


Good gravy, this is a lot to unpack. It's an alarming story from the very beginning, and a cautionary tale of how tempting it is to do everything with Jenkins, even though it's an appropriate tool for absolutely nothing in the Year of our Lord 2021.

> As part of our automation setup, we continuously run integrity jobs to inspect our Jenkins nodes.

Why on earth would you self-host this in Jenkins? This is a monitoring and alerting problem.

> These jobs check system configurations and properties and look to see if any node is failing those checks.

What year is it? We've solved this with immutable infrastructure or system integrity monitoring. Or both.

> The checks automatically mark Jenkins nodes as offline when any of those checks fail and notifies our Mobile Build & Release team via a Slack message.

"Mark" offline? Why not just terminate it? And why do we care if build nodes come and go? These should be cattle, not pets. If they all die at once, that's bad. If they're cycling in and out, that's business as usual.

> When our Jenkins UI stopped working, we noticed two things:

> 1. We had recently upgraded Jenkins and all its plugins to a newer version

Did they just now learn what an awful idea this is? All of this at once, really?

This isn't so much a Jenkins problem (though let's be clear, Jenkins is a problem) as it is a remedial engineering problem. The top takeaways should be "choose appropriate tools for the task at hand" and "don't make reckless decisions with brittle systems".


> "Mark" offline? Why not just terminate it? And why do we care if build nodes come and go? These should be cattle, not pets. If they all die at once, that's bad. If they're cycling in and out, that's business as usual.

Given that they are for mobile builds, there might be some macOS nodes in there for iOS builds. These might be in-house machines they maintain -- or, if they use a cloud provider, there might be costs to just killing and spinning up nodes. For example, for EC2 Mac instances:

> EC2 Mac instances are available for purchase as Dedicated Hosts through On Demand and Savings Plans pricing models. Billing for EC2 Mac instances is per second with a 24-hour minimum allocation period to comply with the Apple macOS Software License Agreement.


if that's the case, just restart the failing nodes

and of course it's not that simple, they still have to customize the workflow


I think it's a frog boiling problem.

I start with building my code, then deploying it, then verifying the deployment, a few smoke tests, regression tests, pretty soon all of those concepts are crowding in on the brainspace of monitoring.

It's just one more thing, why slow down to learn a new tool and convince people to use it?

These days it's getting easier for me to requisition a machine to run a dev tool on. That hasn't always been the case, and I'm sure it's not the case everywhere.


It's horrifying that Jenkins is still the industry standard. The whole thing is poorly documented and full of cruft and vulnerabilities. But there is nothing out there as flexible to my knowledge.


Most of the flexibility comes from the plugins which are a security risk. Either they are just abandoned, have very old dependencies or just don't sanitize inputs.

You'd think Cloudbees would take over abandoned plugins, integrate into the main code or just remove them from the repository for safety but they just let them rot.

We had one of the plugins we use brake after an upgrade because of the dependency hell in Jenkins so we ended up contributing to the plugin to remove the dependency. Thankfully the maintainer was still around to verify our fix and update the plugin repository (we obviously built it locally and tested).

To think that anyone could adopt an abandoned plugins (which could have 1M installs) and just insert some malicious code, with minimum or no oversight is really scary.


Its the Nagios of CI tools ;-)


I have to estimate time and materials for a lot of DevOps contracts. I estimate twice the hours, or more for Jenkins CI work vs GitLab CI even if my engineer is an expert in groovy. The complexity of Jenkins adds a huge amount of risk.


Circa 2009, back when it was Hudson (I think?) I once had an idea to rename MyJenkinsProject to !MyJenkinsProject in order to move it to the top of the list alphabetically. When I hit save, the UI explained that this wasn't possible _and that I shouldn't be putting dangerous characters in my project names_. Not to be pushed over so easy, I tried again with a skull and crossbones unicode symbol () in the name. The UI immediately became unresponsive and wouldn't start again until the project was removed.

edit: Interesting HN also stripped out the character: https://www.fontspace.com/unicode/char/2620-skull-and-crossb...


Lexicographic ordering puts non-ascii unicode characters after ascii characters


I ran Jenkins at 2 companies for probably a total of 13 years.

I will do everything in my lower to never use it again as I feel that I wasted so much time fighting it.

Plugins, dependency hell, slow UI and so many other terrible things about it. Having to back the entire thing up on every change because you may never get it back into a usable state if you change something. That even happens if you have scripts in git to set the whole thing up. What a waste of time.

In contrast, I had just as good of a system running in Gitlab in a week and more importantly the other developers are able to pick it up and extend as they wish.


We moved to drone after many years of a love/hate, abusive relationship with Jenkins (junkins was the common name).

Drone has this awesome feature where you can have it hook out and receive a pipeline on the fly. We now generate our pipelines in an api and this way we can write the logic in ”something other than groovy” - typescript in our case.

No pipelines required in repo and everything as code without ugly hacks.

Never looked back even once.


I would simply stay away from Jenkins if you are getting started from scratch. I have used it for months, and the amount of effort you have to poor into it, and still get scaling and outage issues is not really acceptable.

In contrast when using Buildkite [0] you get essentially all the power and flexibility of Jenkins, but without the crushing technical debt, inherent flaws and complexity. Benefits I have seen of Buildkite over Jenkins:

- Never have to worry about scaling a Jenkins master again

- Build history lasts forever, so no need to setup a system to save logs

- Everything can be in your repo, so you are always confident of changes and testing is easy

- The UI is easy to understand and you can link directly to failed log lines

- Control your own workers, and easily setup autoscaling without any drama

- Flexible plugin and annotation system allows for extensibility

0 - https://buildkite.com/features


> It gets executed in a special Groovy sandbox to increase the security posture

Not a jenkins user but I am really curious about what the perceived security issue is here? Why are there all these layers of protection placed on what is presumably internal infrastructure? I can't think of similar protections being applied to other CI systems where you can run arbitrary bash commands and containers (aka: do anything you like). It seems to be one of the most common pain points, but I can't quite understand why it's there in the first place.


Because you can embed credentials in a Jenkins installation (and when you use Jenkins, we do.)

It inevitably gets connected to resources which can cost money, and since it's an internal infrastructure system, it will inevitably be connected with resources which contain replicas of private information. "Because why not, it's an internal system which means it's perfectly safe, and we should really be testing with real user data, you know, for realism."

And once you cross that bridge, you have wholly and truly gone completely radioactive. This is the intersection of financial risk/attack surface and private customer data, and execution of unproven code. If we send new Groovy scripts through an approval process, we can at least narrow the risk of accident or intentional disclosure by manually vetting scripts before they are run for the first time after changes, (or effective sandboxing so they cannot escape and access for example, any secrets that were not intended for them.) But, roundly, one can argue there's not much to be done, as it is an internal system, and each control we place in the way can easily be seen as an obstacle to getting the job done straightforwardly.

Then think of this also: maintaining Jenkins is known to be an operational burden, to say the least. If you have different departments with their own independent need for Jenkins or something like it, neither of these departments is going to want to own the Jenkins instance if it means being generally responsible for uptime and upkeep. You can bet that someone higher up is going to see this as a tremendous opportunity to consolidate and save money. They're going to wind up running on the same environment together, operated by someone who has no idea what either of these departments is really up to.

If you're lucky it's only two tenants, and that shared-service admin department responsible for the Jenkins instance is going to actively pursue the common interest of keeping things secure on behalf of everyone. What's more likely is, shared services platform team only ever hears from their Jenkins customers when something has ceased to function and their Admin access is needed, and for those people it's basically hair-on-fire can't-work-until the admin guy can be reached. After ten or fifteen times and some political pressure, their boss says this is taking too much of our department's time and we're not adding value, so they decide to shirk the admin duties, and give admin credentials to a "designee" from each team.

Now through diffusion of responsibility, nobody is in charge, and everyone is too afraid of breaking anyone else's stuff to ever upgrade even the most basic stuff.

That's right, now you have potentially private customer data, from multiple departments, with access to secrets that may control resources which can be scaled up to cost money, in an environment where code runs for the first time (potentially even code from third party contributors, but not likely if it's an internal instance... right? ...) code running before it has been tested by anyone, in a multi-tenant environment that nobody wants to pay or spend time to maintain, with competing czars that don't really have much incentive to talk to each other, where nobody has enough visibility to safely upgrade anything for fear it will break something and step on somebody else's toes, and if it goes down or stops functioning for basically any possible reason because of any of those teams, the maintenance of business is completely halted until we take care of it; really it's basically even worse than having cats and dogs living in the same house.

Sure, you can implement policy to solve any one of these things individually, but if you write enough policy to make it truly safe, then people will hate you for all the red tape required to work with it, "why can't we do the easy thing which is technically possible" and as you can see, there is so much wrong possible that even the best policy is going to have to concede some things, to make this tremendous expense worthwhile "otherwise why are we even using (footgun) if I can't easily shoot myself in the foot whenever I need to?"


so then somewhat reading between the lines of your answer, the core of this issue arises from the fact that the groovy code is executing "in-process" with the rest of the jenkins server whereas other CI/CD solutions execute whatever customisations you are making outside of the actual server process itself, so there isn't a possibility that the code can "see" into parts of the code it shouldn't be able to.

I am guessing that this enables Jenkins plugins to do a lot more than they could from outside the server process (literally see and manipulate everything that goes on inside) but it does make me wonder if indeed the tradeoff there is worth it and whether for the more routine cusotmisations it wouldn't be better if it was running these scripts outside of Jenkins itself.


I did a technical review of our Jenkins solution with a friendly neighborhood InfoSec agent once. He was interested in a few things:

1) how is the stored credential being encrypted (is it encrypted at rest),

2) is it real encryption (like, does it use an actual encryption algorithm or is someone confusing base64 or rot13 for encryption)

3) where is the encrypted data stored, where are the keys stored, is it properly on a machine that is not situated directly on the internet (users must access through VPN)

4) can I review the code that decrypts the secret (yes you can, there are plenty of walkthroughs and gists published by many different Jenkins fans and users that show where Jenkins stores its encryption keys, and how you can borrow them with POC code you can use to decrypt individual creds or even the whole store, assuming you have admin access to Jenkins)

He didn't really ask or care about sandboxing at all. There are mitigations you can apply for any threat model, but the crux of the matter is that even InfoSec assumed that any Jenkins users would probably have admin access anyway, and that they should all be allowed to decrypt those secrets.

Which happens to be exactly what they tell you in articles about how to manage your Jenkins server anyway:

https://www.codurance.com/publications/2019/05/30/accessing-...

> 7. Treat all credentials stored in Jenkins as plain text

> Treat everyone with access to Jenkins as an admin user, and you will be fine. Once you give someone access, even read-only, to a Jenkins it's game over. All developers on a project should know all secrets anyway.

The sandbox is just a mitigation that has been built in so that people who run untrusted code can do so with some degree of safety. Jenkins servers that may be exposed to untrusted contributors are a completely different animal than internal Jenkins servers. Different use modes experience different threat models differently.

The fact that you seem to believe there is a technical mitigation possible which would save this is potentially part of the problem, too. You're obviously a well-educated person. Organizations of sufficiently large size that settle on Jenkins can fight an uphill battle here, since many mitigations can only reduce risks, but cannot eliminate them. Leadership roles may not be inclined to understand the threat model issue, there is a dichotomy experienced when Jenkins is used to build software in the open vs closed developer models. Running builds for any PR from a random fork of a public repo is not the same threat model as running builds for any branch or tag in a private repo, from an IC with write access. And yet the mechanical actions are quite the same, and there are only about two checkboxes in a typical configuration separating these models from each other.

I hope we are rapidly reaching a place where companies can participate in both Open and Closed style of development, but if they are supported by Jenkins, I hope it's at least two separate Jenkins servers, because the threat models and risk mitigation that are needed for closed-access teams and open-source style public benefit repos are both completely different, even if their activities are not different.

Maybe I am overstating the risks, but I have been a Jenkins admin before, when it was not in my job description, and it was not supported by leadership, who did not believe in the benefits of open development, and made their posture closed by default (as companies are often wont to do.)

And as a result, I'm afraid, before I left that job behind, we dug the hole so deep in the wrong direction... if I could go back and do it again, I would certainly draw some bright lines in different places than we did, and I would not allow customer data in the dev instances. (You can do that kind of testing in a staging environment, where similar controls as production can be applied and reliably maintained. But you should do this as far away from Jenkins as humanly possible!)


Thanks for all the thoughts - great to have insights from someone with so much experience!


Thanks for coming to my TED Talk

(Seriously though, I hope it will get accepted and if so come to my talk on this at Kubecon?)


Jenkins is great because you install it (easy), start it (easy), and anyone (junior to senior) can have it doing useful work within the hour. It is a single place where multiple people can collaborate and self-service. There are special purpose tools for going deep into single aspects of it (scheduled jobs, CI/CD, deployment) but as an all-rounder it's really terrific and the price is right.


I use Jenkins as a glorified cron runner in some contexts. One of the things I hate the most is that it's difficult to define jobs as code (the 'Job DSL' plugin works in most cases, but if you're using certain plugins it's hard or impossible to configure them).

What does everyone else use if self-hosting is a requirement but you don't have an enterprise budget?


Gitlab? Hasn’t done me wrong yet, even if it feels kind of overkill if you need only CI/CD.


For the cron-like part, Rundeck. My builds are still in Jenkins but it's much easier to maintain when it's just doing builds, not workflow orchestration.


I feel so bad for them. Jenkins is really a technology anti-pattern. I keep saying I'll do this, but I really need to write a series of blog posts to elucidate all the ways in which Jenkins is just bad for your business. If you can use any alternative, do it, and for Bob's sake, pay for a solution. Stop trying to cobble together some crap with shitty free tools when CI/CD is critical to the velocity, quality, and reliability of your products.


As a counter point, we had major issues with just paying for it via CircleCI. Excessive downtimes, a UI so slow it was nearly unusable, etc. We decided at the time we couldn't bet our company on it and we moved to Jenkins. We have far more control, build time and build costs are lower, uptime is better (All of this at the expense of initial dev time and maintenance of course). Generally we are a SaaS company, but paying for a solution just doesn't always work when you get larger and when there are limited options out there.


Try gitlab. You can run it yourself. Great up time.

And best of all it’s much easier to run and upgrade than Jenkins.

Jenkins and it’s plugins can get quite brittle and is quite an art to automate the installation and config of all those plugins.


I'm interested in trying Gitlab, but this does not look easy to run (from their docs): https://plantuml.gitlab-static.net/png/U9oDLbrlsZ0KVVUli9gNB...


Unless you’re a very large organization, you’re not going to be managing each of those components individually.

The “Omnibus installation” as they call it is pretty easy to use. Just bring a Postgres DB and a redis instance.


You don't even need to BYO databases - they have an omnibus docker compose setup that runs everything for you.

https://docs.gitlab.com/omnibus/docker/#install-gitlab-using...


The discussion chain is everything that's wrong with modern engineering. A complex topic progressively simplified and abstracted to just "Run this container", ensuring no one in your org actually understands how it works, how to maintain it and how to debug it.


The GP was interested in "trying" Gitlab. If you want to try it out, but are wary of the time involved, isn't a docker-compose installation a great solution?


Gitlab is a complex system. I wouldn’t expect anyone to suddenly understand my companies’ product either.


Modern Jenkins using JenkinsFiles and cloud provisioning can work pretty well.

All things being equal, it's my preference to not use Jenkins for new projects, but anti-pattern is IMO a significant overstatement.


Can you expand more on how it would qualify as an "anti-pattern"? I agree it is slow, has issues with its built-in coverage and capabilities, and has an oldschool UI; but it is at its fundamental core a pipeline runner. It is a decent pipeline runner even, which when it comes down to it is the core of each other CI product [that I've seen].

So to hear it described as an "anti-pattern", when realistically it seems to BE the pattern - just poorly executed, is a bit unintuitive to me.


Probably not possible to describe. Jenkins is just tool, if used wrong it will bite to operators. Problem with jenkins is that it's takes time to setup workflow. You configure to run plugin-install-tool, JaaC, Jobdsl, Shared Libs, some credential store etc. But when it's up and running, oh boy, it's factory. I have run jenkins with 5k+ jobs. All auto gen, no manual interventions etc. Gitlab-CI (same as github actions) I like due opinionated approach. It makes things easier if setup is simple, but when you need exceptions or special cases, hack's begins.

But, yes, I have seen so many badly implemented jenkinses.


(not the person you replied to, but allow me to give a couple of personal annoyances - keeping in mind that they're a couple of years old and things may have improved since)

My main problem with Jenkins was that its architecture made it extremely difficult to automate its provisioning without having to click through the ui at all.

This led to the second big problem which is that updating either Jenkins or one of its plug-ins when they got a new cve (which was every other day) was quite stressful because you could never be sure if something would break - especially for plugins that depended on other plugins (case in point this post)

I have since moved to concourse which has a much more sane architecture - at least for these things.


1. it isn't designed as a cloud-native configuration-as-code immutable service. The way it stores and loads configs, jobs, logs, build workspaces, etc is all 1990s tech. Every modern replacement does these things much better. These inherent design flaws set up all the later problems.

2. configuration as code is an afterthought, so it doesn't work very well.

3. the only way to manage Jenkins as described initially requires learning four different DSLs, although for developers to write jobs only requires learning three (JobDSL to load jobs from JCasC, Jenkinsfile for simple pipelines, Groovy for complex ones). This is ridiculous.

4. the plugins are atrocious, there's too many of them, they don't have good enough features much of the time, managing and upgrading them is always a pain.

5. CloudBees doesn't even maintain the core stuff correctly. The current Jenkins container comes with a new plugin manager which, by default, does not respect pinned plugin versions. That's literally the most basic thing you can do for operational stability. I filed a bug in January, and they didn't feel like fixing it, so I got them to merge a note at the bottom of their README instead mentioning the bug.

6. pipeline libraries are a costly maintenance and development pain. Having to write Groovy code just to write pipelines is horrible. Jenkinsfiles are, although much better than Groovy, still an over-complicated, unintuitive mess.

7. there's no simple way to deploy, maintain, test, and upgrade a Jenkins cluster. You have to maintain multiple clusters, increasing cost and complexity.

8. since most people don't set it up right (because it is so overcomplicated), the jobs, server configurations, and build history are not backed up, there's no version control. So when something goes wrong, the whole thing is hosed. Unless a Jenkins expert took 6 weeks to set it up perfectly.

9. due to all the above problems, you end up with a million different Jenkinses, all in various states of insecurity, brokenness, and wildly different configuration, making them incompatible with each other. This makes for a gigantic maintenance cost that never ends.

10. literally all of it is completely proprietary to Jenkins. Unless you build it and the jobs in a very particular way (which makes it impractical to use) none of it can be re-used in a different system.

That's off the top of my head. There's more reasons.

The point is, organizations will invest literally thousands of man-hours in making Jenkins work, slowing down their product development, forcing everyone to use this old-ass over-complicated piece of junk. If they took the same amount of cash they could buy literally any proprietary CI/CD system and do everything much faster and better. But the organization doesn't see the hidden costs until it's too late and they desperately want to replace it. Jenkins is not just bad, it actively holds back your organization.


I struggle to see how this justifies the claim that "Jenkins is really a technology anti-pattern".

You are describing bad maintenance, bad architecture, and bad execution, I fully agree with this. Jenkins is clearly old and it has old approaches to complexity.

But an "anti-pattern" implies that using it moves you further from your goal. When I was a noob with 2y of software development, primarily writing React & NodeJS APIs, I stood up a Jenkins VM and was able to correctly set up a CI/CD system building, testing, and deploying containerized microservice-based architectures via Docker and Jenkinsfile alone. I encountered extremely few issues with the core of Jenkins, because it is a pipeline runner and it has a lot of ways to run pipelines.

So to me, it looks like you use the word "anti-pattern" too liberally, since I don't think there are any other free open-source pipeline runners that would meaningfully integrate git webhooks and clone source code easily for me. But perhaps you disagree because of your final opinion, which is that standing up production-grade CI/CD for bigger workload would be faster avoiding Jenkins.

Still, not sure I even agree with that claim. I have seen it used to great success in many contexts. Does it have lots of problems? Yes. Anti-pattern? Tough sell.


> The way it stores and loads configs, jobs, logs, build workspaces, etc is all 1990s tech.

I agree with a good chunk of what you said, but.

It's files, my friend, files. Tech from the 1970's.

Nothing wrong with that, inherently, and they are easy to inspect, repair if needed, you can use standard tools, etc.

The design of the file structure is maybe the issue, because it makes high availability complicated, but just using files is not necessarily a bad idea.


Correct, in some circumstances, files are great. They suck for Jenkins.

What are they? Lots of different things: build logs, job configurations, server configuration, secrets, cached unpacked plugins, build workspaces, etc. Some of those you want in S3, some you want in a database, some you want on fast ephemeral storage, some you want in a credential store. Good luck with that; only the secrets are doable with plugins.

Where are they? Sitting on some EC2 instance's ephemeral or EBS storage. But you don't want them there, so now you have to throw a bunch of crappy wrappers in to occasionally move them if you want them somewhere else. (Even if you do JCasC/JobDSL/Jenkinsfiles for version-controlled configuration and secrets, you may still want to back up your build artifacts and logs)

And them being files, it doesn't scale. Using EBS? Only one host can mount it (unless Nitro), so good luck scaling one box's workspace filesystem past one gigantic EBS volume, or doing master-master. And you have to clean up the filesystem every time a plugin or core version changes, or the cached version on the filesystem will override your container/host's newer version. Using EFS? Network filesystems suck (scalability + reliability + security + performance woes).


I'll note that Cloudbees offers commercial support for Jenkins. Having said that, I had a former employer that was a Cloudbees customer and the support we got was typically "Have you tried turning it off and back on?" That drove us back to OSS Jenkins. Although, AIUI, said employer has now moved to Harness.


So just to clarify, they rolled out the latest version to production which broke? What's the staging environment for then?


Always surprised when I see people still using jenkins, must be because of history? you wouldn't choose it today..?


It could just be my bias against this company. But I find their "Engineering blogs" about fixing their poor product / process rather lame. I get this vibe of "How we fix our inconsistent metric generation problem by using AtomicInteger instead of Integer"


I have never heard the word runbook but I am going to borrow it. I can confirm that uncontrolled updates are the number one of instability in our Jenkins environment, where plugins seems to be the most vulnerable, to Jenkins core updates, other plugins and their own updates




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: