acac10's comments

acac10 · 2025-01-06T14:28:22 1736173702

How is this equivalent?!? In fact, stronger seller identification is what makes the site safer for everyone. I think you're really missing the point of stronger identification on commercial sites. It should be done for sellers AND buyers alike. It would drastically decrease the number of bad transactions if people knew they were no longer shielded by anonymity.

mixmastamyk · 2025-01-06T16:42:38 1736181758

Not worth it to me to sell an occasional ~fifty dollar item. Would rather leave on the sidewalk or throw in the trash if need be.

acac10 · 2024-12-02T11:04:46 1733137486

// Taking another slant at the discussion: Why kubernetes?

Thank you for sharing your experience. I also have my 3 personal servers with Hetzner, plus a couple VM instances in Scaleways (French outfit).

Disclaimer: I’m a Googler, was SRE for ~10 years for GMail, identity, social, apps (gsuites nowadays) and more, managed hundreds of jobs in Borg, one of the 3 founders of the current dev+devops internal platform (and I focused on the releases,prod,capacity side of the platform), dabbled in K8s on my personal time. My opinions, not Google’s.

So, my question is: given the significant complexity that K8s brings (I don’t think anyone disputes this) why are people using it outside medium-large environments? There are simpler and yet flexible & effective job schedulers that are way easier to manage. Nomad is an example.

Unless you have a LOT of machines to manage, with many jobs (I’d say +250) to manage, K8s complexity, brittleness and overhead are not justifiable, IMO.

The emergence of tools like Terraform and the many other management layers in top of K8s that try to make it easier but just introduce more complexity and their own abstractions are in itself a sign of that inherent complexity.

I would say that only a few companies in the world need that level of complexity. And then they will need it, for sure. But, for most is like buying a Formula 1 to commute in a city.

One other aspect that I also noticed is that technical teams tend to carry on the mess they had in their previous “legacy” environment and just replicate in K8s, instead of trying to do an architectural design of the whole system needs. And K8s model enables that kind of mess: a “bucket of things”.

Those two things combined, mean that nowadays every company has soaring cloud costs, are running things they know nothing about but are afraid to touch in case of breaking something. And an outage is more career harming than a high bill that Finance will deal with it later, so why risk it, right? A whole new IT area has been coined now to deal with this: FinOps :facepalm:

I’m just puzzled by the whole situation, tbh.

KaiserPro · 2024-12-02T11:35:22 1733139322

I too used to run a large clustered environment (VFX) and now work at a FAANG which has a "borg-like" scheduler.

K8s has a whole kit of parts which sound really grand when you are starting out on a new platform, but quickly become a pain when you actually start to implement it. I think thats the biggest problem, is by the time you've realised that actualy you don't need k8s, you've invested so much time into learning the sodding thing, its difficult to back out.

The other seductive thing is helm provides "AWS-like" features (ie fancy load balancing rules) that are hard to figure out unless you've dabbled with the underlying tech before (varnish/nginx/etc are daunting, so is storage and networking)

this tends to lead to utterly fucking stupid networking systems because unless you know better, that looks normal.

p_l · 2024-12-02T14:16:42 1733149002

I'll put it this way:

Every time I try to use Nomad, or any of the other "simpler" solutions, I hit a wall - there turns out to be a critical feature that is not available, and which if I want to retrofit into them, will be a hacky one-off that is badly integrated into API.

Additionally, I don't get US-style budgets or wages - this means that cloud prices which target such budgets are horrifyingly expensive to me, to the point that kubernetes pays itself off at the scale of single server

Yes, single server. The more I make it fit the proper kubernetes mold, the cheaper it gets, even. If I need to extend something, the CustomResourceDefinition system makes it easy to use a sensible common API.

Was there a cost to learning it? Yes, but honestly not so bad. And with things like k3s deploying small clusters on bare metal became trivial.

And I can easily wrap kubernetes API into something simpler for developers to use - create paved paths that reduce the amount of what they have to know, provide, and that will enforce certain deployment standards. At lowest cost I have encountered in my life, funnily enough.

riku_iki · 2024-12-02T19:08:01 1733166481

> Every time I try to use Nomad, or any of the other "simpler" solutions, I hit a wall - there turns out to be a critical feature that is not available

Maybe you could give example of feature in case of nomad?

p_l · 2024-12-03T12:31:08 1733229068

I will give example of just few things that literally bought me lots and lots of savings in hours spent on working, that are all in use on "single server cluster":

1. Ingress and Service objects vs. Nomad/Consul Service Discovery + Templating

This one is big, as in really big thing. Ingress and Service API let me easily declaratively connect things with multiple implementations involved, and it's all handled cleanly with type-safe API.

For comparison, Nomad's own documentation tells you how to majorly use text templating to generate configuration files for whatever load balancer you decide to use, or use one of two they point to that have specific nomad/consul integration. And even for those, configuring specific application's connectivity happens though cumbersome K/V tags for apparently everything except port name itself.

You might consider it silly, but Ingress API with it's easy way to route different path prefixes to different services, or specify multiple external hosts and TLS, especially given how easily that integrates (regardless of used load balancer) with LetsEncrypt and other automated solutions, is an ability you're going to pick out from my cold dead hands.

Similarly the more pluggable nature of Service objects turns out critical when redirecting traffic to appropriate proxy, or doing things like exposing some services using one subsystem and others with another (example: servicelb + tailscale).

In comparison Nomad is like going back to Kubernetes 1.2 if not worse. Sure, I can use service discovery. It's very primitive service discovery where I have to guide the system by hand with custom glue logic. Meanwhile the very first kubernetes in production I set up had something like 60 Ingress objects setting up 250 domains which totaled about 1000 host/path -> service rules. And it was a puny two node cluster.

2. Persistent Storage handling

As far as I could figure out from Nomad docs, you can at best reuse CSI drivers to mount existing volumes to docker containers - you can't automate storage handling within Nomad, more or less you're being told to manually create necessary storage, maybe using terraform, then register it with Nomad.

Compared to this, Kubernetes' PersistentVolumeClaim system is a breeze - I specify what kinds of storage I provide through StorageClasses, then can just throw a PVC into definitions of whatever I am actually deploying. Setting up a new workload with persistent storage is reduced to me saying "I want 50G generic file storage and 10G database-oriented storage" (two different storage classes with real impact of performance/buck for both).

Could I just point to a directory? Sure, but then I'd have to keep track of those directories. OpenEBS-ZFS handles it for me and I can spend time on other tasks.

3. Extensibility, the dark horse of kubernetes.

As far as I know none of the "simpler" alternatives have anything like CustomResourceDefinition, or the very simple API model of Kubernetes that makes it easy to extend. As far as I understand Nomad's plugins are nowhere close to the same level of capability.

The smallest cluster I have currently uses following "operators" or other components usind CRDs: openebs-zfs (storage provisioning), traefik (easy trackable middleware configuration beyond unreadable tags approach), tailscale (also provides alternative Ingress and Service implementation), CloudNative PG (automated Postgres setup with backups, restores, easy access with psql, etc.), cert-manager (LetsEncrypt et all, in more flexible ways than embedded into traefik), external-dns (let's me integrate global DNS updates with my service definitions), k3s' helm controller (makes life easier in loading external software sometimes).

There's more but I kept to things I'm directly interacting with instead of all CRDs currently deployed. All of them significantly reduce my workload, all of them have either no alternative under Nomad or very annoying options (stuffing configuration for traefik inside service tags)

And last, some stats from my cluster:

  4, soon to be 5 or 6, "tenants" (separate namespaces), without counting system ones or ones that provide services like OpenEBS
  Runs 2 VPN services with headscale, 3 SSOs, one big java issue tracker, 1 Git forge (gitea, soon to get another one with gerrit), one nextcloud instance, one dumb webserver (using Caddy). Additionally runs 7 separate postgres instances providing SQL database for aforementioned services, postfix relays connecting cluster services with sendgrid, one vpn relay connecting gitea with VPN, some dashboards, etc.

And because its kubernetes, my configuration to setup for example new Postgres looks like this:

  local k = import "kube.libsonnet";
  local pg = import "postgres.libsonnet";
  local secret = k.core.v1.secret;
  {
    local app = self,
    local cfg = app.cfg,
    local labels = app.labels,
    labels:: {
      "app.kubernetes.io/name": "gitea-db",
      "app.kubernetes.io/instance": "gitea-db",
      "app.kubernetes.io/component": "gitea"
    },
    dbCluster: pg.cluster.new("gitea-db", storage="20Gi") +
      pg.cluster.metadata.withNamespace("foo") +
      pg.cluster.metadata.withLabels(app.labels) +
      pg.cluster.withInitDb("gitea", "gitea-db") +
      pg.cluster.withBackupBucket("gs://foo-backups/databases/gitea", "gitea-db") +
      pg.cluster.withBackupRetention("30d"),
   secret: secret.new("gitea-db", null) +
      secret.metadata.withNamespace("foo") +
      secret.withStringData({
        username: "gitea",
        password: "FooBarBazQuux",
        "credentials.json": importstr "foo-backup-gcp-key.json"
      })
  }

And this is older version that I haven't updated (because it still works) - if I were to setup the specific instance that it's taken from it would have even less writing.

bigfatkitten · 2024-12-04T21:02:22 1733346142

> Unless you have a LOT of machines to manage, with many jobs (I’d say +250) to manage, K8s complexity, brittleness and overhead are not justifiable, IMO.

Because it looks amazing on my CV and in my promo pack.

0xbadcafebee · 2024-12-02T14:15:51 1733148951

Same reason they'll make 10 different microservices for a single product that isn't even 5K LoC. People chase trends because they don't know any better. K8s is a really big trend.

acac10 · 2024-07-01T10:00:44 1719828044

The key to this is machine AND human readable formats. That is why the text format of protobufs is such a good approach: - structured formats - declared schema - the parser to read the file is also a validity checker (at least for data types)

There are tools that do similar with JSON but none are as simple and unobtrusive as protobuf tool chain.

acac10 · 2024-07-01T09:57:08 1719827828

When SOX compliance became a thing (remember Sarbannes-Oxley?), many moons ago, I was able to send EY consultants away during their audit for Unix controls, just by showing them that all our (Autodesk Unix team) configs were in SVN, logged, with authors and full change control. The Windows team had to spend months figuring out something that was not even remotely close to control or auditability. Frankly, I just assumed that it was the std practice today and I am surprised people are still not doing it.

acac10 · 2024-07-01T09:48:56 1719827336

I think the takeaway from your comment is that you really do not understand the sheer scale of volume of changes done by +100k engineers, their heavy reliance on tools that do cluster-based testing (not local dev machine) and the tens of thousands of tools that perform auto-commits.

It’s a lot easier to shit on a big corp. Intellectually lazy, too.

floating-io · 2024-07-01T10:23:55 1719829435

Are you saying that GitHub and GitLab.com don't have that kind of volume?

(edit: the point being that at least one of those tools existed when they started Piper if I have the timeline correct).

Xeamek · 2024-07-01T12:09:21 1719835761

>Are you saying that GitHub and GitLab.com don't have that kind of volume?

On a single project? Verry much doubt either of them do

floating-io · 2024-07-01T12:20:18 1719836418

Yeah, that's moving the goalposts.

The point is that either product should theoretically have the ability to handle Google's load. They were both built for scale. If I have the timeline correct, they existed at that point. I would expect those organizations to have positively salivated at the possibility of having Google as a customer; they would have bent over backwards to make it support Google's workflows.

The ROI of such a choice vs going custom is the interesting question.

Of course, if I have the timeline wrong, then the discussion is moot.

tylerhou · 2024-07-01T12:35:19 1719837319

There is a big difference between one VCS handling 1,000 QPS and 1,000,000 VCS instances each handling 0.001 QPS. A system built for one is not necessarily suitable for the other.

floating-io · 2024-07-01T12:45:40 1719837940

See my response to bananapub below.

kccqzy · 2024-07-01T16:57:07 1719853027

It's not moving the goalpost because you didn't understand where the goalpost was in the first place.

Lots of tiny repos like GitHub? Easy sharding problem. They don't even need Paxos. One large monorepo? Everything changes. Let us not forget that GitHub cannot even handle the Homebrew repo be cloned and updated by Mac users and asked them to switch to a CDN.

Xeamek · 2024-07-01T13:32:50 1719840770

Its not moving the goalpost, it's part of google's requirements that make their case unique (at least more than average)