Hacker News new | past | comments | ask | show | jobs | submit login
Crowdsec: A Fail2Ban alternative written in Go (github.com/crowdsecurity)
170 points by jaden on Oct 19, 2020 | hide | past | favorite | 93 comments



I like the idea of a Fail2Ban alternative, but I'm not sold on the idea of sending information to a third party. It may be optional to upload/download the malicious IP addresses, but the text on this project's GitHub page doesn't indicate that it is optional.

My immediate thought is "Fail2ban works. I don't need or want communication with a 3rd party right now."


I think that's fair, and I had the same initial reaction. However, people are generally fine with DMARC or other crowdsourced spam signals, so it's not clearly bad.

I haven't looked at the source, but it should be easy to turn the crowdfiltering off or self-host if you have your own larger enterprise, no ?

It also depends on what exactly is transmitted. Transmitting all IP addresses in the clear might be bad, sending out hashes might be better ?

I think it's actually a good idea to try to detect those IP at scale. On your own individual server, you just see a million login attempts from a million different nodes on a botnet. But if you have millions of servers report the bad IPs and aggregate, you can see patterns and hopefully catch whole botnets.


> However, people are generally fine with DMARC or other crowdsourced spam signals, so it's not clearly bad.

DMARC isn't a crowdsourced spam signal. And from my memories chatting with people running mail servers—the crowdsourced spam filtering is fairly controversial.


Apologies, I was thinking of the spamhaus list, which I'm also not sure now is crowdsourced in the sense most people would use.

But most email providers build spamfilters using 'crowdsourced' techniques between their users (if enough users mark a sender spam it might be marked spam for all users).

And while that may be controversial, it's also highly effective.


Speaking from personal experience working with these reputation warehouses and with handling a good deal of incoming and outgoing email volume, they aren't any more effective than other IP-based spam heuristics and if everyone configured their DKIM correctly these warehouses would be largely obsolete. I don't see a problem with crowdsourcing heuristic data per se, but email is a much narrower field than internet traffic at large and I can see a lot of the ways that poorly configured fail2ban type software could accidentally ban a lot of valid traffic.


> Transmitting all IP addresses in the clear might be bad, sending out hashes might be better

How many IP adresses are valid? Can't you hash them all and compare them with the hash you receive?

2^32 IPs is 4GB * 4 Bytes (1 IP) + 4GB * size of the hash should be the needed space I think.


1) That is not what DMARC is.

2) I won't be using any "security" tool that automatically shares details on active mitigation efforts.

> I think it's actually a good idea to try to detect those IP at scale.

It is! But not with my machines.


Sorry, we should have made it clear, it is totally optional. You just don't get the IP rep DB part of the soft if you don't share, but the behavior is still 100% functional.


This sounds terrible. I’ve locked myself out of a machine once because I remembered the password wrong. That was annoying. I might have done it without noticing when trying to log into the wrong machine accidentally more than one. Locking myself out of all machines would be way worse.


I mean you could also imagine someone somehow seeding your ip address into the crowdsourced fail2ban to maliciously deny you access to your machine for maintenance, or to trigger some sort of opsec regression to get access.


Spoofing an IP is not exactly hard, so ... yeah, I'm a bit skeptical of the idea.

Yet at the same time I think setting up a long running reputation weighted reputation clearing house would be amazing. (And if nothing else at least something like this could emerge from all the crypto-staking-reputation research. See Polkadot and Ethereum's plans about keeping folks honest with bounties (reverse staking?) for example.)


Well actually Spoofing on a private network is trivial, but in TCP over a public network, it's another story entirely and it's not simple at all. UDP can be easily spoofed though, hence we do not treat reports in the same way so far. Beyond this, there is eventually BGP spoofing, but funny enough, CrowdSec could detect them, provided you have logs. It should be fairly easy to track in terms of behavior.


You wouldn't have to spoof. I said seed, not spoof - you could just spin up a bunch of servers, connect it to the crowdsourced security networks, and issue false claims that IP address X has been attempting to break into your network, and suddenly block X from their own servers.


Well we have a consensus system that's quite advanced to avoid poisoning and false positives. To put it short, all members have a Trust rank, only TR1 can publish an IP without counter verification, and only if it doesn't shoot a Canari from our whitelist of IPs. TR1 mean perfect accurate reports for 1+ year. All other TR level can partake but need counter verification from either our own honeypot network or other TR1 peers before being integrated. There is also an AI that will be trained soon to confirm false negatives and extract more complexe patterns.


Thanks for the details!

So basically anyone joining the network for the next year sits in limbo, the network is not capable of catching more "bad IPs" for that year, because any report by new members requires cross-verification by the original nodes/honeypots.

This seems pleasantly conservative. Also, is there a way for nodes to lose trust rank? (How will the network find out if a TR1 node is reporting false negatives?)


this is accounted for. By default you have a whitelist containing local lan IP ;)


Yeah I have 10+ machines with fail2ban configured and 0 of them on my LAN though.


well just whitelist your Public IPs or use a combo of IPset & port knockd. Works fine for me for variable IPs.


LAN addresses, eh? People do still use internet addressing on our networks despite the consumer CPE vendors increasingly trying to sell you NAT stockholm syndrome :)


the machines i'm ruining fail2ban on are on public networks, not my LAN.


You don't use static IPs for managing systems?


There is some discussion in the diquis about them providing a server you can run yourself. So if you have a fleet of servers you don't have to share with crowdsec but instead can share only with yourself (my read, perhaps a hybrid)


yepn indeed, we call it private sharding or private consensus. Far on the roadmap (4 months), but nevertheless, the team is thinking about it. You could also include or exclude some Geographics for ex if you don't trust a country or have a private consensus between only your own machine. If you are, say Morgan Stanley, you may be attacked only you by some machines and the crowd wouldn't know. But all your servers teaming together will see it.


To me the main value proposition would be blocking anyone who was observed attacking one of my servers or even a honeypot. I'm not opposed to sharing that data as far as GDPR allows, but receiving block lists is something I would be a lot more careful about.


I have the opposite feeling.

I don’t see why we need to replace fail2ban, but I find the idea of a global repository of bad IPs and reputation very interesting.



Fail2ban is a bit shaky when it transitions to Python3.

Hasn’t pass the pylint,?futurize, or 2to3 by much.

But if the maintainers are willing, I can do those things.


Not sure what you mean by shaky, fail2ban has been working under Python 3 for quite a while. (See for instance fail2ban on Ubuntu 16.04 depending on python3[0].) People definitely would have noticed.

I saw your issue[1] on GitHub, and that's just a misunderstanding of how fail2ban was ported to Python 3. fail2ban uses the use_2to3 feature of setuptools.setup to do automatic translation upon installation.[2] While I do wish the code runs under Python 3 as is, what users actually run is totally working code.

[0] https://packages.ubuntu.com/xenial/amd64/fail2ban

[1] https://github.com/fail2ban/fail2ban/issues/2853

[2] https://github.com/fail2ban/fail2ban/blob/960e30cfcdae7e2c81...


Yes. It may be totally working code.

But it wasn’t written cleanly as many tools would attest.


Those tools tell you if you comply with some persons style choices, and maybe catch a few potential problems. That says close to nothing about cleanliness or correctness. They're for internal use, not for making external measurements of code.


Plenty of people have asked for fail2ban support for Caddy, but since Caddy v2 has transitioned to structured logging, this hasn't been very easy.

I'd love to see integration with Caddy here, I'm sure many people would appreciate a Caddy plugin that can do what they'd typically use fail2ban for.


Hi Guys, thanks for all your feedbacks. (I'm part of the CS team) I'll try to address some few questions.

1/ You don't have to communicate. If you don't, you get a modern, fast, decoupled fail2ban with many various remediations (instead of just drop) and observability. What you don't get though are the IPs spotted by the crowd and curated by us. You don't contribute, you don't get them, fair. If you contribute, only offending IP / timestamp / scenario triggered are sent back to us to establish what we call a consensus (to avoid false positives and poisoning)

2/ We are super vigilant and sensitive about privacy. We made the architecture and many other crucial points compatible with GDPR (EU Law framework regarding private data handling)

3/ IP sent: We could hash it, but it's very easy to reverse. Maybe have a public/private key encryption, quite a good point, I'll tell the team, thx.

4/ You can contribute scenario in YAML or data source connectors in Grok. We are not hardcore for or against any language, but Go allows portability (we'll release Win & Macos binaries) and is container friendly, plus super fast, easy to read and scalable. Ever since we released, tons of proposal were made to port it to a 'real' language, sorry we are fine with that choice, no intent to change, no intent to convert anyone either ;)

5/ Herd immunity is what we want to create indeed. We tried to explain the combination of Behavior + Reputation by using an analogy with Waze. It worked but is less accurate. I prefer the one with Immune system.

We are available for direct dialog on gitter. allow just some delays depending on your time zone, we are based in France, so CEST. (https://gitter.im/crowdsec-project/community) we answer in French & English.

Try it, it's free, MIT licensed and stable: https://github.com/crowdsecurity/crowdsec

Thanks,

Philippe.


Is it possible to shed some light on your "curation platform" (is that part open source or documented somewhere)? How does it defend against, say, poisoning attacks from someone who controls a large number of seemingly reputable nodes that don't fall into a single IP block?


Sure. To put it very short, we give every user a trust rank. It varies overtime. If you consistently, and for a long while, reported attacks that could be correlated by others and our own botnet, you progress, until you reach trust rank 1. Other are listened to, but need double verification from TR1 and/or our own botnet. We also have a canari list, which contains IP not too shoot. (Google bot, Microsoft update, DNS, etc.) and last but not list, an AI mashing logs to extract larger patterns. With this "consensus" chamber, a weighted vote is cast and the IP is then included in the DB. (We are always on the more conservative side if in doubt) If you ever feed bad intel, your TR regresses. If you shoot a canari, we'll either presume you are trying to poison or that your scenario is too twitchy. If you feed a bad intel, you TR regresses instantly and your voice weight less in the consensus (actually nothing in fact). IP are fresh, they were seen doing crap at max 72h before. Beyond this threshold, we consider them not relevant anymore and wait for a refresh. So if you are doing "headshots", 0 days style, only, the system would have hard time catching you. But if you port / web scan, bruteforce, do credential or CC stuffing, or whatever else, the system catches you quite quickly.


How is reversing the hash of an IP easy? Are you saying because there are only 4billion ipv4s? Hash should be fine for ipv6 still, right?


Well hashing is (usually) a symmetric function and we are open source... Meaning you could recover the key in the code (or intercept it during transfer). I think Private/Public key is a simpler approach, reusable elsewhere in the code and it's known to be safe. But I'm not the CTO either, I could be mistaken.


Hashes aren't symmetric and don't use a key.


Here, I found this really useful to understand hashes: https://crackstation.net/hashing-security.htm


(but I think they already send it through HTTPS)


But would you want to only block single IPv6 addresses? Usually whole blocks are assigned as far as I know? So just hashing a single ipv6 would probably not work very well.


Can you shed some light on what the premium offers are going to be?


Sure, People activate the sharing of what they spot or not. If they do, no money is asked for them benefiting from the global IP rep DB. The one willing to use it without contributing will be able to do so, through API calls, but at a (moderate) cost. We'll have 2 plans, premium & enterprise. They we will provide support, tools for fleet management (like deploying a policy on X/Y/Z servers from a central location), AI (to spot larger trends), cold log analyzes (forensic, but harder coz of GDPR), tailor made bouncer responses, bounce back to us and self IP monitoring (to see if they are caught in a consensus, hence have been hacked). Also, bouncers, the components blocking Ingress IPs are able to work without the GO daemon, by just using the IP rep DB. Think for exemple someone willing to protect a group of IoT machines, low CPU, low mem, the API approach is close to costless in terms of resources and allow those machines to be protected without running the daemon.


Hi - this looks great, but can you share what your plans are for the premium features in the futures? It feels like the big value here for you is the crowd-sourced info that you can aggregate from all users. What concerns me is investing time in a tool that builds value for you but then the features get stripped back when the premium offering lands. Cheers.


no risk here. Tool is MIT, if community doesn't like our approach, you fork it. So we'll be faithful to our commitments and this licensing model is the best insurance for it. Now, I can also tell you that people using the free software and contributing IPs will get back, for free, the IPs dangerous for their technology footprint. (like if you use Wordpress / SSH & Nginx scenario, you'll get the IP attacking those). Free. Period.

We monetize the aggregated, curated data and the features we offer that cost us infrastructure to run.


I love reading these go projects. I don't use Go professionally and spent minimal time with it on side projects. Apparently that was enough for the language because I can hop right into the codebase and basically understand everything in one-shot. A big language benefit, even if it maybe it was frustrating at times for the author.


Already replaced fail2ban for sshguard [1], which I like better. But I'll be testing this even not being a fan of crowd sex :-)

[1] https://www.sshguard.net/


CrowdSec is for all protocoles / system generating logs (can be Cloud trail, syslog, kafka, etc.) and can ban at an applicative, user or IP level.


From the sshguard site: "Started for SSH, now protects a wide range of services out of the box"


I should maybe have told you also, team members are from pentesting and high security hosting background. We also have created some other OSS components before, like NAXSI (Waf over Nginx), Snuffleupagus, PHP malware finder, etc. So we faced the hurdles of assembling, deploying, configuring, handling and maintaining sectools in our Devops & Secops environments, and we thought this tool with our years of experience in mind.


Is the fact that it is "written in Go" a selling point?


Yes! For those who know Go, it means they'll be able to hack/improve/fix the software themselves.

This question gets posted on every single "X written in Y", and I can't help but think it's an effortless way to broadcast some strange form of superiority (namely: by showing my exasperation with Go enthusiasts, I place myself in the category of people unimpressed by Go. Bonus points for mentioning Rust or Haskell.)

This feeling is at odds with open source culture, where the ability to understand the code you're running is absolutely central. If you value Open Source, it should be pretty easy to understand how "written in language X" is a valuable piece of information.


Correct me if I am wrong but doesn't Go yield better performance than Python?


Yes, by quite a decent margin.


The title doesn't say that it's open source, though. (nor its license) It's specifically advertising itself based on its language, not its qualities.


GitHub wasn't a dead giveaway? How about the MIT License badge on the page?

Come on, now...

I think it's safe to say that one should check these (obvious) things before posting a snarky comment. It seems to me that this is part of the HN community ethos.


> GitHub wasn't a dead giveaway?

There have been numerous source-available but proprietary github projects posted to HN.


Good thing you can check the license!


Some people prefer using a single binary rather than requiring a python installation.


Yup. Perhaps the language shouldn't matter as much, ideally maybe we'd talk about "runtime features" - which matter to the readers. "Single Binary, static link, No GC" etc. But the language serves to me as a small proxy for most of those attributes.

Tell me a game engine is written in Python or Go and i can infer a lot about the intended audience or runtime performance characteristics.

The HN crowd seems to be so annoyed by language recently, but to me they just scream of missing the point entirely. /shrug


For me definitely! I have a tiny VPS running ARM that I use as a build server. I get so much SSH spam that it noticeably slows down the VPS (both with and without Fail2Ban)..

"Written in Go" signifies to me that it can be faster and more resource-efficient than Fail2Ban: a good reason to check it out.


Is this really much of a problem? I have a VPS that's been online for years, serving port 22. I average about 200k attempts per year. I have it set to pubkey only, root can't login at all. If you connect without sending a pubkey, it pretty much instantly tells you to go away. I don't bother with fail2ban.

Maybe I should start logging attempted pubkeys as a side project just to see what pops up.


You should consider changing default SSH port. It helps a lot with the spam.



and/or add an iptables rule that limits the rate, set high enough such that you'll never hit it

1 line in your iptables config


Only if you had issues with fail2ban's prior performance or ease of installation. Otherwise we start getting into partisan point scoring competitions over what critical services are written in what languages, which is incredibly tedious.


Yes, but it's just a trend. Could as well have been written in React.

Joke apart, the trend to rewrite any single thing in Go/Rust is scary: why take something that is working and standard, and make it new? People have tried this hundreds of times as per hackernews history, and it's mostly not worth it.

However, it is a good training, tutorial for Go.


> the trend to rewrite any single thing in Go/Rust is scary

Many useful projects get abandoned just because someone made a more popular alternative.

In 2-3 years the rewrite in go/rust fad will fade and both the new and the old projects end up abandoned.

I'll be downvoted to hell for this: jumping on fads harms the FLOSS ecosystem.

Additions: also, static liking and embedding many dependencies harms Linux distributions.


Of course not. It's the same as with the Docker hype a few years ago. I don't get it. Still today everyone seems to think an app is better when it's "dockerized". Now with Go everyone can write buggy, memory leaking programs. I prefer apps written in C whenever a I can.


Not sure why on earth you think something would be non-buggy or non-leaky just because its written in C. I avoid C for almost any user mode work save a few things.


I'd expect the C to be buggier, personally


Garbage collection should make apps written in Go less memory-leaking and less buggy than those written in C, no?


we need a fork of HN


I rather use ossec http://ossec.net/


different approach, but I'm sure at some point we'll get close to one another.


That would be nice, its good to have alternatives


Or Wazuh (ELK stack fork)


Beware:

> Crowdsec is in BETA version. It shouldn't, and didn't crash any production so far we know, but some features might be missing or undergo evolutions. IP Blocklists are limited to very-safe-to-ban IPs only (~5% of the global database so far, will grow soon)


Absolutely. We owe that transparency to our users. The 1.0 should be out in a month from now, and it will include a Local API, an abstraction layer between the core and the bouncers & data sources. This will help the community to dev their own scenario, bouncers & data source connectors. But at that stage, we had no report of CrowdSec daemon bugging a server, crashing it or over consume resources. It's even used by some hosting companies, to process their reverse proxies logs, without any meaningful perf impact. That being said, it's still beta because some features or architectural points could vary a lot at that early stage. We only distribute 5 to 10% of the IP rep DB because we are over cautious and don't want any false positive to happen. Our Consensus algorithm is getting better by the week by we are cautious by nature (and experience)


For seven years, I've been using a home-grown Fail2Ban alternative called txrban:

http://www.kylheku.com/cgit/txrban/tree/


looks really interesting! thanks for sharing.

How does it fare in terms of performance compared to fail2ban? as far as I can see, fail2ban can chrun for quite a bit of cpu digesting logs. Is crowdsec faster/lighter?

another question regarding backwards compatibility... eg porting fail2ban configs, action scripts, jails etc. I guess it’s not going to be a drop in replacement, but are there any porting efforts, recipes, scripts, docs to help with the transition?

sorry for the crappy formatting, but using my mobile atm and couldn’t wait to ask :)


Perfwise, we have a user that previously used fail2ban to block some http botnets. He crunches 7000 IPs worth of logs in 50 mins with F2B. under a minute with CrowdSec. Another block 3000 IPs doing credit card stuffing directly at payment page, very quickly as well.


I don't like fail2ban because with IPv6 it becomes useless.


It's still relevant even if you don't switch to network blocking. I haven't heard of any bots brute forcing over IPv6 yet, which will be much harder due to the size of the address space but those two aside...

A bot is unlikely to reconfigure the host's network stack to grab or rotate additional IPv6 addresses. That type of behaviour would be very easy to detect by endpoint protection systems and shut down.

When scanning, scraping, and/or brute forcing service passwords they're likely to remain using the same IPv6 address either permanently or on a daily rotation, most likely this will be mostly impacted by OS defaults on privacy addresses as I don't actually expect many normal users to know and/or care about them.

So if you're attacked on IPv6, you'll likely be equally protected by fail2ban as you are on IPv4.


That should be just a matter of making firewall blocks at least a /64, and considering scans/source-ips also as a netblock instead of individual ip's.


Fyi, I looked into where the attacks came from. 99.5% China, no joke. I blocked whole B networks from China, and wouldn’t you know it - less break in attempts, less vulnerability scans, less SMTP spam. By orders of magnitude, night and day difference.


Not enough people do this. Using country-level block-lists dropped the number of IP/Port scans we have received down to sub 1% of totals. It may not be an elegant approach to the problem but it is VERY effective.


We'll (soon) provide a Backoffice, where you can choose which IP you decide to ban (based on their activities, like bot scrapping, bruteforcing, etc.) but also add some 3rd party blacklist, block some AS or ranges, Tor exit nodes or VPN. This is all being builded right now, but in a couple of months, should be available.


Changing the default port takes a lot less effort than finding and using a country-level block list, and is just as effective in cutting down intrusion attempts.


Sshguard is another fail2ban alternative that is worth a look.


CrowdSec is not designed specifically for SSH. It can ingest any type of logs and answer with a bouncer at pretty much any level. IP/Session/User/software stack. Ie, we are working on Magento to parse all logs (apache, magento's logs, etc.) and provide a bouncer that is user aware, at an applicative level. Some people are experimenting it to parse logs from airplane communications, to see if pilots behavior is close to a standard or deviate. We have experimentations on BGP protocol, etc.


Neither is sshgurd, jftr


Marketing riding on the COVID19 wave is distasteful (Let’s achieve a “digital herd immunity")


The concept of "herd immunity" far predates COVID-19 and is an accurate enough analogy for what they're trying to accomplish.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: