Hacker News new | past | comments | ask | show | jobs | submit login

GitLab employee here.

We have internal tooling that we're working on incorporating into GitLab itself to help with this: https://about.gitlab.com/blog/2021/08/19/introducing-spamche....




Hello, do you have any plans/desire to support ActivityPub federation on Gitlab? It's a killer-feature-to-come for Gitea and certainly would help dealing with spam, as admins could allowlist trustworthy instances on an opt-in basis, enabling easy cooperation across related communities.


I don't think it's currently scheduled: https://gitlab.com/gitlab-org/gitlab/-/issues/30672


Yeah i saw that issue two years ago. It's sad nothing has moved on here, whereas the forgefriends project (ex-fedeproxy, not directly related to forgefed) has been super active in the past year (checkout their monthly reports) in this area of forging interop.

EDIT: someone on that issue summarized the issue pretty well:

> Its really annoying how fragmented gitlab is rightnow. I have a dozen accounts on a dozen instances. This feature combined with oauth login to other instances, would make it like there is one big gitlab we all use!


Sorry, I'm not sure I understand. How does that help "dealing with spam?"


Because once you have federation you can either use an operator/domain web-of-trust, or you can use allow/denylists on your instance. That's how email or XMPP is kept mostly spam-free (on a selfhosted server most spam - if not all - i receive is from gmail addresses, not from selfhosted servers who are easily denylisted if they start sending spam).

In particular, if an instance or specific repository concerns only people from specific projects/instances, it would be easy to allowlist those specific instances and not have to deal with spam at all.


> on a selfhosted server most spam - if not all - i receive is from gmail addresses, not from selfhosted servers who are easily denylisted if they start sending spam

And most - if not all - potential GSoC contributors are from gmail addresses. So again, I don't understand how this could be a general solution to spam.


I think you don't get my point. I'm not advocating for denylisting gmail.com because it produces spam (although this has tempted me on more than one occasion), i'm saying fighting spam in federated environments has decades of experience of various techniques that work well. Open nodes (eg. remailers) have terrible reputation and are denylisted pretty much everywhere, but specific communities/servers can maintain a decent reputation as long as they have some form of moderation/cooptation. By opting into the federation, Gitlab could support various advanced workflows depending on your threat model:

- a new organization using your project? maybe grant their whole gitlab instance "issues" read/write access to the project

- publishing FLOSS in a "community" setting where random people submitting contribution is not expected? maybe we can check the PGP WoT before deciding whether to accept that PR

- running a federation of organizations, some of whom may run their own instance? allowlist all the instances so they can interact across instances

- running a public forge like gitlab.com, codeberg.org, or chapril.org? maybe maintain an allowlist of servers who ask for it and pledge to fight spam

- feel adventurous? setup an entirely public instance and help catch spam and reporting it to denylists

All this is already possible on email level, but pointless as you pointed out as trustworthiness of the mail server is not correlated to trustworthiness of the forge.


Sounds like a potential solution.

When will it ship?


Looks like it shipped in 14.8 (4 days ago)

https://docs.gitlab.com/ee/user/admin_area/reporting/spamche...


Wait a sec... this is from the feature request[1]:

> Just because I don't think I said it explicitly anywhere above: Because we are using an obfuscated, non-free component (the preprocessor), we can't include spamcheck in CE (users of CE expect no proprietary code to be included in the pacakge), but only in EE.

So... is it available in the current version of gitlab-ce or not? I don't want to waste time trying to get it running only to find out you've only made it available for enterprise editions and gitlab.com.

1: https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6259


Non-free obfuscated code cannot be included in the community edition unfortunately. https://gitlab.com/gitlab-org/gitlab-foss/-/blob/master/LICE... The architecture in https://gitlab.com/gitlab-org/spamcheck#architecture-diagram shows the spam detection, where the ML training models remain obfuscated to not give spammers an advantage.

You can run EE without license, it provides the same features as CE. Maybe that is an option for you: https://docs.gitlab.com/ee/update/package/convert_to_ee.html I've created an MR to help clarify the docs: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/81751


If I didn't care about the open source license I'd simply use github. (Which, unfortunately, may be the only solution that doesn't continue eating more and more of my time.)

Anyhow, this sounds like a death knell for gitlab-ce. My GSoC use case isn't fringe (there are 100s of GSoC orgs), and Gitlab wouldn't have spent money on the ML approach for EE if it weren't generally important.


Oh wow, thanks!

I'll have a look.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: