I run a Gitlab instance. It wouldn't be a pain except that user spam is nonstop....

dnsmichi · on Feb 26, 2022

GitLab employee here.

You can disable signups, or require all users to being approved by an administrator, if that works for your instance. https://docs.gitlab.com/ee/user/admin_area/settings/sign_up_... There are more ways, like limiting specific domains for signup.

Future spam detection ideas shared in https://news.ycombinator.com/item?id=30479511

jancsika · on Feb 26, 2022

Thanks.

> You can disable signups,

I want potential GSoC participants to be able to sign up.

> or require all users to being approved by an administrator, if that works for your instance.

I don't have time to hand separate the Indonesian casino spammers from the potential GSoC participants. And I do mean "by hand"-- the Gitlab UI requires me to click a button to open up a secondary menu, then choose add or delete, then wait for the user screen to reload.

At least when sifting through an email spam folder back in the 90s I could press the delete button multiple times in a row. Even that would be a relatively usable solution.

dnsmichi · on Feb 26, 2022

Thanks for the additional context. Agreed, manually approving and filtering is not efficient here. Spamcheck suggested in https://news.ycombinator.com/item?id=30480296 should be the path.

mathstuf · on Feb 26, 2022

The API is good enough to write some Python code that does this way faster. Some autoclassification based on keywords helps a lot too.

I made some scripts to do this, but would have to extract them from beside the user data in the repo.

boleary-gl · on Feb 26, 2022

GitLab employee here.

We have internal tooling that we're working on incorporating into GitLab itself to help with this: https://about.gitlab.com/blog/2021/08/19/introducing-spamche....

southerntofu · on Feb 26, 2022

Hello, do you have any plans/desire to support ActivityPub federation on Gitlab? It's a killer-feature-to-come for Gitea and certainly would help dealing with spam, as admins could allowlist trustworthy instances on an opt-in basis, enabling easy cooperation across related communities.

boleary-gl · on Feb 26, 2022

I don't think it's currently scheduled: https://gitlab.com/gitlab-org/gitlab/-/issues/30672

southerntofu · on Feb 26, 2022

Yeah i saw that issue two years ago. It's sad nothing has moved on here, whereas the forgefriends project (ex-fedeproxy, not directly related to forgefed) has been super active in the past year (checkout their monthly reports) in this area of forging interop.

EDIT: someone on that issue summarized the issue pretty well:

> Its really annoying how fragmented gitlab is rightnow. I have a dozen accounts on a dozen instances. This feature combined with oauth login to other instances, would make it like there is one big gitlab we all use!

jancsika · on Feb 26, 2022

Sorry, I'm not sure I understand. How does that help "dealing with spam?"

southerntofu · on Feb 26, 2022

Because once you have federation you can either use an operator/domain web-of-trust, or you can use allow/denylists on your instance. That's how email or XMPP is kept mostly spam-free (on a selfhosted server most spam - if not all - i receive is from gmail addresses, not from selfhosted servers who are easily denylisted if they start sending spam).

In particular, if an instance or specific repository concerns only people from specific projects/instances, it would be easy to allowlist those specific instances and not have to deal with spam at all.

jancsika · on Feb 26, 2022

> on a selfhosted server most spam - if not all - i receive is from gmail addresses, not from selfhosted servers who are easily denylisted if they start sending spam

And most - if not all - potential GSoC contributors are from gmail addresses. So again, I don't understand how this could be a general solution to spam.

southerntofu · on Feb 27, 2022

I think you don't get my point. I'm not advocating for denylisting gmail.com because it produces spam (although this has tempted me on more than one occasion), i'm saying fighting spam in federated environments has decades of experience of various techniques that work well. Open nodes (eg. remailers) have terrible reputation and are denylisted pretty much everywhere, but specific communities/servers can maintain a decent reputation as long as they have some form of moderation/cooptation. By opting into the federation, Gitlab could support various advanced workflows depending on your threat model:

- a new organization using your project? maybe grant their whole gitlab instance "issues" read/write access to the project

- publishing FLOSS in a "community" setting where random people submitting contribution is not expected? maybe we can check the PGP WoT before deciding whether to accept that PR

- running a federation of organizations, some of whom may run their own instance? allowlist all the instances so they can interact across instances

- running a public forge like gitlab.com, codeberg.org, or chapril.org? maybe maintain an allowlist of servers who ask for it and pledge to fight spam

- feel adventurous? setup an entirely public instance and help catch spam and reporting it to denylists

All this is already possible on email level, but pointless as you pointed out as trustworthiness of the mail server is not correlated to trustworthiness of the forge.

jancsika · on Feb 26, 2022

Sounds like a potential solution.

When will it ship?

boleary-gl · on Feb 26, 2022

Looks like it shipped in 14.8 (4 days ago)

https://docs.gitlab.com/ee/user/admin_area/reporting/spamche...

jancsika · on Feb 26, 2022

Wait a sec... this is from the feature request[1]:

> Just because I don't think I said it explicitly anywhere above: Because we are using an obfuscated, non-free component (the preprocessor), we can't include spamcheck in CE (users of CE expect no proprietary code to be included in the pacakge), but only in EE.

So... is it available in the current version of gitlab-ce or not? I don't want to waste time trying to get it running only to find out you've only made it available for enterprise editions and gitlab.com.

1: https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/6259

dnsmichi · on Feb 27, 2022

Non-free obfuscated code cannot be included in the community edition unfortunately. https://gitlab.com/gitlab-org/gitlab-foss/-/blob/master/LICE... The architecture in https://gitlab.com/gitlab-org/spamcheck#architecture-diagram shows the spam detection, where the ML training models remain obfuscated to not give spammers an advantage.

You can run EE without license, it provides the same features as CE. Maybe that is an option for you: https://docs.gitlab.com/ee/update/package/convert_to_ee.html I've created an MR to help clarify the docs: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/81751

jancsika · on Feb 27, 2022

If I didn't care about the open source license I'd simply use github. (Which, unfortunately, may be the only solution that doesn't continue eating more and more of my time.)

Anyhow, this sounds like a death knell for gitlab-ce. My GSoC use case isn't fringe (there are 100s of GSoC orgs), and Gitlab wouldn't have spent money on the ML approach for EE if it weren't generally important.

jancsika · on Feb 26, 2022

Oh wow, thanks!

I'll have a look.