Also, it's already possible to have your own comntr server: you just need to git clone the comntr/http-server repo, npm install & npm start it, and tell the iframe to use your server with ?srv=https://foobar.com:42751. This would be an "on-prem" solution.
My first post about the web extension idea got some interest (and almost 150 stars on github!), so I've made the next logical step: an <iframe> that renders the comntr.github.io page and effectively adds comments to your page. The spam problem is partially addressed by the filters: the iframe.src can have a special ?filter=[...] param that can hide some of the comments you don't like. Although it's unlikely one can bypass the security model around those filters (but you are welcome to try!), advanced spammers can still post a lot of garbage to those comments as they can easily generate new ed25519 keys.
That's a great collection of libraries for a common problem that many people run into. If you're interested, I made a library for [1] hacker news comments on static sites. Let me know what you think.
to just get a few comments (not quick, unmoderated discussions or in the thousands), a radical strip-down may suffice – neither server-code nor JavaScript, let alone a 3rd party: Pure atom xml rendered to html client-side via xslt in an iframe. No comment markup. Feel free to express yourself in unicode.
The hard problem in this space is spam prevention. Its not immediately clear me to if this even addresses it? Other than giving ban/filter controls to a human.
I think most of the spammers can be deterred by a simple puzzle, like 23+47. If we want to raise the bar, we make the puzzle more and more complex. Obviously, the puzzle is returned as an svg picture where the letters are "rendered" with little squares. My point is that 99% of the spammers out there are lazy and won't be able to pass this little test vs someone who's written a thoughtful comment and can definitely add 23 to 47.
Em.. "off-the-shelf OCR" sounds neat, but anyone who knows such words isn't an average spammer. The goal of basic SVG puzzles is to block 99% of the spammers who just type dumb comments on keyboards. The rest 1% can be taken care of by human mods.
TBH, I don't like the reCAPTCHA-like solutions. They are just annoying from my personal experience and if they rely on any 3rd party service, I'll give them a hard pass for this reason alone. My approach is to use trivial SVG-style captchas with adjustable complexity, e.g. instead of asking "23+34", we can ask "log(32)/log(2)" and effectively filter out everyone except people familiar with math, or "md5(2615), first 7 hex digits" and let in only people familiar with cryptography. Forcing users to detect birds and crosswalks will just make them upset, IMHO.
It's completely dependent on the traffic of the site if a spammer takes the time to break a custom captcha.
I work on a site with 10 million monthly pageviews and spammers register on a form that has recaptcha and email verification... and we tried hidden input fields and other tricks, but each day we have consistently had 5 new spam accounts. With SVG they can just take a screenshot of what a user sees and send that to OCR. Complex math will turn away as many legitimate users as spammers.
The only real way to stop spam is to use a 3rd party API to detect it, or use something like a karma system that builds up over time. I think we're at the point where simple solutions won't work well unless you have a small site.
That's true when we talk about 10M monthly pageviews, but I doubt that this little extension will reach such popularity levels. If this somehow happens, by that time there will be a way to enable 3rd party captchas for any page.
The catch is that the text will be represented as small geometric svg shapes, so the spammer will need to first render the svg to png and then run text recognition tools. But in that svg we can easily add some css animations that make sure the entire image is never rendered, so spammers will need to run the entire browser to take screenshots and will need to assemble the image from multiple frames.
Won’t it have to be converted to a raster image before it can be OCRd?
Granted all you need to do is render it to a canvas but that’s an extra step on top of everything you need for a raster image, I’m not sure it’s easier.
And just rendering to canvas may be very tricky if the captcha is animated with css, i.e. it moves a bit and different parts of it appear at different times.
It can in theory be solved, but the more important question is: will it be solved?
I used a bunch of one-word-answer questions for over a decade now for sucessful spam prevention — trivial for a determined attacker with the time and resources to circumvent (and similarily trivial for me to replace with something else).
This also means for a decade I didn’t ship my user data to google.
Unless you are a really juicy target fending off the bots is enough.
> Commento is a proud recipient of the Mozilla Open Source Support award. The $19,200 grant was given in recognition of Commento's contributions to make the internet more privacy-friendly.
That said, it uses akismet for spam control, which I'm not sure how trustworthy it is... But definitely better than nothing!
Happy or not, I'm not sure that too many people seem to appreciate the data leakage that happens through WordPress and their affiliated products (such as Akismet). Considering how much people comment negatively about Gooogle Analytics, I'm always surprised that WordPress & co manages to slip under the radar, constantly. From https://automattic.com/privacy-notice/ (the owners of Akismet):
> Site Comments: When a visitor leaves a comment on a Site, we collect that comment, and other information that the visitor provides along with the comment, such as the visitor’s name and email address.
> Technical Data from a Visitor’s Computer and Etcetera: We collect the information that web browsers, mobile devices, and servers typically make available about visitors to a Site, such as the IP address, browser type, unique device identifiers, language preference, referring site, the date and time of access, operating system, and mobile network information.
> We may determine the approximate location of a visitor’s device from the IP address. We collect and use this information to, for example, tally for our Users how many people visit their Sites from certain geographic regions. If you’d like, you can read more about our Site Stats feature for WordPress.com sites and Jetpack sites.
> Akismet Commenter Information: We collect information about visitors who comment on Sites that use our Akismet anti-spam service. The information we collect depends on how the User sets up Akismet for the Site, but typically includes the commenter’s IP address, user agent, referrer, and Site URL (along with other information directly provided by the commenter such as their name, username, email address…oh, and the comment itself, of course).
> A cookie is a string of information that a Site stores on a visitor’s computer, and that the visitor’s browser provides to the Site each time the visitor returns. Pixel tags (also called web beacons) are small blocks of code placed on Sites. Automattic uses cookies and other technologies like pixel tags to help identify and track visitors and Site usage, and to deliver targeted ads
> We also collect any other information that our Users provide to us about visitors to their Sites. For example, a User may upload a directory or other information about Site visitors and customers to the “backend” administrative platform for managing the Site.
How We Use Visitor Information
> We use information about Site visitors in order to provide our Services to our Users and their Sites. Our users may use our Services to, for example, create and manage their Site, sell products and services on their Site, flag and fight comments from spammers, and collect information through polls, quizzes and other surveys.
> We may also use and share information that has been aggregated or reasonably de-identified, so that the information could not reasonably be used to identify any individual. For instance, we may publish aggregate statistics about the use of our services.
Then there's Gravatar, which is another method of user tracking and I'm sure there's a slew of other ways.
Talkyard deals with spam by 1) letting you configure manual review of the first N comments, say first 3 comments, by a new user. If someone posts 3 on-topic comments, then likely s/he is not a spammer and no need to review, thereafter, unless flagged. And 2) Akismet + Google Safe Browsing, optionally. (I'm developing Talkyard, https://www.talkyard.io/blog-comments )
For high traffic websites, simply the task to review new comments by new users, can be too much work. So needs to be combined with something automatic like Akismet. Based on what I've heard.
Or rather time-delay moderation, as it's easier to implement. Comments are added to the server as usual, but the web client shows them only after 1 hour.
That's right. My first attempt was to use IPFS or DAT. Figured out it's not quite possible, but we can get very close to that, in theory. Imagine the extension or the iframe could run a ipfs.js or dat.js that would discover all the http servers with comments via DHT: servers that want to participate, publish a unique key to the DHT and the web clients discover this key and then the IP addresses of the servers. In practice, this doesn't quite work because DHTs are based on the assumption that any node can quickly ping (with a UDP packet) any other node and thus perform the DHT discovery using the Kademlia algorithm in log(N) steps. But in the web, the only way to "ping" someone is to set up a p2p connection with WebRTC: this not only needs a signaling relay, but also implies a multi step exchange with SDPs and has other costly overhead. And I haven't even approached the Symmetric NAT problem. This is why ipfs.js hogs CPU, allocates a 1 GB and keeps 4-8 sockets always open (they aren't even p2p now, but rather web sockets to some relay, for perf reasons).
It is technically centralized but with the option to self host the comments server. Given that the source code is out there for a minimalist comments server, that saves a ton of work for anyone who just wants to make a blog/social site and doesn't necessarily want to build a full fledged REST service for it.
https://github.com/comntr/http-server/blob/master/src/handle...