We have two perspectives, the normal user and the scraper. The normal user either acquires a /64, /56 or /48, depending on the whim of their ISP. There is typically no cost difference between these options, which means that the scraper (or their upstream proxy) always chooses a provider which offers a /48.
Thus, the default unit of IPv6 blocking must be a /48. This situation will persist for as long as /48s are readily available at the same price as /64s.
Perversely, the reason we don't have this issue in IPv4 space is because the address space is of the same order of magnitude as the number of potential users. That artificial scarcity means that a routable /24 is x256 the cost of a end-user /32, so the unit of blocking can be a /32.
> Perversely, the reason we don't have this issue in IPv4 space is because the address space is of the same order of magnitude as the number of potential users.
Do we not? And is it really?
There are some /32 IPv4 addresses hosting many users, e.g. with CG-NAT, and it's already an issue with regards to blocking/rate-limiting.
Just like there are single-user /48s and multi-user /64s, there can be single-user /32s and /32s hosting tons of users behind a CG-NAT.
Sure, but that's the same argument I'm making: the unit of blocking will be the largest unit that is routinely allocated to a single user. In IPv4 space that's a /32, so people block by /32. In IPv6 space that's a /48, so people block by /48. Check out Let's Encrypt's rate limit policy, for example.
The difference I'm pointing out between IPv4 and IPv6 is that nobody is giving single IPv4 users /24s for their own use. But IPv6 /48s (which are theoretically somewhat equivalent to IPv4 /24s) are freely available. This is a problem because it makes over-blocking even more likely than it already is. And as you point out elsewhere, over-blocking is already an issue in IPv4 space.
If you have two addresses in the same /64, you know almost certainly they are on the same LAN.
If you have two addresses in neighboring /64s (same initial 63 bits), or in general within the same /48 (same initial 48 bits), you know almost certainly that they are somehow within the same organization. They could be within the same building, or in the same company, or using the same ISP or cloud provider; you don't know, but they are somehow related. How do you know? Well, since a /49 isn't individually routable in BGP, they have to _somehow_ originate at the same upstream network. There has to be some sort of cooperation between them (possibly through an ISP as a middleman).
But if they are in _neighboring_ /48s, you don't have this kind of guarantee. They could be from completely different organizations. Most likely, they are on the same continent (since they were given out by the same regional internet registry; RIR), but even that is not really guaranteed.
So when you are bucketing addresses for rate limiting purposes, a /48 is a reasonable place to start doing that, just like /24 is for IPv4. Of course, you may need to get smarter than that (e.g. an attacker could have access to a /32), but it's a reasonable starting spot.
> So when you are bucketing addresses for rate limiting purposes, a /48 is a reasonable place to start doing that, just like /24 is for IPv4.
I've encountered assumptions such as this one as a user, and they're really frustrating.
More than once I've found myself banned from being able to log in, view a site etc. because of somebody else's bad behavior I temporarily share a CG-NAT or large public Wi-Fi with, or more likely because somebody topologically close to me got hacked.
Meanwhile, actual attackers are using pretty much the entire IPv4 space worth of compromised embedded devices spread across the globe...
gnfargbl didn't say the _address_ was unroutable, but that effectively, routing policies mean that /48 is a common minimum unit for administrative purposes (similar to how /24 has a special “minimum size” meaning for IPv4).