IPv6 has a different representation format, where one (and only one) sequence of fields containing only 0 bytes may be replaced by a single '::', turning '2001:0:0:1:0:2' into '2001::1:0:2'.
I never quite understood why they chose to reuse the colon as a separator character, because this is problematic when you want to append a port number. '2001::1:0:2:8080' could either be '2001:0:0:1:0:2 on port 8080', or '2001:0:1:0:2:8080 on the default port'. The following (ugly) syntax is used to resolve that ambiguity: [2001::1:0:2]:8080.
More IPv6 hackery: When using a link-local address (every interface has a link-local IPv6 address), you can append a zone index to the address in the brackets as routing information of where to send the link local request. For windows it's the interface number, for Unixes it's the interface name.
To test this out, first find a link-local address on your network. To find all link-local addresses on eth0, use this command to ping the all-nodes link-local multicast group:
ping6 -I eth0 -c 2 ff02::1
Now append the percent sign and zone index to the address and connect to the host on a port that is probably open.
This is not true. 0.0.0.0 may end up at localhost for you, but it's not localhost. I just get:
PING 0.0.0.0 (0.0.0.0): 56 data bytes
ping: sendto: No route to host
From what I can tell, the use of '0' or '0.0.0.0' as a destination address is not official. It should only be used as a source address. From RFC 5735 - Special USe IPv4 Addresses:
0.0.0.0/8 - Addresses in this block refer to source hosts on "this"
network. Address 0.0.0.0/32 may be used as a source address for this
host on this network; other addresses within 0.0.0.0/8 may be used to
refer to specified hosts on this network.
Support for it looks to be flaky. 'ping 0' and 'ping 0.0.0.0' work for me on various Linuxes ("64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.026 ms"), but not on Windows 7 ("sendto: Cannot assign requested address") or Windows Server 2003 ("Destination specified is invalid").
My understanding is that 0.0.0.0 is the opposite of localhost as binding a daemon to 0 would bind that service to every interface and on any available IP. This means if you have MySQL (for example) listening on 0, it's not just listening on your localhost address but also on any WAN / LAN IPs you have as well.
#!/usr/bin/ruby
a, b, c, d = $*[0].split('.').map(&:to_i)
i = a * 256**3 + b * 256**2 + c * 256 + d
puts "dec: http://#{i}"
puts "hex: http://0x#{i.to_s(16)}"
puts "oct: http://0#{i.to_s(8)}"
Looks like Internet Explorer recognizes them all as the same address (when I hover over the link, it shows the link target in dotted decimal). Chrome does the same thing for 1, 3, and 4 but doesn't like the number > INT_MAX. Firefox thinks they're all different.
I recognize the other formats, but what format is "http://7393249866/"? It opens about:blank in chrome. How is it distinguished from the format above it?
7393249866 mod 2^32 is 3098282570. I'm not convinced you can do that - although it works in Firefox, telnet and ping don't like it.
Edit: I highly doubt that this is standard behavior. I suspect that implementations that permit this are storing the number in a 32 bit integer and not checking to see that it was truncated.
http://7393249866/ doesn't seem to work in chrome as it doesn't believe it's a link. Entering it directly in the URL-bar doesn't work either.
All of the others is shown as `184.172.10.74` on mouse-over.
Went to RSnake's website and encoded the address of Facebook to post on my wall. Facebook does not let me post it.
After 3 or 4 tries, it blocked my account.
That article is focused on ping and glibc but the reason why this works is fundamental to IPv4 itself.
All IPv4 address are 32 bit addresses. That's why each number of an IPv4 address that people are familiar with are called octets. They represent 8 bits of the 32 bit address. So if you have 127.0.0.1, then the first octet is 127, the second is 0 and so on.
Any representation of this 32 bit number (including the most familiar A.B.C.D format) is a kind of short-hand for representing that 32 bit number.
I tried finding a link to a particular site that was phenomenal for learning the entire TCP/IP stack but I haven't seen that site since at least 1998.
But basically, if you want to spend time learning about IPv4 right before IPv6 becomes the dominant standard, then look into how hosts use the combination of IP address and subnet mask to determine if something is on the local network, how hosts use ARP to translate a 32 bit IP address into a MAC address and all the other low level protocols.
No, it works because the tool allows it by virtue of how it's implemented. Find a different ping or inet_aton implementation and it won't necessarily work.
Interesting, I went searching and learned something new. You are right. Apparently the textual representation of IP addresses was not properly specified in any RFC, so this is in fact a quirk of the implementation.
Find a different ping or inet_aton implementation and it won't necessarily work.
At best, you seem to be picking nits. More likely though I think you just have a very surface level understanding of what is going on here.
The bottom line is that, as I said already, an IPv4 address is a 32 bit number. It's also true that various tools that communicate with IP addresses utilize different methods to arrive at the underlying 32 bit number. But it's not right to say that "this only applies to systems using ping from iptools and glibc" as the article you linked says. Many of these "tricks" worked in all manner of operating systems and applications. I was discussing many of them in IRC using Windows 3.1 and Trumpet Winsock before 1994.
Again, the bottom line is that an IPv4 IP address is a 32 bit number and there are dozens of ways that various applications allow you to arrive at that number. It's silly for an article or someone quoting it to attempt to say it only applies to one implementation.
I rather think they're talking about different things.
Rachel analyzed things from the bottom up, seeing the scope of how much worked from the implementation. But the flexibility of an implementation doesn't tell you much about the specification, as implementations are usually more lenient and broad than the specification.
300bps is talking about how in_addr is a 32-bit structure and indeed contains s_addr; IPv4 addresses are fundamentally 32-bit unsigned integers.
To actually shed some light on this one would need to look into the specified encoding of in_addr in URLs.
Although the URI syntax for IPv4address only allows the common
dotted-decimal form of IPv4 address literal, many implementations
that process URIs make use of platform-dependent system routines,
such as gethostbyname() and inet_aton(), to translate the string
literal to an actual IP address. Unfortunately, such system routines
often allow and process a much larger set of formats than those
described in Section 3.2.2.
For example, many implementations allow dotted forms of three
numbers, wherein the last part is interpreted as a 16-bit quantity
and placed in the right-most two bytes of the network address (e.g.,
a Class B network). Likewise, a dotted form of two numbers means
that the last part is interpreted as a 24-bit quantity and placed in
the right-most three bytes of the network address (Class A), and a
single number (without dots) is interpreted as a 32-bit quantity and
stored directly in the network address. Adding further to the
confusion, some implementations allow each dotted part to be
interpreted as decimal, octal, or hexadecimal, as specified in the C
language (i.e., a leading 0x or 0X implies hexadecimal; a leading 0
implies octal; otherwise, the number is interpreted as decimal).
These additional IP address formats are not allowed in the URI syntax
due to differences between platform implementations.
A host identified by an IPv4 literal address is represented in
dotted-decimal notation (a sequence of four decimal numbers in the
range 0 to 255, separated by "."), as described in [RFC1123] by
reference to [RFC0952]. Note that other forms of dotted notation may
be interpreted on some platforms, as described in Section 7.4, but
only the dotted-decimal form of four octets is allowed by this
grammar.
IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
dec-octet = DIGIT ; 0-9
/ %x31-39 DIGIT ; 10-99
/ "1" 2DIGIT ; 100-199
/ "2" %x30-34 DIGIT ; 200-249
/ "25" %x30-35 ; 250-255
Whenever a user inputs the identity of an Internet host, it SHOULD
be possible to enter either (1) a host domain name or (2) an IP
address in dotted-decimal ("#.#.#.#") form. The host SHOULD check
the string syntactically for a dotted-decimal number before
looking it up in the Domain Name System.
So in this light, the headline on this page is both right and wrong. 3098282570 is indeed a valid IPv4 address; it is within the range of a 32-bit unsigned integer. It's not required or recommended to be recognized as such in host name contexts, though, and http://3098282570/ is not a valid URI.
I don't care who someone is because I evaluate individual statements on their own merits. In Rachel's own statement, she just learned "why this worked" "the last time it came up on HN". I got my first modem in 1985 (have you noticed my username?) and been in IT for longer than most HN users have been alive. But feel free to tell me how I don't have a right to discuss something because I'm correcting the Rachel Kroll.
Agreed. I read that statement with the same disregard. Unless she wrote the entirety of the ip stack specification, the argument is moot. Not one single person does or can know every piece of the technology puzzle. This amounts to calling me stupid for using gnome because Linus likes KDE, or more recently vice versa.
rachelbythebay didn't say it only applies to one implementation. She said it was implementation specific. A subtle distinction, but she's correct. Just because something is fundamentally a 32-bit number, that doesn't automatically mean that all possible representations of that number will be converted to it. They have to be implemented to do so.
"if you want to spend time learning about IPv4 right before IPv6 becomes the dominant standard"
Not as much is changing as you think, at least for basic concepts and intro level stuff. Simpler in some ways.
To kick over a real bee hive, as I guy who's used ipv6 for I donno a decade or so now, the main issues are NAT6 love it (noobs) or hate it (folks who know what they're doing), believe it or not its easier to make a statefull firewall than to do NAT, some folks have bizarre ideas about ipv6 netmasks and what other people should be permitted to have, the concept of RFC1918 in ipv6 is more complicated, router advertisements RA love them or filter them, you probably need DHCPv6 or maybe not, and some firewall nuts filter DNS traffic into packets too small to hold AAAA records (but large enough to hold A records) Other than that ipv6 is just obese ipv4, the concepts of subnet and mask aren't changing, you've still got ARP just different address family in the protocol...
(edited to add my favorite topic, hand editing text zone files is probably not the most scalable way to implement reverse (or forward!) DNS for ipv6 from pure PITA perspective. Then again doing things by hand instead of automating is a PITA for ipv4 not just ipv6.)
The submission is a little misleading; it's traditional on HN to link to an article discussing the principle, rather than simply a demonstration. However, the OP highlights an interesting point. There are other, analogous, unexpected behaviours like this one, such as numeric constants with leading zeros being interpreted as octal, that arise from the ubiquity of the C standard library. You see them in Java all over the place.
Edited to Add:
Don't think good penetration testers (and malware writers, alas) don't already know about this trick. But, as http://xkcd.com/1053/ pointed out, it's always great to teach new people old things.
yeah, some years back there was also some C code in an issue of 2600 (or maybe a bash script? I cant remember) that obfuscated URLs in this manner. The article that the script was attached to actually used school URL filtering as the example case use.
Yeah I've made a similar post here before and got a few upvotes. https://news.ycombinator.com/item?id=5763739 It's worth noting that the address isn't converted by the browser, but by the network stack.
The downvoters probably found a lot of pr0n and warez and (or?) unsavory content at that address. Imagine how painful it would be if you clicked on that and saw the legendary goatse, or tubgirl.
An IPv4 address is a 32 bit string of bits. That 32 bit value can be represented in many different ways, but none of those ways are IPv4 addresses. The actual address is that string of 32 bits, and that is what is used in the TCP stack inside your OS or your router.
The inverse of this actually bit me when I was porting my company's app from iOS to Windows 8. We have a custom URI scheme to open stories in our app that looks like foo://<int>. On iOS, we just take the bit after the scheme and use it directly. Windows, however, was giving it to us as a dotted-decimal string, so we had to convert back from that to the integer representation before using it. That was the day I learned that these different kinds of representations could be used and to not assume anything about URIs.
I first discovered these 10digit decimal formatted IPv4 address about a year ago when testing/analyzing android apps. Created a quick python scripts to convert back and forth.
This is why it's a good idea to use getaddrinfo and getnameinfo for IP address validation and normalization. While we're at it, use sockaddr's to represent addresses. in6_addr doesn't hold the IPv6 address scope/zone ID.
The fact that this is 'news' or a 'trick' to some people says a lot about the current state of HN. IP addresses have integer representations by design! Are the commenters here really so removed from actually doing work that they've never seen a database that stores IP addresses as integers? (Hint: ALL of the good ones do). That's to say nothing of the fact that there are clearly many people here who don't understand what binary is.
At least it's good to know that the radical, ridiculous, Reddity mentality that's been plaguing HN as of late is coming from a clearly different group of people than those who used to comment. Looks like it's time to move on to greener pastures.
I haven't been on Hacker News long enough to speak definitively about the current state of HN (a bit less than two years, but I spent the first one lurking), but I would be more concerned about comments that complain about 'HN changing' (as someone who was on Reddit to hear complaints about all the Diggers... then all the 4chaners.. then all the high schoolers..) than content considered 'elementary' to smarter programmers than I.
I notice that you haven't submitted any links. Is there any particular reason why? If you're not pleased with the content on the site, why don't you submit better content?
(For what it's worth: this content is news for me. I never spent much time thinking about IP addresses before clicking the link and getting confused for a solid minute. I understand what binary is, though, I promise!)
Almost all the other comments in this thread are people spreading knowledge about technology. Sure, maybe you already knew it. Maybe things were better "back in the day". People having a genuine interest in stuff like this just really doesn't seem worth bemoaning. Isn't it better to be happy that someone is learning something you're familiar with rather than being upset that they didn't already know it?
Even though I know that IP addresses are just 32 bit of data I find it very interesting that browser support these representations out of the box. There is no reason to and from a security point of view it is even detrimental.
Here's some other encodings you can use for the same address, they're all valid and recognised by most browsers and other software:
http://3098282570/
http://7393249866/
http://0xb8.0xac.0x0a.0x4a/
http://0270.0254.0012.0112/
The source for this comment is an XSS filtering bypass tool by RSnake — http://ha.ckers.org/xsscalc.html