Hacker News new | past | comments | ask | show | jobs | submit login
Apple might be running a web crawler written in Go (moesen.nu)
169 points by beltex on Nov 6, 2014 | hide | past | favorite | 83 comments



Not at all sure why it's relevant which programming languages they use at Apple.

Is this considered interesting since it somehow proves the mainstream adoption of Go? Or since Apple, a competitor of Google's in cell phones, are using one of Google's technologies? Or what?

The takeaway for me was that Apple has a whole /8 subnet to themselves. That's just ... immense, for a single company. Gaah.

EDIT: Mis-typed the netmask, I meant /8 but typed /24. Fixed.


Back around 1990 I had a class /16 and I was just a 14 year old. Before the web explosion it was a different time. Domains used to be free back then too!


I was still playing with toy cars back then, but how did people lose those IPs? Were they revoked?


I still have a /24 from the era.


Eeek, the "C" in CIDR stands for classless, so it is either a Class B or a /16 not a "class /16"

Also, it is highly unlikely that you had a /16 as a PI allocation even in 1990.


Your nick is deliberately obscene. I'm not a prude, but it doesn't fit the tone of the site.


There are no words that are obscene.


Class Bs for persons still existed in 1998.


>The takeaway for me was that Apple has a whole /24 subnet to themselves. That's just ... immense, for a single company. Gaah

(I think you mean /8.) Brace yourself: http://en.wikipedia.org/wiki/List_of_assigned_/8_IPv4_addres...


People are curious simply because Apple is so damn secretive. I'm not sure I'd ever want to work there because it seems I would be unable to talk to anyone about my projects, ever.


A /24 for a company of their size is nothing - they actually have the entire 17.0.0.0/8 to themselves: http://en.wikipedia.org/wiki/List_of_assigned_/8_IPv4_addres...


Less about 'of their size' then about how long they've been around, since back in the days when a large company could get a /8, and just about anyone could get a /16.


Interesting. I know Google uses the IP 8.8.8.8 (and 8.8.4.4) for their public DNS servers. So it seems Level3 doesn't own the entire 8.0.0.0/8 block then.


They do, but they reassign some of them to other parties: http://whois.arin.net/rest/net/NET-8-0-0-0-1/children


Yeah, that was what I meant, brain error. Thanks, I edited my comment.


Go is not really a "Google" technology. It's just a programming language like any other. There's no reason Apple shouldn't use it, if it's the right tool for the job.


Well, it'd be pretty weird if Go-ogle started developing their applications in Swift.


Actually, google does a lot of iOS development and has dozens of shipping iOS apps (https://itunes.apple.com/us/artist/google-inc./id281956209), and thus has a pretty sizable community of Objective C programmers. So Google will almost certainly be shipping apps containign Swift code as soon as the tools stabilize enough for them to do so.


I think they're mechanically translating their Java code to Objective C:

https://github.com/google/j2objc



I don't think it would be weird if they developed OSX/iOS applications in Swift.


I'd be surprised if they didn't use Swift for their iOS apps.


They're not using Swift at the moment. But they might at some point in the future. Source: friend who works at Google.


Apple simply doesn't have any own solutions for server-side development. Google has own languages for both client (Dart) and server (Go).


The software tech Apple uses generally doesn't seem to get as publicized compared to most large tech companies - for example, I know second hand that Apple uses Angular.js, but I have never seen anything in the wild about it.

One thing to keep in mind though is that Apple is much more a hardware company than software.


For a hardware company, the write a crapload of software... like Mac OS, iOS, and all the built-in applications. I'd say they probably do a lot more on software than hardware, actually. They only release a few pieces of hardware a year.


It's not difficult to argue that reliable hardware is only a delivery mechanism to enable the usage of Apple's software, however I doubt users would purchase a "Samsung iPhone" or "LG iPhone" in the same manner that they do for Android devices.

The Apple hardware brand, the hardware itself, and the software all together sustain the user demographic.


Oh sure, the product they sell is the integration between hardware and software. But saying they are "much more" a hardware company than a software company is selling them pretty short.


"Apple views itself as a software company."

- Steve Jobs

https://www.youtube.com/watch?v=dEeyaAUCyZs


Certainly doesn't come across as one. UX/hardware comes to mind first.


A /24 is only 255 addresses...


You might find this guide to calculating subnets helpful. ;)

http://www.dslreports.com/faq/cisco/30.1_Quick_subnet_calcul...


Maybe the poster expected Apple to eat their own dogfood? They have ObjC and Swift, and Google is one of their major competitors, in mobile space yes as well as pop-language space.

A /24 is also only 256 addresses, so really unremarkable. I think you are reading it backwards. MIT has an /8, that's the large one.


Mea culpa, I see that Apple does indeed have an /8.


HP has two /8 subnets. What's even weirder is that Eli Lilly (the pharmaceutical company) and Halliburton (the oil company) also both have one.

(as others have said, you mean /8)


I seems like organisations that where early adopters of the internet got one.


Relevant XKCD comic "Map of the internet": http://xkcd.com/195/


I agree. "Breaking news: Apple uses Google products"


I noticed this as well a couple of weeks ago.

They're still actively visiting every page on the websites that are associated with our iOS apps.

Today alone (starting shortly before 8am CET) they've crawled over 8000 pages on https://trails.io and http://offmaps.com — without sending conditional caching headers (our pages don't change too often).

    $ grep ^17\\. /var/log/nginx/*.log | grep 'Go|Fetcher' | wc -l
    8254
Note that they don't appear to be scraping any URLs available from within the apps themselves, but rather the the company/support websites linked in our App Store listings.

I guess they're automatically scanning for objectionable content, since these websites are linked from the App Store and the iTunes website?


Ooh, that's interesting. They already increased the requirements for online privacy policies etc for ios8 apps, so perhaps we'll see some crackdown on appstore apps who are missing those.


The Go http client definitely has a bug that doesn't maintain the User-Agent across redirected requests.

https://code.google.com/p/go/source/browse/src/net/http/clie...


Looks like issue #4800 is the place where it's tracked:

https://code.google.com/p/go/issues/detail?id=4800


Go is a fantastic language for writing a crawler. I wrote the Showyou crawler in Go and it's both one of our highest load processes (and it's very efficient, trust me) and most stable.


I've seen golang mentioned in Apple job listings. I didn't think this was a secret.


I've got several thousand requests for this in my logs with an IP address in the same /24.

Two weeks ago it was using a different user agent:

    Mozilla/5.0 (iPhone; CPU iPhone OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5376e Safari/8536.25 (compatible; Fetcher/0.1)
This string is mostly from a beta build of iOS 6.0 beta 4, but with the same suffix the author discovered.


It looks more likely that someone's learning Go and going through the crawler example in the tutorials. Of course I may be wrong here. The trigger for me was the term "Fetcher" here.. http://tour.golang.org/#73

Fun exercise though.


As a former employee, that was my first thought when I read the post too. Every once in a while, I would stay late there and work on a little personal project for fun there. Sometimes one of my friends would be working late, and I'd just keep him/her company. After all, if I'm just going to do the same thing at home, I might as well take advantage of coding on my big monitor and comfy chair...


IMHO, a big couch is better than a comfy chair.


If Apple coders are reading, I noticed exactly the same issue of User-Agent header not being used on redirects.

You can solve this by (ab?)using the CheckRedirect functionality of http.Client to set the User-Agent again.

Here's an example: https://github.com/lifeforms/httpcheck/commit/07440d952d1660... (The program is nothing special, just a little thing I use internally for monitoring)


So they are "apple-maps'ing" search as well, hilariously using one of Google's own tools in the process.

Would be interesting to know if/when any deals with bing/google/duckduckgo for search is expiring, like they were with google maps. That's probably when we will see an apple search engine.

Makes sense for them to fully control another part of the backend for spotlight and siri.


> So far, I have seen requests from two IPs: 17.147.18.33 (7 on 2014-10-15) and 17.147.18.33 (7 on 2014-10-15)

One of these should be .35, judging by the logs that follow ;)


Perhaps a proxy designed for iOS testing?


I don't think so, it's only requesting HTML.


This just in: Someone at Apple was doing stuff


I've been seeing this as well.


Way to get someone sacked at apple.



Why shouldnt someone at apple use Go.

Servers/scripts/etc are not their usual focus and they dont really have an internal language that fits better. Go is pretty good at this kind of things.


I like Go, but fail to see how Swift and C++14 don't fit the bill as well.

EDIT: changed to a more constructive comment.


Any Turing Complete language would "fit the bill", obviously.

What's wrong with a company not just kicking the tires on another language, but using it on a big project? That's how you design better languages of your own.

"Go was great, except for X and Y" leads to improvements on your own stuff.


"I like X, but i fail to see why Y and Z don't fit the bill as well."

Try repeating your point for any order of X, Y and Z, and you might realize it isn't really a valid reason for them to have used Swift or C++ instead.


Apple owns C++ and Swift compilers.


Your original argument was that C++ and Swift fit the bill as well. Now, as far as I can tell, you seem to be saying that C++ and Swift are better choices because Apple owns compilers for them. That is a different argument, which is in direct conflict with the one I originally responded to. My response was based on the assumption that you meant what you wrote.

Anyway, onto your current stance. That Apple don't use Swift or C++ for their spider seems to indicate that the benefit of using Go outweighs the benefit of using a language they own the compiler for. Go has a BSD-style license, so as far as Apple implementing their own changes and doing whatever they want with the source code and their own binaries at will goes, they might as well have owned it. Go is also seems like a no-brainer choice for a web spider since it has concurrency modeled in such a straight forward and cruft free way.


lol,i'm not sure why anybody would be sacked because "Google product" ;) however,I wonder why they didnt use Swift,and it's a question.


The most likely explanation for not using Swift is the employees who worked on the crawler simply didn't know it existed. From what I've heard, only a select few at the company knew it was in development and a number of Apple employees were noticeably shocked during the WWDC keynote when it was announced.

Apple actually uses a bunch of 3rd party developer tech on the server-side and on the web. They use Rails quite a bit I believe and are starting to use Angular.js too.


It's not as mature. Doesn't have the library and third party support. Does Apple use Mac OS for all its servers? They use Azure for some servers? That would imply they need the language to be cross-platform.


I can vouch for this-I've been trying to import CommonCrypto, and instead of just typing "import CommonCrypto" I have to do some sort of "bridging header" stuff with a .h file that just seems totally backwards to me (I don't have Cocoa or Obj-C experience). It feels very not quite ready for prime time.


Given how it interoperates with Objective-C runtime, I would say it already has lots of libraries to choose from.


Swift was designed for coding user-facing applications, not internet utilities.


Swift was designed to do everything - one language to rule them all. To quote from Apple's Swift page: "Swift is a successor to the C and Objective-C languages."

My guess would be that the webcrawler was started before Swift was allowed to escape from the Developer Tools enclave.


Doing everything is not the same as doing everything well.

Surely you agree that languages designed with one purpose in mind (a la Go and networking/server development) can reasonably do better at that purpose than an explicitly general-purpose language?

Just to be clear, I prefer Swift to Go in general. However, between the two, I would personally prefer to use Go for the development of network utilities.


I'm not saying it's a good idea, I was just responding to your assertion that Swift was designed only for writing user-facing applications. That is not the case - this is Apple's planned replacement for low-level code as well as apps. It is this design requirement which drove Swift's abandoning Objective-C's dynamic bindings etc.


> Surely you agree that languages designed with one purpose in mind (a la Go and networking/server development) can reasonably do better at that purpose than an explicitly general-purpose language?

I don't think that's a given. Just because a language is more special-purpose, doesn't mean that a more general purpose language can do the same things as seamlessly as in the other language (maybe they would have to define an embedded DSL, but that might be simple). Maybe the special-purpose language decided to commit to certain things in its domain - like certain concurrency features - and make it "first class", since that was its intended domain. But then maybe that concurrency feature wasn't that useful and a more general purpose language - which had to keep its options open since it was general purpose - could express other alternative features more easily.


Can you explain what that means? Which language features is Swift missing?


It's a perfectly reasonable general purpose language. However, it doesn't have a low-overhead green thread mechanism, and the standard library has no particular emphasis on network tools.

Even as an outspoken critic of go, I recognize that it has quite a strong networking toolset.


Swift is a general purpose language. I fail to see what it lacks.


So are Python and Ruby, but most people will agree that those are unsuitable for certain types of highly concurrent and performance sensitive network applications.

Using the right language for a particular domain can save you a lot of headache. Go is open source and a relatively proven choice, compared to Swift.


Python and Ruby aren't a suitable comparison, because they lack AOT compilers to native code as part of their canonical implementation, which is not the case for Swift.

As for access to the language implementation, Swift is Apple's creation.


The argument you made was that "Swift is a general purpose language.". Since both Python and Ruby are generally considered general purpose languages, they _are_ suitable for comparison to Swift in the exact sense that you chose to describe Swift.

I'm not trying to tell you that you are wrong. I'm just saying that your reasoning is flawed, and your arguments don't really support any particular conclusion. "Swift is a general purpose language." - OK? "I fail to see what it lacks." - Noted.

Let's only compare with native, canonically AOT compiled, general purpose languages, though. All of a sudden there are a lot of questions. Why didn't they write it in C? Why didn't they write it in Delphi? Fortran? Forth? The answer is of course that beyond being a native, canonically AOT compiled, general purpose language, different languages have different constructs and models of particular patterns that make them more or less useful for different kinds of tasks and development models.

Apple seemingly recognized that Swift was less useful than Go for the particular task at hand, and while we can't really know for sure if this was a matter of performance, development speed, personal preference of the developers etc, we can probably assume that the choice was based on taking several such quality dimensions into consideration.


I would expect Swift to have comparable performance to Go since it's a C like compiled language. Where it doesn't, Apple should probably dog-food it so they can find any deficiencies.


Network latency and distribution notwithstanding, one of Go's core goals is low-overhead concurrency. I'm not sure how GCD compares in Swift because I haven't spent a ton of time in it yet, but that could be a point of contention, particularly in a web scraper.


Why? This is exactly the situation, where the tool is a great match for a job.


I am doubtfull Apple wants to compete with Google in search.

Or maybe it's a way for Steve Jobs to get a revenge post-mortem on Google for Android!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: