How big is the Google Earth database?

Bedon292 · on April 11, 2016

This actually seems very low to me. The archive at DigitalGlobe is over 80PB of data [1], and was growing at over 2PB a year [2] before the latest satellite went up. There are more bands in the imagery than RGB, so it is larger because of that.

But somewhere Google is storing the original resolution data. Aerial imagery is at an even higher resolution, so even larger per square kilometer. They also are holding different resolution data for different zoom levels. They do not resample aerial imagery past certain zoom levels, they use a different source.

At approximately 3225x3225 pixels per square kilometer (31cm pixels in Worldview 3), that is approximately a 10 megapixel tiff, per square kilometer. Which I believe is roughly 35MB. 496Million square kilometers of land on earth not including antartica. Brings this over 17PB of raw data. And that is not including any historical data either. Before starting caching the different zoom levels of png's that are served out.

I know not all of the world is covered at that high a resolution, but there are spots with even higher resolution to make up for it. Plus historical data. And plenty of water included in the images not counted either. I highly doubt they are holding anything less than 30PB.

[1] https://www.digitalglobe.com/platforms/gbdx [2] http://www.networkcomputing.com/big-data/how-digitalglobe-ha...

DonHopkins · on April 11, 2016

I wonder if they use wavelet compression for all those aerial photos of ocean water.

cmurf · on April 12, 2016

Dedup. It's a patch set of ocean. If you mean the ocean topo data, that's different, more like mountains but far less color info.

eli · on April 12, 2016

I think that was a pun

kbenson · on April 11, 2016

> 10 megapixel tiff, per square kilometer. Which I believe is roughly 35MB.

I'm sure they are using a storage medium (at a second level of compression) that provides better space efficiency than that. TIFF supports both lossy and lossless compression through extensions, but for storing source data that is used relatively infrequently (relative being the key term here), it might make sense to use a storage compression geared very heavily for space efficiency. It's not like Google doesn't have the computational power or ability to process in parallel efficiently.

johnjreiser · on April 11, 2016

For some interesting historical perspective, here's what Microsoft and partners had to come up with in 1998 to support Terraserver and it's 3TB of imagery. http://research.microsoft.com/pubs/68552/msr_tr_98_17_terras...

mmanfrin · on April 11, 2016

Neat that mapping Terra required them to build a computer capable of storing terabytes.

andylynchnz · on April 11, 2016

Not a coincidence - it was built to demonstrate the scalability of their technology to Terabyte datasets.

mmanfrin · on April 11, 2016

The 'terra' I was referring to was earth :]

stuff4ben · on April 11, 2016

I think a better question is "how" is all of that data stored? One large Netapp? JBOD's all over the world? And how does that data make it into our browsers so quickly?

mike_hearn · on April 11, 2016

Google developed technologies like BigTable for a reason, you know!

You can serve data straight out of a database like BigTable as long as it's done carefully (to keep latency low). And the dataset can be easily replicated around the world in different datacenters to reduce round trip times.

I used to work on Google Earth. It's a really cool project. The serving infrastructure is pretty simple though. It's basically just a simple frontend server in front of BigTables (or was). The real magic happens in the data processing pipeline that produces the datasets, and in the client side rendering code. It's not like web search where the serving infrastructure is hugely complex and very much secret sauce.

The thing that impresses me the most in recent times about Earth is how the Maps team were able to write an Earth client in Javascript that starts way faster than the desktop client does. The traditional client is a large C++ app that has always struggled with startup time, I think due to a combination of code size and having lots of static constructors that slow down the critical path. And the bulk of the app was written when Keyhole was an independent company selling into government and industry, so startup time didn't matter much.

Also, the image processing pipeline is really amazing, because it blends together lots of disparate imagery sets from many different providers and cameras, with different lighting and colour calibrations, whilst still producing something usable. In the early days of Earth you could easily see the seams where different types of images had been stitched together. And if you zoomed you could clearly see the thin strips from the satellite passes. These days you still can see them if you look around, but there is a lot of very clever colour balancing and reblending being done to make the whole thing a lot more seamless and beautiful.

kuschku · on April 11, 2016

EDIT: Really? Already at -2 after 15 minutes? This is directly on-topic (it directly criticizes the quality of the process that was praised before), it is directly relevant (it provides examples for a counterargument), it is not a personal attack or anything, and it provides sources.

Before you downvote, please comment – that’s a lot more helpful to discussion

________________________

> In the early days of Earth you could easily see the seams where different types of images had been stitched together. And if you zoomed you could clearly see the thin strips from the satellite passes. These days you still can see them if you look around, but there is a lot of very clever colour balancing and reblending being done to make the whole thing a lot more seamless and beautiful.

Well, I don’t really have to look around: http://i.imgur.com/fhMPHVm.png

Wherever I look around my home town, it’s blurry, stitched together, with bad color balancing.

Try taking a look at the areas where it touches the open sea, it’s even worse.

In addition to the imagery taking half an hour to load even anything that’s not totally blurry (which can’t be my connection as all competitors are instant), and the missing 3D in a city of 300k, the imagery is (a) not updated since 2005, (b) has constant edges and stripes everywhere, (c) is – due to not being updated – missing several districts already.

I’ve complained about this on several occasions before to Google employees who promised to fix it, for example here [1].

I have to be honest, I’m more than disappointed with Google Maps and Earth.

Even here maps imagery (linked below in comparison) is more up to date, has higher resolution, and is blended a lot better.

Google, as always, seems to only (or mostly) care about the US. There, even most small towns have perfectly clear 3D imagery from the current year.

It’s just another punch in the face in a long row of "US-only!". The Internet. Connecting the ~~world~~ US.

________________________

[1] https://www.reddit.com/r/legaladvice/comments/3rer0e/google_...

askldfhjkasfhd · on April 11, 2016

Your tone is critical, aggressive, and makes no attempt to do anything other than express your personal experience. I believe this is why you receive downvotes -- your comment only serves to further your personal agenda and does not add to the conversation.

kuschku · on April 11, 2016

> Your tone is critical, aggressive, and makes no attempt to do anything other than express your personal experience.

(a) it’s not just my experience – it’s an issue that affects many people – most places outside of the US or large population centers look like this

(b) after complaining with hundreds of other people, some of whom have complained for a literal decade trying to improve the situation, could you be calm?

Especially when people then say "look how awesome this is", and you yourself know how bad it really is?

andrewstuart2 · on April 11, 2016

If you're really interested, I highly recommend reading the SRE book Google has released. It's got a ton of insight into both their business practices and technical infrastructure.

[1] http://www.amazon.com/Site-Reliability-Engineering-Productio...

buckhx · on April 11, 2016

I can speak for how Bing Maps was storing their data before being bought by Uber. They had a single massive bespoke compute site for batch processing. I want to say it was 64k cores and multiple PBs. There was a flood in Colorado one summer and it knocked the imagery site out for a day or two. Uber purchased that site along w/ ~100 engineers. I had left about 6 months before that purchase happened.

themartorana · on April 11, 2016

And how long until I have 3 PB of storage in my whatever replaces SSDs, and we all laugh at how at one time that was considered a lot of storage?

coldtea · on April 11, 2016

We've already hit a snag with Moore's law. And storage is not progressing that fast either, for quite longer than that.

The thing is, we've lived through the "low hanging fruit" era of the progress curve in IT, and now we're in the diminishing returns stage (increasingly more difficult effort -- e.g. cpu smaller process size issues -- to get ever less interesting improvements).

Of course anybody having lived the best and rapidly increasing part of the curve (roughly 1950 to 2005 or so) would think this will go on forever, or that it's some "inevitable" "natural" characteristic of technology to always improve rapidly (which mostly means they're either too optimistic or bad at math).

For some areas of tech it's even worse: not only we've reached increasing difficulty in creating machines, factories etc to churn it out, but we've also reached the limits of physical laws involved. So we either need to find totally different processes, or prove Einstein and/or Heisenberg wrong to overcome this (and sure, the first if easier than the latter, but still nowhere as easy as simply creating a new generation of the same process).

andrewstuart2 · on April 11, 2016

There's also the fundamental limit to the bandwidth that a given consumer can realistically ingest. The richest inputs we have -- our sight -- has been estimated [1] at about 8.75 Mbps. The real driver for bandwidth (and thus local storage) these days is input latency.

We're only just beginning to be able to react quickly enough to requests for different input (VR), so the historical workaround has been increasing the amount of data instantaneously available through things like HDTV. You're never fully consuming the bandwidth provided at any instant but you can increase your awareness by constantly looking at different parts of the screen.

So again we're fundamentally limited by the speed of light and how quickly data can be retrieved. If we can get better at local caching and query prediction (streaming in data right before it's requested), then we could theoretically reduce massively the amount of site-local storage required.

So for supply/demand reasons, I think software combined with HID improvements (better prediction through better sensors) has a significant potential to essentially put the storage industry into maintenance mode.

[1] https://www.newscientist.com/article/dn9633-calculating-the-...

smitherfield · on April 11, 2016

Not disagreeing with you, but you have to say the advances we've seen in storage have thus far been almost beyond belief.

I remember my dad bringing home our family's first modern PC, an original iMac. It replaced some ancient thing with a monochrome display that we kept in the basement. (I wish they still had it, whatever it was—would be fun to play around with). We thought it was like something out of science fiction–you could use Netscape to do a Lycos search! It played DVDs! It came with two: a demo disk and A Bug's Life. The DVDs (4.7 GB) had a larger capacity than the hard disk! (4 GB)

Today, my MacBook Pro has 512 GB of storage. 8 doublings in 16 years (1998-2014) and that's with a change in form factor from desktop to laptop, and in storage medium from hard disk to SSD.

coldtea · on April 12, 2016

>Not disagreeing with you, but you have to say the advances we've seen in storage have thus far been almost beyond belief.

Sure, my first PC had a 40MB hard disk (a generation ago it would be a 360k floppy).

But we also saw the same great advances (even bigger?) in CPU power, and yet Moore's law is on its last legs too.

BillinghamJ · on April 11, 2016

I'm not entirely sure that will happen. My anecdotal info is that I actually have less data stored locally than I did a couple of years ago. Most of my data is now held online and retrieved on demand.

ctrl-j · on April 11, 2016

Most consumers won't be the drivers of this. However gaming, workstations, and enthusiast needs will cause enough demand for storage sizes to go up.

Modern AAA games are over 30 GB installed. Applications in general are taking up more and more space locally.

An hour of uncompressed 4k video is approximately 1.72 TB. Photos have also grown in size as resolution has gone up.

I'm sure there are other industries as well. I know personally between my various hobbies, my SSDs are always full and I have several 4TB drives for storage.

mortenlarsen · on April 11, 2016

I am no expert, but I would think "an hour of uncompressed 4k video" would be more like 6TB. Maybe even more.

   >>> 4096*2304*3*60*3600/1e9
   6115.295232

   Resolution: 4096*2304
   Colors: RGB (3 bytes)
   Frame rate: 60
   Seconds: 3600 (one hour)

Even using a lower frame rate like 25 would leave only two bytes for color before we get somewhere near 1.72 TB.

   >>> 4096*2304*2*25*3600/1e9
   1698.69312

How did you estimate/calculate the 1.72TB?

mrpopo · on April 11, 2016

Uncompressed video is typically stored in YUV colour space. A typical pixel format, YV16 stores 1 luma for each pixel, and one Cb/Cr for every 2 pixels, which does account for the two bytes per pixel you got :)

mortenlarsen · on April 11, 2016

Thanks.

mjevans · on April 11, 2016

They're probably using some type of fast lossless or nearly lossless 'compression' that cuts the really low hanging fruit out but doesn't extract quality they care about.

HuffYUV, for example, if you're going to be targeting a 4:2:0 output stream; if you're going for 4:2:2 or even 4:4:4 for some super HD stuff you might use different options.

greiskul · on April 11, 2016

But for instance, I own several AAA games on my steam account, but I have only some installed at any given time. If I feel the urge to play one that I haven't in a long time, it doesn't take that much to download it again. It's even better that I don't need to worry about save files anymore, they are backed up automatically.

lallysingh · on April 11, 2016

Photos and videos captured from cell phones should eat that up.

BuildTheRobots · on April 11, 2016

This is one of the most baffling things to me about modern mobile phones. The device now in my pocket is capable of producing 4k video at 100's of MB/minute yet has paltry internal storage and unlike the previous generation removed the ability to stick a 128gb microSD card in it. I honestly despair :(

dhimes · on April 11, 2016

Do you think internet networks will be able to keep up?

agumonkey · on April 11, 2016

Just when you'll have 1TB ram and 128-core CPUs.

valarauca1 · on April 11, 2016

So an Oracle T4 Ultra-SPARC server?

_pfxa · on April 11, 2016

How would this be more useful than a machine w/ 4gigs of ram and 4 cores? Most of the time the hungriest process on a personal machine is the browser, and most of the human time it uses is wire(less) bound, so even if one had the machine you describe, he'd hardly notice.

coldtea · on April 11, 2016

>How would this be more useful than a machine w/ 4gigs of ram and 4 cores? Most of the time the hungriest process on a personal machine is the browser, and most of the human time it uses is wire(less) bound, so even if one had the machine you describe, he'd hardly notice.

Speak for yourself. Anybody doing 3D, audio work (64 channels all loaded with 2-3 UAD plugins running native), and lots of other things will notice.

Even consumers. Advanced 3D gaming of 2030, virtual reality interactive porn, etc. Even something as humble as local speech recognition and AI processing for real time assistant.

The thing is whether we can make those cpus/tech -- not whether we can use their power when we make them.

_pfxa · on April 11, 2016

> The thing is whether we can make those cpus/tech -- not whether we can use their power when we make them.

We use up so many natural resources in such process that it's not worth it if it is not practically useful.

> Speak for yourself.

I really don't like this imperative style. You're responding to a stranger who has not said nothing offensive in his comment.

> Anybody doing 3D, audio work (64 channels all loaded with 2-3 UAD plugins running native), and lots of other things will notice.

I said "a personal machine" in my comment. A machine which is used to browse the web, take notes, watch movies, write documents, etc. 3D and high-performance audio is not every-day, personal computer use, it's a specialised case.

> Advanced 3D gaming [...] virtual reality interactive porn [...] local speech recognition and AI processing [...]

I recall saying useful.

eropple · on April 11, 2016

> I really don't like this imperative style. You're responding to a stranger who has not said nothing offensive in his comment.

You're projecting your needs on those of a complete stranger. That's going to get some folks hot under the T-shirt.

> I recall saying useful.

You know what "advanced 3D gaming" also means? It means being able to broaden one's horizons via virtual tourism (and if you think I'm reaching, go use a Vive) without paying tens of thousands of dollars t go be there in person. You know what local speech recognition and AI processing means? It means being able to do all that cool shit Google does without exposing yourself to risk.

These might not be things that are useful to you. They are useful to others. You're projecting, and you're kind of being a jerk about it, and that's why you got the response you did.

Oh, and while we're at it: 3D unambiguously and without question is "every-day personal computer use," and higher-end audio would be if people actually had the hardware for it. Everyone plays video games now. You might blurf from your very, very high horse that that's not useful, but literally tens of millions of people derive value from that right now and that number's only going to go up.

Please don't be That Guy.

_pfxa · on April 11, 2016

I really really wouldn't answer you, but I will, not for responding to you, but for justifying a point in the face of the public.

Consumption of 3D and high-quality audio usually requires a fraction of the computing power needed for their production. Their consumption is in the scope of personal use, while their production is in the scope of specialised use.

coldtea · on April 11, 2016

>Consumption of 3D and high-quality audio usually requires a fraction of the computing power needed for their production. Their consumption is in the scope of personal use, while their production is in the scope of specialised use.

You have in mind 3D movies. We're talking about 3D content -- and spoken about things like 3D games, virtual reality (and VR porn), virtual tourism, interactive stuff etc.

This kind of 3D content is rendered on the client and in real time, and with high CPU (well, GPU) needs, down to absolutely demanding the more realistic you make it.

eropple · on April 12, 2016

CPU too. Check out the requirements for tracking-based systems like the HTC Vive.

coldtea · on April 11, 2016

>We use up so many natural resources in such process that it's not worth it if it is not practically useful.

We already have 128 core cpus -- it's not something out of science fiction. It's just that we don't have them for the average consumer.

Besides, one could say the same about today's 4 and 8 cores back in 2000.

>I really don't like this imperative style. You're responding to a stranger who has not said nothing offensive in his comment.

And "speak for yourself" is offensive, how? The message was while this might hold true to you it's not the general case.

>I said "a personal machine" in my comment. A machine which is used to browse the web, take notes, watch movies, write documents, etc. 3D and high-performance audio is not every-day, personal computer use, it's a specialised case.

And yet it's done on general purpose, personal machines.

And not just by high end studio engineers either (although most recording studios I've been to also use consumer PCs, nothing special there either, usually just more expensive cases and fans to be silent): millions of people recording themselves and their band do DAW/audio work that uses high end audio CPUs, and can leverage more power.

Even more millions do video editing, and soon every consumer camera will have 4K on it (already the most popular phones do).

This is not the era of Boomers and Gen X, where such things were uncommon. Video, Photography and Music as hobbies have exploded in an era when everybody can self-publish on YouTube, Bandcamp etc, and younger kids are growing up with those things all around them.

And all those people do it on their personal machines. Not some special workstation, and not at work.

So to define use of computers as "web, take notes, watch movies, write documents" is too 1999.

And while the more expensive CPU/GPU wise stuff (video, audio work) are not as widespread as passive consumption, I'll give you that (but nowhere near to fringe activities), the argument breaks with stuff like 3D games -- which a large majority under 30 play regularly, and a huge chunk of under 20 religiously update CPUs and game consoles to get the highest and greatest.

>> Advanced 3D gaming [...] virtual reality interactive porn [...] local speech recognition and AI processing [...] >I recall saying useful

What one doesn't have a use for doesn't make it useless. I mentioned some very popular stuff -- speech recognition and AI assistants like Cortana and co are used by hundreds of millions, 3D gaming is done by billions.

You're general response is close to the "no true scotchman". When one replies with stuff we could use that CPU power for, some are not really "personal", the others are not really "useful", etc.

agumonkey · on April 11, 2016

I didn't say that. I just responded to the comment about having PB of data as the norm. You cannot handle such scale without adequate processing power and intermediate RAM.

By the way, I'm writing this on a core 2 duo, which is my main computer. So trust having 4 cores already looks like overkill to me :)

_pfxa · on April 11, 2016

Agreed. Actually I sport a Core 2 Duo too (Asus X51RL, "15.4 inches Widescreen Gateway to the World of Entertainment" https://www.asus.com/Notebooks/X51RL/) :)

4/8 cores is useful if one does a lot of things with VMs, but above that it's mostly either a server or an auction.

agumonkey · on April 11, 2016

For desktop usage, excluding advanced media creation, the only thing a 2006 processor lacks is a decent GPU to delegate rendering stuff.

brianwawok · on April 11, 2016

My browser alone uses more than 4GB

JoBrad · on April 11, 2016

I'm not sure that's a bad problem, as long as your OS (and your browser) reallocate memory efficiently when required. There's no need to treat RAM like your hard drive, and keep a certain amount free all of the time.

Also, your browser won't use 4GB of RAM if that's all you have.

agumonkey · on April 11, 2016

Not if you use dillo.

coldtea · on April 11, 2016

And how is that relevant, given that less than 0.1% use it, or wish to use it?

agumonkey · on April 11, 2016

Sorry, only kidding.

imtringued · on April 11, 2016

Don't underestimate javascript bloat.

_pfxa · on April 11, 2016

I don't underestimate javascript bloat, I blast it with a no-js browser (xombrero).

TylerE · on April 11, 2016

Already there are some games that are using upwards of 10gb of RAM (potentially). More and more games are shipping either as 64bit strongly preferred or even 64bit only.

qaq · on April 11, 2016

Look at how many threads your browser is using and this at a time when very few sites are using web workers.

gnur · on April 11, 2016

Tt will take quite a while, 1TB aught to be enough for everyone.

Kidding aside, you can assume that in 10 years storage will be about 1000 times cheaper then it is now, so 3PB of storage would be around the cost of 3TB drive today, around 100$.

eterm · on April 11, 2016

How do you figure the 1000 times cheaper?

Data storage today certainly isn't 1000 times cheaper than it was a decade ago, I'm not sure it's even 100 times cheaper.

For some reason you seem to believe that storage prices will remain the same while we double capacity every 12 months, that isn't even close to reality.

(Even if you've misapplied Moore's Law to flash storage, that predicts doubling every 2 years not every year.)

JoeAltmaier · on April 11, 2016

According to http://www.statisticbrain.com/average-cost-of-hard-drive-sto... its about 40:1

toomuchtodo · on April 11, 2016

http://www.mkomo.com/cost-per-gigabyte-update

coldtea · on April 11, 2016

Don't hold your breath. See the "flat lining" after 2010?

gnur · on April 11, 2016

I've looked at the prices between 1980 and 2010, seemed to be getting about 1000 times cheaper every 10 years. And SSD pricing has gone down about 40% every year the last 4 years. Which would come to a price reduction of less, but some breakthrough is bound to be around the corner.

Joyfield · on April 11, 2016

They have data centers around the world. https://www.google.com/about/datacenters/inside/locations/in...

gtrubetskoy · on April 11, 2016

Consider for comparison that gmail alone probably uses close to 100,000PB (based on my back of the napkin math and some reference I found that there are ~900M users).

TimGremalm · on April 11, 2016

"So our final estimate for the total size of the Google Earth database is 3,017 TB or approximately 3 Petabytes!"

comboy · on April 11, 2016

Why do they have to estimate it? Aren't they able to check exact size of their database(s)?

giarc · on April 11, 2016

The author does not work for Google.

>Timothy has been using Google Earth since 2004 when it was still called Keyhole before it was renamed Google Earth in 2005 and has been a huge fan ever since. He is a programmer working for Red Wing Aerobatx and lives in Cape Town, South Africa.

kyberias · on April 11, 2016

I'm sorry to say I laughed out loud after reading your comment, but I actually wondered about the same thing while reading the article. :)

You have to admit it would be hilarious if they didn't know the size of the database or wouldn't be able to calculate it easily.

boxidea · on April 11, 2016

The author of the article does not work at Google. So the article is basically speculation. He has NO CLUE how Google stores the image data.

typon · on April 11, 2016

Surely he could just email the Google engineers and they'll tell him.

Thaxll · on April 11, 2016

It's not the case here but it's actually possible that in some cases it's just an estimate ( MYSQL InnoDB ).

shubb · on April 11, 2016

This is the imaginary database but Google knows a lot more than that.

There are also vectors for the roads, with a link between that data and streetview. You can click on the map and google will tell you that you're at 24 Burgess Street, and show you a photo, so it has a link between all that information.

I suspect they also fuse this with information from phones. Do the phones contribute to live traffic stats? Or the information about when a shop is busiest (it will show you a bar chart of when a gym or store is busy). Is that done based on knowing when android users are at that location? What about all the Wifi access points it knows the location of?

In some ways, this is much harder to deal with than the image data because although it's smaller in size, it's denser in information and the links between are more complex.

duaneb · on April 11, 2016

> There are also vectors for the roads, with a link between that data and streetview. You can click on the map and google will tell you that you're at 24 Burgess Street, and show you a photo, so it has a link between all that information.

I mean, you can now download your local area's data to your phone from google maps. I suspect this size is negligible compared to the raw 2d/3d/historical data. It would have been nice to include it in their analysis, though.

SEJeff · on April 11, 2016

Google's purchase of Waze is one of the main ways they get realtime traffic data:

http://www.businessinsider.com/how-google-bought-waze-the-in...

For serious drivers, there simply isn't anything on the market (and free!) better than Waze.

Karawebnetwork · on April 11, 2016

Are all the features on Waze in Google yet? Or should I download Waze?

SEJeff · on April 11, 2016

You should likely download Waze if you want the best experience. That being said, using google maps, you'll see traffic updates or maintenance/cops/red light cameras and it will say "Information from Waze" or something akin to that.

jessriedel · on April 11, 2016

You can't contribute information without Waze, and you don't get some notifications like the presence of speed traps. But the Google Maps app has the key traffic information after coarse-graining and filtering.

y4mi · on April 11, 2016

they have different target audiences.

waze almost gamifies driving and tries to get users to actively report stuff about the traffic, including accidents and similar things.

google maps is a traditional navigation system with live traffic monitor, but ultimately a passive experience. afaik no feature of waze has been nor is planned to be merged into google maps. the live traffic monitor of google maps was already available significantly earlier than the waze acquisition.

rdtsc · on April 11, 2016

I understand that compared to raster images vector data for roads is negligible. A dinky Garmin car nav system from 2011 I had held continental US road system data in a few gigs only.

Doctor_Fegg · on April 11, 2016

OpenStreetMap's vector data for roads (and buildings, landuse, etc.) for the whole world is currently 31GB, in OSM protobuf format: http://planet.osm.org

hackbinary · on April 11, 2016

As interesting as this is, the author should make it clearer on his main landing page that his site is not affiliated with Google.

http://www.gearthblog.com/about

> This blog is not officially affiliated with Google. Google Earth Blog is dedicated to sharing the best news, interesting sights, technology, and happenings for Google Earth.

kzrdude · on April 11, 2016

It's really not needed. This comment is entirely my opinion. Which is also redundant to say.

sievebrain · on April 11, 2016

They seem to have forgotten that the imagery database is calculated at multiple zoom levels. If a country has high rest imagery for the max zoom, it also has scaled down versions of that imagery for all the higher altitude image sets.

So that would expand things quite a bit.

There are also height maps, which were not counted, and metadata.

brianwawok · on April 11, 2016

Would it?

If full detail zoom took 1 MB... that same area zommed out 2x would take 256KB.. zoomed out by 2 is 64 KB... zoomed out by 2x is 16 KB... Don't know all the zoom levels, but for a rough guess the zoomed in level should be pretty close to the total (within 30% or so).

Bjartr · on April 11, 2016

Depends on if it's precalculated or dynamically generated from the full res and then cached.

Retric · on April 11, 2016

Clients are clearly doing some scaling. So worst case is clearly less than 35% overhead for multiple zoom levels.

PS: Sum (0 to N) of (1/4)^N = 1.33333...

brianwawok · on April 11, 2016

Thanks for adding the maths I did the lazy guess ;)

JoBrad · on April 11, 2016

He talks about the height of the camera, and loading additional images, in the first post of the series.

erroneousfunk · on April 11, 2016

Minor point: Can people stop using 3D pie charts? The "satellite imagery" slice in the chart they show is slightly less than a third of the "historical imagery (satellite)" slice, but it looks like it's about half because of how the chart is rotated.

"3D imagery" is supposed to be slightly larger than the "Historical imagery (aerial)" slice, but instead, it looks smaller!

There's just no reason to use this type of visualization.

nxzero · on April 11, 2016

Huge database aside, I would be more curious to know what is the most valuable data/metadata in the database. Anyone have any thoughts?

mike_hearn · on April 11, 2016

Valuable as in cost? Aerial imagery in general is very expensive, especially from satellites.

That's why the Earth/Maps team don't have an open protocol (or one reason). They aren't allowed to just give the imagery away for free: it's licensed specifically for that application. If you try and download the entire dataset there are systems in place that will fight you, to defend the rights of the imagery companies.

jws · on April 11, 2016

The list of areas they are required to obfuscate?

ufmace · on April 11, 2016

I would tend to think it's the fine Maps detail on every road in every country that Navigation works in - all of the details of every single intersection, how many lanes, what turns you can make from where, etc. The maps and images are expensive, but they already exist and can be bought, if you're willing to pay the price. That detail info doesn't even exist anywhere, at least not in any kind of consistent and reliable way. The only way to get it is to employ a virtual army of drivers to drive every single road in every country, record a ton of info about them all, and figure out how to store it all in a searchable format. And then keep doing it forever, because those roads all change periodically, and often nobody every documents how or where or when.

ant6n · on April 11, 2016

I find it odd that they have to estimate this.

_jgvg · on April 11, 2016

The blog isn't affiliated with Google.

persona · on April 11, 2016

You are right - it took me a few secs to realize that. Better title should be: How big I estimate to be the Google Earth database?

chris_va · on April 11, 2016

Now multiply by the number of serving data centers, and disk replication.

IshKebab · on April 11, 2016

Now estimate streetview!

peeters · on April 11, 2016

Wow, I had no idea how creepily awesome Google's 3D imagery had gotten. It used to only be skyscrapers in main cities. Now I can basically see in my own window.

JoBrad · on April 11, 2016

I get part of the "creep" factor, but anyone walking down the street has a much higher resolution image of your house - and with real-time "video", to boot! Maybe people just don't think about that very often?

eva1984 · on April 11, 2016

Those are some number flying...

But I have to say they dont surprise me that much, by the sheer amount of data. The meat is in efficiently retrieve them, ie index.

zhte415 · on April 11, 2016

Our AI has developed a propensity for exploring and documenting regions focused on cheese production.

We expect a report back in around 4000 years. Depending on results, we may reset you.

wrg · on April 11, 2016

>1024 TB

So... 1 PB?

elbigbad · on April 11, 2016

There are actually an infinite number of numbers greater than 1024. For the sake of brevity, I'll refrain from listing them though.