This actually seems very low to me. The archive at DigitalGlobe is over 80PB of data [1], and was growing at over 2PB a year [2] before the latest satellite went up. There are more bands in the imagery than RGB, so it is larger because of that.
But somewhere Google is storing the original resolution data. Aerial imagery is at an even higher resolution, so even larger per square kilometer. They also are holding different resolution data for different zoom levels. They do not resample aerial imagery past certain zoom levels, they use a different source.
At approximately 3225x3225 pixels per square kilometer (31cm pixels in Worldview 3), that is approximately a 10 megapixel tiff, per square kilometer. Which I believe is roughly 35MB. 496Million square kilometers of land on earth not including antartica. Brings this over 17PB of raw data. And that is not including any historical data either. Before starting caching the different zoom levels of png's that are served out.
I know not all of the world is covered at that high a resolution, but there are spots with even higher resolution to make up for it. Plus historical data. And plenty of water included in the images not counted either. I highly doubt they are holding anything less than 30PB.
> 10 megapixel tiff, per square kilometer. Which I believe is roughly 35MB.
I'm sure they are using a storage medium (at a second level of compression) that provides better space efficiency than that. TIFF supports both lossy and lossless compression through extensions, but for storing source data that is used relatively infrequently (relative being the key term here), it might make sense to use a storage compression geared very heavily for space efficiency. It's not like Google doesn't have the computational power or ability to process in parallel efficiently.
I think a better question is "how" is all of that data stored? One large Netapp? JBOD's all over the world? And how does that data make it into our browsers so quickly?
Google developed technologies like BigTable for a reason, you know!
You can serve data straight out of a database like BigTable as long as it's done carefully (to keep latency low). And the dataset can be easily replicated around the world in different datacenters to reduce round trip times.
I used to work on Google Earth. It's a really cool project. The serving infrastructure is pretty simple though. It's basically just a simple frontend server in front of BigTables (or was). The real magic happens in the data processing pipeline that produces the datasets, and in the client side rendering code. It's not like web search where the serving infrastructure is hugely complex and very much secret sauce.
The thing that impresses me the most in recent times about Earth is how the Maps team were able to write an Earth client in Javascript that starts way faster than the desktop client does. The traditional client is a large C++ app that has always struggled with startup time, I think due to a combination of code size and having lots of static constructors that slow down the critical path. And the bulk of the app was written when Keyhole was an independent company selling into government and industry, so startup time didn't matter much.
Also, the image processing pipeline is really amazing, because it blends together lots of disparate imagery sets from many different providers and cameras, with different lighting and colour calibrations, whilst still producing something usable. In the early days of Earth you could easily see the seams where different types of images had been stitched together. And if you zoomed you could clearly see the thin strips from the satellite passes. These days you still can see them if you look around, but there is a lot of very clever colour balancing and reblending being done to make the whole thing a lot more seamless and beautiful.
EDIT: Really? Already at -2 after 15 minutes? This is directly on-topic (it directly criticizes the quality of the process that was praised before), it is directly relevant (it provides examples for a counterargument), it is not a personal attack or anything, and it provides sources.
Before you downvote, please comment – that’s a lot more helpful to discussion
________________________
> In the early days of Earth you could easily see the seams where different types of images had been stitched together. And if you zoomed you could clearly see the thin strips from the satellite passes. These days you still can see them if you look around, but there is a lot of very clever colour balancing and reblending being done to make the whole thing a lot more seamless and beautiful.
Wherever I look around my home town, it’s blurry, stitched together, with bad color balancing.
Try taking a look at the areas where it touches the open sea, it’s even worse.
In addition to the imagery taking half an hour to load even anything that’s not totally blurry (which can’t be my connection as all competitors are instant), and the missing 3D in a city of 300k, the imagery is (a) not updated since 2005, (b) has constant edges and stripes everywhere, (c) is – due to not being updated – missing several districts already.
I’ve complained about this on several occasions before to Google employees who promised to fix it, for example here [1].
I have to be honest, I’m more than disappointed with Google Maps and Earth.
Even here maps imagery (linked below in comparison) is more up to date, has higher resolution, and is blended a lot better.
Google, as always, seems to only (or mostly) care about the US. There, even most small towns have perfectly clear 3D imagery from the current year.
It’s just another punch in the face in a long row of "US-only!". The Internet. Connecting the ~~world~~ US.
Your tone is critical, aggressive, and makes no attempt to do anything other than express your personal experience. I believe this is why you receive downvotes -- your comment only serves to further your personal agenda and does not add to the conversation.
> Your tone is critical, aggressive, and makes no attempt to do anything other than express your personal experience.
(a) it’s not just my experience – it’s an issue that affects many people – most places outside of the US or large population centers look like this
(b) after complaining with hundreds of other people, some of whom have complained for a literal decade trying to improve the situation, could you be calm?
Especially when people then say "look how awesome this is", and you yourself know how bad it really is?
If you're really interested, I highly recommend reading the SRE book Google has released. It's got a ton of insight into both their business practices and technical infrastructure.
I can speak for how Bing Maps was storing their data before being bought by Uber. They had a single massive bespoke compute site for batch processing. I want to say it was 64k cores and multiple PBs. There was a flood in Colorado one summer and it knocked the imagery site out for a day or two. Uber purchased that site along w/ ~100 engineers. I had left about 6 months before that purchase happened.
We've already hit a snag with Moore's law. And storage is not progressing that fast either, for quite longer than that.
The thing is, we've lived through the "low hanging fruit" era of the progress curve in IT, and now we're in the diminishing returns stage (increasingly more difficult effort -- e.g. cpu smaller process size issues -- to get ever less interesting improvements).
Of course anybody having lived the best and rapidly increasing part of the curve (roughly 1950 to 2005 or so) would think this will go on forever, or that it's some "inevitable" "natural" characteristic of technology to always improve rapidly (which mostly means they're either too optimistic or bad at math).
For some areas of tech it's even worse: not only we've reached increasing difficulty in creating machines, factories etc to churn it out, but we've also reached the limits of physical laws involved. So we either need to find totally different processes, or prove Einstein and/or Heisenberg wrong to overcome this (and sure, the first if easier than the latter, but still nowhere as easy as simply creating a new generation of the same process).
There's also the fundamental limit to the bandwidth that a given consumer can realistically ingest. The richest inputs we have -- our sight -- has been estimated [1] at about 8.75 Mbps. The real driver for bandwidth (and thus local storage) these days is input latency.
We're only just beginning to be able to react quickly enough to requests for different input (VR), so the historical workaround has been increasing the amount of data instantaneously available through things like HDTV. You're never fully consuming the bandwidth provided at any instant but you can increase your awareness by constantly looking at different parts of the screen.
So again we're fundamentally limited by the speed of light and how quickly data can be retrieved. If we can get better at local caching and query prediction (streaming in data right before it's requested), then we could theoretically reduce massively the amount of site-local storage required.
So for supply/demand reasons, I think software combined with HID improvements (better prediction through better sensors) has a significant potential to essentially put the storage industry into maintenance mode.
Not disagreeing with you, but you have to say the advances we've seen in storage have thus far been almost beyond belief.
I remember my dad bringing home our family's first modern PC, an original iMac. It replaced some ancient thing with a monochrome display that we kept in the basement. (I wish they still had it, whatever it was—would be fun to play around with). We thought it was like something out of science fiction–you could use Netscape to do a Lycos search! It played DVDs! It came with two: a demo disk and A Bug's Life. The DVDs (4.7 GB) had a larger capacity than the hard disk! (4 GB)
Today, my MacBook Pro has 512 GB of storage. 8 doublings in 16 years (1998-2014) and that's with a change in form factor from desktop to laptop, and in storage medium from hard disk to SSD.
I'm not entirely sure that will happen. My anecdotal info is that I actually have less data stored locally than I did a couple of years ago. Most of my data is now held online and retrieved on demand.
Most consumers won't be the drivers of this. However gaming, workstations, and enthusiast needs will cause enough demand for storage sizes to go up.
Modern AAA games are over 30 GB installed. Applications in general are taking up more and more space locally.
An hour of uncompressed 4k video is approximately 1.72 TB. Photos have also grown in size as resolution has gone up.
I'm sure there are other industries as well. I know personally between my various hobbies, my SSDs are always full and I have several 4TB drives for storage.
Uncompressed video is typically stored in YUV colour space. A typical pixel format, YV16 stores 1 luma for each pixel, and one Cb/Cr for every 2 pixels, which does account for the two bytes per pixel you got :)
They're probably using some type of fast lossless or nearly lossless 'compression' that cuts the really low hanging fruit out but doesn't extract quality they care about.
HuffYUV, for example, if you're going to be targeting a 4:2:0 output stream; if you're going for 4:2:2 or even 4:4:4 for some super HD stuff you might use different options.
But for instance, I own several AAA games on my steam account, but I have only some installed at any given time. If I feel the urge to play one that I haven't in a long time, it doesn't take that much to download it again. It's even better that I don't need to worry about save files anymore, they are backed up automatically.
This is one of the most baffling things to me about modern mobile phones. The device now in my pocket is capable of producing 4k video at 100's of MB/minute yet has paltry internal storage and unlike the previous generation removed the ability to stick a 128gb microSD card in it. I honestly despair :(
How would this be more useful than a machine w/ 4gigs of ram and 4 cores? Most of the time the hungriest process on a personal machine is the browser, and most of the human time it uses is wire(less) bound, so even if one had the machine you describe, he'd hardly notice.
>How would this be more useful than a machine w/ 4gigs of ram and 4 cores? Most of the time the hungriest process on a personal machine is the browser, and most of the human time it uses is wire(less) bound, so even if one had the machine you describe, he'd hardly notice.
Speak for yourself. Anybody doing 3D, audio work (64 channels all loaded with 2-3 UAD plugins running native), and lots of other things will notice.
Even consumers. Advanced 3D gaming of 2030, virtual reality interactive porn, etc. Even something as humble as local speech recognition and AI processing for real time assistant.
The thing is whether we can make those cpus/tech -- not whether we can use their power when we make them.
> The thing is whether we can make those cpus/tech -- not whether we can use their power when we make them.
We use up so many natural resources in such process that it's not worth it if it is not practically useful.
> Speak for yourself.
I really don't like this imperative style. You're responding to a stranger who has not said nothing offensive in his comment.
> Anybody doing 3D, audio work (64 channels all loaded with 2-3 UAD plugins running native), and lots of other things will notice.
I said "a personal machine" in my comment. A machine which is used to browse the web, take notes, watch movies, write documents, etc. 3D and high-performance audio is not every-day, personal computer use, it's a specialised case.
> Advanced 3D gaming [...] virtual reality interactive porn [...] local speech recognition and AI processing [...]
> I really don't like this imperative style. You're responding to a stranger who has not said nothing offensive in his comment.
You're projecting your needs on those of a complete stranger. That's going to get some folks hot under the T-shirt.
> I recall saying useful.
You know what "advanced 3D gaming" also means? It means being able to broaden one's horizons via virtual tourism (and if you think I'm reaching, go use a Vive) without paying tens of thousands of dollars t go be there in person. You know what local speech recognition and AI processing means? It means being able to do all that cool shit Google does without exposing yourself to risk.
These might not be things that are useful to you. They are useful to others. You're projecting, and you're kind of being a jerk about it, and that's why you got the response you did.
Oh, and while we're at it: 3D unambiguously and without question is "every-day personal computer use," and higher-end audio would be if people actually had the hardware for it. Everyone plays video games now. You might blurf from your very, very high horse that that's not useful, but literally tens of millions of people derive value from that right now and that number's only going to go up.
I really really wouldn't answer you, but I will, not for responding to you, but for justifying a point in the face of the public.
Consumption of 3D and high-quality audio usually requires a fraction of the computing power needed for their production. Their consumption is in the scope of personal use, while their production is in the scope of specialised use.
>Consumption of 3D and high-quality audio usually requires a fraction of the computing power needed for their production. Their consumption is in the scope of personal use, while their production is in the scope of specialised use.
You have in mind 3D movies. We're talking about 3D content -- and spoken about things like 3D games, virtual reality (and VR porn), virtual tourism, interactive stuff etc.
This kind of 3D content is rendered on the client and in real time, and with high CPU (well, GPU) needs, down to absolutely demanding the more realistic you make it.
>We use up so many natural resources in such process that it's not worth it if it is not practically useful.
We already have 128 core cpus -- it's not something out of science fiction. It's just that we don't have them for the average consumer.
Besides, one could say the same about today's 4 and 8 cores back in 2000.
>I really don't like this imperative style. You're responding to a stranger who has not said nothing offensive in his comment.
And "speak for yourself" is offensive, how? The message was while this might hold true to you it's not the general case.
>I said "a personal machine" in my comment. A machine which is used to browse the web, take notes, watch movies, write documents, etc. 3D and high-performance audio is not every-day, personal computer use, it's a specialised case.
And yet it's done on general purpose, personal machines.
And not just by high end studio engineers either (although most recording studios I've been to also use consumer PCs, nothing special there either, usually just more expensive cases and fans to be silent): millions of people recording themselves and their band do DAW/audio work that uses high end audio CPUs, and can leverage more power.
Even more millions do video editing, and soon every consumer camera will have 4K on it (already the most popular phones do).
This is not the era of Boomers and Gen X, where such things were uncommon. Video, Photography and Music as hobbies have exploded in an era when everybody can self-publish on YouTube, Bandcamp etc, and younger kids are growing up with those things all around them.
And all those people do it on their personal machines. Not some special workstation, and not at work.
So to define use of computers as "web, take notes, watch movies, write documents" is too 1999.
And while the more expensive CPU/GPU wise stuff (video, audio work) are not as widespread as passive consumption, I'll give you that (but nowhere near to fringe activities), the argument breaks with stuff like 3D games -- which a large majority under 30 play regularly, and a huge chunk of under 20 religiously update CPUs and game consoles to get the highest and greatest.
>> Advanced 3D gaming [...] virtual reality interactive porn [...] local speech recognition and AI processing [...]
>I recall saying useful
What one doesn't have a use for doesn't make it useless. I mentioned some very popular stuff -- speech recognition and AI assistants like Cortana and co are used by hundreds of millions, 3D gaming is done by billions.
You're general response is close to the "no true scotchman". When one replies with stuff we could use that CPU power for, some are not really "personal", the others are not really "useful", etc.
I didn't say that. I just responded to the comment about having PB of data as the norm. You cannot handle such scale without adequate processing power and intermediate RAM.
By the way, I'm writing this on a core 2 duo, which is my main computer. So trust having 4 cores already looks like overkill to me :)
Agreed. Actually I sport a Core 2 Duo too (Asus X51RL, "15.4 inches Widescreen Gateway to the World of Entertainment" https://www.asus.com/Notebooks/X51RL/) :)
4/8 cores is useful if one does a lot of things with VMs, but above that it's mostly either a server or an auction.
I'm not sure that's a bad problem, as long as your OS (and your browser) reallocate memory efficiently when required. There's no need to treat RAM like your hard drive, and keep a certain amount free all of the time.
Also, your browser won't use 4GB of RAM if that's all you have.
Already there are some games that are using upwards of 10gb of RAM (potentially). More and more games are shipping either as 64bit strongly preferred or even 64bit only.
Tt will take quite a while, 1TB aught to be enough for everyone.
Kidding aside, you can assume that in 10 years storage will be about 1000 times cheaper then it is now, so 3PB of storage would be around the cost of 3TB drive today, around 100$.
Data storage today certainly isn't 1000 times cheaper than it was a decade ago, I'm not sure it's even 100 times cheaper.
For some reason you seem to believe that storage prices will remain the same while we double capacity every 12 months, that isn't even close to reality.
(Even if you've misapplied Moore's Law to flash storage, that predicts doubling every 2 years not every year.)
I've looked at the prices between 1980 and 2010, seemed to be getting about 1000 times cheaper every 10 years.
And SSD pricing has gone down about 40% every year the last 4 years. Which would come to a price reduction of less, but some breakthrough is bound to be around the corner.
Consider for comparison that gmail alone probably uses close to 100,000PB (based on my back of the napkin math and some reference I found that there are ~900M users).
>Timothy has been using Google Earth since 2004 when it was still called Keyhole before it was renamed Google Earth in 2005 and has been a huge fan ever since. He is a programmer working for Red Wing Aerobatx and lives in Cape Town, South Africa.
This is the imaginary database but Google knows a lot more than that.
There are also vectors for the roads, with a link between that data and streetview. You can click on the map and google will tell you that you're at 24 Burgess Street, and show you a photo, so it has a link between all that information.
I suspect they also fuse this with information from phones. Do the phones contribute to live traffic stats? Or the information about when a shop is busiest (it will show you a bar chart of when a gym or store is busy). Is that done based on knowing when android users are at that location? What about all the Wifi access points it knows the location of?
In some ways, this is much harder to deal with than the image data because although it's smaller in size, it's denser in information and the links between are more complex.
> There are also vectors for the roads, with a link between that data and streetview. You can click on the map and google will tell you that you're at 24 Burgess Street, and show you a photo, so it has a link between all that information.
I mean, you can now download your local area's data to your phone from google maps. I suspect this size is negligible compared to the raw 2d/3d/historical data. It would have been nice to include it in their analysis, though.
You should likely download Waze if you want the best experience. That being said, using google maps, you'll see traffic updates or maintenance/cops/red light cameras and it will say "Information from Waze" or something akin to that.
You can't contribute information without Waze, and you don't get some notifications like the presence of speed traps. But the Google Maps app has the key traffic information after coarse-graining and filtering.
waze almost gamifies driving and tries to get users to actively report stuff about the traffic, including accidents and similar things.
google maps is a traditional navigation system with live traffic monitor, but ultimately a passive experience.
afaik no feature of waze has been nor is planned to be merged into google maps. the live traffic monitor of google maps was already available significantly earlier than the waze acquisition.
I understand that compared to raster images vector data for roads is negligible. A dinky Garmin car nav system from 2011 I had held continental US road system data in a few gigs only.
OpenStreetMap's vector data for roads (and buildings, landuse, etc.) for the whole world is currently 31GB, in OSM protobuf format: http://planet.osm.org
> This blog is not officially affiliated with Google. Google Earth Blog is dedicated to sharing the best news, interesting sights, technology, and happenings for Google Earth.
They seem to have forgotten that the imagery database is calculated at multiple zoom levels. If a country has high rest imagery for the max zoom, it also has scaled down versions of that imagery for all the higher altitude image sets.
So that would expand things quite a bit.
There are also height maps, which were not counted, and metadata.
If full detail zoom took 1 MB... that same area zommed out 2x would take 256KB.. zoomed out by 2 is 64 KB... zoomed out by 2x is 16 KB... Don't know all the zoom levels, but for a rough guess the zoomed in level should be pretty close to the total (within 30% or so).
Minor point: Can people stop using 3D pie charts? The "satellite imagery" slice in the chart they show is slightly less than a third of the "historical imagery (satellite)" slice, but it looks like it's about half because of how the chart is rotated.
"3D imagery" is supposed to be slightly larger than the "Historical imagery (aerial)" slice, but instead, it looks smaller!
There's just no reason to use this type of visualization.
Valuable as in cost? Aerial imagery in general is very expensive, especially from satellites.
That's why the Earth/Maps team don't have an open protocol (or one reason). They aren't allowed to just give the imagery away for free: it's licensed specifically for that application. If you try and download the entire dataset there are systems in place that will fight you, to defend the rights of the imagery companies.
I would tend to think it's the fine Maps detail on every road in every country that Navigation works in - all of the details of every single intersection, how many lanes, what turns you can make from where, etc. The maps and images are expensive, but they already exist and can be bought, if you're willing to pay the price. That detail info doesn't even exist anywhere, at least not in any kind of consistent and reliable way. The only way to get it is to employ a virtual army of drivers to drive every single road in every country, record a ton of info about them all, and figure out how to store it all in a searchable format. And then keep doing it forever, because those roads all change periodically, and often nobody every documents how or where or when.
Wow, I had no idea how creepily awesome Google's 3D imagery had gotten. It used to only be skyscrapers in main cities. Now I can basically see in my own window.
I get part of the "creep" factor, but anyone walking down the street has a much higher resolution image of your house - and with real-time "video", to boot! Maybe people just don't think about that very often?
But somewhere Google is storing the original resolution data. Aerial imagery is at an even higher resolution, so even larger per square kilometer. They also are holding different resolution data for different zoom levels. They do not resample aerial imagery past certain zoom levels, they use a different source.
At approximately 3225x3225 pixels per square kilometer (31cm pixels in Worldview 3), that is approximately a 10 megapixel tiff, per square kilometer. Which I believe is roughly 35MB. 496Million square kilometers of land on earth not including antartica. Brings this over 17PB of raw data. And that is not including any historical data either. Before starting caching the different zoom levels of png's that are served out.
I know not all of the world is covered at that high a resolution, but there are spots with even higher resolution to make up for it. Plus historical data. And plenty of water included in the images not counted either. I highly doubt they are holding anything less than 30PB.
[1] https://www.digitalglobe.com/platforms/gbdx [2] http://www.networkcomputing.com/big-data/how-digitalglobe-ha...