> First, we can combine sensor readings (if light reading is sub x then phone is not outdoors, for instance) and second, given appropriate volume we can arrive at valid averages – an answer that gets to the heart of what Big Data really means.
Even with volume I think this is a bigger problem than the article makes it out to be. The first thing that came to mind was a denser city area, in fact, density aside I think there is a significant number of people indoors (house, car, bus, train) at any given point in time. This data can be very helpful during the winter and summer to get a better idea of the RealFeel temp outside but during the winter people's phones are usually in their coats. Maybe there is some magic Big Data sauce that can be used alongside officially reported data but I'm a bit skeptical and curious at the same time.
As far as noisy data goes, I'd venture to classify this as very noisy.
Weather data is the most exciting thing in the world right now. There are hundreds of millions of internet-connected barometers right now, all over the planet, that did not exist just 2 years ago!
Non-linear systems are inherently non-deterministic. Even if you had one sensor for every cubic foot of the troposphere nonlinearity would quickly swamp your model forecast.
> Non-linear systems are inherently non-deterministic
Wait, what? That's not correct. The problem is that they are very much deterministic, so you get the classic situation of small variations in initial conditions having wildly different effects on the outcome, and we can never hope to measure all the things that can affect the weather a few weeks out.
However, we don't have to eliminate error, we just have to push the error below some threshold (+- a few degrees, within some margin of the actual distribution of rain over an area, etc) for some acceptable amount of time into the future. More data is definitely one part of the solution to the problem.
True, I worded that badly. Nonlinear systems are not random. The present determines the future, but the approximate present does not approximately determine the future.
> but the approximate present does not approximately determine the future
Again, though, that statement contains huge assumptions about the nature of the dynamical system and how approximate a future prediction we need to call it "accurately predicted". Errors accumulate, but we can get arbitrarily close by more accurately measuring where we are in phase space (which is what the OP was talking about) and by more accurately modeling the system. There will always be errors -- dynamics are hard -- but we can certainly reduce them for some distance into the future. That's why we can successfully put things into orbit around Mars, for instance, in spite of the many interacting bodies in our solar system.
Assuming an increase in accuracy even somewhat proportional to the increase in sensors will almost definitely turn out to be wrong, though, which may be more the point you're trying to make and I'm missing.
Given the wide variability of where my smartphone is at any particular moment, I really wonder how good any data it produces can get. I think forecast.io / Dark Sky handle this problem well - use standard data sources but combine them with user data to make things more accurate.
This may raise the anthropomorphic climate change histeria by another magnitude, by distorting the data with billions of sweaty palms, body heated pockets and proximity to humid junk, and general indoor climate control.
Eventually might even lead to personal carbon tax calculated from your phone data.
Even with volume I think this is a bigger problem than the article makes it out to be. The first thing that came to mind was a denser city area, in fact, density aside I think there is a significant number of people indoors (house, car, bus, train) at any given point in time. This data can be very helpful during the winter and summer to get a better idea of the RealFeel temp outside but during the winter people's phones are usually in their coats. Maybe there is some magic Big Data sauce that can be used alongside officially reported data but I'm a bit skeptical and curious at the same time.
As far as noisy data goes, I'd venture to classify this as very noisy.