Hacker News new | past | comments | ask | show | jobs | submit login

     already out of date and is of very limited use
Depends on the type of the information, news: yes,information about biology, or Geo-Information or encyclopedic: not so fast. on other hand we now have access to real time data.

     Weather scientists have been collection data since the beginning of the century and they are still no better at predicting the weather than they were at the beginning of the century. 
Predicting Weather is near impossible in theory forget the practice. its a chaotic system, generally in data science we are trying to predict things which we know can be predicted, such as determining whether an article is relevant to a topic for example. Humans can do that very well but for machines to do it you need more data.

     They have tons of data on markets but they still can't figure out what makes markets tick. 
Again markets are chaotic and are affected by things like low probability events. You can do predict some patterns using information asymmetry, thats what all those traders at Goldman Sachs and other firms do.

    We need generative models that explain how the data is being generated and why it is being generated in a certain way.
The systems that we are interested in have complex models, and generating them from first principles isnt always as easy, additionally even if you do generate you need to test them against real world data. Rather than using the normal hypothesis - experiment cycle, it makes more sense to look for predictable patterns in data.

    I don't think the results have been that impressive.
Thats because you are out of touch with the field. Look at Google's Statistical Translation results, for example, they beat every generative model around the town. Or netflix's prize for recommendation system.

As someone interested in Data Science and been quite involved with it for few years as a student and an intern let me try to explain why data science is now becoming an important area:

1. We now have a lot of data in a machine readable format:

Unlike few years ago we now have huge datasets publicly available they, you have topic specific datasets such as Geo Names, Linked Geo data to much more wide encyclopedic datasets such as Wikipedia Data Dump, Freebase, Open Cyc, DBPedia.

We also have huge amount of user generated data, E.g. I have on my computer right now a huge chunk of twitters follower network consisting of 35 million users(I am writing an open source whom to follow system). Additionally I also have 100 million tweets.

2. Not just data we now have access to real time data:

You can access the public twitter time-line, using their streaming api, there are quite a few pubsubhubub systems out there which combine information from disparate sources and provide you unified source.

3. Moreover we have tools to handle the deluge [well sort off]:

Thanks to Google, Apache, Yahoo, Facebook we now have Hadoop , Map Reduce, Pig and other tools which make job of parallelizing processing of the data easier.

5. We have a Scalable on demand infrastructure in place:

Using AWS we can buy processing power as needed. It would have been impossible earlier, I recently bought a high memory instance with 17GB Ram for 50 cents an hour for 10 hours to run some jobs. It would have been impossible few years ago. We can now also deploy web apps very easily using Google App engine and dont need to even pay a penny, this enables us to create nice interfaces for visualization and querying for the data at a low initial cost.

Finally we are slowly building infrastructure to sell datasets or custom apps. E.g. Amazon Dev Pay or Infochimps.

The name data science can though be misleading, if you are a student that would mean taking course in Machine Learning, Data Mining, Information Retrieval, Statistics, Distributed Computing, Databases.




Predicting Weather is near impossible in theory forget the practice.

Are you really saying that everyone from pre-historic hunter-gatherer societies tracking the seasons to modern meteorologists with their supercomputers and satellites have been engaging in something theoretically and practically impossible? That would surely be staggering news to everyone involved.


The details of which days are going to be sunny a month out, which will be rainy, and when storms will arrive is chaotic and impossible even in theory to predict in detail more than a few weeks away. Practice is worse, there we manage no more than a few days, and this has been true for several decades.

That said, there are larger trends that can be predicted. For instance the seasons which come from astronomical facts. Or the several year El Niño/La Niña oscillation. Not to mention relatively slow moving Rossby waves in the jet stream. (One of which is bringing hot weather to Russia and monsoons to Pakistan right now.) These give useful information about what is likely to happen and keep happening over periods of weeks, months, and to some extent years.

But none of this brings us any closer to the idea of being able to give an exact weather forecast for a day 6 months in the future. That goal is impossible. And has been known to be impossible for several decades. Furthermore I assure you that this fact is well-known to every competent modern meteorologist. (The word "competent" does not necessarily cover people chosen primarily for their appearance to deliver the weather report for local TV stations.)


According to this ECMWF planning document (page 14, figure 4):

http://www.ecmwf.int/about/programmatic/strategy/strategy.pd...

the forecast skill for ECMWF and NOAA have been improving pretty steadily over the last 15 years. Basically, we're seeing two days farther into the future now than 20 years ago.

I agree that chaotic dynamics and various noise sources limit the time horizon for weather predictions to perhaps 2 weeks.


I know what 'chaotic' means. Or where the seasons come from. The statement I was taking issue with is still complete nonsense. We can predict the weather, both in theory and in practice. He was saying that we can't and this is obviously untrue.


http://en.wikipedia.org/wiki/Chaos_theory

     To his surprise the weather that the machine began to predict was completely different from the weather calculated before. Lorenz tracked this down to the computer printout. The computer worked with 6-digit precision, but the printout rounded variables off to a 3-digit number, so a value like 0.506127 was printed as 0.506. This difference is tiny and the consensus at the time would have been that it should have had practically no effect. However Lorenz had discovered that small changes in initial conditions produced large changes in the long-term outcome.[43] Lorenz's discovery, which gave its name to Lorenz attractors, showed that even detailed atmospheric modelling cannot in general make long-term weather predictions. Weather is usually predictable only about a week ahead.[25]
Note this is different from predicting Global Warming, Global warming is a long term trend prediction, not what will be temperature at certain day at a certain place kind of prediction.


You should also read the wikipedia page on 'Weather Forecasting' while you're there. Or consider if you can predict whether the upcoming December in the Northern hemisphere might be colder than this last July.


read before you type, from my comment above:

     Weather is usually predictable only about a week ahead.
and about

     December in the Northern hemisphere might be colder than this last.
Thats not exactly a prediction, It is a seasonal variation due to inclination of earth w.r.t. sun. Thats same as some product will have higher sale before Christmas, or there will be more traffic on roads before thanksgiving.


Well, I think you said two things.

(1) "Predicting Weather is near impossible in theory forget the practice."

and

(2) Quoting wikipedia, "Weather is usually predictable only about a week ahead."

These two remarks are not quite opposites, but almost.


you are getting confused in semantics of word "prediction".

Weather in theory is an Chaotic system thus there is no simple laws such as F = Ma to predict it.

Time frames of week , seconds and years are meaningless. You can always argue that you can predict weather for next milli or micro or femto seconds, why stop at weeks? but that does not proves that chaos theory is wrong or that there is some generative law which can be used to predict weather without relying on data.

Also from a utilitarian perspective, a long term accurate weather forecast would be so much useful rather than just a week long range [Assuming that is accurate right now,]. But you dont find accurate prediction for monsoon in January, or even of say a hurricane a fortnight before.


I can barely understand what you're saying here, so I won't comment beyond this reply.

The time scale is critical. If you look at any review of numerical weather prediction accuracy, such as the one I linked to above, you will see it's a key parameter. It's why 3-day forecasts are excellent and 7-day forecasts are not very good. Good 3-day predictability does not imply good 7-day predictability.

This is no different than the pictures of the state trajectories of the Lorenz system (http://en.wikipedia.org/wiki/Lorenz_attractor) starting from two nearby states, which stay together for a time, and then diverge suddenly.

It's true, the relevant laws are the Navier-Stokes differential equations, not simple laws like F = m a. But, we can observe the boundary conditions and propagate the system state forward in time. These equations do constrain the future dynamics.


You can always argue that you can predict weather for next milli or micro or femto seconds, why stop at weeks?

The reason that I've heard for stopping at weeks is that that is the time frame for perturbations to work their way up from the quantum scale to the macro scale. We cannot, even in principle, measure what is happening everywhere on the quantum scale.

Of course as soon as you merge quantum mechanics and chaos theory, life gets very, very weird. See http://www.iqc.ca/publications/tutorials/chaos.pdf for more.


Sometimes these smaller perturbations are called "sub-grid phenomena". I think practitioners think of them as, say, pressure waves or flows which average to zero when observed on a 10km x 10km grid. But it is possible that their ultimate origin is on a much finer grid ;-)

Another problem, separate from these effects, is getting closure on the variables in the model. Things like evaporation from soils and vegetation, for example, which ECMWF is trying to include in their models. You put them in the weather model to improve accuracy, and all the sudden you need a time-dependent model for soil and vegetation water content. And also, sensors to satisfy the boundary conditions for your model (e.g., to estimate plant vegetation type).


thus there is no simple laws such as F = Ma

There are plenty of simple mechanical systems governed by mechanical laws that exhibit chaotic behaviour. You said a silly thing about weather prediction, it happens. There's really no need to dig yourself into some deeper hole of gibberish.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: