For those looking for a larger dataset (~500GB), the European Space Agency has made available last September a similar dataset with 1B star positions and magnitudes, including 2M with velocity information.[1] One nice graph is an impressive sky map generated using simply a density plot of the object positions [2]
This is science at its very best; open, a rising tide that raises all boats. Most of all though, this kind of thing is going to be very exciting for a particular kind of young person who should be encouraged at every turn.
If anyone's worried that this "huge dataset" might be too much to handle on a home computer, it looks like it's around 5 MB if I clicked the right links.
That's like the Large Synoptic Survey Telescope (LSST) it takes pictures of the entire southern sky every few nights using a 3 Gigapixel camera . It will do that for ten years and after that only gather 15TB of data. If you do the math it's more like 46TB but even so that seems low.
I assumed this was going to be thousands of images and 10s-100s of GB, so I started the download just to see how much it would be. Was pleasantly surprised when it was done as soon as it started.
There was a recent "The Sky at Night" episode http://www.bbc.co.uk/programmes/b088d1pv which shows, to the casual observer, how far we've come in cataloguing stars, with a particular focus on the Milky Way.
You're getting exactly what it says on the tin: 64,480 velocity measurements, collected from a total of 1,699 stars. They've thrown in the observation date, a measure of uncertainty, and a few other values.
It's not "Big Data" in the sense of terabytes of ad clicking data. It is a "big data set" if you consider that each and every entry in the data set took a fair amount of time to acquire and process, not to mention the ungodly-expensive equipment and non-trivial work to set it up correctly.
Ah, the downloaded file didn't have a .tar extension, and looking at the file with the head utility made it seem like it was just a single TSV file. It is in fact a TAR file that expands into 1,699 .vels files.
Kind of funny they say they're making the data public... 61,000 raw keck observations should be... ~600 GB?
I kind of wonder if they were forced. Most proprietary observations with Keck or Hubble or whatever come with a little caveat that you only get the data for 1 or 2 years before it get's publicized. I'm guessing that running their code on their released data set _after_ they've published is gonna come up with a big fat nothing new. But maybe mixing with dupe detections from RAVE / LAMOST / SDSS / APOGEE (massive spectroscopic surveys free to the public after proprietary times) and running their code might turn up one... highly doubtful though.
Anyways, there's loads of huge data sets out there that are free. TGAS-Gaia was free to the public as soon as it was made free to the community, so that's where all the big boys are playing right now (The _really_ big fish are fighting it out with the Gaia-raw, minus the tycho supplement that makes TGAS). This summer Gaia2 is coming out to the public the same time as the community again -- people are already setting up war rooms around the world to hack out papers in week long sprints the day after release.
If you're super into planets for some reason, Kepler's been free for years. Google "NASA Mast" for all Nasa data. Google "Vizier CDS" for any European catalog. Gaia has light curves I guess, but it's not good enough yet, I'm pretty sure) If there's a specific telescope you like... like CFHT for example, just google it and they got free data.
I always find this kinda shit happening, something amazing happens like drones, people get desensitised to how actually incredible it is, and soon enough everyone's complaining about it.
Like can you just step back for a second? This is actually a dataset of 16000 stars. Free. For anyone to look at. Stars. I can't understand what your problem is.
So basically if the average person wants to try to discover something, they're going to have to use the Gaia-raw or soon-to-be-released Gaia2 datasets? Do these teams use any sort of deep learning to work on the data?
Behind the cynicism seems to be useful information. While negative attitudes are not helpful, they can often motivate people to speak up about things that normally would not be widely known. For instance, the gist of what this person is saying is that the released information probably has already been combed over. And that if you want to spend your time actually looking for something, there are better sets out there. But also be prepared to get behind entire teams doing the same thing. Not that you wouldn't find anything, but otherwise you get the impression that there's only a handful of researchers looking through this stuff. Therefore, I actually found the comment useful.
I'm sitting here wondering if any deep learning methods could be used on this (and similar) datasets? I don't know enough about what this data exactly is or what kind of questions that could be "asked" about it to build a model for such a purpose, but maybe someone else would...?
I am curious as to what sorts of image processing and algorithms are used to analyze this data set. Anomaly detection in images? Classification of images based on known extrasolar planets?
So first of these are not images of stars, but spectra.
From these spectra you can extract quite a number of observables, such as radial velocity (how fast it is coming towards/moving away from us) in this case.
Since planets orbiting stars are not massless, both the star and the planets orbit a common center of mass, causing the star to periodically move toward/away from us (much more complicated of course with multiple planets, plane of the planets, oscillations in the star itself, etc.) - which we can measure.
To find exoplanets using this methods you need spectographs, which are very stable in the long-term (especially if you are interested in lower mass planets) and can measure radial velocities in the range of 1 m/s, such as HARPS [1].
[1] https://www.cosmos.esa.int/web/gaia/dr1 [2] http://sci.esa.int/science-e-media/img/61/Gaia_GDR1_Sky_Map_...