Google Squared

robotrout · on May 13, 2009

There are those who say that Google is a parasite that profits off the rest of the internet, displaying copyrighted content of others and managing to sell ads for the privilege. Of course, the people who say these things are quickly slapped down, by folks citing how, without Google, nobody would find those web pages anyway, and Google is doing them a great service.

I have to note that this sort of data scraping shown here, by Google, with no incentive left for anybody to actually travel to the original source web pages, seems to make the first groups case much, much, stronger.

swombat · on May 13, 2009

I'd like to take the opposite side:

I believe that once you publish data (particularly data, but even other kinds of information) on a public, indexed web page, you automatically relinquish control over how it will be used.

That's just reality, and it's also the most profitable way to view online published data, from a global perspective. It is better for all of us if the act of publishing data on the web grants an automatic licence to the downloader to mash it up any way he sees fit. The alternative scenario, where you have to ask for permission for every bit of data, is frightening.

Just because it's "big Google" who is doing the mash-up doesn't make it less ethical than if it was some start-up coming out with a new product (or, say, Wolfram Alpha).

robotrout · on May 13, 2009

I would submit that just because it's 'big Google' who is doing the mash-up doesn't make it MORE ethical, either.

Mash-ups are ethical only as long as they provide more money in the pockets of the people you are stealing content from. If they do that, fine. If they aren't, your mash-ups are not ethical, and neither are Google's..

swombat · on May 13, 2009

No, you're missing my point completely. I'm saying that it doesn't matter whether or not you line the pockets of the people who produced and published the initial data on the web. If it's published on the web on a public site, it is available for anyone to use.

Particularly when it's factual data, rather than, say, an article. You might recall that the copyright acts do not protect factual data.

robotrout · on May 13, 2009

Sure. That's true. Copyright doesn't protect factual data.

But there is what's legal, and there is what's ethical. If I traveled around the country, measuring the heights of roller coasters for my website rollercoasterheights.com, I did it to get people to come to my site. If that information is harvested from my site and displayed elsewhere, I've done a lot of work for nothing.

swolchok · on May 14, 2009

Perhaps next time you'll think harder about the likely reward for your work.

zackattack · on May 13, 2009

Can't you de-list yourself from Google? Or add a robots.txt file?

robotrout · on May 13, 2009

I suspect that instead of those, there might be more creative solutions.

I'm envisioning consultants, like SEO folks, but these are WDO folks. (Web Data Obfuscation). Web Data Obfuscation is the term I just made up for structuring your web page so Google couldn't scrape it easily. The page would still appear in the google index, as that is a benefit to you, so you would make available enough data to be indexed, but not enough to be usefully scraped.

calambrac · on May 13, 2009

Turning the Web into a giant database will crush any attempt to segregate the “best” information into a separate database so that it can be processed and searched more deeply.

This is pure speculation, and I personally doubt its true. First, much of the content of on the web amounts to echoing, filtering, and distorting high-quality source data. If you can tap that source data, are the gains you realize from what's been derived from it worth the extra cost of going out and doing the work to acquire it and then separate out the noise?

Second, there is a lot of real or perceived value in knowing where your data is coming from. Tons of companies pay tons of money to get data and statistics from sources that they can trust. Until Google's willing to vouch for these results beyond "we tried", the bespoke curatorial approach is going to keep capturing these dollars.

robryan · on May 13, 2009

I think it can rival a structured dataset if it is done correctly. The big advantage that google has, and I know there was a paper they put out about it a couple of months ago, is that they have access to such a large source of data that they can use a great deal of sources to weed out the noise by only using the stuff that appears over many sources.

Then you could also assign sources a reliability rating based on how acurite the information they provide is compared with other reliable sources.

Kind of like a new pagerank but for data integrety.

calambrac · on May 13, 2009

Don't get me wrong, I think it's very cool that they're doing this kind of work, and I hope it takes off. What I was really responding to was the idea that this approach would "crush" a more traditionally curated database. At least in the short term, I don't see that happening, because the number of really high-quality data sources is still pretty tractable, because a lot of what's on the web is just going to be echoing/distorting those primary sources anyways, and because trust is such a huge part of that market.

ErrantX · on May 13, 2009

The issue is though that Wolfram is not going to provide the mudane sort of data that people want in the instant way Google are going to claim to do it.

Search camera on Wolfram and I expect you get lost of data on the history of camera's and other such stuff. Google seem to be offering to provide a list of camera models with some pertinent data for each... I suspect the latter will be more "useful" (especially as Wikipedia would probably be fairly reliable for the camera background...)

I think your point is valid: a curated database will be the #1 source for info on camera history and facts. Unfortunately that means Wolfram is competing with Wikipedia not Google... and that is probably even worse for them :(

calambrac · on May 13, 2009

You make a really good point. For more consumer-oriented data, a more casual level of accuracy is probably acceptable, at least acceptable enough to hang ads off of. I had my head stuck in the world of data that people are actually willing to pay for directly.

martythemaniak · on May 13, 2009

Seems Eric has kinda missed the point. From what I've read, Wolfram Alpha is:

- get a bunch of structured, verified, curated data - use mathematica to understand and reason about the data - use NLP to expose mathematica to the web surfer.

What google wants to do is get a bunch of structured data from unstructured data. Great, but competition for Wolfram Alpha? Doesn't look like it right now.

But I guess we'll have a better view of this in a few weeks.

smhinsey · on May 13, 2009

he didn't miss the point, that is contemporary "journalism." in other words, the story is skewed towards the most controversial possible angle.

it beggars belief that, after dozens of stories on the topic, each with its own universe of commentary, most of which pointing out basically what you've said, that they are unaware that their premise is flawed.

snprbob86 · on May 13, 2009

1) This looks awesome. I can't wait to try it!

2) The interviewers are surprisingly immature and unprofessional. They are downright rude to the demonstrator and are clearly ignorant about the amazingly cool technology that they are privileged to be seeing.

skorgu · on May 13, 2009

They seemed more concerned about user interface decisions and whether it would be launching at IO than the actual technology behind it. I suppose that's a valid angle, it's just not one that appeals to me. And yes, they were quite rude.

JacobAldridge · on May 13, 2009

At a first glance, this seems to help take 'Google-fu' to the unwashed masses.

If I ever do a search as generic as 'camera', it's very closely followed by a more detailed search (eg, camera olympus "flash time"). Usually, however, my first search is the most detailed, and I become more vague if I require more results.

As I'm sure most of us are aware, this helps quickly deliver the outcome / answer we seek from searching, but I'm continually surprised at how many people bang a series of vague search phrases into a Search Engine and spend time sorting through the chaff.

Google Squared seems to prompt the sort of thinking I either do before hand, or as an immediate result of seeing 8.6M results. Google-fu enhanced.

(My favourite evidence of this is from Allyn Gibson's blog, which is routinely located by people Googling the phrase "things that happen on my birthday". Think about it, or read #5 on this list http://www.allyngibson.net/?p=1686 )

huhtenberg · on May 13, 2009

I am getting a strong feeling that Google is genuinely concerned about Wolfram Alpha, which makes latter that much more interesting. In the end Google's hyperventilation may create an exact opposite of the effect they are going for.

snprbob86 · on May 13, 2009

I get the strong feeling that this Google vs Alpha silliness is a manifestation of the internet media.

Google is likely carefully studying their competitors continuously and taking prudent actions as necessary. This product doesn't seem like an answer to Alpha as much as it does an interesting 20% project growing up.

huhtenberg · on May 13, 2009

I was not referring to the media coverage. I was referring to Google's practice of piggybacking on others' PR efforts - http://www.marksonland.com/2009/04/google_likes_to_steal_oth.... In Wolfram's case they did it twice in rapid succession. That is what's interesting.

ErrantX · on May 13, 2009

Perhaps that is not a case of worrying about WA: but using clever tactics to get their new products into media focus.

It matters not whether the "New Stuff" from Google is a competitor or reaction to Wolfram or completely unrelated. What matters is the media thinks it is competition and generates all this Google vs. Wolfram media hype :) Wolfram get some sympathy as the underdogs but ultimately it just pushes Google brand (they still just hold the public opinion of not being evil: so if you see a Google vs X discussion you imagine it is Good vs. Good battle and that both products are OK).

It's a tactic Google have always used - often to superb effect :)

anigbrowl · on May 13, 2009

Quite. This is very different from Alpha (and somethng I've been waiting for from Google for a while). Alpha's not trying to be Google, and this doesn't look like google trying to be Alpha.

You know what Alpha reminds me of most (as a tester - try it out yourself on 5.18)? 'Insert Field' in MS word, where you can insert the date & time, or a reference to a cell in an Excel spreadsheet, or some other piece of external data, and have it automatically update every time the document is loaded. It's sort of like having widgets but being able to call them up with a simplistic natural language interface.

Not all of Earth's <Alpha: world population> people will appreciate this, but it will probably be popular in <Alpha: countries with most universities per capita>. (see?)

alain94040 · on May 13, 2009

What's interesting to watch is the flip by a lot of Google supporters that, one week ago, were saying that unstructure search was just fine - thank you, and this Wolfram stuff had no merit.

Then today they flip and write everywhere that structured is the next big thing.

Ironic, that's all I'm pointing out. I don't know yet which technologie(s) will have an impact (note that it's not an either/or choice, both WA and Squarred can succeed).

paraschopra · on May 13, 2009

They would have started working on the product months before the world came to know about Alpha

aston · on May 13, 2009

The video demostrates some decidedly non-Googly slowness. If it takes 10+ seconds to load, I don't think very many people are going to be using it...

derefr · on May 13, 2009

Hypothesizing: it's doing things which require decidedly non-BigTable-like architecture. Most other Google apps in their alpha stages can just take advantage of running on, basically, App Engine; this seems like one of the ones that can't, so it has to run on a separate architecture that hasn't been scaled yet. That would be alright for some other company's demo—if they were anyone else, they'd just use limited, cached, pre-defined data sets. But they're Google, they have the raw data, and they wanted to show off the possibilities of working "from scratch" on anything they like in realtime. Thus, it was similar to an app without any scaling consideration as-of-yet (consider a normal startup webapp) drinking from the entire search index firehose on each query. Ten seconds is impressive.

Once it's deployed across Google's entire cloud it'll speed up nicely.

kf · on May 13, 2009

Like Google isn't going to buy/license Wolfram Alpha...

ErrantX · on May 13, 2009

Firstly I doubt wolfram would sell (but maybe license). Secondly Google are already facing grumblings about being a monopoly - acquiring or licensing Wolfram could backfire. And thirdly there is probably no need: Wolfram dont bring anything much to the mix (as far as I can tell) except a stack of "expert" curators - which could be hired easy enough :)

dbul · on May 13, 2009

You would have thunk Google would have bought loopt, too.

kf · on May 13, 2009

They bought a startup in that area a while ago, Dodgeball.

dbul · on May 13, 2009

What's your point? That they must buy one startup in each area in which they are interested? They don't have to buy anything.

zackattack · on May 13, 2009

Why would have Google bought Loopt?

dbul · on May 13, 2009

Exactly. Why would they buy Alpha?

Here you have two powerful companies, Wolfram and loopt. Google has interests in technologies similar to what these two companies offer. They didn't buy/license loopt's technology, so why should they buy/license Alpha? There is no basis for what rms said.

zackattack · on May 13, 2009

No, no, no.

1. Google is coming out with a new product called Google Squared. It's related to searching data sets.

2. Wolfram is coming out with a new product called Wolfram Alpha. It's related to searching data sets.

3. rms observed that #1->product and #2->product are related, and if their technology doesn't explicitly overlap, WA might be a good candidate for acquisition in order to broaden Google's hold on searching data sets.

4. You issued a non-sequitur. It's not related to the thread at hand, and I'm not even sure why Google would even want Loopt. (Please explain your rationale why Google would want Loopt, cuz I don't get it, but know that explaining it won't strengthen your argument against rms).

5. Sparknotes Version-- Your argument is this: Google didn't license Y's technology. Y's technology is from a powerful company *Therefore, Google won't license X's technology, since X is a powerful company.

6. What you're trying to say is that Google won't necessarily go around trying to license technology from every company that has intersecting interests.

dbul · on May 13, 2009

I'm not making such an argument as you are claiming. My argument is that it is absurd to make an assertion without a basis. The assertion about loopt was such an assertion.

(To satisfy your model:

1. Google is coming out with a new product called latitude. It is related to finding people around you.

2. loopt has a product called loopt. It is related to finding people around you.

3. rms observed that #1->product and #2->product are related, and if their technology doesn't explicitly overlap, loopt might be a good candidate for acquisition in order to broaden Google's hold on finding people around you.)

Not everything on Hacker News must be

  P1
  P2
  P3
  --
  Q

kf · on May 13, 2009

What's wrong with being absurd?

dbul · on May 13, 2009

First, the emphasis was on the lack of a basis for the comment. But to answer your question, it depends on whether you think something that may be borderline absurdity adds some kind of value to others:

http://news.ycombinator.com/item?id=567298

I knew when I wrote that it would probably be received poorly; indeed, it was volatile. But it was creative, and there was some truth to it.

HN is too simplistic. I would venture to say it ultimately doesn't work, with evidence in responses like "Pot, socially" being #2 in the top comments under lists. HN is fun to experiment with, though.

zackattack · on May 13, 2009

The Loopt bit was a non sequitur and is not related to the conversation.

dbul · on May 13, 2009

It was an analogy.

eggnet · on May 13, 2009

I wonder when google will publish the energy requirements of a google squared search.