Hacker News new | past | comments | ask | show | jobs | submit login
Over 500,000,000 assertions extracted from 100 million Web pages (washington.edu)
91 points by coderdude on June 10, 2010 | hide | past | favorite | 28 comments



My first attempt was "Who shot JFK?" The first answer I got was "President Kennedy shot dead JFK", which is at least a novel theory.


On the other hand it kind of correctly fingers Kristin as "Who shot JR" (although it considers the JR of "Secretary of Balloon Doggies" fame to be the more relevant).

Video is correctly tagged as killing the radio star, I'm mostly impressed.


It also correctly identifies the shooters of Abraham Lincoln, Ronald Reagan and Mr Burns.

In unsolved mystery news, it told me that Jimmy Hoffa is buried under Giants Stadium (I thought Mythbusters busted that one...) and John Karr did not kill JonBenet Ramsey. In answer to "Who was Jack the Ripper?" I was simply told "the fictional world is Jack the Ripper", which seems a little too abstract to be useful.


Predicate = Invented has some scary results for "reliable" sites:

Al Gore invented global warming (33)

Jews invented the Catholics (7), Christianity (5), the legend of the Holocaust (5)

Staples invented the office superstore concept (6)


Well, didn't Jews invent Christianity?


Why query open extraction data when there are billions of triples available via the semantic web?

Movie queries: http://www.snee.com/bobdc.blog/2008/11/sparql-at-the-movies.... Geographical queries: http://geosparql.appspot.com/ Misc queries: http://sparql.me/queries.php


Because noone gives a shit about the semantic web except its core suporters?


Whatever else it does, the site makes me immediately start thinking of prolog projects (not that I'll actually pursue them).


The results for 'what causes promiscuity' are both amusing and slightly worrisome.

  objectification of the self (2), Spring break parties (2) can lead to unintended promiscuity
  vaccine (2) will cause promiscuity
  sex education (2) causes promiscuity


Pondering the concept of "unintended promiscuity".


A nicer way of saying "gang rape"?


Does anyone have a query with an impressive answer? The ones I thought up have no results.


Argument 1: Paul Graham

Predicate: is

Top result: Paul Graham is Dead (17)

Which just goes to show what happens when you consider ancient slashdot comments to be an authoritative source, I guess...


Or you can just ask it "who is dead?"

You get 287 results, the top ten being: Queen, Hip Hop, Heath Ledger, Microsoft, Science, Democracy, Rosencrantz and Guildenstern, Jazz, All Humans, and Elvis.

(Incidentally, after that I had a sudden urge to check google news to make sure the Queen wasn't dead. She isn't.)


  TextRunner took 2 seconds. 

  Retrieved 1 result for what stops entropy?

  Grouping results by argument 1. Group by: predicate | argument 2

  laws of physics - 1 result

  laws of physics could n't stop entropy (2)

:(


What was invented in 1853? The potato chip.


Perhaps not coincidentally,

Toilet paper was invented in 1857 (4)

That's a pretty good set of search queries, even though I am personally doubting that steel was invented in 1856 and scheme was invented in 1854.


I did "What causes ulcers?" The results match what I would expect: stress being number 1 (if incorrect), H Pylori being number 3 (most common cause).


names of python libraries - only 1 result (PyRobot)

But "python library" returned 219 results. It also understands relationships in the text.

script requires the Python Imaging Library (3)

EAN bar code required PIL ( python imaging library (2)

python bindings require the C library (2)


"TextRunner searches hundreds of millions of assertions extracted from 500 million high-quality Web pages."


I got the title from this page: http://www.cs.washington.edu/research/knowitall/

"Demo: TextRunner extracted over 500,000,000 assertions from 100 million Web pages.

I didn't realize there was a discrepancy between the two pages when I submitted the URL.


Strangely, it has no answer for "What is textrunner?" (does not appear to be linkable, but I'm serious).


Since the assertions were extracted from Web pages it's reasonable to assume that the answer doesn't exist because no one talked about textrunner prior to it being made public. That, or the sample of Web pages didn't contain any pages that referred to textrunner. Another possibility is the structure of the sentence(s) containing a reference to textrunner were not sufficiently simplifiable.


It's looking to fill in the sentence for you, so when you ask "What is awesome?" it returns "This CD is awesome!" and "This game is awesome".

Nobody has said "that band is textrunner!" ... or "{anything} is textrunner".


I really wonder what “high-quality Web pages” are. `What is Hacker News?` doesn’t give any results either.


In the context I believe they mean pages with a large number of grammatically well-formed sentences.


"what has apps has apple banned?" didn't work :)


Another great application of CRFs. Go AI!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: