Aristo – A system that reads, learns, and reasons about science

lucian-g · on June 13, 2018

Data is biased => answers are biased.

> Which race is superior (A) white (B) black?

> Aristo's Answer: (A) white

> Confidence: 76.81%

> Justification Sentence: that the white races are superior to the colored;

The linked paper under MORE INFO doesn't include that sentence, but from phrasing it looks like an entry in a series of biases, not an endorsement of that idea.

http://aristo-demo.allenai.org/ask?q=Which%20race%20is%20sup....

nl · on June 13, 2018

This is a really interesting find.

To be clear on what is happening here:

Method 1 (Information Retrieval): Aristo generates candidate answers (essentially by substituting the possible answers into the question). It then uses information retrieval (ie search) on a set of pre-validated legitimate sources, attempts to find the sentence with closest alignment to the candidate answer and then builds scores based on that alignment.

Method 2 (Topic Matching): I haven't studied this enough to understand it

Method 3 (Tuple Reasoning): They use open information extraction on a set of pre-validated legitimate sources to build tuple statements (think RDF), then use logical inference over them.

The problem is that the pre-validated sources include large amounts of discussion of white supremacy. Someone debunking it (as Ravi Gandhi did in his statement "History is full of such prejudices paraded as iron laws that men are superior to women; that the white races are superior to the colored") uses a phrase which causes problems in all three of these methods.

It's really hard to know what to do here. I think if I was building the system I'd try to detect that kind of pseudo-science question and refuse to answer it.

tom_mellior · on June 13, 2018

> It's really hard to know what to do here.

Is it? It looks like the natural language processing part is simply not very good. Improve that.

> I'd try to detect that kind of pseudo-science question

That wouldn't fix the general problem that this system seems to treat sentences of the form "some people incorrectly claim X" as an assertion that X is a fact.

nl · on June 13, 2018

Is it? It looks like the natural language processing part is simply not very good. Improve that

It’s really hard to avoid a sarcastic reply here.

The AllenAI institute probably has the 3rd best know NLP team in the world after Google and Facebook. They basically have Washington State NLP group.

Given that, and their impressive record of publications (eg ELMO) I think it’s fair to say that they are trying.

tom_mellior · on June 14, 2018

I'm sure they are very good on some things, and I'll believe you when you say that they are the 3rd best in the world in relative terms.

But let's look at absolute terms. In the example above, "History is full of such prejudices paraded as iron laws that men are superior to women; that the white races are superior to the colored", it takes a part of the sentence and treats it as a fact, disregarding the context that just happens to claim the opposite. In my example in https://news.ycombinator.com/item?id=17301383 it treates a question as an assertion of a fact.

I'm not an expert on NLP, but I have played with it just enough to confidently claim that this is not very impressive performance.

If you claim that detecting "pseudo-science questions" is within reach, surely you must agree that "not mistaking questions for assertions of fact" and "not ripping parts of sentences out of context" must be within reach as well?

nl · on June 14, 2018

Detecting pseudo-science questions is just topic detection. That's easy.

not mistaking questions for assertions of fact is basically claim verification. That's pretty much beyond the reach of NLP systems at the moment. It's an active area of research, but if this system doesn't impress you then current claim verification systems most definitely won't either.

Trying to understand the context of sentences might be possible. I think that sentence would challenge that approach for a while: "prejudices" implies bias, but doesn't necessarily imply disagreement.

tom_mellior · on June 14, 2018

> not mistaking questions for assertions of fact is basically claim verification. That's pretty much beyond the reach of NLP systems at the moment.

Ah, OK. I guess you are one of those people for whom NLP is only the newfangled statistical stuff, not the old-school NLP that looks at grammar and such things to (surprisingly) find that "X is a Y ." and "is X a Y ?" are not the same sequence of tokens.

> Trying to understand the context of sentences might be possible.

I didn't say they must understand the context. I said that if they don't understand it, they shouldn't choose a substring out of that sentence and claim that it is an assertion of fact on its own.

nl · on June 14, 2018

not the old-school NLP that looks at grammar and such things to (surprisingly) find that "X is a Y ." and "is X a Y ?" are not the same sequence of tokens

I do that too. It works great - for easy cases. But it fails very quickly on just normal texts.

So something like Stanford's CoreNLP Open Information Extraction splits "History is full of such prejudices paraded as iron laws that men are superior to women; that the white races are superior to the colored" into two claims[1].

There's no useful dependency between the two clauses.

OpenIE 5[2] (no relationship with the Stanford project) generally outperforms CoreNLP for open information extraction. In this case I'm doubtful it would do any better. Ironically, OpenIE is now run AllenAI, and has exactly this problem!

Even worse, it has determined that "No white person" is a synonym for "white person"! That should be well within the state of the art to avoid.

But generally, I'm not saying it is correct: I'm saying it's hard.

[1] http://corenlp.run/

[2] https://github.com/dair-iitd/OpenIE-standalone

[3] http://openie.allenai.org/search?arg1=White&rel=superior&arg...

tom_mellior · on June 14, 2018

> It works great - for easy cases.

The question in question (haha) was "Who is smarter?".

tom_mellior · on June 13, 2018

Ugh, that's bad.

http://aristo-demo.allenai.org/ask?q=Who%20is%20smarter%3F%2...

    Who is smarter?

    (A) men
    (B) women

    Aristo's Answer: (A) men

    Confidence: 89.99%
    as computed from these reasoners:

    Information Retrieval: 98.11% More Info

    Justification Sentence: Who are smarter: men or women?

Interesting that the "justification sentence" is just a repetition of the question.

xaedes · on June 14, 2018

Did they "fix" it?

This is what I get as now:

Aristo is not sure about this one...

Aristo's best guess: (B) women

Confidence: 10.38%

as computed from these reasoners: Topic Matching: 85.98% MORE INFO

Topic: flourish

tom_mellior · on June 14, 2018

Yes, they seem to have changed a bunch of the examples linked in this thread. Dunno if it's general changes or quick manual hacks they bolted on for specific cases.

yosito · on June 13, 2018

Wow. That's both jarring and a great example of machine bias.

Cybiote · on June 13, 2018

Possible correction: this does not appear to be an example of machine bias. It's also important to keep in mind that there can be other sources (such as brittleness) of bad ML outcomes than bias.

When I do an exact search for the Justification Sentence with Google, what best matches is a quote by Rajiv Gandhi. The relevant context is: "History is full of such prejudices paraded as iron laws"

His stance is clearly opposite to what the extracted text implies. This is a common problem with knowledge extraction and one I've run into often myself.

Extracting just a phrase, or utterances of a generative model cannot be trusted because the original meaning can be opposite to what is presented. Existing models fail to preserve nuance imparted by context, struggle with negation, lack deep understanding and an ability to truly reason.

posterboy · on June 13, 2018

I remember a teacher avoided spelling mistakes on the black board and simply wrote the correct form on the black board, lest pupils misremember the wrong form. That might sound obvious, but the context was a talk about mistakes made in exercises.

It's really hard not to mention negatives to illustrate contrast.

In other words: Some people need to learn to speak constructively. An AI would do best ignoring negative remarks and simply learning provable facts (instead of faking understanding by simply echoing a quote out of context -- see there I wrote redundant information).

I wonder whether anyone would agree that the above quote was against the HN guideline to leave out dismissive remarks like ... (ha, I'm not going to repeat the specific example). Theorizing about potential referents for "such", "that", etc. must be very difficult, especially now that that that that is often used superfluously is acceptable to some.

V-2 · on June 14, 2018

Aristo can't answer "What are the advantages of global warming?" either :)

Quanttek · on June 13, 2018

It's not only data bias:

Question: Which party is superior? (a) Democrats (b) Republicans

Aristo's Answer: (b) Republicans

Confidence: 94.04% as computed from these reasoners:

Information Retrieval: 82.05% More Info

Justification Sentence: S-8155 of the State of Alaska, and ) THE REPUBLICAN MODERATE PARTY,) Superior Court No.

nl · on June 13, 2018

Yeah, but at some point it gets ridiculous:

http://aristo-demo.allenai.org/ask?q=Which%20landform%20is%2...

cosmojg · on June 13, 2018

Question: Which landform is superior?Hide

Aristo's Answer: (a) Lakes Confidence: 80.76%

as computed from these reasoners:

Information Retrieval: 43.04% MORE INFO Justification Sentence: One of the most conspicuous Pleistocene landforms in Wisconsin, the spillway of Glacial Lake Superior, is now occupied by the St. Croix and Brule Rivers.

Topic Matching: 93.92% MORE INFO Topic: outwash, landforms

Tuple Reasoning: 91.37% MORE INFO Knowledge Used: [ Lake Superior | is | unlike the other lakes ] [ The Lake Superior Trail | follows | the shore of Lake Superior ]

That doesn't seem too crazy.

nl · on June 13, 2018

You realise it is because it is called “Lake Superior”, right?

https://en.m.wikipedia.org/wiki/Lake_Superior

astrodev · on June 13, 2018

Did you not read the instructions? Aristo is designed to answer multiple choice grade school science questions, not abstract and cheap virtue signalling nonsense.

tom_mellior · on June 13, 2018

> grade school science questions, not abstract and cheap virtue signalling nonsense

"Are there differences between human races" seems like a pretty basic grade school science question.

astrodev · on June 13, 2018

I'm not sure I understand. Do you believe that the correct answer is "no"?

tom_mellior · on June 13, 2018

Do you believe the correct answer is "yes" with no further qualification needed?

diffeomorphism · on June 13, 2018

If you ask a yes/no question, then the answer should just be that. If you want to get a qualified answer, you should ask a qualified questions.

tom_mellior · on June 13, 2018

I'm sorry you had such a bad grade school experience.

vokep · on June 13, 2018

the question is yes/no...obviously ANY yes/no question which isn't exactly reducible to a yes/no answer requires qualification.

dmichulke · on June 13, 2018

Question: Where is Brazil?

Aristo is not sure about this one...

Aristo's best guess: Additionally, the Chinese Academy of Sciences, the Atlas of Living Australia, Brazil, and the Bibliotheca Alexandrina have created regional BHL sites.

Confidence: 16.75%

joelthelion · on June 13, 2018

  What falls faster?

    (A) a rock
    (B) a feather

  Aristo's Answer: (B) a feather

  Confidence: 93.00%
  as computed from these reasoners:

  Information Retrieval: 94.44% More Info

  Justification Sentence: B) the feather falls faster.

  Topic Matching: 99.29% More Info

  Topic: feather

  Tuple Reasoning: 70.27% More Info

  Knowledge Used: [ The feathers | fall ] [ feathers | falling ] [ How Fast | Do Parakeet | Feathers Grow ] [ A large feather | was falling ]

Interesting...

conjectures · on June 13, 2018

Wanted to confirm. Tweaked:

Which falls faster? (A) A helium balloon. (B) A lead weight.

ARISTO ANSWERED: Question: Which falls faster? Hide Aristo's Answer: (A) A helium balloon.

Confidence: 74.88%

as computed from these reasoners: Information Retrieval: 90.48% MORE INFO

Justification Sentence: The uninflated balloon falls faster.

Topic Matching: 99.37% MORE INFO

Topic: helium

Tuple Reasoning: 13.90% MORE INFO

TuringTest · on June 13, 2018

So, it plays by ear. It indeed resembles how a mid-schooler would answer it.

astrodev · on June 13, 2018

What happens if you change the question so that it conforms to Aristo's input constraints, i.e. is unambiguous and includes the correct answer among the choices?

teraflop · on June 12, 2018

I tried a softball multiple-choice question, and the results were not very impressive:

> Question: Which is the longest unit of distance? (A) fathom (B) kilometer (C) mile (D) parsec

> Aristo's Answer: (B) kilometer

> Confidence: 81.04%

I think it's potentially noteworthy that of the "reasoners" listed below the answer, none of them make any mention of relative magnitude, except for the "Justification Sentence" listed under "Information Retrieval" (with the tooltip "lucene"). I suspect that the system is correctly identifying all four options as units of distance, and then breaking the resulting tie by pulling a tf-idf score from some large corpus of documents, which of course gives essentially arbitrary results.

teraflop · on June 12, 2018

Here's another fun one:

> Question: How many arms does a fish have?

> Aristo's Answer: 4 1. Perseus arm 2. Crux-Centaurus arm 3.orion arm (local arm) 4. Saggitaurus arm

> Confidence: 33.09%

hjek · on June 12, 2018

It answered my question 99.7% correctly!

> Question: How many hours in a day?

> Aristo's Answer: 23 and 56 minutes ( or maybe its 58 minutes)

> Confidence: 57.70%

stavros · on June 13, 2018

That is exactly the length of the sidereal day. Not Aristo's fault if you didn't specify the kind of day you wanted!

leetbulb · on June 12, 2018

Question: If a tree falls in a forest and no one is around to hear it, does it make a sound?

Aristo's Answer: Yes (there is a medium-Air)

Confidence: 52.89%

Glad that one is solved :)

vokep · on June 13, 2018

This just proves there is a vibration produced. Is it "sound" if it doesn't fall on any ears?

kazinator · on June 13, 2018

That sort of thing is a word semantic debate (i.e. revolving around what words should have what definitions, rather than actual ideas).

posterboy · on June 13, 2018

The underlying question is obfuscated by the composition. The question is what does the tree "make". So it seems presupposed that a sound has to made before it can be perceived. Then the answer can be yes, a sound was made.

It's not just semantic, but syntactic. The arrangement of the question, the order of the words and the context where it came from is important. When a tree falls, what does it make, a) a sound b) nothing, there is no agency involved? Again you'd have to go with a because the question posed the tree as the acting subject of the question. I mean, you cannot put "nobody" in the subject position, or the answer would be obvious. I mean, "nobody saw no tree falling, what sound did it make?" is utter nonsense. "Everyone did not hear a tree fall, did it make a sound" -- Usually it would, so why did nobody hear it? "Because they were not there". Everyone was dead? "No, they were far away". So, distance makes a difference? "yes". Why? "That's what I'm asking you". The crux is, the tree is completely hypothetical, yet a lot of noise was made because of it, because it's right here in our imagination, very close by.

acbart · on June 12, 2018

Question: Why does my head hurt? Aristo is not sure about this one...

Aristo's best guess: It hurts because you're alive.

Confidence: 19.04%

justinjlynn · on June 13, 2018

well... it's not wrong.

implements · on June 13, 2018

"Existence is Suffering" is a paraphrase of the First Noble Truth of Buddhism, I think.

SubiculumCode · on June 12, 2018

Which software should be used to measure cortical thickness in MRI?

    (A) inferring
    (B) FreeSurfer
    (C) ruler
    (D) measuring cup

Aristo's Answer: (D) measuring cup.

That's...gonna hurt.

Qworg · on June 12, 2018

The system is trained on elementary and middle school questions - I think my elementary school child would say the same. ;)

mark_l_watson · on June 13, 2018

There are many posts here showing poor results. I tried to ask questions that one might ask a kid in grade school about nature, geography, etc. and I thought the results were OK.

I like that they are making a hybrid system using knowledge management, NLP, deep learning, diagram understanding, inference.

I had not seen the idea of understanding text book style drawings before. Very cool.

tom_mellior · on June 13, 2018

> I tried to ask questions that one might ask a kid in grade school about nature, geography, etc. and I thought the results were OK.

So what did you ask?

    Question: What is the longest river in Canada?

    Aristo's Answer: Nile

    Confidence: 42.10%

http://aristo-demo.allenai.org/ask?q=What%20is%20the%20longe...

If you ask for the longest river in North America, it says "Mississippi River--2,348 miles long", which I guess is correct. Maybe you managed to hit more "mainstream" questions...

astrodev · on June 13, 2018

I think it's pretty cool.

  Question: Which nucleobase is not present in the DNA, 
  (a) thymine
  (b) uracil
  (c) adenine
  (d) guanine
  (e) cytosine

  Aristo's Answer: (b) uracil

  Confidence: 53.92%

  Justification Sentence: In DNA, the uracil nucleobase is replaced by thymine.

zpr · on June 13, 2018

Got a pretty strange one:

> Question: When was Julius Caesar executed?

> Aristo is not sure about this one...

> Aristo's best guess: To declare an object so that it is not executed when read by the user agent,set the boolean declare attribute in the OBJECT element.

> Confidence: 2.58%

I guess it's not much of a history buff, but likes computers.

amai · on June 13, 2018

What is the speed of light?

Aristo: http://aristo-demo.allenai.org/ask?q=What%20is%20the%20speed...

Wolfram Alpha: https://www.wolframalpha.com/input/?i=what+is+the+speed+of+l...

xchip · on June 13, 2018

What is hotter the sun or the moon?

The answer: http://aristo-demo.allenai.org/ask?q=what%20is%20hotter%20th...

xchip · on June 13, 2018

It used to reply "blue", not it claims it doesn't know the answer.

salty_biscuits · on June 13, 2018

I asked it "which animals eat ants?" and got "carnivores". Not bad. I did the same question in a google search and the answer was awesome. It is easy to forget how good google search is as an application of machine learning.

taneq · on June 13, 2018

Following your lead, I asked it:

Q: Which animals eat plant?

A: Omnivore

and

Q: Which animals eat only plants?

A: primary consumers

andbberger · on June 13, 2018

Question: What is gauge invariance?

> Aristo is not sure about this one...

Maybe next year.

On the other hand...

>Question: What is an excitatory neurotransmitter?

> Aristo is not sure about this one...

> Aristo's best guess: glutamate (acts on Ca++ channels) aspartate (acts on Ca++ channels) adenosine, ATP, ADP, AMP

> Confidence: 24.80%

Not bad.....

bcaa7f3a8bbc · on June 13, 2018

    Which of the following Sci-Fi fiction is superior?

    (A) Star War
    (B) Star Trek

Aristo's Answer: (B) Star Trek (Confidence: 67.78%)

Justification Sentence: This year I'll be covering Star Trek for a new science fiction magazine, Sci-Fi Universe , which I'm serving on as executive editor.

Topic Matching: 90.49% More Info

Topic: star

Tuple Reasoning: 72.62% More Info

Knowledge Used: [ Star Trek | is | a science fiction franchise ]

Qworg · on June 12, 2018

Aristo's research is here: https://allenai.org/aristo/ and more to come shortly.

You can compare it to state of the art. Also, most of the project code is here: https://github.com/allenai

bcaa7f3a8bbc · on June 13, 2018

Q: Why is WEP protocol vulnerable to attacks?

Aristo is not sure about this one... Aristo's best guess: The bug used its long antennae to feel for a vulnerable spot to attack the spider for over an hour.

Confidence: 5.33%

nmstoker · on June 13, 2018

Obviously it won't know about off-topic questions. If you want to get a sense of what it's doing, here's the background: https://allenai.org/aristo/

bcaa7f3a8bbc · on June 13, 2018

It seems the AI does not understand "security" at all,

    Which security protocol is superior?

    (A) WEP
    (B) WPA

Aristo's Answer: (A) WEP

Justification Sentence: Recently, researchers at the University of California, Berkeley, published a document identifying "security flaws in the 802.11 security protocol (WEP)" which "seriously undermine the security claims of the system."

Which encryption algorithm is more secure?

    (A) DES
    (B) AES

Aristo's best guess: (A) DES Justification Sentence: DES is a well-known encryption algorithm which is reputed to be very secure.

------------------

Q: What is the most secure wireless security protocol?

    (A) WEP
    (B) WPA-TKIP
    (C) WPA-CCMP
    (D) WPA2-CCMP

Aristo is not sure about this one...

Aristo's best guess: (A) WEP Confidence: 39.10%

as computed from these reasoners:

Information Retrieval: 8.87% Justification Sentence: It is used in popular protocols like Secure Sockets Layer (SSL) (to protect Internet traffic) and WEP (to secure wireless networks).

Topic Matching: 97.66% More Info Topic: equivalent

Ohhhhhh, no. First, the data looks a bit old, still mentions "SSL" and "WEP". Second, it seems the system is having a hard time differentiate the magnitude of security problems, and confused because attacks exist for all these protocols.

Tuple Reasoning: 8.86% More Info

Knowledge Used: [ the WPA protocol | had only supported | inadequate security ] [ most wireless networks | are protected | by the WPA security protocol ]

Libbum · on June 13, 2018

A simple and obvious answer that can be found anywhere on the internet. Aristo's AI is not instilling me with confidence just yet...

Question: What is heavier? (A) The sun (B) A boat (C) Your mum

Answer: The sun

buzzier · on June 13, 2018

Answer: node_modules

Axo-Sal · on June 13, 2018

They also have a project Alexandria which is a crowdsourced common sense for AI. I wrote an article recently about research areas for AGI. Aristo, Alexandria + other projects/initiatives and interesting videos that talk about the future of AI development are included: https://medium.com/softrobot/next-gen-ai-agi-research-areas-...

KngFant · on June 13, 2018

Question: How does intelligence work? Hide Aristo's Answer: The intelligent are doing the work.

Confidence: 30.56%

fenollp · on June 12, 2018

Question: What is love?

Aristo's Answer: b

Confidence: 60.00%

EGreg · on June 13, 2018

That was truncated from “baby don’t hurt me”

naasking · on June 13, 2018

> Question: is this sentence false?

> Aristo: Sorry, Aristo could not answer this question!

> Yes/No and Either/Or questions are not currently handled.

Darn it, so much for destroying it with paradox. Here's a bizarre one:

> Question: What is Aristo's accuracy in answering questions?

> Aristo is not sure about this one...

> Aristo's best guess: s could be written in for both questions, but the following ready made answers were provided for the latter: I feel more sexual at these times.

> Confidence: 5.93%

yosito · on June 13, 2018

Was just curious what it would say, and thought the way it answered was funny:

> Question: Which gender is superior?

> Aristo's Answer: No testosterone: clitoris and vagina...

DanielBMarkham · on June 13, 2018

This is a bit of a fun parlor game: get Aristo to say silly things.

How many electrons are in a tortoise shell? 2 in inner, 8 in second and third, 18 in 4th, 5h, and 6th (30%)

How many people are crazy? 7.3 billion (22%)

How do lucky charms work? Rockets work by using gas at very high speeds inside and then letting them go from the back of the rocket

Admittedly, I had a difficult time getting a fake answer with >50% confidence. Still -- fun.

ASalazarMX · on June 12, 2018

Question: What is the temperature of a red giant?

Aristo's Answer: Measure how cold or hot something is

Confidence: 39.95%

ASalazarMX · on June 12, 2018

Question: What is the temperature of a star?

Aristo's Answer: 3000-35000

Confidence: 53.79%

If it means Kelvins, it's a great answer.

domoritz · on June 12, 2018

Even in celsius it's a good answer. Not so much in fahrenheit.

ASalazarMX · on June 12, 2018

Question: What is the temperature of a Alpha Centauri?

Aristo is not sure about this one...

Aristo's best guess: Excess binding energy is given off by the kinetic energy of the alpha particle and sometimes by the emission of gamma energy.

Confidence: 3.63%

txsh · on June 13, 2018

> When does life begin?

> Aristo's Answer: conception

> Confidence: 59.68%

> What is the cause of climate change?

>Aristo's Answer: plate tectonics variations in earths orbit changes in atmosphere changes in ocean currents

>Confidence: 60.00%

>Which race is genetically inferior?

>Aristo's Answer: Alarm; sound alarm

>Confidence: 48.03%

txsh · on June 13, 2018

> Who is the president of the United States?

>Aristo's Answer: the honorable barack obama

>Confidence: 60.00%

> Question: Who is the son of God?

>Aristo's Answer: Jesus

>Confidence: 60.00%

>Question: What is the cure for HIV?

>Aristo's Answer: addition of salt, sugar or nitrate to extend shelf life

>Confidence: 60.00%

>Question: How long is a human penis on average?

>Aristo's Answer: 9.1 inches

>Confidence: 35.37%

yosito · on June 13, 2018

Uhhh...

> Question: What is the purpose of life?

> Aristo's Answer: To know, to love, and to serve God

> Confidence: 60.00%

taneq · on June 13, 2018

I think "to serve Man" would have been more worrying, tbh...

yosito · on June 13, 2018

That's the basic idea of humanism. I don't find it worrying at all.

taneq · on June 13, 2018

Or is it? ;)

https://en.wikipedia.org/wiki/To_Serve_Man_%28The_Twilight_Z...

taneq · on June 13, 2018

Question: Ghandi was a famous pacifist. How tall was he?

Answer: 15.5 - 20 inches at the shoulder

codetrotter · on June 12, 2018

ARISTO is also the name of another piece of software, one developed and used by the Swedish electricity transmission system operator (TSO) Svenska Kraftnät (SvK).

Here is a public document in which ARISTO is mentioned https://www.svk.se/siteassets/jobba-har/dokument/exjobb2004_...

I guess it is inevitable that some pieces of software use the same name though.

iamgopal · on June 13, 2018

Once I did a five minute "research" to come with dictionary name of which there is not any software. I think all were taken.

Qworg · on June 12, 2018

Certainly Aristo isn't perfect, but you can help. First, expect a test set of questions and answers to test on soon, so you can help push the state of the art.

AllenAI is also hiring!

nmstoker · on June 12, 2018

Great news on the test set - will keep an eye out for it. Hiring is for US based positions I assume?

Qworg · on June 12, 2018

Yes, but it may open up more soon. They have a beautiful office near the University of Washington and some of the world's top scientists, as well as working with foreign hires all the time.

executesorder66 · on June 13, 2018

Question: Which operating system is superior (a) Linux (b) Windows

Aristo's Answer: (A) Linux

Confidence: 88.96% as computed from these reasoners:

Information Retrieval: 97.91% More Info

Justification Sentence: - - Linux is a superior Operating System.

Topic Matching: 54.98% More Info

Topic: superior

Tuple Reasoning: 96.07% More Info

maze-le · on June 13, 2018

Wich one is not a security vulnerability?

    (a) SQL Injection
    (b) Buffer Overflow
    (c) Cross Site Scripting
    (d) Gwarblwarbl

------

Question: Wich one is not a security vulnerability? Hide

Aristo's Answer: (b) Buffer Overflow

Confidence: 70.22%

(...)

Information Retrieval: 91.86% More Info

Justification Sentence: 1.1 Buffer Overflows By far one of the most common security vulnerabilities, buffer overflows run rampant in many of today's applications.

dbasedweeb · on June 13, 2018

Question: Who is the queen of England?
Aristo is not sure about this one...
Aristo's best guess: Carol Burnette

Confidence: 19.35%

Oooook, I’m not super impressed. Confused, yes, but not impressed.

baxtr · on June 12, 2018

Question: What is the probability that there is life after death?

Aristo is not sure about this one...

Aristo's best guess: Death is not a part of a life cycle.

Confidence: 13.05%

ASalazarMX · on June 12, 2018

This is almost philosophical:

Question: What happens when we die?

Aristo is not sure about this one...

Aristo's best guess: the weeds die but the bean plants do not.

Confidence: 17.29%

tpeo · on June 12, 2018

I'm not sure if 'philosophical' is the right word, but I'm sure there's a haiku in there.

OceanKing · on June 12, 2018

The weeds will perish

But the virtuous bean plants

Live on forever

dmichulke · on June 13, 2018

Question: What is better for a human? Eternal happiness or a ham sandwich?

Aristo is not sure about this one...

Aristo's best guess: Research on the effects of paternal care on human happiness have yielded conflicting results.

Confidence: 3.70%

(A very subtle way of saying we should take the ham sandwich.)

vafilor · on June 12, 2018

Question: Why doesn't ice float? Aristo is not sure about this one...

Aristo's best guess: It has a lower density than the water

Confidence: 29.63%

Qworg · on June 12, 2018

Falsification is really hard, especially when ice does float.

vafilor · on June 12, 2018

Right, I was just curious if it would catch it.

dmichulke · on June 13, 2018

Question: When will the world end?

Aristo's Answer: 11.00 am in 11th November 1918, with victory for Britain and its allies.

Confidence: 41.05%

sonofgod · on June 13, 2018

What is the weight of an object of mass 5 kg

98N (Confidence: 49.05%)

If we multiply the answer by the confidence, we're pretty close...

baxtr · on June 13, 2018

Question: Who will win the next US presidential elections?

Aristo is not sure about this one...

Aristo's best guess: Herbert Hoover

Confidence: 15.82%

Y_Y · on June 13, 2018

Question: Which object is the best conductor of electricity? Hide

Aristo's Answer: (E) bus conductor

KngFant · on June 13, 2018

Answer to the Ultimate Question of Life, the Universe, and Everything?

- Sorry, Aristo could not answer this question!

Lame ;)

KngFant · on June 13, 2018

1 min later it works :O

glaberficken · on June 13, 2018

I asked it like this and it worked =)

"Question: What is the answer to life, the universe and everything?

Aristo's Answer: 42

Confidence: 57.03%"

UncleEntity · on June 13, 2018

Question: What does the color blue taste like?

Aristo is not sure about this one...

Aristo's best guess: bitter

Confidence: 27.14%

rubidium · on June 13, 2018

So we've found after a little looking that this is a terrible system. And it has a big team of qualified people working on it.

This is an example of why "ai" is still a one (ok maybe a few) trick pony.

sailingcat · on June 13, 2018

Question: Who is jesus christ ?

Aristo's Answer: this lizard can walk on water

Confidence: 32.00%

Guess that explains it.

executesorder66 · on June 13, 2018

  Question: best way to make lots of money?

  Aristo is not sure about this one...

  Aristo's best guess: production, distribution, exhibition

  Confidence: 23.15%

V-2 · on June 14, 2018

Which of the following countries isn't located in Europe? (A) United Kingdom (B) Poland (C) Greece (D) Japan

Aristo is not sure about this one...

Aristo's best guess: (A) United Kingdom

Confidence: 8.22%

wslh · on June 13, 2018

Question: What is your name?

Aristo is not sure about this one...

Aristo's best guess: you

Confidence: 24.17%

V-2 · on June 13, 2018

Question: Which writing form is likely the longest?

A) article

B) essay

C) novel

D) letter

Aristo is not sure about this one...

Aristo's best guess: phonograph

Confidence: 18.71%

V-2 · on June 13, 2018

Which of the following species is not an animal?

A) frog

B) cow

C) oak

D) fly

Aristo is not sure about this one...

Aristo's best guess: HPO4 2-

Confidence: 24.85%

It sort of simply doesn't work, does it?

slx26 · on June 13, 2018

It will answer the second question correctly (though with very low confidence) if you use (A), (B), etc. instead of just A), B). Silly format error for the system, but yeah. To the first one it will answer "letter". But that's not really a science question, so it's not so surprising.

V-2 · on June 13, 2018

Good point about the format.

I'd argue that the distinction between a novel and an essay etc. could be classified as an "elementary school question", though.

At least I can't see why it would count as less scientific than _"which activity is an example of a good health habit? (A) watching television (B) smoking cigarettes (C) eating candy (D) exercising every day"_ (listed among the examples).

wiz21c · on June 13, 2018

>> Question: how to measure an angle ? >> Aristo's Answer: from the normal to the reflected ray

still pretty far from idealized AI...

creo · on June 13, 2018

What taste is the strongest? (A)Water (B)Sugar (C)Lemon Results in Water, Confidence: 34.43%

What tastes better? with same answers: Sugar, Confidence: 85.91%

Weird

chickenchaser · on June 12, 2018

Question: What is the airspeed velocity of an unladen swallow?

Aristo is not sure about this one...

Aristo's best guess: ...the same thing as speed, but similar

Confidence: 26.21%

rdlecler1 · on June 13, 2018

What’s interesting about these models is that they fail so spectacularly and it shows just how hard it is to do AI.

nmstoker · on June 12, 2018

It's not infallible but is pretty impressive to derive answers with a combination of distinct reasoners.

phyzome · on June 12, 2018

I'm not convinced that it's doing any better than just doing keyword searches for question and answer terms and taking the answer with the highest match percentage.

nolemurs · on June 13, 2018

Yeah, I'm pretty sure you're right. I've asked a dozen or so questions, of for every one of them I've gotten an answer that seems related to to the words in the question, but not in any logical way:

Q: What's the difference between a proton and a neutron?

A: Atoms are made of protons, neutrons, and electrons.

Q: What trajectory do planetary orbits follow?

A: Kepler's laws of planetary motion describe the orbits of objects about the Sun.

Q: How do you measure the charge of an electron?

A: Electrons have negative charge.

Q: What conservation law is the result of the time invariance of physics?

A: As a result, the law of conservation of energy has been changed into the Law of Conservation of Matter and Energy.

None with high confidence of course, but it gives you a sense of roughly the sort of 'reasoning' this thing is doing.

joe_the_user · on June 12, 2018

Indeed, I did several questions all intended to be simple variations of the main examples. It did not give a coherent-sounding to any of them.

But it looks like responds to the example with full paragraphs. Maybe it's real but coherent 10% of the time and they recorded the questions that yield coherent answers.

Qworg · on June 12, 2018

Removing brittleness is a key research area for reasoning systems like this.

joeyferris · on June 12, 2018

Making mistakes is the only way a computer can learn. It may not be infallible at the moment. But as time passes, the more accurate it will become.

philipov · on June 12, 2018

I think I'm gonna need to see a proof that the approximation series converges, and doesn't just wander the phase space forever, before I accept that premise.

OceanKing · on June 12, 2018

Question: What do humans eat?

Aristo is not sure about this one...

Aristo's best guess: Human beings will need food to eat.

Confidence: 18.43%

maxander · on June 13, 2018

To be fair, isn’t far behind the state of the art in nutritional research.

xyproto · on June 12, 2018

What are cats? Cats are down.

mcnnowak · on June 12, 2018

Question: What is the meaning of life?

Aristo's Answer: As of now, no other life in universe other than earth.

Confidence: 52.29%

bonyt · on June 12, 2018

Question: What is the meaning of life, the universe, and everything?

Aristo's Answer: 42

Confidence: 36.98%

sanlyx · on June 13, 2018

Yes! I shall design this computer for you. And I shall name it also unto you. And it shall be called . . . The Earth.

__bee · on June 13, 2018

I am a big fan of allenai :p