The important thing to note about SymPy Gamma is that it does only the mathematics part of WolframAlpha. It's also relatively new. There is no natural language input. There are no non-mathematical capabilities. The syntax should match Python syntax for the most part, though there are extensions to allow things like "sin x" or "x^2" or "2 x". All this will hopefully improve in the future (and pull requests are welcome!).
Most of the code was written by David Li (who is actually a high school student). You can watch a presentation about it here: http://conference.scipy.org/scipy2013/presentation_detail.ph.... It started out as a "because we can" toy, and it's gotten much better.
The real benefit of SymPy Gamma over WolframAlpha is that there are no barriers around it, since it's entirely (BSD) open source. For example, if you start computing something interesting and want to try more, you can move to SymPy Live (http://live.sympy.org/) and compute in a more session like environment. Or you can use SymPy locally on your own computer.
Regarding the comments that wolfram is mostly used for play, I'm not so sure about it. Wolfram is invaluable to students as a calculator. Sure Google can compute 100 * pi, but it falls apart when you try to compute integrate(sin(x) * x, x). When I was in college (which was last year), I saw people use it all the time. It's been very successful in making computer algebra accessible to virtually everyone.
By the way, probably the best feature of SymPy Gamma right now is the integration steps. See for instance the "integral steps" section of http://www.sympygamma.com/input/?i=integrate%28sin%28x%29*x%.... This is a feature that used to be free at WolframAlpha, and it's extremely useful if you are learning integration in calculus. It doesn't work for all integrals, because not all integrals are computed the way you would by hand.
I used to work at Wolfram Research on Wolfram Alpha's backend, and one of the most challenging technical problems we faced was free-form input. Although W|A has by no means perfected this, this is hardly an alternative - queries like "factor the number 100" fail because there is only the beginnings of a free-form transformer. Obviously, the usefulness of being able to answer "factor one hundred" is questionable, but W|A solved it out of Stephen Wolfram's aspiration to be able to "compute everything". Right now, this is just a programming language you can run on the web.
The other thing is that W|A has trillions of data points and what I am only allowed to describe as the beginnings of a semantic network for inferring relations between them. It was a vastly overcomplicated system that was difficult to work with, so I am quite confident that some day there will be an open-source alternative that anyone can contribute to.
The other thing to take into account is how this affects the future. Wolfram Research thought they were going to "disrupt the calculator" (I heard this ridiculous statement once at a meeting). In reality, Wolfram Alpha queries are more often for the sake of fun than for the sake of discovery (I know this because there was a big TV in the break room that would keep displaying things that people searched on Wolfram Alpha). Is it really that useful to be able to have a computer give you an answer to "I have two apples, Jill has three apples. How many apples do we both have?"
Or is it more useful to make something that can take in symptoms of your current ailment and tell you which disease you are most likely to have? Wolfram Alpha does this as well.
Although the results are difficult to interpret. In my time at Wolfram Research, I was certainly convinced by the idea of knowledge engines and their ultimate emergence, but I think the way this will be accomplished is in more in a Google-esque fashion where their knowledge engine results are displayed alongside a real search algorithm. Best of luck to the people on this project, I hope you make the first step into creating an open source knowledge engine.
Doesn't work, even though Wolfram Alpha knows what mangoes are enough to give me the nutritional value of three of them as its response instead.
'Oranges' as the noun works quickly. 'Pens' doesn't work at all, so it's hardly surprising that 'doowats' also fails. Surprisingly, even 'pears' is a fruit too far.
On the other hand, "I have two apples and one orange, Jill has three oranges. How many oranges do we both have?" works very well.
So I have to ask, is this just a trick? I mean, did you program it to handle apples and oranges specifically, without attempting to do any sort of semantic comprehension? Because this is a big failing of Wolfram Alpha for me. It 'knows' things but it doesn't know them. It knows that a mango has 84 calories but not that it's countable. Or perhaps it does know that and that knowledge just doesn't propagate to more complex queries.
Yeah, it doesn't appear to be very advanced. I tried:
I have two mangoes, Jill has three mangoes. Alan has
three oranges, and I have two apples and a carrot. How
many mangoes do we have? How many fruits do we all have?
How long can we survive?
One would think it did some basic parsing, and managed to file away three oranges and two apples in a parse tree of some kind -- and if those were associated with "fruits" -- it should be able to answer?
And if it were able to answer, one might be able to have it hand out advice on diets ("Give me 30 examples of a 3000 calorie diet featuring no red meat") and a lot of other things that would be "easy" to answer based on ingesting some pretty standard databases.
I see a couple issues with handling your "30 examples of 3000 calorie diet with no red meat" query. One is the complexity of finding a subset of all the food items in the Wolfram Alpha database that add up to around 3000, you could certainly make an algorithm to do this but all queries time out after a few seconds so it is likely to fail. The other is that it would have to understand which foods constitute red meat, which it doesn't (try searching "red meat food", the only red meat it knows is a movie by that name). Even if it did, you would need to categorize all foods so the query worked with "3000 calorie diet with no vegetables" and so on. Obviously if it can solve word problems about apples but can't solve the same problems about mangoes, computation like this is far beyond what they will accomplish in many years.
Oh, yes, that might have been the "long way 'round" of doing it. I was alluding at crawling a recipe database (with calories per meal), and filtering out those with red meat, then doing a random selection. Not very hard at all.
1. Word problem solving is shoddy at best. There is a lot of functionality of Wolfram Alpha that exists for that minuscule chance that somebody will actually query it, so queries matching "[person has object]*, what is the total" is easily understood but it fails when the database doesn't have "mango" or the plural "mangoes" tagged as an object. This is largely due to poor database design decisions made in the past, but I know that this will improve in the future based on the project I worked on in improving the standard data format.
2. Some functionality is literally only for the sake of demonstration. I remember one time when Stephen was demonstrating a variety of cool queries into Wolfram Alpha during the yearly all-staff meeting, and when I got back to my desk I tried all the same queries with slight variations and almost all of them failed.
3. Mathematica has powerful string manipulation and regex matching functionality, to the point where lazy engineers are easily tempted to join the dark side. So it is very possible that this particular word problem only works because of a regex match. I know it sounds crazy, but I honestly wouldn't be surprised if some lazy engineer added in a literal match for "([subject] (have|has) [number] (apple|apples|orange|pear|peach|peaches))+", fed it into a simple extraction function, and output the result.
"In reality, Wolfram Alpha queries are more often for the sake of fun than for the sake of discovery"
In advanced math classes (upper undergraduate or graduate) it is almost impossible to check results or do a complex operation on a simple calculator.
In many cases I have to turn to a tool like Mathematica/Wolfram Alpha. I have a W|A Pro account and it has worked wonders for me. For example, entering "integral from 0 to infinity of (ye^(-y)((-1/y)(e^(-t))+1/y)) with respect to y" into Wolfram Alpha is so much easier than doing the same with a TI-89. I can copy/paste and adjust very easily and the software produces multiple interpretations/representations which is very useful.
Your comment reminded me of a very interesting observation I had when I was looking at Wolfram Alpha analytics - for some reason, the number of queries drops dramatically around mid-December, and only returns to the original volume around mid-January. I was puzzled by how this phenomenon repeated itself every year.
It took me a minute to realize that during that time, every college student is on vacation.
I use IPython/SymPy for the same purpose. It is nice to be able to do these calculations on my local machine, in stead of relying on an internet conection. Plus I get the powers of Python to back me up if I want to do anything more compilcated.
It's not nothing, however, that Wolfram Alpha works spectacularly well on a phone. The mobile app even has a very useful keyboard that has common math symbols. With mobile data, at any given time, I'm much more likely to have a computing device with an internet connection than I am to have one with a useable programming keyboard.
Not that I'm against the progression of these tools, far from it, I just think that wolfram alpha does a better job at math homework type problems for most people most of the time. If you have more complicated modeling/statistical/etc. work, then bring out the bigger guns.
I used the Wolfram Alpha iPad app in real-time during mathematics / statistics lectures to check the notes on the blackboard and to try out alternatives. Very useful in the classroom.
I see that some professors (not in my school though) publish they’re lecture notes as IPython notebooks. That is super clever and allows interactivity with the lecture material. I don’t know the mobile/tablet support of IPython notebook however.
the first step into creating an open source knowledge engine
I'm working on this too, but I'm tackling it from the non-math side (ie, NLP+Knowledge Graph).
one of the most challenging technical problems we faced was free-form input
Yes, it's a horrible problem(!) I'm using Quepy[1] (which in turn uses NLTK), and it does a decent job. It's still not automatically general purpose (you need to write code to map classes of queries), but it can handle questions like "Who directed The Social Network"[2].
The other thing is that W|A has trillions of data points and what I am only allowed to describe as the beginnings of a semantic network for inferring relations between them. It was a vastly overcomplicated system that was difficult to work with, so I am quite confident that some day there will be an open-source alternative that anyone can contribute to.
This is interesting to me (for obvious reasons).
Can you expand on what made it so complicated (I assume beyond the standard RDF-style inference engines)?
I've also been working with Quepy a bit lately. Very cool stuff. Are you able to comment at all on what you're working on, or is it "super secret stealth mode" stuff?
For us, we already do semantic concept extraction using Apache Stanbol, against content that flows into our enterprise social network product, and then store the associated triples in an RDF triplestore. We have a primitive search feature exposed, which lets you query using SPARQL, but realistically, we know "normals" will never, ever, ever, ever write SPARQL queries, so the big push is to do automated translation from natural language (even if it's a slightly restricted natural language) into SPARQL so users don't have to think about triples and what-not.
If you're not in super-secret stealth mode and ever want to compare notes to talk about this stuff offline, feel free to shoot me an email.
Not especially super-stealth, but it's not really ready for public consumption.
Basically I've been glueing lots of pieces pre-existing software together, sticking webservice front ends on them and making them work together. It's all Dockerfied so things can be run separately.
For us, we already do semantic concept extraction using Apache Stanbol, against content that flows into our enterprise social network product, and then store the associated triples in an RDF triplestore. We have a primitive search feature exposed, which lets you query using SPARQL, but realistically, we know "normals" will never, ever, ever, ever write SPARQL queries, so the big push is to do automated translation from natural language (even if it's a slightly restricted natural language) into SPARQL so users don't have to think about triples and what-not.
Very similar here.
I'm (currently) using DBPedia dumps, loaded into Jena. I'm experimenting with content extraction (for eg CIA Factbook).
Nice, sounds like we're using a very similar stack. We are using Jena as our triplestore, but we don't touch the dbpedia triples directly, but rely on Stanbol to do the entity extraction processing for us. And we're also starting down the path of using Quepy.
Jena here too. 14G of data total. Total Triples: 124,294,115 (SELECT (COUNT(*) AS ?no) { ?s ?p ?o })
I believe that's actually pretty big for a triplestore. Seems to work ok, but loading is pretty slow.
I'm contemplating switching to YAGO2[1] or Freebase, but I think I'd be better served doing entity extraction myself (DBPedia & YAGO tend to be out of date).
One of the main reasons things why the system was so complex was because it had literally been built from scratch using a weakly typed language that has no support for data structures or object-oriented programming, and had absolutely horrendous error handling (any Mathematica user can attest to this). I do not believe Mathematica was the appropriate tool for a project as large as Wolfram Alpha, and obviously the performance hit from an interpreted language like Mathematica is very significant when writing computation at the scale of a search engine (or "knowledge engine" as the engineers around me were quick to correct).
One great decision the Wolfram Alpha people made was to put together an excellent set of internal documentation on how to add new parsing capabilities to the language. So suppose you were tasked with adding queries about something like pregnancy data, you would just write a fairly straightforward module that would capture queries like "I am 6 months pregnant" and return a list of pods (a pod is the computed interpretation of your query, most Wolfram Alpha queries will return at least 5 of them). For pregnancy data, there is a pod that shows you how big the fetus should be, another for how much amniotic fluid there is, and so on. You would then write some Mathematica code to either scrape a website with pregnancy data or integrate with some data set that was curated by a data curator. This is not difficult to do, and I know of several WA-like projects that have accomplished this already. The problem with this is that data gets siloed, and data curators have a weak standard for how data and its relations should be expressed.
This leads to difficulty arising when you are tasked with handling a complex query like "Which country has the greatest ratio of population to GDP?" Now you're talking about interoperability between two data sets, and although it can be done quite easily using Mathematica's CountryData function:
... it is nearly impossible to handle these kinds of situations for general queries that could ask about ratios of anything. A possible solution was for some time to make "ratio of population to GDP" a column in the database table, but ostensibly this leads to an exponential explosion of columns if you are trying to answer general queries.
By the time I had joined the Alpha team (after working for 2 years on Mathematica) they were already moving some of their most poorly designed data sets into a much better system that used a more rigid set of standards for describing things, places, concepts, relations, etc. I wish I could elaborate more on this because it was really very cool technology running in the background, but Wolfram Research has a real track record of suing people who violate their NDA (Matthew Cook). What I can say is that it fixed some absolutely ridiculous database design decisions - for example, in one table storing athlete performance, there were multiple rows for athletes who played multiple years where the name would be BabeRuth1942, BabeRuth1943, BabeRuth1944, and so on. It was then up to the developer to know that the name and year need to be separated, and that their code needs to handle athletes who play one year and multiple years separately.
tl;dr: Don't over-glorify Wolfram Alpha - it gets things done, but the poor performance and unpredictable results are caused by bad planning and poor organization within. If I was going to make my own knowledge engine, I would spend a long time drawing up an incredibly detailed schema about how every single thing would be represented and how a developer would write a new module for it before writing a single line of code. These are the lessons gleaned from spending two and a half years wallowing in a Big Ball of Mud (http://laputan.org/mud/).
So it used a traditional database for storage? I've gone down the triplestore route, with some trepidation. Working out okish so far, although I wish there were better resources around on SPARQL.
>In reality, Wolfram Alpha queries are more often for the sake of fun than for the sake of discovery
I would expect the significance of queries on any free engine to adhere to a steep Pareto distribution. For what it's worth I, as an algebraic thinker, find Wolfram Alpha's symbolic manipulation and easy syntax refreshingly useful.
The symptom-to-ailment mapping is actually a quite complex problem that neither Google nor W|A have thoroughly solved, but Google provides results in a much easier-to-read format.
What language does WA use? Is the natural language interpretation run completely by Mathematica? Also, would you recommend someone interested in making a natural language project look into using Mathematica for that?
A large part of the natural language processing is indeed done by Mathematica, and last time I looked I believe there were about 15 million lines of Mathematica code in the main repository. Note that this massive number is largely the result of the multitude of Mathematica scripts used to insert raw data and relations into the database. Just based on glancing at folder sizes, I'd estimate that around 40% of the repository is code for parsing, so around 6 million lines of Mathematica code.
Note that lines of Mathematica code tend to do a lot of processing, so this would be the equivalent of many times more lines in another language. It is quite an interesting process hooking in a new feature to Wolfram Alpha, and some developers described it as the "mud bowl" because when you break things, you just had to throw more mud at it.
I'm not allowed to disclose details about the technology stack, but I can say that the database querying functionality was kept separate from the actual parsing and semantic analysis, and was implemented in a different language.
If you're interested in NLP, which by the way is a wonderful and exciting field with mysteries abound, Mathematica is indeed a great way to get started quickly. Although I recommend to everyone to do their work in an open-source-able way with a popular language like Python or Java, I built a Swahili translator during my freshman year of college with Mathematica. Here it is on github:
> I'm not allowed to disclose details about the technology stack, but I can say that the database querying functionality was kept separate from the actual parsing and semantic analysis, and was implemented in a different language.
I'm definitely curious about this. Are you allowed to disclose which language?
" In reality, Wolfram Alpha queries are more often for the sake of fun than for the sake of discovery"
For me, this is because WA starts asking me for money when I try to use it for useful things. I suspect an open-source version will have different numbers.
When I use Wolfram Alpha, I am always terribly frustrated by the natural language input. This "natural language" only works when the exact query you put in was anticipated by an engineer at Wolfram. I'd rather have auto complete where I can type "least common multiple" and it shows me that the function is lcm(n,k).
> but I think the way this will be accomplished is in more in a Google-esque fashion where their knowledge engine results are displayed alongside a real search algorithm.
http://www.goofram.com/ does exactly that, albeit somewhat crudely.
You can get some unexpectedly interesting stuff on the Wolfram Alpha side of things, though usually it's irrelevant unless you're "thinking" in Wolfram query mode.
Thank you for sharing. It's interesting (and perhaps obvious in hindsight) that Wolfram has great math in the background, but struggles with the freeform interface.
I'm not entirely sure how this works as an alternative to Wolfram Alpha. Much of the value I see in Wolfram Alpha comes from it's highly curated data set and ability to parse natural language into a useful mathematical representation. It brings curated constants and mathematical equations, graphs, and simulations to the general public. On the other hand, this really seems like the SymPy interpreter in a browser.
I think this would do much better without the comparison to Wolfram Alpha.
Many many many years ago, I wanted to use Mathematica, but I didn't like the language (just the algebra system, plotting, etc).
I wrote a python bridge, which was actually pretty cool. It's probably the neatest, cleanest, most CS-y code I've written (it converted Python objects to Mathematica objects over MathLink. It integrated with Numeric Python.
I typed "12 c in f", a very simple example of the sort of thing I use wolfram alpha (usually by way of duck duck go) for most often. It choked. I bailed.
I'm a Linux user. I have bc and units installed. I even have some shell script wrappers to make those utilities actually helpful for casual use. I can open a terminal and calculate expressions and convert units...so long as I ask nicely. The big win for W|A is that it doesn't require me to ask nicely. This is helpful for quick 'n' dirty queries as well as for queries where the work required isn't in doing the calculation so much as reducing the query into a simple expression in the first place.
In other words, simpygamma solves a problem that by and large doesn't exist.
I wonder if any of the stuff developed for Gamma will trickle down (/up?) to ipython. Frankly it is bit surprising that they seem to have developed a new web-interactive system for (augmented) python instead of leveraging the ipython framework.
Wolfram|Alpha does take boolean, but in the notation that Mathematica likes (&&, ||, !, etc). You can also write it out in plain english (NOT A OR B, etc).
I used it to simplify some kmaps for class the other day and it's very nice: prints out a truth table and various types of minimal forms.
"Conventional" boolean operators work as well. Again, I suspect that it depends on whether a developer has mapped something ToExpression can consume, either via forms, boxes, or unlocking the function itself. (This would be consistent with the "throw more mud at it" comment.)
As a Mathematica and WApro user, I find most of the utility in not having to import random datasets myself. As every researcher, I have a disgusting library of scripts that often involve curl, groovy, awk, sed, etc., to pull info into mma. It's nice when that becomes SEP[1].
On Chrome 33.0.1750.46, the up-down caret next to the topic headings doesn't respond to a click, even though the cursor indicates that it should. Clicking on the topic heading itself or even around the caret works, but the caret seems the most natural target to me.
The important thing to note about SymPy Gamma is that it does only the mathematics part of WolframAlpha. It's also relatively new. There is no natural language input. There are no non-mathematical capabilities. The syntax should match Python syntax for the most part, though there are extensions to allow things like "sin x" or "x^2" or "2 x". All this will hopefully improve in the future (and pull requests are welcome!).
Most of the code was written by David Li (who is actually a high school student). You can watch a presentation about it here: http://conference.scipy.org/scipy2013/presentation_detail.ph.... It started out as a "because we can" toy, and it's gotten much better.
The real benefit of SymPy Gamma over WolframAlpha is that there are no barriers around it, since it's entirely (BSD) open source. For example, if you start computing something interesting and want to try more, you can move to SymPy Live (http://live.sympy.org/) and compute in a more session like environment. Or you can use SymPy locally on your own computer.
Regarding the comments that wolfram is mostly used for play, I'm not so sure about it. Wolfram is invaluable to students as a calculator. Sure Google can compute 100 * pi, but it falls apart when you try to compute integrate(sin(x) * x, x). When I was in college (which was last year), I saw people use it all the time. It's been very successful in making computer algebra accessible to virtually everyone.
By the way, probably the best feature of SymPy Gamma right now is the integration steps. See for instance the "integral steps" section of http://www.sympygamma.com/input/?i=integrate%28sin%28x%29*x%.... This is a feature that used to be free at WolframAlpha, and it's extremely useful if you are learning integration in calculus. It doesn't work for all integrals, because not all integrals are computed the way you would by hand.