Hacker News new | past | comments | ask | show | jobs | submit login
Tableau acquires ClearGraph, a data analysis startup using natural language (techcrunch.com)
112 points by dgudkov on Aug 10, 2017 | hide | past | favorite | 31 comments



I've been working on a (soon to be) open source version of this, I didn't realize there was a real business version out there. So far it works great alongside Salesforce e.g. "Find me appointments in San Mateo today set by Jason Jones".


Nice, do you have any details already? Library / Language? Can't wait to see some actual efforts in open source in this space.


It's based on seq2seq translation models, where instead of translating from one human language to another, it translates a human language to a command language: "Find appointments in San Mateo" -> "find($type=appointments, $location=San Mateo)". The code that does that parsing (PyTorch based) is at https://github.com/spro/RARNN

The missing piece is generalizing the training process to fit other people's schemas; generating training data based on models, attributes, and relationships.


This is very interesting! How far does the 'missing generalization' impact go? I.e. your Nalgene grammar file (nice reference, didn't know about Nalgene) is used to generate both flat input strings and nested desired outputs, which you use to train your network on (if I understand correctly). This file seems to contain quite a lot of hand-written varieties like "please/plz/plox/...". After training the network, I assume it is capable to also handle inputs not seen in the training data? Like someone writing "please?!11"? If not, I don't really understand why'd you'd train a network in the first place: you have put in the effort to create a grammar, so might as well use that one to use the actual conversion to a tree, no?

Totally not trying to be negative here, just trying to understand your workflow a bit better.


The words are turned into GloVe word vectors on the input side, so it is able to handle a decent amount of variation in spelling and using synonyms. Having synonyms defined in the Nalgene file helps the network to accept vectors in a general region rather than an exact point in space. It also encourages the network to learn about the grammar of the input rather than use of specific words, so it can handle words that it hasn't seen before (good for names, places).


Gotcha! Thanks for the explanation!


Saw a pretty sweet demo of Tableau's home-grown prototype at their annual conference last November. It was surprisingly useful to be able to just speak "show me all of the 3-bedroom homes in the downtown Seattle area less than $400,000".

It was slow, but effective. I kept feeling myself wanting to click around for the first few minutes but quickly realized I didn't need to.

I did have to speak in away that the NLP engine could understand (i.e. "four-hundred thousand dollars" instead of "four-hundred k") so it still feels like I'm building a SQL query with my voice instead of just speaking an idea and the software figures out what I mean (hard problem to solve, I know!)


> "show me all of the 3-bedroom homes in the downtown Seattle area less than $400,000"

Ha, that's an easy one. Empty result set!


Is there an open source NLP engine out there? I've been trying to learn this area and there are so many "pot holes" and wrong paths ... I've looked at OWL/Sparql, Graph DBs, logic programming, rule based systems. I feel like I'm dancing around the real topic and I don't know what "it" is :'(


I was playing with this around for a weekend or two. So my knowledge is not exhaustive on that matter but it all boiled down to having a good OLAP-ish data source in the first place.

- You can do the Named Entity Tagging based on the categorical data (e.g. columns that are Text/Strings with low-ish relative cardinality would make good candidates to filter out text fields with for example email addresses (which shouldn't be in a DWH in the first place as categoricals))

- FLOATs/decimals/Integers would be good candidates for values that somebody looks for (and the name of the column would be the 'trigger' of the query.

All in all, with a bit of logic, good OLAP design and a lot of up front configuration I got in a weekends time to answer basic questions like 'revenue in the US in 2016' using NLTK back in the day. Today I would probably give spaCy a try as NLP engine.


NLTK in Python - it even has a basic example showing natural language to SQL translation that is pretty cool. Simplistic, but a good starting point for learning!

http://www.nltk.org/book/ch10.html


.NET has some speech synthesis and listener libraries. They work pretty well; I built a modestly-functional chat bot once with them. Not sure about the overall .NET licensing arrangement, but I heard it was moving towards open source.

Though they feel abandoned, and there hasn't been much recent activity around them. Microsoft probably has all speech engineers working on Cortana instead. (Though I'd be surprised if she's not using .NET at some level.)


> Microsoft probably has all speech engineers working on Cortana instead

Microsoft cognitive services


The closest I've gotten to something useful is NLTK. It's really great and really powerful and there is plenty of documentation and how-to guides.


If you are playing in the SPARQL space take a look at http://quepy.machinalis.com/

But NLP based question answering is an unsolved problem and the best way to approach it is ensemble approaches.


There's a lot of movement in this space at the moment. I'm aware of the following players:

- Veezoo www.veezoo.com

- Wizdee www.wizdee.com

- Kueri www.kueri.me


Pokemon or Data company


This has been in production with Power BI + Cortana for about a year now.


Is it any good?


My experience with this is it's more of a gimmick. It's cool when it works but most of the time I've found it simpler and more accurate to just select the data you want.


Mostly agree with Baconner.

It does help with discovery if configured well. 'Configured well' is a fairly high cost, so only worthwhile for a fairly simple data model that will have a large number of consumers.

Typically, there's not much of a population of "people who are unfamiliar with the dataset, but need to ask questions of it".


I do agree. It can work and when you've put the time into designing it to work then it can feel fairly magical, but there's a point at which you feel like you're almost pre-creating all the queries for users.


This is a great combination IMO. I spent some time researching various BI data analytics services and I was impressed with some of the newer ones like Tableau. This seems like essential tech for any medium-large company.

Dashboards and visualizations that can be easily composed with a natural language interface... it makes a lot of sense. Especially when combined with alerting services and/or chatbot-esque interfaces for automating workflows.


Hmm - I hadn't thought about the dashboard / data viz creation side of this. Not sure that I'd want to be sitting in my cube creating dashboards "out loud".

"Okay, let's bring in 'Sales' to the rows card.

"Nah, I don't like that. Move 'Sales' to the columns card.

"Hmm, that doesn't work either. Put 'Sales' back to the rows card but add 'Profitability Indicator" to the details section.

"Crap. Still not working. Let's start over."

Imagine 10 analysts in a room all talking like this :)


Well, anything you can say you can also type out.


this here. I've spent the last couple of years working on a similar data discovery style product and after a lot of playing around with concepts I think semi-natural language descriptions typed and also generated based on your manual data selection can be really useful.

If I'm speaking i have to finish the whole thought and deal with excluding all my "uuhms" and half thoughts. If I'm typing i can intellisense prompt for relevant things. Correlate Sales with _[Discounts, ...]. I think terse natural langage descriptions of data views are really useful aside from voice.

Incidentally nothing like trying to play around with this stuff to make you super self conscious about uuh how you speak.


True enough. I had the speak-out-loud in natural language demo in my mind, I guess! :)


Curious to see how the industry will play out.

Seems like more analytics and BI companies will have to incorporate NLP if they want to compete.

Currently working on a similar project

- www.askned.com


You mean another player like the company you are a VP of marketing?


Absolutely, It's a unique opportunity in the space, it feels like everyone is racing to market - next 6 months will be interesting to watch.


Nice edit to your comment.

Originally, claudfuen wrote something like:

"Wonder if we'll see more acquisitions this space, like askned.com."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: