Ask HN: Review my project, Factolex - the fact lexicon

matt1 · on Jan 27, 2009

Some thoughts:

Maybe write a script that goes through the web, starting with Wikipedia, and intelligently extracts facts from sentences. From there, pass it along to Mechanical Turk folks to gauge whether or not it makes sense. You could cheaply populate your database with all sorts of relevant information this way.

Maybe focus on a particular field at first, such as tech industry or the stock market or... whatever. Once you've got a grip on that, start expanding to others. I'm not suggesting you stop gather other types of facts, but I think right now its better to focus than spread yourself too thin. I think your visitors rather have your site tell them a lot about one topic than a little about a lot.

It's a small detail, but capitalizing the first letter of the descriptions would make it look sharper. Also, on my browser, IE7, the "More information" link drops down below the "Welcome" link. I think you could do better with a different, unique color scheme which people would associate with Factolex. (At first I thought HN was ugly, but now I wouldn't have it any other way.)

Overall, very well done. I could see a lot of search engine traffic getting directed to Factolex someday down the road.

akirk · on Jan 27, 2009

Factolex.com is a fact lexicon. We split up knowledge into small sentences: facts

You can use the checkboxes to remember the facts that you find relevant and by doing so, we create your personal lexicon (which can be exported using the API or a widget). Also, the more people select a fact as relevant to them, the higher it is ranked within the term.

I'd love to get some feedback from the HN community. Thanks for checking it out.

glazz · on Jan 27, 2009

very interesting. Could you tell me how its made?

akirk · on Jan 27, 2009

what do you mean? like what it is built with (i.e. programming language, server etc)?

glazz · on Jan 27, 2009

like the idea of project. how was definitions obtained?

akirk · on Jan 27, 2009

The idea of the project is a new approach into collaborating on knowledge, by splitting the knowledge into smaller parts, and then being able to handle them in an easier way.

Most of the definitions come from Wikipedia. We started off trying to have people enter facts manually, but then I figured that we should leverage all the knowledge of Wikipedia that is freely available.

So I wrote a bot that is not particularly clever, but most of the times good enough for fetching facts from Wikipedia.

By the way, it can also extract facts from the German Wikipedia (there is a German version of Factolex as well: http://de.factolex.com/ as my mother tongue is actually German).

avinashv · on Jan 27, 2009

Very clever--it's a nice way to integrate fact-searching across many different websites. The interface is effective and doesn't get in the way. I like the branding you've created for yourself as well. The avatar of the Factobot incorporates the logo and that awesome chimpy-looking bot really well.

I like how you let me add to my personal lexicon without having to register, but will that be saved, and be carried over to any account I create if and when I do?

Few nags:

On the tab bar, if "Home" has a bar on the very left, shouldn't FAQ have one on the very right?

On the "ongoing votings" page, it's not immediately clear what I am supposed to do, and the yellow hint box towards the right says the same thing it does when I search for something--hit a checkbox. There aren't any checkboxes. This could be confusing for some.

Great job overall, I see myself using this.

akirk · on Jan 27, 2009

Thanks for the kind words.

Your personal lexicon survives your registration (of course :) For the navigation bar: we tried it, but settled for leaving the bar out (as the menu actually continues on the far right). A matter of taste, I suppose. The ongoing votings section is still sort of under construction, thanks for bringing that up. There is quite some room for improvement there ;)

zain · on Jan 27, 2009

What's the difference between this and searching for "define: word" on google?

http://www.google.com/search?hl=en&safe=off&q=define...

gsmaverick · on Jan 27, 2009

Personalization!

Jakob · on Jan 27, 2009

Excellent, should be part of Google! (e.g. "fact:Hackernews")

On index pages no inline links are visible. e.g. "Illusion a Polish band founded in 1992 in Gdańsk, Poland and defunct since 1999"

You can only click "Gdańsk" if you’re on the single fact page.

akirk · on Jan 27, 2009

As a matter of fact, there is a Greasemonkey extension for Firefox that achieves integration into Google search: http://userscripts.org/scripts/show/32352

kbrower · on Jan 27, 2009

Does your grabber get things from urban dictionary?

akirk · on Jan 27, 2009

Not yet, but I am not actually sure what to use from there. There are just so many definitions that have weird up vs. down ratios that I wouldn't dare to let a program decide what is legitimate or not.

rw · on Jan 27, 2009

Suggestion: include the Cyc database.

akirk · on Jan 27, 2009

Thanks for the idea. I am not sure if I can extract useful facts from it, but I can definitely see this helping for "see also" purposes.

Integrating with the semantic web is a natural further step for Factolex.

gojomo · on Jan 27, 2009

I've looked at the OpenCyc dataset for a similar purpose. It does not easily yield text that is reader-friendly.

jgilliam · on Jan 27, 2009

one simple idea: put a big number in the header or on the homepage with how many terms you have.

gojomo · on Jan 29, 2009

This is somewhat like an idea I've been considering. So I think it's a great concept... and have strong opinions on possible directions.

If your primary model is a ranked listing of 'facts' by a major 'term' key, it will be hard to outperform Wikipedia (or even Mahalo). When those sites' single-topic articles/pages are well-written, they already lead with the core facts, and then proceed through the rest in a well-organized fashion. Even the Google 'snippets' in natural search hits then turn out to be pretty strong for anwering people's questions/queries. So I think to differentiate you need to break out of that linear model somehow.

The multiple licenses situation is confusing and may prevent you getting proper credit for openness.

You may hope voting and reputation will be mechanisms for gradual quality improvements, but they often backfire. Any benefit from people seeking to 'climb' for the right reasons can be offset by gaming.

It's unavoidable you'd need to use some bulk automated processes (like scraping Wikipedia or hired contributors) to bootstrap, but that may then undermine the organic growth of 'community spirit' and norms that will be essential for the long term. The right balance will be hard to find.

Good luck!