My Django hosting service. I looked at Heroku and thought, "I want that for Python." Yesterday, I got most of the http request path finished. There is some node.js work left, and I still need to do some more work on Varnish. I'm hoping to get to the Postgresql stuff next week, then on to the website and API...
I'm working on a web-based pixel perfect mockup tool called jMockups, which is built on top of HTML5's canvas element. Most web designers use Photoshop to do this right now, but Photoshop makes it much harder than it should be (the UX, lack of common HTML elements, difficult to share, etc).
An early alpha version will be available in a week or two. If you're interested in helping test it, shoot me an email: matthew.h.mazur@gmail.com or leave a comment below.
My ongoing quest to make money predicting horse races. So far I've not made a dime, and don't really expect to, but the faint hope of future monetary rewards keeps me going.
The real payoff has been that in the process I've learned all sorts of things: about Python, data mining, working with large datasets, machine learning...
I didn't implement the machine learning algorithms for myself, because there are some really good packages out there and I know I don't have the smarts to better them.
Keep in mind that I didn't really have any success:
There seem to be two main ML packages, Weka and Orange. I personally preferred Orange, it has a nice graph-based UI for linking various components together; when you've figured that out it can script in Python. Also Orange makes it easy to test your data set against various different learning systems, and compare the performance. Standard testing procedures like n-fold cross-validation are built-in and really simple to use.
Also you need data. I'm pretty sure more is always better. I actually started with greyhounds* and skimmed mine (in Python use BeautifulSoup) from a website. I tried to come up with various statistics about the recent performance of the dogs. Unfortunately nothing I tried made the ML algorithms predict better than a random choice. A friend who's into gambling suggested greyhound racing was quite random by nature, so I've switched to horses recently. I'm still building that dataset, now trying out MongoDB just for fun.
I think the trouble is that you can have as much raw data as you like, but generating the predictive statistics requires a lot of knowledge of the problem domain. I'm not actually into gambling at all so I don't know if the track conditions are important, how much breeding or the age of the animal really matters etc... This made it hard to pick likely stats (and rebuilding datasets and retraining learners can take some time).
For horses there's a lot more information in forums and racing guides etc, so I'd start with horses. Just make sure you've tested your predictions with pretend bets before you commit any real money :)
Good luck!
*I began with greyhounds because of a dissertation posted on reddit where the authors suggested they'd had some success with a neural network and gave quite a lot of detail. That piqued my curiosity, and my initial version just re-implemented their work.
Yeah I hated using weka at uni. I'll look into Orange.
"I don't know if the track conditions are important, how much breeding or the age of the animal really matters etc."
Yeah, feature selection is a tough one. I'd thought that the system would pick up on good indicators by itself, but it might well be that that has to be a manual decision.
"Just make sure you've tested your predictions with pretend bets before you commit any real money"
haha, yeah absolutely. My plan was to train/test until the accuracy seemed good enough (using monte carlo) and then run the system on live data with pretend money for a few months to see what the actual performance is like, before actually investing real cash.
Do you have a link to the greyhound topic? I searched on google but couldn't find it.
I don't have a problem so much with having myriad statistics and picking the right ones, but not knowing which stats to generate in the first place from my database of results.
For example, I assume that a dogs past performance must be some indicator of its chances in the next race, but how do I account for the chances of dogs who didn't complete their last race? What weighting is the last race worth, compared to the ones before (perhaps it had a bad race, but on the whole is running well).
I just don't know how to optimise for those sort of things. I have a rough idea that some combination of genetic programming and GA could help - it would be an interesting challenge to builds software that knew how to apply a selection of mathematical functions to my data, and then breed the results like a GA. But it's tricky, I'd have thought.
I've been treating the ML classifiers and learners as something of a black box, perhaps a more rigorous approach is required.
"software that knew how to apply a selection of mathematical functions to my data, and then breed the results like a GA"
yeah, I'd envisaged using the accuracy of the neural net as the fitness function for a GA that mutates input parameters. It's another layer of complexity, and I've no clue how you'd start, but it seems like it would work.
In other words - use a GA to select features, using how well the NN trained on that set of features performs as the fitness function.
Why did re-implementing the dissertation for greyhounds not work? Was the dissertation flawed?
If you're putting this much effort into it, why not stop by a horse racing track a few times and pick up some domain knowledge? Maybe you could even talk to race horse owners, jockeys, breeders?
I wonder if you could turn this into a product for breeders? Or maybe for people buying/selling race horses? Or people hiring Jockeys, or even marketing an offshoot of this to the gamblers? Just some wild thoughts.
I've heard that the easiest way of predicting greyhound racing is to ignore the form book and monitor the odds changes following bets being placed at the very last minute by those with insider information...
Here's a random greyhound racing tip which a man in a bar told me, so it must be true: before the race, leave it as long as possible to bet and watch the dogs. The one that is quivering and dancing and looks must wound up generally wins. Wait until you get a race where only one dog looks that way.
You're welcome. One of the things I'm looking at now is adapting ranking systems from other sports or competitions. For example I know that the Elo system from chess has been applied to other sports (I don't know the details, though, or what success they had)
I met a guy (I can give more details offline) who worked for an international gambling syndicate based in HK. Software written by teams in Shenzhen running on servers in HK fed by data from handicappers in Australia directed bettors to place bets at tracks all over the world. They were making something like 15-20% returns. As you can imagine, he had some crazy stories. At the time I was thinking about something similar (not horses) but eventually moved on to more respectable things (grad school.)
Journalism, with a focus on early-stage technology companies. Entrepreneurs are doing amazing things with small teams and relatively little capital, but the stories and lessons seem mostly lost in the stream of tech news about iPhones, the oldest Twitter user, or Facebook's privacy policies. I think there's a small but significant market for passionate, well-researched, educational content like this. I'll find out soon enough.
A simple version 1 comes out Sunday. I'll post it to HN then. The focus is on companies no older than two years, and generally on those with consumer-oriented products. But beyond that, they could be funded or bootstrapped, located in Silicon Valley or Pittsburgh or anywhere else, and led by startup veterans or total newcomers. Amazing stuff is happening everywhere; you just need to look for it.
I did something similar to that once -- a program that compiled GUI descriptions into a runable Python class, which could then be subclassed to provide functionality. The syntax looked like this:
Laying out GUI components was done with the rowLayout, colLayout and table containers. The subclass would contain code to be executed for @New, @Open, @Save, etc.
There's some similar stuff for wx in Perl and Python, too - I've just been taking the time to (try to) be systematic about making things fast and easy to specify. I really, really get tired of coding all that stuff by hand every damn time - it's one of the major things that made me get out of GUI work in the first place, back in the Stone Age.
I've got big plans, and since it's not expected to pay any bills for a while, I can afford to think things through sufficiently. I just hope I won't drop it entirely.
> I really, really get tired of coding all that stuff by hand every damn time - it's one of the major things that made me get out of GUI work in the first place
Absolutely. Having sensible and configurable defaults for everything is the way to go.
Well, actually it's kind of like an unholy Python/Tcl mutant thing, since the one-word tag determines the parsing of its line and children - but yes, significant indentation using a Perl filter.
A programming language where syntax and semantics are manipulable at run-time as well as compile-time and where you can define grammars in-line and use them immediately. I also intend to integrate the concepts of pattern calculus - http://lambda-the-ultimate.org/node/3695 - to permit extreme levels of flexibility in the language.
It's a huge project and I'm right at the start of it. But no matter how hard I try I can't get away from the desire to work on language design and compiler development. It's just my thing, and the wonderful thing about hacking is you can just do it :-)
I just found namespaces in Racket too painful (see [this blog post][1]), I spent hours and hours trying to do something I felt ought to be very simple, i.e. sharing a namespace between different files, yet Racket just utterly refused to do it. I RTFM but found it utterly confusing and nothing I tried, including their examples, worked. I asked on the freenode IRC channel, and even then nobody could help me.
After a while I gave up, maybe I am simply not a good enough coder to understand how Racket namespaces work, but either way I worried that if this one aspect of the language is extremely difficult, what else am I going to uncover in the course of the project? On that basis I decided it'd be wise to switch.
Initially I was going to switch to C for portability as I am also working on a parser generator, singular[2], which I thought could be useful to people even before I write it in Terse (I intend to self-host and bring singular into that too), however I worried that the many pitfalls that C brings to the party, e.g. the ease of segfaulting, null pointers, etc. and its lack of abstractions would overly slow me down, so I thought Go would be a better option, especially as it seemed tastefully designed.
My experience of Go so far is one of great admiration and enjoyment, it really is a lovely language, nicely low-level and low-key yet still providing many useful abstractions including proper interfaces, i.e. by implementing the methods of an interface you can treat it as that interface without having to explicitly inherit from it.
To be honest my decision to switch to Go is probably not that defensible as not many people are using it so the initial reason (portability) for switching to a lower-level language is less of an excuse now, so if I'm being honest I have to admit that I wanted a fast language that played nice with Linux (not that Racket wasn't either of these), and I wanted to play with Go, which kinda overrode other considerations.
Most recently I've been very interested in implementing [pattern calculus][3] in the language somehow, as it provides enormous flexibility and offers a formal underpinning to a more fundamental means of expressing abstractions than oo, functional, etc. - in fact my ambition is to have an abstraction which can encompass these paradigms in itself if you want, i.e. you can implement oo or functional or whatever you want. Obviously I am very inspired by lisp in this and many other regards.
The main thing is getting stuff done, this idea has been floating around in my mind for at least a couple years and I've changed my mind about things many times (and will carry on of course when necessary) causing me to throw away work more than once, so obviously I am somewhat focused on actually writing code and getting closer to actually having something rather than just the idea.
Luckily I am pretty damn certain about the core ideas in the language (flexible syntax, the use of pattern calculus, etc.) so that looks to be quite likely.
Anyway, it's really early days, but I am utterly committed to getting this done as I want the language for myself, want it to not be a toy language, and want it to actually do these things I think would be awesome, even if (as is most likely as with any personal language project) no one else uses it :-)
I know I'm digressing from your question, but have to say that I really think one of the most wonderful aspects of programming is the ability to just hack on stuff, no matter how crazy, with just a cheap computer, some coffee and a willingness to put in the time. So glad I was born in a time where that was possible.
I'm working on text classification. I have a decent classifier that's especially suited to author identification. I can think of a few good uses for it; the first one I'm trying to commercialize is academic anti-cheating.
You are probably aware of it already, but a lot of universities (mine included) use MOSS (Measure Of Software Similarity) to detect plagiarism in CS classes. Link: http://theory.stanford.edu/~aiken/moss/
I am. There are also a large number of services that detect plagiarism in essays, but most (all?) only detect direct copying from published sources and sometimes re-use of an essay previously turned in by another student.
I'm targeting the custom essay - services like http://essaymill.com ("our writing, your success"), as well as students paying other students to write their papers.
Detecting and punishing cheating in those circumstances sounds like a Hard Problem. In particular, when your software says 2 essays were probably written by the same student, but both students deny it, how can they reasonably be punished, since there is no proof?
Questions like "why did you say X?" usually reveal whether a person is actually familiar with what they claim to have written. It's imperfect, certainly, and I would never recommend punishing a student based entirely on an algorithm's result, but I think I can provide a tool to drastically cut down on this sort of academic fraud.
I intend to make it very clear to customers that they should not punish students based only on information provided by my software.
I ran a site called that crawled Gnutella/Limewire for student papers. That's something you could consider adding to your database and quite easy since the Limewire code and RFC are opensource. You could write your own client or modify Limewire.
I experimented with existing text classification algorithms for an author identification project I was doing for fun. What I'm currently using is somewhere between KNN and SVM, but I'm not done tweaking it yet. I'm also working on boosting results using different feature sets.
you might try looking at the BLEU metric. It's designed to test similarity between a machine-translated text and a human-translated one, but it could be a good starting point for detecting plagiarism too.
Great looking site and very easy to use. Obviously the report is a lot nicer than that Google one that you can get emailed in PDF format, but what other advantages are there?
To me the "best" setup would be to avoid the PDF and get the report directly in my mailbox. I know that would require unique graphics for each email, but is that the only barrier from going with that approach?
I agree. I recently unsubscribed from Metric Mail because I didn't want to look at PDFs of my analytics. I want them right in the email. Once you push that feature I'll likely resubscribe!
http://soundkey.com : Will ideally become something like the Wikipedia of sounds (i.e. a central repository/global reference about anything that has to do with sounds)
Really cool idea. I have some feedback after visiting your site. "wikipedia of sounds" is not at all how I would describe it. It's more like "twitter for sounds".
When I arrived at your site I was really quite confused as to what the hell soundkey did. The main part of the site shows a big list of social networks, and says "use soundkey here" ... Ok... but what exactly does soundkey do?
I would focus more on the aspect that you can record sounds and then link to them (or embed them). It's really that simple, but you've managed to overcomplicate it and it took me way longer than it should have to figure out the following: Soundkey lets me record a sound, then link to it.
My suggestion: put the record tool right smack in the middle of the front page. Make it the primary purpose of the site. That is, I should be able to go to "soundkey.com" and conveniently record and share a sound, rather than seeing a splash page. You don't visit bit.ly and see a whole page explaining the benefits of short links, and where you can use shorter links. You see a textbox that you can immediately use.
At any rate, this all might sound very critical, but I love your idea. Put that recording thing front and center, emphasize "record sounds and share them" [twitter for sounds], and your users will figure out the rest.
We are currently working with a UX expert to help us re-design the look & feel and the functionality of the website, and the issues you bring up have been brought to our attention and will be addressed in the next iteration of the website.
(We currently do have a "How this Works" section on the front page that says "Record Sound, Get SoundKey, Use Anywhere" with an explanation of what that means, but I guess it's not clear enough because lots of people complain about it. Hopefully the updated website will make things much more clear in much less time)
madlibber.com is such an awesome idea. You need to take some time to populate (or copy) good mad libs and artificially vote them up so that it's not a ghost town when people arrive.
Update: I just created an account with inqueryapp -- I cannot add a category. The AJAX response is a 500, you might want to check that out. Up to that, the experience was rather enjoyable and I was really looking forward to creating a FAQ page using your service.
That's a good idea, I'll throw a bunch of samples up there. Now, I need to figure out how I should display a tag list or cloud on the homepage to navigate. Probably should make a little "syntax help" link when creating madlib stories too.
Weird, I can't reproduce that inquiry error and Hoptoad didn't catch anything. Let me know if you run into that again.
a small note site. Pages are written in markdown, and then displayed in HTML. It exports to plain text, and syncs with Simplenote. The goal is a super low barrier to entry. I want it to be a middle ground between my thoughts and my hard drive.
I used to not want to leave my desk for lunch because I'd need to come back if the market moved. I wrote a script that would tell me if the market moved past a certain threshold and would text me to come back.
I was sort of shocked that it didn't exist in the wild (or at least wasn't easy to find) so I decided to see if I could make a web app of it.
Hoping to "launch" in the next 2 weeks. Would love testers.
The main idea behind ShowMe is "viewability", by which I mean that every object in a running ShowMe program can be navigated and displayed (and potentially altered) in a Viewer. By object I mean every entity within the system; ShowMe will be a pure object-oriented language.
There will be multiple Views, so the user can view the same data in multiple ways (for for example a table of numbers could be displayed as ascii text, or as an HTML table or as a graph). One of the views will be a low-level ascii string, from which the object can be re-created; this format can therefore be used for serialisation. User-defined classes will be able to define their own views, or re-implement existing views for the new class.
Like Clojure, a lot of data structures in ShowMe will be immutable.
ShowMe will not be a pure functional language, but it will be possible to program in it in a functional style.
There will be 2 syntaxes for writing ShowMe programs: one based on Lisp, the other similar to C. The C-like syntax will be compiled into the Lisp-like syntax.
I’m working on building a better Business search service with a strong emphasis on mailing list.
The idea is to give small to mid size businesses a tool to generate geographically and/or category based well targeted mailing list for their marketing campaign.
Mailing list are generally huge files pretty much unusable by a small business owner with a constant contact account, we are looking to change that.
A friend of mine runs http://www.doorknobads.com/, selling physical adverts by neighborhood. Your geographically targeted email marketing service sounds very promising. I'm looking forward to seeing what you do with it.
At work...a sinatra api for an existing app that is currently a horrible mess of java and xml configuration files (more "code" is in xml than java, ugh!)
After hours...working on features toward the launch of http://www.wanderphiles.com (teaser site...sign up!). So much to do and only a couple hours a day to work on it.
Reporting for SharePoint... trying to federate multiple instances and make sense of the fact that every site/list has a different schema of potentially the same data. So it's a foray into reporting against federated semi-structured data.
Eventually this will lead to integrating pyjamas as an alternative to templating for pure-AJAX applications. This will mean adding a setuptools plugin for compiling the pyjamas client before packaging the application, creating a paster template for generating the project structure, and extending the test framework to support functional tests.
It might mean creating yet another framework built on pylons, but I'm hoping it will get folded into Pylons proper so that I don't have to maintain a separate project.
I'm working with a friend on a text based bulletin board system written in C# that uses SSH rather than Telnet or a modem and supports many of the features of an early 90's -ish DOS based Dial-up BBS.
... I and a few of my friends miss the old days so we'll probably be the only ones to make reasonable use of it, but it's a fun hack project for me and I'm learning a lot about SSH in the process. It's not OpenTG and I'm not that developer (he's doing one in Ruby so his project is probably more interesting to folks around here). Still in the very early stages so nothing works yet and I don't have a code repo setup.
Noted (http://notedwiki.com), a wiki that will be easy to use, attractivelly designed, simple to theme and have functionality focused on small business and freelancers.
It will be a software product sold for a one-time fee and will be compatible with pretty much any environment that has PHP. The syntax will be based on the cross-wiki WikiCreole standard, but it will also support WYSIWYG. We have some great ideas and are gathering feedback from people who are interested in wikis or an easy way to store their information.
I'm working on a program that uses the Advanced Configuration and Power Interface to stop charging my netbook battery to 100%, thus shortening its life. Instead, it will start charging at 40% and stop at 60%. There will be an option to charge to 100% in case I need a full charge to work off the grid. There will also be an option to turn off the charging as sometimes the dual load of charging the battery and running the computer trips the circuit breaker of the power supply at an airplane seat.
Just a small extension of a little iPhone app I made for myself. Realized that most of my 'todo lists' are more like checklists... for launching a new site, compiling a distribution build of an iPhone app, etc. So I have a little app for myself that let's me quickly re-use these checklists.
So now I'm building in a web based back end to allow everyone to share their checklists for other people to use.
It would be a good place to store, and use, the 100's of different checklists i've seen on blogs, hackernews, etc.
I have been working on it by myself for a few months (so have a working proof of concept).
The idea is a data aggregation service that helps small to medium online businesses analyze common sources of ecommerce data.
Most small businesses do not have the time or manpower to implement complex business intelligence solutions such as SAP, Cognos, or Actuate (the big players in this space).
My product serves to bridge this gap by providing an easy way to gather both traditional and non-traditional metrics. By traditional I mean gross sales, volume, margins, site traffic etc. Non traditional would be a lot of the "Web 2.0" metrics (tweets, buzz, etc).
I'm at a point right now where a proof of concept is built but I need a lot of help in the marketing/business development department.
Currently working on setting up a high-quality content providing company. The business plan is just about complete but this is really just a work of love and the result of my general hate of content farms (yes I have a day job).
Other than that I've been slowly putting together a site about bad dates for a friend of mine, maybe someone here will enjoy it enough to add a story ;)
http://www.runawayscreaming.com
In light of its recent meltdown I'm hacking the real estate industry to help stabilize the market, re-establish lost value, and avoid future foreclosures.
I'm drastically simplifying the setup process for OurDoings Dropbox integration. If you use Dropbox, just share a folder with box@ourdoings.com to try it.
Yeah, I'm using gomoku extensively. But there are some questions that are more about the langauge -- for example, it took me a while to figure out you can't have a default value for an optional argument.
http://hnrecap.com - Daily, Weekly and Monthly HN Summaries (Has: Its own point system, Instapaper support, Treemap visualisation and Archives). Thinking of maybe starting a weekly podcast!
Also a project in the early stages which aims to make it easy to find great available domain names for projects/startups. A lot of hackery going on here. :)
Runs on IIS >= 5, and Mono on Apache. It does most of what I have needed for daily tasks, so it has been some time since I have worked on it; should probably get back to it in the near future . . .
I'm working on Plexibase, a database platform that makes it easy to create a knowledge base. Please try it out at http://plexibase.com. It's a work in progress; I welcome your thoughts. Smaller companies could use this to make data available to customers quickly without having to develop a web application or CMS.
I'm working on an app that allows you to (smoothly, quickly) see full resolution images on your iPhone, by reading png tiles of the original image. The tiles are created on your pc using a java app that stores the tiles in a sqlite db, it can also rasterize pdf to images using the java.net pdf-renderer.
Anyone knows a better pdf renderer that works well with java ?
It's Python and all command-line. I'm trying to make something that's both better than iTunes for managing music (not that hard) and smarter than MusicBrainz Picard for correcting tags (a little harder).
Harvesting as much sheet music as possible from free sources (Icking, Petrucci, etc) converting it to MusicXML and LilyPad and then doing... something with it. I have a few ideas, but harvesting it all is a start. I'd like to put it online wiki style since it will most likely need editing after being run through OMR.
Something like the Django admin feature but for MS Sql Server and more generic; to sell to companies. I figure a lot of companies have databases for internal purposes but don't have programmers to make CRUD apps for them.
I'm still not sure how to locate potential customers or how much to charge. Any advice on that?
I'm working hard to get the website and api ready for Private Beta. Tomorrow I'm coding the final api feature and more functional test cases to iron out problems with unicode filenames.
The past week I have been working on a back end tool to fully automate the creation of the newsletter (the selection of content is done by me). It has been fun working with the MailChimp API.
Recipe Website that emails you healthy recipes based on food you like to eat. Love burgers, pasta, and string beans, but hate hot peppers and anchovies? We'll give you personalized recipes in your inbox to keep you fit.
I'm working on my social networking site where users can share music playlist, photos, exchange messages, real- time chat ala Facebook, and schedule their daily task. http://www.jamafriend.com
A company in Berkeley is doing this: www.iqengines.com (see Developer API), and demonstration app www.omoby.com. HTML Post image and JSON/XML return label (also face, barcode, ocr, etc).
So just colours and faces at the moment. You're right that arbitrary images are ridiculously complicated, I'm hoping to start off on a smaller domain and build up :)
(edit: obviously I can return the coords of the face too, as well as coords of empty parts of the image etc, but tagging is really what I'm focusing on at the moment)
Working on an iPhone/iPad application that uses data from NASA's SDO/SOHO satellites to display hi-res images/movies of the sun based on 'solar events.'
Sadly, no preview available yet, but the app should be out in a few weeks.
Helioviewer. I'm currently working with the backend team to get stuff pushed out on 8/20 and should be able to submit the app to Apple after that. This is my first app, so I suspect it may take me a little while to get it up there. If you do download it, please review :D
I've been working on some Chicken Scheme stuff lately. Also playing with epoll and libevent some more. I'd like to eventually get scheme and mongrel2 working together rather than writing my own web server.
I'm messing around with a Javascript text editor. Mostly it is to improve my chops with JS. By day, I am a Java developer, and don't get many chances to mess with other things, except during my free time.
Masters project on using bandit algorithms for optimising CTRs on website content. Also involves some search engine / text mining / dimensionality reduction stuff.
A little bit of messing around with Android SDK too...
Hopefully! It is great to work with some real world customer click data...
If anyone is interested in a technical intro to the setting there is a set of slides from John Langford at Yahoo Research (many good and standard reference papers cited in it): http://hunch.net/~exploration_learning/
A/B testing could be thought of as a sort of epsilon-Greedy strategy (particularly if such testing is carried out at regular intervals initially). While not enjoying the optimality characteristics of other algorithms, such an approach can in fact outperform in many practical cases :)
Simple online selling platform, Hosting Provider www.fusionservers.co.uk - coding the backend stuff, trying to create an app for the iPhone (dont own an iphone) and finding it very tricky.
I'm working on a nano-blogging platform that reduces posts to short, one-at-a-time messages. Each user gets only one visible post at a time. Think twitter minus all the clutter.
Getting a stable version of Noostr ready for release. Added multiple database options, plugin support, better themeing and pagination (big one) this release :)
i'm working on http://coloringout.com intermittently.
a bit rough around the edges at the moment and i'm in the middle of porting it to app engine just for the hell of it :)
A CodeIgniter based Control Panel for ordering and managing SEO services. Has mailing list management, task list generation, status updates. Pretty sweet.