Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Tapir: Simple search for static sites (jeffkreeftmeijer.com)
90 points by jkreeftmeijer on May 9, 2011 | hide | past | favorite | 25 comments



As for "The problem I ran into was that it takes you off the website you were searching on and takes you to a Google results page (complete with slightly irrelevant ads)."

You can set it up to load the ads on the same page. For example, running on my jekyll blog: http://paulstamatiou.com/search


Err I meant "load the results on the same page", not ads.


Awesome share, will be using this.


Look, this is cool and all from a technical perspective, but why are people strapping themselves in static site generator straightjackets like this? At some point, doesn't it become much simpler to just use a server side framework?


They're significantly faster, they never break, and they're way more secure.

Pretty much everyone already agrees they're less flexible.


My personal site is simple enough that there aren't any database-requiring features I actually want on it, and benefits of Jekyll include:

  - Just feels nice. I actually enjoy playing with it.
  - Cheap, I have it on AWS CF (a CDN) and it costs $0.10/month for
  a few thousand pageviews
  - Fast, with such few page views even wordpress would be fast enough
  for me, but I like that it's REALLY fast
The price benefit isn't really a consideration for me, I still have the server I used to run it on (in fact I don't have a unix machine at home, so I edit the site on my server), but it is nice that it can be hosted on a really good CDN (which a database-driven website couldn't be.)


I use Armin Ronacher's rstblog site generator. I'd say the inverse of what you said is true, at least in my experience. At some point it is much simpler to just use a static site generator. That point is when you nail down your workflow. I wrote about this very topic this morning:

http://mattdeboard.net/2011/05/09/more-tips-on-rstblog/


It's cheaper, there's no app server to crash, the best code is no code, etc.


Static sites can easily cope with a massive spike in traffic caused by things like being posted to Hacker News.


Frameworks like Jekyll are a good choice if you want to use github pages.


Do you have any plans to monetise this ever, and/or might you ever release the source so people can run it locally themselves?

I have a small enough amount of content on my Jekyll site that I don't want a search function right now, but if I ever were to want one, the missing feature I would like is the ability to add pages not from an RSS feed, as in standard non-blog pages such as /index.html and /about.html (or whatever). Obviously I could work around by adding them into an RSS feed... but that's a tiny bit messy.

edit: Assuming I'm right in thinking that one of, if not the, most suited system for Tapir is Jekyll, you could perhaps mention it on the actual Tapir site so that search engines can pick it up. (Not that I'm in any way an SEO expert, but in my experience with niche areas like this, very little or no attention to SEO can still get you ranking highly for something like "jekyll search").


You can actually just push any content with a link trough the API (see http://tapirgo.com/#docs > Push API). So with a deploy script that takes the content and pushes it, you can already do this right now!


ideally it would skip the rss feed. most static sites are just plain html files, they have no rss feeds.


Agreed. Right now this is technically "Search for static BLOG sites". If you had the ability to scan simple HTML files as well, it could work for any HTML site, even if it weren't a blog site (or another site type that has a time-ordered list of posts in an RSS feed).


We're definitely looking into this and hope to find a clean solution soon. We started by indexing RSS since it was what we needed and since it was _way_ simpler to implement for a first version. Stay tuned! :)


Given a sitemap.xml, that could be used to index static site without the need for a feed.

http://www.sitemaps.org/

Since sitemap is a /standard/ (sic!) document hence no need to reimplement the wheel for true static sites.


Never used Elastic Search, but it's not too hard to use Nutch + Solr for static sites. Nutch to spider your (truly static) site, and Solr to store and serve search requests.


wouldn't it be simple enough using Google's own AJAX search API ? http://code.google.com/apis/customsearch/v1/overview.html


That would be the simpler approach, but Google wants you to put their logo on your results page and limits you to 100 requests per day, unless you get a paid plan. (http://www.google.com/cse/docs/tos.html)

If you're fine with that, Google Custom Search is a good option. If you're not, maybe Tapir can help. :)


> rate of $5 per 1000 queries

Damn, search is expensive


Or you can try IndexTank. Free up to 100k documents, $49/mo up to 500k. Unlimited queries. (disclaimer: it's my company)


Thank you for making reddit search...usable.


Can it search atom feeds as well?


Yes, it can handle atom feeds too. If you have a feed Tapir can't read, let me know and I'll look into it. :)


i use Montezuma, tiny Lucene clone in CL.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: