Introducing Tapir: Simple search for static sites

PStamatiou · on May 9, 2011

As for "The problem I ran into was that it takes you off the website you were searching on and takes you to a Google results page (complete with slightly irrelevant ads)."

You can set it up to load the ads on the same page. For example, running on my jekyll blog: http://paulstamatiou.com/search

PStamatiou · on May 9, 2011

Err I meant "load the results on the same page", not ads.

thomasdavis · on May 9, 2011

Awesome share, will be using this.

helium · on May 9, 2011

Look, this is cool and all from a technical perspective, but why are people strapping themselves in static site generator straightjackets like this? At some point, doesn't it become much simpler to just use a server side framework?

tptacek · on May 9, 2011

They're significantly faster, they never break, and they're way more secure.

Pretty much everyone already agrees they're less flexible.

swores · on May 9, 2011

My personal site is simple enough that there aren't any database-requiring features I actually want on it, and benefits of Jekyll include:

  - Just feels nice. I actually enjoy playing with it.
  - Cheap, I have it on AWS CF (a CDN) and it costs $0.10/month for
  a few thousand pageviews
  - Fast, with such few page views even wordpress would be fast enough
  for me, but I like that it's REALLY fast

The price benefit isn't really a consideration for me, I still have the server I used to run it on (in fact I don't have a unix machine at home, so I edit the site on my server), but it is nice that it can be hosted on a really good CDN (which a database-driven website couldn't be.)

mattdeboard · on May 9, 2011

I use Armin Ronacher's rstblog site generator. I'd say the inverse of what you said is true, at least in my experience. At some point it is much simpler to just use a static site generator. That point is when you nail down your workflow. I wrote about this very topic this morning:

http://mattdeboard.net/2011/05/09/more-tips-on-rstblog/

_pius · on May 9, 2011

It's cheaper, there's no app server to crash, the best code is no code, etc.

evangineer · on May 9, 2011

Static sites can easily cope with a massive spike in traffic caused by things like being posted to Hacker News.

robert-boehnke · on May 9, 2011

Frameworks like Jekyll are a good choice if you want to use github pages.

swores · on May 9, 2011

Do you have any plans to monetise this ever, and/or might you ever release the source so people can run it locally themselves?

I have a small enough amount of content on my Jekyll site that I don't want a search function right now, but if I ever were to want one, the missing feature I would like is the ability to add pages not from an RSS feed, as in standard non-blog pages such as /index.html and /about.html (or whatever). Obviously I could work around by adding them into an RSS feed... but that's a tiny bit messy.

edit: Assuming I'm right in thinking that one of, if not the, most suited system for Tapir is Jekyll, you could perhaps mention it on the actual Tapir site so that search engines can pick it up. (Not that I'm in any way an SEO expert, but in my experience with niche areas like this, very little or no attention to SEO can still get you ranking highly for something like "jekyll search").

matsimitsu · on May 9, 2011

You can actually just push any content with a link trough the API (see http://tapirgo.com/#docs > Push API). So with a deploy script that takes the content and pushes it, you can already do this right now!

pencilcode · on May 9, 2011

ideally it would skip the rss feed. most static sites are just plain html files, they have no rss feeds.

silverlight · on May 9, 2011

Agreed. Right now this is technically "Search for static BLOG sites". If you had the ability to scan simple HTML files as well, it could work for any HTML site, even if it weren't a blog site (or another site type that has a time-ordered list of posts in an RSS feed).

jkreeftmeijer · on May 9, 2011

We're definitely looking into this and hope to find a clean solution soon. We started by indexing RSS since it was what we needed and since it was _way_ simpler to implement for a first version. Stay tuned! :)

_pdeschen · on May 10, 2011

Given a sitemap.xml, that could be used to index static site without the need for a feed.

http://www.sitemaps.org/

Since sitemap is a /standard/ (sic!) document hence no need to reimplement the wheel for true static sites.

techtalsky · on May 9, 2011

Never used Elastic Search, but it's not too hard to use Nutch + Solr for static sites. Nutch to spider your (truly static) site, and Solr to store and serve search requests.

mikemoka · on May 9, 2011

wouldn't it be simple enough using Google's own AJAX search API ? http://code.google.com/apis/customsearch/v1/overview.html

jkreeftmeijer · on May 9, 2011

That would be the simpler approach, but Google wants you to put their logo on your results page and limits you to 100 requests per day, unless you get a paid plan. (http://www.google.com/cse/docs/tos.html)

If you're fine with that, Google Custom Search is a good option. If you're not, maybe Tapir can help. :)

sebastianavina · on May 9, 2011

> rate of $5 per 1000 queries

Damn, search is expensive

diego · on May 9, 2011

Or you can try IndexTank. Free up to 100k documents, $49/mo up to 500k. Unlimited queries. (disclaimer: it's my company)

blhack · on May 9, 2011

Thank you for making reddit search...usable.

robert-boehnke · on May 9, 2011

Can it search atom feeds as well?

jkreeftmeijer · on May 9, 2011

Yes, it can handle atom feeds too. If you have a feed Tapir can't read, let me know and I'll look into it. :)

mahmud · on May 9, 2011

i use Montezuma, tiny Lucene clone in CL.