Hacker News new | past | comments | ask | show | jobs | submit login
Falcon – a Chrome extension for full text browsing history search (github.com/lengstrom)
178 points by anishathalye on Sept 12, 2016 | hide | past | favorite | 52 comments



This ( and TreeStyleTab (https://addons.mozilla.org/en-US/firefox/addon/tree-style-ta...) ) is the main reasons I keep preferring Firefox for leisure-browsing before Chrome. Every time I use Chrome, it's impossible to find previous pages based on the title or URL, while in Firefox is super simple and works really well.

Which is kind of ironic since Google is all about search and data but can't handle a browser address bar...


Yeah, it's incredible how search in the Chrome's URL doesn't find URLs (or page titles) I've visited 15 minutes ago. It's utterly frustrating.


In my experience, Chrome seems to be highly biased to matching the beginning of titles (and URLs). It really doesn't like suggesting based on a match deep in the title until you add something like the tld or the site name which often appears at the beginning of all titles.


Chrome's history search seems broken for the past 20+ versions.


I'm not sure why you're experiencing that, but Chrome's bar certainly does let you talk about titles. Just type "Hacker" or "news" and see.

At least, that's what it's supposed to do and is currently doing for me.

I just fired up Firefox, and neither actually seem to index into the content of the page, which is what's so cool about this modification.


I'm on Ubuntu and just tried this in Chrome, I went manually to news.ycombinator.com, then open a new tab and type "Hacker" and HN is nowhere to be find there...


Yeah, might work if you have a bookmark or something which would store a title locally though.


There may be a regression with the Chrome support in that OS. As it is a very tiny share of the Desktop market and is the hardest shell to maintain of the 3, I'm not surprised.

I have confirmed it works fine on Windows 10 and OSX variants both in Canary and Stable.

You perhaps might be interested in reporting the bug. That's a major regression. I'd be annoyed if it was missing too. Not enough to go to Firefox, mind (the speed loss is too great once you have 20-40 tabs open), but it'd definitely lower the speed at which I work.


> the speed loss is too great once you have 20-40 tabs open

That leads me to my second point, TreeStyleTab. I commonly have 50+ tabs open, and they are nested nicely under each other so I actually can find them. In Chrome, there is bunch of half-baked addons for easier finding, but nothing like TreeStyleTab for Firefox.


Have you tried Tab Outliner? [1]

Compared to TST it has two (major?) drawbacks upfront:

1. It sits on a separate window next to your other browser windows.

2. Some of its features are paid features.

I always thought that TST for Firefox has ruined all other browsers for me but I actually like Tab Outliner more than TST. Here are my reasons:

1. It manages all my browser windows in a single tree view. Separate windows are children of the current session.

2. Its has a much better keyboard support like rearranging tabs with your arrow keys while holding CTRL or indent/unindent a tab further into the tree with the usual TAB / Shift + TAB shortcuts.

2.1. Even rearranging your tabs with the mouse seems much more deterministic than in TST. In TST I struggle with rearranging a tab as a new child vs. as a new sibling tab.

3. You can unload tabs or entire subtrees (collapse its children and then press the green unload button) to free your RAM.

4. Google Drive backup feature to sync your entire tree to your other computers. You can restore your entire tree or just a subtree on an other device just via drag and drop.

I only use a subset of all Tab Outliner features but it is already giving me a much saner tab management experience than TST could do. The better keyboard support and syncing my tabs across devices while preserving the tree structure are essential features that I can't have with TST on Firefox. Personally I have more sympathy with the TST developer (always nice on GitHub) but Tab Outliner is still the better browser extension.

[1] https://chrome.google.com/webstore/detail/tabs-outliner/eggk...


That happens all the time on my Chrome in OS X as well.


Works fine for me on a Mac.


Chrome is designed to send you to Google search results where you can click on ads on the way to your intended destinations.

This is why Chrome is much slower than Firefox from a usability standpoint. Firefox sends you right where you want to go, where Google has an incentive to send you to a page full of ads on the way.


Chrome prefers to give search hints..which lead to search page..which lead to monetisation. Annoying as hell.


Thanks for the tip - TreeStyleTab is great!


TreeStyleTab really just blew my mind. I think it's the straw breaking for me to go back to FF.

Seriously, tabs on the side in a tree view make SO MUCH SENSE!

I know many of you probably have a few dozen tabs in chrome open, and you gotta guess whats what from the icon. We keep making monitors more widescreen yet webpages mostly need a max-width. I'd bet in the next few years all browsers will switch tabs to a treeview like this.

And wow, the chrome extensions trying to do the same thing are fugly as all getout.


Hmm, your post seems to imply Firefox does searching of full text history page contents natively -- is this true? I'm trying (on nightly) and can't seem to trigger it.

If not natively, which add-on are you using?


Exactly. Firefox's Awesomebar is aptly named.


Google most likely wants you to just use google rather than search through your history.


> you can clone it on your local machine, read through our code to verify that it is not malicious, and then install it

I like that the authors share my concern about installing an extension that would by design record every page I visit. However the repository contains several minified Javascript files [1]. This somewhat contradicts their invitation to read through the code.

[1] https://github.com/lengstrom/falcon/tree/master/extension/js...


Switch to Firefox and only use fully reviewed/approved addons if you're serious about this. I just put a ported chrome extension through the full Firefox add on review process (thanks to web extensions they're easy to port now), and those guys rejected my extension twice because they couldn't replicate my minified code from my dependencies to the exact byte.

Chrome web store doesn't care what I upload and push down to my users. I've had numerous requests from spammers looking to buy my extension based on the number of users and their geography. I guess once they buy an extension they push malware down to the users, so even if you can trust the extension developer or source now, you can't keep that trust up indefinitely.


I agree that the thirdparty javascript files also should be supplied in full, and minified during the build process.

However, I've found the originals so you can still check if they contain 'contaminated' code.

chrono: https://www.npmjs.com/package/chrono-node - a natural language date parser for Node and Browserify

notie: https://www.npmjs.com/package/notie - a clean and simple notification, input, and selection suite for javascript, with no dependencies

readability: https://github.com/arrix/node-readability - Node implementation of Arc90's Readability (however seems this code has been slightly modified)

semantic: https://github.com/Semantic-Org/Semantic-UI - Semantic UI JS support

stopwords: list of stopwords for the english language


It serves no purpose, reviewing third-party code that you don't even know is the same that is distributed. But anyways, since Chrome has autoupdate for addons, it doesn't matter if you're reviewing the addons you install or not, because it can change at any point.


If you clone it on your local machine you also won't receive any automatic updates.


Don't chrome extensions automatically update themselves? Assuming no permissions are changed I think they do. It would be nice if there was some sort of version pinning.

After some investigating, this extension doesn't have an update_url in the manifest.json, so I think that means it won't/can't auto-update.


Every extension in the Web Store auto updates, doesn't need update_url


Why do you think it would be practical to identify malicious code even if it wasn't minified? See: http://www.underhanded-c.org/


I wrote an extension that did the same thing a couple years ago. http://lifehacker.com/deeper-history-searches-the-contents-o...

I voluntarily removed it from the web store after realizing it was caching lots of sensitive data. I eventually started encrypting the stored info but I realized that if the extension ever became very successful, it would become a target and I wasn't comfortable with that.

I hope the developer of this extension will invest more effort in their user's security than a simple blacklist.


Seems like this extension stores all the data locally, so it's probably much less of a problem.


As did mine. There are many well known attacks that breaks locally stored data out of its sandbox. If attackers are sure there are bank account numbers, balances, email addresses, and other sensitive info in plaintext, they'll come after it.


Do you still have the code somewhere? I would like to take a look at it.


Really really useful extension, whoa. Searching the content of pages you've browsed. I need it legitimately multiple times a day lol.

Two caveats though: 1) obviously it can't index the pages you browsed before installing the extension and 2) it's a bit unclear how to use it (in searchbar press f tab).

I'm also interested to see info on storage usage after a long time using it.


Upon installation, if you the user opted, couldn't it crawl the history up to some specific date? Or does chrome not allow extensions to access browser history?


My defunct extension did exactly that. So, yes, chrome extension can access history.


This is awesome! We develop a tool that does exactly this and found that getting the search right can be really tricky given the very large volume of data. Love the simplicity of making it a chrome extension. Excited to try it out!


Gifs are great to explain visually.

The gif on this page is really bad at it.

Slow it down. I have watched it loop 5 times in the last 30 secondes and I still cannot tell what it is without reading the text. I feel dizzy.


Thank you for this! Many times have I wished the browser's history search could provide this.

Now that school is starting again, and I had some free time, I was thinking of working on a project that would allow to search through the websites you've visited, the documents you have on your machine, the photos and music you have on your machine (if you can run some program which generates a description for your photos and run some mp3 to lyrics program for the music), and all the same across many machines. I started looking at elasticsearch, because that is what I found during my research for the search tool I would need for this project.


Kippt used to do this. Unfortunately, I never got the email that their service was shutting down. And I lost all 500+ of my tagged bookmarks.


It would be great if this worked for bookmarks as well. On average I probably accumulate about 10 new bookmarks per day of notable content. Over the years that adds up.

Obviously searching pre-existing bookmarked content (and not just history) would entail far more complexity, probably requiring a back-end service.


If you haven't looked at it already, Pinboard (https://pinboard.in/) might be worth checking out.

The premium service caches the content of everything you bookmark using the service, with full-text search (in addition to the usual tagging).


This can be done entirely client side.


You're right. A cursory investigation suggests spinning up a bunch of client-side HTTP requests from a Chrome background page should do the trick.


This has been one of my favorite things in Opera since they introduced it in the late 2000s. It seemed weird Chrome wasn't better at this. I'm going to have to give this Falcon a try because it looks just like what I would want!


I built something like this as well!

I wonder how you solved the data storage and indexing. Does it scale to multi-month heavy usage ? Does it deduplicate multiple visits?

Cool stuff, gotta put mine somewhere. Always planned to, but never got around to it.


Looking in the code, it loops over all the indexed text and does substring matching on tokens from the search query. Good enough for everyone so far, but...


I wrote an extension that did this as well. I used indexedDb. Before I implemented encryption on the DB, it scaled well. I had an algorithm that got the size of the largest pages down to ~2-3kb and I only stored diffs on subsequent visits to the same pages.

After encryption, things got unruly because the text had to be stored in fixed length chunks.


I made one a couple years ago called All Seeing Eye.. have not had time to update it since. Did screen capture too. On Github .


I've missed All Seeing Eye & haven't found another free solution til today. fetching.io was nice, but I really wanted a local (or self -hosted) solution.


fetching.io is self hosted on OSX at least


Very nice work, this will be useful in my day to day browsing for sure!


How similar is this to fetching.io?


I can't speak to the internals of this extension but a few things jump to mind: this works with chrome, fetching.io works with Safari, Firefox and Chrome. This works on any OS that has chrome, fetching.io is only self hosted on OSX (otherwise there's a cloud version). This appears to implement its own indexing scheme, fetching.io uses elastic search. Fetching.io has it's own search UI, tagging notes etc, this is integrated directly into chrome. Which is best for you probably depends on your needs. Oh and fetching.io isn't open source ;)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: