Hi HN, a few months ago I started building a suite of knowledge management tools to for my own needs. It's been a long iterative process of noticing patterns (and inefficiencies) in my workflow, building simple tools to improve it, and evolving the UX over time.
One of the tools I have been using daily is a web clipper that captures not just the current page, but can automatically extract key information from it. You can also do a quick lookup of your existing notes regardless which web page you are on.
Prior to this, I had been using web clipper extensions by Evernote, OneNote, and Notion, and all of them had something missing that would significantly slow me down. Wanted to share what I have built to address this. The code is integrated with the [Rumin](https://getrumin.com) backend (the other tools I built), but you can easily swap out the API calls to point to local storage or some other endpoint.
Check it out. Would love to hear feedback from the community :)
Great project! Rumin looks very interesting as well. I was a long-time Evernote Web Clipper user, but switched to Notion a few months ago. I'm much happier with Notion's web clipping workflow and table storage approach, but it's not perfect.
thanks neovive! Yeah Notion's web clipping and table storage approach is quite elegant. It only gets a bit clumsy when we get to the "power user" (not very common) use cases
Wouldn't it be possible to support any video by simply scanning for <video> tags and getting current playback information from there? I'm not sure, but is the extension able to control he video playback in order to navigate to the right time?
great idea! yeah that would seem to be a better design.
Regarding the playback time, currently I'm adding the "t=[TIME]s" parameter to the captured url, so it works for YouTube. But there are definitely more elegant solutions as this scales to support more websites.
Good work!
For extracting meta information -- a set of community maintained information scrapers (html, or intercepting ajax) for different websites could be cool. It's hard to maintain all the sites on your own (especially the ones you don't use), and by sharing we could perhaps avoid redoing the same thing twice.
thanks! and yeah that's a very good point, and one of the main reasons why I'm open sourcing this.
Community maintained information scrapers/extractors is definitely a direction I want to build towards, collaborating with any existing efforts. Though the exact form will take some iterations (e.g. a marketplace for scripts/"recipes", built-in scripts for common sites, allowing individual users to save their own scrapers etc.)
Years ago, I spent a couple of months building a simple EverNote clone in Clojure. The weakest part of my “for my own use only” project was a FireFox extension I wrote to capture selected web page data and send it to the backend of my system.
This Web Clipper project would have really helped me. I hope the author of this gets the satisfaction of wide adoption in many cool projects.
This looks great! I use Evernote Web Clipper but spend a lot of time adding context/information/screenshots manually, this would save me a ton of time. I requested access to Rumin and will definitely try swapping this into my workflow.
Looks pretty slick, though custom metadata for just 7 sites seems pretty low for launch. Perhaps the default metadata capture is good enough for sites like Wikipedia, Amazon, etc, that aren't covered?
thanks for checking it out! yeah the coverage for the metadata capture definitely needs to be improved. At this point, this just includes the top sites for my own use cases (and some early users).
I was hoping by sharing it I can get a better sense of what sites other people would like to have supported, and keep adding to it :)
haha for now...the main reason being it's just me working on it at the moment, and I'm fixing/cleaning things up before releasing more of the code base. the rest of the product is pretty clunky (with a beyond shitty code base)
in the meantime, it should be easy to swap out the API hostname to something else (or even local storage)
Please pick a license. By not doing so, you retain full ownership of the code, preventing other people from modifying it for their own needs. See here for more details: https://choosealicense.com/no-permission/
Thanks Lucas! I was a Notion web clipper user as well. For me it worked for the most basic use case of saving a page into a table. But for me these use cases kept coming up:
- An idea belongs to multiple collections, as opposed to a single "table"
- There are usually properties/metadata I want to save (e.g. YouTube channel information), which would take multiple copy-and-pastes back and forth each time
- Bi-directional linking of captured content
- I wanted full-control over the captured data, for more advanced queries/filtering
and it's sad that web clippers tend to be one of these "table stakes" features that companies build a basic version for, and not invest further in.
Quick answer for "Why should I use Rumin?" is: "Perhaps you shouldn't yet, but let's stay in touch and I'd love to hear about your use cases and other ideas."
The current version of Rumin is very rough, and there's an overwhelming list of improvements to make. This is one reason why I closed it for sign ups for now. But in the meantime, I feel there's a lot the community can do even with just the web capture component being open source.
Regarding your concern about data loss. I intend to open source more and more parts of the platform, and somehow figure out a model to make the development sustainable.
Ah, web clipping. I haven't heard that phrase used since PDAs were running on the Mobitex network and had to use web clipping to usefully browse the internet at all.
One of the tools I have been using daily is a web clipper that captures not just the current page, but can automatically extract key information from it. You can also do a quick lookup of your existing notes regardless which web page you are on.
Prior to this, I had been using web clipper extensions by Evernote, OneNote, and Notion, and all of them had something missing that would significantly slow me down. Wanted to share what I have built to address this. The code is integrated with the [Rumin](https://getrumin.com) backend (the other tools I built), but you can easily swap out the API calls to point to local storage or some other endpoint.
Check it out. Would love to hear feedback from the community :)