Show HN: Archivy – Extensible Self Hosted Knowledge Base – v1 Release

etherio · on Feb 2, 2021

Hey HN! A few months ago in August I posted about Archivy [0] here, and got tons of useful feedback and really enjoyed the discussion. Since then, I've improved the whole project quite a lot and wanted to share and talk with you again.

I'm really excited about our improvements and am happy with our progress in this v1 release, it feels good to be back here after a lot of work :)

[0]: https://news.ycombinator.com/item?id=24199419

eitland · on Feb 3, 2021

This is seriously cool and might do the things I wanted Perkeep to do[1] and more.

[1]: but never could get it to because documentation seems to be close to non existant.

theshrike79 · on Feb 3, 2021

This was my first thought exactly.

I _love_ the idea and tech behind Perkeep. But holy shitballs the documentation and UX are ...imperfect.

This might be the midway between "bunch of markdown files" and Perkeep for me.

Arech · on Feb 3, 2021

Hmm... It seems that I should finally stop using (good but too generic) DokuWiki engine in favour of this.

What interests me the most is an advertised ability to bridge self-hosted Wiki (did I get it right?) for taking any kind of notes with web-bookmarking and ability to perform a good-quality search over the _content_ of bookmarked pages. This is where I'm having a real pain. There are so much great useful stuff on the Internet, but it so hard to find some very specific thing that you know exists and you've even bookmarked it long ago... So if it'd solve that pain and also features a comparable to DokuWiki notes engine... I think, it'd be an absolutely great thing! I'll give it a try.

BTW: does it support importing existing bookmarks from Firefox?

mncharity · on Feb 2, 2021

Is there a privacy statement? Given open source PIMS that variously phone home (when started, or when visiting assorted websites, etc), it's now something I look for, right after self hosting.

etherio · on Feb 2, 2021

Oh there's nothing of the sort with Archivy. No information is sent back and there isn't even a server where it could be sent to really.

I haven't looked into redacting a privacy statement, might be a good idea :)

geoelectric · on Feb 2, 2021

This is a great project!

I think you should put in a simple privacy statement that comes down to "Archivy won't make any external requests" if it's going to be as simple as that, and then stick to it like glue. If it's not that simple, be accurate to whatever room you'd like to leave yourself, and then stick to that.

Not having a statement like that leaves the future open for a future git pull to change the landscape. Technically the statement won't stop that, but it'll keep the project accountable.

FWIW it's way more important for this sort of thing because it's a stack you have to invest lots of time and notes into and buy into long term. Were I going to run a personal wiki, I'd be very interested in knowing I could likely use it for a long time.

vibrant_mclean · on Feb 3, 2021

May not be archivy directly. But for example if the web pages use static assets like google fonts, jquery, they will send external requests thus leaking the presence of the self hosted server. Or may be some javascript plugin that phones home.

bravura · on Feb 3, 2021

Any plans to allow references / citations?

cite-proc or bibtex style reference processing would make this very useful for keep a research/science log

zmix · on Feb 3, 2021

Is there a specific, technical, reason, why you don't use an XML database with, let's say, an XQuery frontend for storing and retrieving document data?

denysvitali · on Feb 3, 2021

> Extensible search with Elasticsearch and its Query DSL

Omg thanks. I'm so sick of Confluence inability to search through some documents. I really wish I had all of my Confluence spaces on disk so that I can `grep` it all.

brazzledazzle · on Feb 3, 2021

Is it not searching/indexing pages or attachments? Ignoring the frustrating interface and query syntax it (in theory) should share some characteristics with elasticsearch since it uses lucene on the backend. In my experience it does an ok job of indexing pages and some file types. It would be nice if it just dumped the strings in binary files for indexing but I can see how that wouldn’t be very user friendly.

But it can only do ok when the indexing is working as it should. It can be easy to miss a stale index for a while because search continues to work. Then you’re waiting for it to catch up or even for a full reindex. The problems seem to get considerably more frequent with their HA products since you’re now distributing the search/index.

My impression from running a few different pieces of software and maintaining some smaller elasticsearch clusters is that search is pretty difficult to do well. Or even decent for that matter. Development, UX and operational support have to come together just right and when it’s just a feature of a product or a service instead of a core competency you get milquetoast and frustration.

vibrant_mclean · on Feb 3, 2021

It would be good to have a systemd service file

fao_ · on Feb 3, 2021

contribute one :)

M5x7wI3CmbEem10 · on Feb 3, 2021

how is this compared to Obsidian?

iFreilicht · on Feb 3, 2021

More basic, but open source. Obsidian has very interesting features for showing and creating connections between pages, archivy seems to take a more standard structural approach. Links between pages are a planned feature, though.

One of the big features of archivy is how bookmarks work: If you bookmark a webpage, it gets converted to markdown and stored in your knowledge-base to prevent link-rot, which is an awesome feature. I'm not sure Obsidian allows to archive pages this easily.

geoelectric · on Feb 3, 2021

Obsidian has a webpage to markdown paste/drag function, but IME it's hit or miss.