Hacker News new | past | comments | ask | show | jobs | submit login
Tagsistant: A Semantic Filesystem for Linux (tagsistant.net)
105 points by goranmoomin on July 6, 2019 | hide | past | favorite | 33 comments



Very nice project. I'm delighted to see new ideas for organizing files and personal data.

I feel that most people gave up on organizing data and just went with concept of searching. This is a shame because searching wastes everyone's time to filter false positives, while small effort of tagging new content goes long way to enable discoverability. Web sites that allow you to filter content based on desired and undesired tags give you optimal way to recover information.

A very interesting feature in Tagsistant is tag relations. It enables tree hierarchies for tags ("anything starwars-related is also scifi-related"). Kind of ironic how they wanted to get away from tree structure of files, and then they implemented tree structure for tags. Perhaps a tagging system for organizing your tags would be better? :)

This meta-structure of data is fascinating. Is there a good resource that systematizes the area, with best practices and implementation tips?


Tree hierarchies for tags is starting to sound a lot like categories. It's probably essential not to overdo it because then you start to lose the power of tags and gain the detriments of categories. That said, I do think that introducing some limited relations between tags can be beneficial.


I was worried, because I'm currently working on a tagging system and they beat me to the release, punchy website included.

Then I realised that they built it on top of FUSE, and SQL, and took a sigh of relief.

((EDIT: Perhaps this is a little harsh? I didn't mean to be harsh, just precise, but perhaps I went a little over the top -- I apologize to the authors if I did.))

I investigated the FUSE/db option earlier a year or two ago, and personally I don't see this as an interesting or compelling solution to the file<->tag problem. Because users move and rename files, pathnames are potentially semantically meaningless to the tag system. The contents of files change often, and arbitrarily (given things like MS Word's formats which are literal memory dumps of what word is doing, not to factor in encrypted files, etc.), because of this, file hashes are potentially semantically meaningless to the tag system.

In other words, basic file operations (reading/writing/renaming) will cause this system to break your tags without significant work to keep the file<->tag relation consistent. You can attempt to mitigate this problem through systems that keep track of files (inotify, etc.) but that introduces a runtime cost and has technical difficulties as well. It's 'designed' (albeit unintentionally) to break from the start, and the developer has to exert a large amount of effort to stop the system from breaking. To me it didn't seem like the effort was worth it, that the innate flaws were not worth surmounting. Unfortunately to avoid this from being a 'debbie downer' post, I'd have to talk about the alternative approach, which I don't really have space (or the time, right now) to do here.


It sounds like this is a pretty easy problem to solve if files aren't identified by their names. If a file just has a name (and a parent, and a bunch of tags...) then tags are trivially stable when moves/renames/writes happen. No need to hash anything, ids are a fine way to track identity, and perfectly amenable to storing an a SQL database.

If you want to support hard links you can decide whether to associate tags with files or inodes, depending on whether you want all linked files to have the same set of tags.


> It sounds like this is a pretty easy problem to solve if files aren't identified by their names.

That would make things slightly better, but it's really not how things are supposed to look from userspace. You still have the problem of tags not being preserved across file copies, and not across filesystem boundaries (Which, the latter is almost a universal problem in this space, I guess).


User space shouldn't care about implementation details like storage primary keys. I don't think copies are difficult either, but I guess some other storage scheme could make them simpler.


> I was worried, because I'm currently working on a tagging system and they beat me to the release, punchy website included.

Tagsistant has been around for ages.. here's it being discussed on HN in 2011:

https://news.ycombinator.com/item?id=2573318


So when is your solution going to be available?


In short: when it's ready. I struggled with some health stuff last year which led to a delay (and also got side-tracked with some other projects that went nowhere). At the moment I'm satisfied the core library (libkoios -- koios being the greek word for 'to query' c:) works but I don't really trust myself not to have forgotten something.

At the moment the program's interface has been worked through, the documentation written up, and I'm just plugging the user-level interface together, so maybe a few days to a week to run through everything. I'm still not comfortable with my application-level testing either so we'll see I guess.


Are you intending to sell this, or make it open source. If the former, good luck with that! If the later, why wait?


It's going to be free software, yeah. I think it's nice to have something to work towards, and it allows myself to be messy with the project or start over without having any ties. At the same time, I do generally believe that software should at some point be Finished (aside from compatibilty updates and maybe one or two ease of life features), so I have the aim of crushing all or most of the bugs I encounter before releasing to the public, even though the public release is technically 0.0 :)

I guess a fun way of saying it would be, software is an artisanal craft so out of respect for what I'm building and for the users, I don't really want them to see something that is obviously imperfect, until I've smoothed those over.

The less fun way is that I don't want the responsibility of someone running it in production, then for things to go belly-up, haha


Please make a Show HN post or something similar when you feel your work is ready to share; I have a feeling that there are a ton of us around here who would be super interested in checking it out!


This is an interesting concept. But why to do this on a file system level? Web and desktop based applications like Google Photos and iTunes offers some cataloging and search capabilities that should cover most of the use cases.

Would this later be connected to something like Spotlight on OSX?


One advantage to a FS over a program is that it gets to be really interoperable for free; pretty much everything can work on raw files, and the unix ecosystem is very good at inter-operating with filesystems as an "API".


Because (select formats of) photos and music aren't the only things that people want tagged. If I want to tag RAW pictures, or TV shows, or Word documents, or assets for a game, or notes, or ...., neither Google Photos nor iTunes can help me.

Moreover, such a tagging system belongs to an OS-wide service; it's not something that should be implemented on its own by every application dealing with particular media types.


Isn't this like asking "why do this in a uniform way, when each application could do it in its own way?"

(Also, (1) even if "should cover most of the use cases" is correct, why not cover all of the use cases?; and (2) the mere presence of more powerful tools can encourage ingenuity to use them in ways that wouldn't have been imagined if we only had purpose-built, specialised tools.)


For me it's simply that I'd much rather not give up my photographs to Google.


Interesting timing. I came across TMSU [0] the other day, which seems to have similar objectives.

[0]: https://tmsu.org/


Yayy! Thank you for the reference, tag my shit up seems more straightforward to me after reading tmsu and Tagsistant. and both use sqlite!


Would also like to point out TMSU [0] which also presents a filesystem-y interface, alongside a quite powerful CLI.

[0]: https://tmsu.org/


Can it import from current file magic? ID3? CDDB? IMDB?

You see, it is a big effort to manually input all this data.

What is useful is reliable, public source entries.

You also want a documented data format for future compatibility. So you can migrate to the next format.

This furthermore gives the advantage that applications can use the metadata. Although I suppose the file system abstraction achieves the same?


Isn’t stuff like this what the ext4 “user_xattr” mount option is for?


Not quite. xattrs are not indexed, so if you want efficient retrieval by xattr tags you have to build/maintain your own inverted index. xattrs are good if you have a file that you want to store metadata about, but not so good if you want to use that metadata for discovery (at least in my experience, maybe I'm missing something, but I've implemented a remote file syncing system using xattrs and I never came across a tool in the unix arsenal beyond find + getfattr which is painfully slow).


I made heavy use of OpenMeta (that was a community effort on OSX, before Apple introduced something very similar), that placed tags in xattrs. It is great! Retrieving never caused any issues or slowdowns.


This is really cool, I've been thinking about making something like this myself for a while! Design seems to be relatively well thought out.

I might be a bit too paranoid of file loss to install it though...


Then just use it with symbolic links?


If you find this interesting, check out Bear with its nested tags, which are executed brilliantly.


Why not just make a macOS compatible system, based on xattrs?


I think tags are a dead end because few users will make the effort to tag their files properly, at all times. Better use the Google approach, and let ML/NLP do the job.


Users will not give their files proper names or put them into folders or tidy their desks or do the laundry either.

That is some users won't do that, but by adapting everything to these users we are making everything worse for everyone including them.


The problem is that everyone is one of these users at some point. E.g. when about to leave the office and receiving an attachment. Or at 3am when solving a tough problem.

Instead of requiring every file to be properly tagged, why not take the Google-approach altogether?


Perfect example. However:

I'm not trying to argue against search, I'm arguing against everyone having to rely on search because ux designers decided to remove every other way of finding files because someone might someday forget to tag the file or whatever the procedure is.


and of course there's the requisite tag cloud




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: