I love the emphasis on data export / analysis and ownership of your data. I also love the quantified self aspects of this.
However as I've gotten older, I've learned that it's incredibly freeing to at least partially let go of some of your data and not worry if it gets lost or erased.
Tracking and managing all of this sucks up a lot of time and mental effort, that you could be using to progress in other areas of your life. I prefer to focus on the present instead of letting the past occupy so much of my mind.
On one hand, keeping backups of important files and automating that as much as possible is good. Tracking your top 2-10 most important habits (e.g. sleep, workouts) can also be very good.
But for social media sites (like Reddit, HackerNews, Twitter, YouTube, etc), it's just too much of a pain to manage. And they can kick you off in a heartbeat for any reason. I personally treat all content on those sites as potentially being erased at any time.
If I have anything important (e.g. a good post, comment, or image that I want to save), I make sure to save that locally, usually in a personal outliner/wiki that I use called zim-wiki (it's like Evernote/Notion/Obsidian, but it's all just flat wiki-formatted plain text files and folders that are easily greppable and scriptable). You can also post the best content on your own site. None of this requires regular social media exports, APIs, or any programming - you can just copy/paste/save this content as you create it into your own categorized section of your filesystem or PIM tool.
> However as I've gotten older, I've learned that it's incredibly freeing to at least partially let go of some of your data and not worry if it gets lost or erased.
I'm ok with people opting to let it go.
What's uncool is how much effort it takes, how ridiculous it is to opt to do anything other than let it go. To marshal oneself & one's data should not be a huge sprawling saga, fought for tooth & nail every step of the way. Resisted at every step.
This is not an ok condition for humanity to try to live under. As Geohot said[1], we are coming to live in an eternal prison. Providing the advice to let it go, find happiness in these conditions, is what I hear again and again and again on this site, and I'm mad. I'm mad people advocate for giving up agency, will, control, for what feel like the most core genuine & true pieces of humanity. I'm ok with the option, absolutely, but only as a choice. And today there is no choice at all.
The problem that I have is that three years down the track I will suddenly think of a topic and remember just a few details of a really good blog post or comment thread that I would like to read again, but I can never find it.
I'm just not very good at knowing in advance which is the 0.001% of posts I will want to read again.
Storage is so cheap that in my ideal world, I would like to store everything I read and post. But that's difficult as you mentioned.
Hello author, hope you don't mind the negative comments since this isn't for everyone including myself. I found your dedication to organization very impressive and only shows that you'd do the same professionally.
I don't really see them as negative (or at least don't get offended). It seems that people are on a spectrum of organization from not even using a calendar and never taking notes ("if it's important I'll remember") to what I do (and maybe even more extreme??). And it's okay! Some people suffer from information overload, some people suffer from inefficiencies and forgetting things, and you can be productive/unproductive, or happy/unhappy either way -- there are many other factors at play too.
2) They are really well structured and start of with a hypothesis or premise
To elaborate:
1) On the spectrum you mentioned I'm more on the organized side, though not as extreme as you. A lot of my colleagues are really disorganized and they picked up some pointers from me and our collaboration has become much more efficient by them just taking a few more notes and sharing instead of me always starting at 0 (or 1) when we start a conversation. Everybody should find his own comfy spot on the spectrum, but people should try to not make others miserable by relying entirely on external knowledge.
2) Recently I read a few books which follow this structure (which is also the standards for scientific papers) and it's just so refreshing to the usual brain dump blog post. I do enjoy some of these brain dumps, especially if they are more for entertainment and less so for teaching, but sometimes it is exhausting if a blog post aims to answer a question, but it's not clear where it's going in the beginning and you just keep on reading.
Thank you! Glad you like the structure too -- sometimes I wish I could do more free-form writing too, but so far it was more of "have a problem -- convince yourself it's important to solve -- come up with solution -- document a narrative how you did it" :)
So I've been waiting patiently for something equivalent to Home Assistant but for QS and you pop up often. You're probably most expert to say what progress is being made for an integrated QS platform whereby you plop in a new integration and, even if there are a few manual steps (exporting, running a local software tool, etc.) you easily can import, slice on, and chart your data in a time series DB (similar to how with InfluxDB and Grafana you can do with Home Assistant). There were a few open source tools out there on GitHub but the ones I found were dated or didn't seem to have much love or enough of an audience to support them.
OH! I just saw you are the author of HPI which I have starred - it certainly was as close as I found but I pined for more evaluating it about half a year ago. How is progress? What other wise words can you lend?
As for progress, I'm experimenting with automatic InfluxDb/Grafana integration right now, actually [0]! I think ideally HPI would be able to automatically create influxdb mirror + 'reasonable' Grafana dashboards for each data source, by using type information.
And of course, it would also be possible for the user to create custom dashboards (from Grafana, or from python code which would automate the manual Grafana steps).
For something Memex-like, Grafana probably won't be enough. But I'd like to use existing open source tools to the maximum extent possible -- both Influx/Grafana/Jupiter/etc are awesome, well optimized and powerful instruments, and I don't want to reinvent the wheel.
Thank you for sharing this! This genre of software tool is something that has been on my mind since at least 2010. Here* is a quick summary of my idea. I even applied to the first YC Fellowship in 2015 with the idea, but I didn’t make it past the application. I’ve come across many related ideas over years of reading HN (the general consensus is to just use org mode, so far I’ve been strictly a vim user xD), but this post is particularly inspirational!
I got my first job in software in 2017 and to this day I’m growing constantly. Before this first job I struggled to keep the motivation to practice and work on projects. I finally have the abilities to make projects come to life, and I’ll definitely be studying your project carefully as I get going on building one of my own.
*https://web.archive.org/web/20160806063444/http://mindengei....
What's "MindEngei"?
Engei (園芸) is the Japanese word for gardening. I plan to build applications that help us organize our thoughts, experiences, and knowledge in a pleasant way. Too much of what we learn and experience gets sucked up by the applications we used and maintaining that information is either a hassle or impossible. I want to make applications that log everything we want to remember in a simple and relaxing way, allowing the user to pull in data from services, take notes, and document knowledge and provide quick and pleasant ways of viewing and sharing their content.
I find it weird to tell people which exact tools to use.
I also really enjoy knowledge management and I think I would really love org mode once I figured it out, but I just never got around to it. I am a Sublime user and really enjoy it, so I found other tools for my use cases and so far I am happy with the results.
Could org mode provide me more value? Maybe. But it's not the point.
Somebody might be taking more notes than others, because they are used to it, or because they need it, and it's totally okay and people should do what works for them.
If your data was truly yours (data affiliation vs data ownership) this would be even less of a problem.
Hmm, it's hard to quantify it exactly, but overall there isn't that much maintenance required once it's all set up (except for the few manual bits). Let's say it could all run for months without needing any intervention. Maybe one or two data sources would stop exporting due to an expired token or some API weirdness, but the downstream services that use the data will carry on running against older data.
It's possible due to the design: some general 'architectural' decisions [0]: separating the data retrieval from data processing, avoiding databases where it's not absolutely necessary [1], 'defensive enough' error handling [2]. (but naturally I fix small things here and there when I notice just like with other chores)
I do spend quite a lot of time on it, it's sort of a hobby at this point (e.g. to add/connect new things, experiment etc.). But, lately most time was spent documenting, releasing different parts, and writing about it to share with people -- this is much more time consuming (and certainly less fun than tinkering). I'd like to reach a certain milestone and get some rest from it, switch to something else :)
> if you were to start over with a new organizational methodology, what would you do differently knowing what you know now?
Frankly, I'm not really regretting doing anything the way I did it (at least for now) -- most things make sense to me and what I implemented is a 'dependency' for more sophisticated methodologies anyway, so hopefully no effort wasted here. I wish I came up to the principles I described in the links quicker, for a while it was a 'random walk' without clear vision how to compose and connect things.
This is awesome. Unfortunately, I will very unlikely be able to or inclined to go this deep. However, I've been collecting both digital and physical assets that I have used to built up a museum for the family. Digital is easier, physical items are much harder.
It all started way back when I left my home-town for the bustling city of Bombay as the last century was ending. When I visited my home-town after almost a decade, I realized that my tapes, floppy discs, and other artifacts including some hand-written Bollywood actors/actress mails got lost. Ever since, I've tried to hold on to interesting objects.
The floppy disc was particularly interesting as it had some of the first programs I wrote, including a stupid QBasic game that I wrote for my first girlfriend.
This is pretty amazing from an analytical/engineering standpoint.
OTOH, it serves to justify my natural inclination to limit where and how often I participate on sites and/or particular media (e.g. chat apps).
It's not just a matter of data/privacy, either. It's simply a matter of attention, available time, and the best use of that time.
I went from BBSs, to Usenet News, to (mostly) niche sites on the web. The tendency of all of these to focus conversations made them (and make them) more useful to me - better uses of my time and attention - than social media.
I pretty much only comment on this site, and on reddit, and only sporadically on either. I pretty much ignore almost all social media. Deleted facebook. Haven't regretted it for a moment.
I understand that social media provides real value - in certain situations - for people. I just don't need it, personally.
Big fan of yours. I think this is the second time I’ve come across your site on HN. I’ve been getting deep into developing my own Second Brain lately. I glanced briefly at your Exobrain Repo and it seems you’re using a custom system. Have you tried out using https://obsidian.md or the whole Wikilink/Zeitelkasten Personal Knowledge Management methodology? Curious to hear your experiences.
Yep, exobrain is using org-mode export + some tweaks. I've settled on emacs + org-mode a few years ago (perhaps on the fifth attempt it finally clicked). Before it was gitit/Zim/sublime text/random scripts, whatnot. All the shiny apps like Obsidian/Roam/etc have appeared over the last couple of years (and it's awesome!). But I'm too hooked onto emacs now, maybe the only thing I should use more is org-roam [0]. For me the most useful org-mode features perhaps are tags, agenda, org-capture and org-refile. Search is very important, I have a whole post about it [1].
I started a draft describing my 'process' [2], but it's pretty incomplete.. there are some bits scattered across exobrain too.
Also lately I started playing with Logseq [3] to get a more interactive representation of exobrain [4] (warning, it's pretty heavy, needs some optimization and I might have messed up the physics). Logseq could be a great gateway for people who want to get into org-mode but not ready to go 100% Emacs, highly recommend to try it it!
There are moments that I'm tempted to go down a similar organizational road because I'm curious about the possible lessons from the data aggregation, but I usually pass because the payoff didn't seem large enough.
So that makes me want to ask: Have you learned much from the data? or is it more about organization/keeping control of life?
The "quantified self" bit is what you could call learning, but ironically so far I've mostly failed to find any significant correlations in my personal data (which is also an interesting finding, in a way :) ). However I did learn a lot about nutrition/exercise/sleep while tinkering with data, so it was totally worth it so far.
Thankyou so much for sharing your work. Finding this post was like finding something I didn't know I was looking for.
Have you considered monetizing this in some way? As a person of limited technical skills, I would definitely pay for a set of tools that would help me aggregate my data.
- I don't really see a good way (not that I'm entrepreneurial in this aspect though), maybe except for providing some infrastructure etc
- dealing with others' data is a completely different kind of headache with legal and security responsibilities
- I don't really have a product or anything -- more of a vision which I want to communicate, a bunch of patterns, techniques and repositories. Maybe know some esoteric gotchas.
- It's a very niche thing to think about and 'want' in the first place (although I am trying to change it through my communication :) ). Most 'regular' people don't do backups, let alone this!
But it would be very cool if someone comes up with a business model to provide such service.
Can I ask - how did you find working with graphviz. Was this your first time working with it, how long did it take you to figure out how to place things and draw?
I am working with drawing graphs at the moment and evaluating what library to use, its coincidentally nice to see what you built but it also seems like it was thoughtfully laid out (less edge crossings etc). I am trying to do something more dynamic so it may not be as applicable for me.
I am looking at your code as well, thanks for providing that
Not the first time, but possibly the biggest thing I've drawn in it...
There definitely are some weird things when you try to plot complicated things, fighting with weird placement, clusters etc. But not sure if it's me or Graphviz to blame for this. But I don't really know a better tool. If I knew how the diagram would look in hindsight I might have drawn in manually in inkscape or something, but when I started I didn't know what I would end up with, so needed to be an automatic tool :)
Working with graphs, I realized its hard to generate dynamic ones and you inevitably end up with domain specific layouts. I guess thats the nature of graph drawing algorithms, its usually specific to the problem.
Probably too messy to understand what's exactly happening -- but my main takeaway is that if you implement your own DSL (for Graphviz at least), implement in such a way that you can freely mix DSL and raw bits. That way it's very easy to experiment or tweak minor bits without rewriting half of the code.
If you need to be able to diff the code for graphs, graphviz (dot language) is great.
If you need auto-layout, it's hard to do much better than graphviz. The creators of graphviz have some good papers on what it takes to do that right. They'll discourage you from wanting to re-invent that particular wheel, if you read them. Tricky bit here is, depending on your exact graph and constraints, it's not exactly fully solvable in the general case, if you want to have no overlap or no crossing lines under any circumstances whatsoever. Still, you'll struggle to do better, and there aren't actually a ton of implementations out there other than Graphviz that are anywhere near as good (at least, as of a couple years ago).
Templating or building programmatically? Great, it's just text. Generating on demand? Decent, it's fairly fast.
If you need pretty, and I mean really pretty, and especially pretty and interactive... well it's less enjoyable for that, but then nothing that is enjoyable for that purpose is much good at the other points above.
Thanks for that, will try out graphviz. I was doing some custom implementation for minimizing edge crossings and it seems to work fine for me but thats mostly for 2-layered graphs. If I am to expand to multi-layer graphs, its going to be fairly more complicated to figure out how to do that and I dont know if it will work well.
Graphviz seems to be great for getting you a mostly-useful auto layout majority of the time.
I have a requirement to be able to mix fixed-positioned nodes along with dynamically changing nodes, I havent used graphviz to know if it could do that. A library like cytoscape works ok or maybe d3 (I am still evaluating these)
Honestly coming into this, I thought graph drawing would be fairly easy and I could look up online for examples - but it seems to be quite opposite, its really complicated to get graphs that are both dynamically generated with the nodes positioned at the same place in case there are similar nodes
I believe my app (launching soon) is close to solving your "what do I want" but currently wrapping up the API. Are there some other communities, links to other blogs or people that are doing what you are doing? (Tracking this much data, etc.)
For exporting and owning the data the keyword is "data liberation"
Otherwise, the keywords are perhaps "personal knowledge management" and "memex". Have some links here, although they all have different components in scope https://beepb00p.xyz/exobrain/memex.html#cmmnts
Which of these data-streams do you most often find yourself looking at & using? Are there any that you have been surprised to find yourself using more than you expected?
The most useful are perhaps the one's I'm using for search [0] -- chat logs, web annotations, book highlights, reddit/hackernews saves, tweets, etc. It's literally my external memory, I'm not exaggerating.
Everything in Promnesia [1] is super useful, makes my internet lurking much ... efficient? This word has somewhat bad rap, but it is what it is -- doesn't mean I don't have fun, just makes it easier to keep track and discover new things.
Not sure about surprising data sources -- I'm trying to integrate them all as far as possible and make sure they work for me passively, so I don't have to think about specifics :)
Yes, it's complicated and this is exactly one of the points of this map, to illustrate how ridiculous it is -- that I have to jump over all these hoops to do something useful with the data that's already somewhere in my phone/was in my browser, but hidden and siloed (from myself!) for various reasons.
Fair enough. The effort and quality of your work is apparent - well done.
I guess for me, I would have opted for greater inconvenience and simply let some of what you graphed slide, rather than deal with the mental overhead of actually tracking it all. But I would also not have enjoyed building that graph. :)
Thanks :) The thing is -- often it's very hard to know in advance how much time you would spend on something. You think "ah this would be kind of a nice feature to have/question to answer... hmmm it'll probably take only an hour", and then you spend weeks doing that, and something else related, etc. Sometimes you do wish you hadn't bothered in the first place, but then it's also sunk cost so it's tempting to finish.
You know what's crazy -- I started with simply trying to improve my sleep (so exorting data from Jawbone bracelet) and wanting to be a bit more efficient at learning physics (and other things) in my spare time. I did learn some physics (and other things) thanks to all these systems, but certainly went down a massive rabbit hole too :)
Is it really that complicated? The map is pretty much straight lines from top to bottom. This is like saying that the subway system is too complicated. It is simply a series of tubes with stops. The map simply allows someone to view the entire system at once.
If you are doing everything by hand, perhaps. But if you are working at a higher level of abstraction using something like Apache Airflow, it will take care of making the map for you.
I mean, you can also just let the data go. "nothing, chaos" is pretty simple. I don't believe being able to search my chatlogs from 2005 would make my life simpler, or better.
Some people are natural archivists and love that second-brain stuff though, and this is a pretty sweet example :). One area I strongly relate to Karlicoss is how bad it is getting data out of my bank; as a regular non-institutional customer I don't get API access, so I have to download a PDF manually every month and run it through some script to parse out the line items.
I have friends who have died and for whom I’ve lost our chat logs (some back from the AIM days). I also carelessly lost my home dir from college with my thesis and thesis code/data in it. (Tar, gzip, ftp in ascii mode, don’t check for months.)
Is my life meaningfully worse for those losses? No, not really. Have there been several times when it would have been better (even just temporary happiness/comfort) if I had access to the data? Definitely yes.
just because he has passion and conviction about a cause linked to leaving a legacy doesn't mean it follows any... cultishness.
I myself am a bit of an exobrainian but not as well organized or anywh near as far down the part as Karl is. it's a noble cause to not want to dismiss what never was capturable before; conversations between friends and family that we can harvest, replay, or add as a chatbot to have shell versions of deceased loved ones.
I didn't mean "religious" in the bad sense. We all have some topic that incites that part of our brain and makes us hold long speeches about our ideals. On a more positive level, it also gives us the determination to actually follow through with those ideals. His "itch" seems to be gathering data.
If one gathers data for no other reason than it's useful, you don't randomly start talking about the difference between man and animal or how all turns to wind. Those are more biblical sentences.
still searching for my earthseed alike principles for sure but i understand the values, the balance at play. obviously based on highly negatively biased votes & comments, but something i feel more concretely, is that this obvious yet occluded truth still needs better means to be focused upon.
tbh weird times that there are so few religions that have sprung up and shut down in my lifetime. what days that we have an example, one[1]! tech is all about interlink, embiggening of human spirit & potential, which mirrors so closely the interlink of spiritually & religion & essences of man.
I am going to suggest that calling somebody a stupid animal over something so trivial is an excellent opportunity for you to log off and call a distant friend / family member, read a book, play with the dog, etc.
Even Karlicoss would agree that this stuff is pretty small potatoes compared to the vast complexities of human life (99.999% of which don’t exist the internet, even in messy archival form).
What the chart describes strikes me more as a prison than transcendence. Not all complexity is order, and not all chaos isn't simple.
However, much of the wrangling could relate to contractual data preservation (like client communications), which means we don't have enough information to decide whether it's "too complex" for remuneration received. Doesn't mean the grandparent comment is wrong, though.
Having the capabilities, to me, implies nothing. What you do with gathered information is up to you. No one else has any capacity, capability, to gather, to understand, to begin to consider. We all live in eternal, scattered, not just unconnected, but unconnectable, un-understandable forms. The possibility should exist to try to harness our many forms. The digital should be a channel to explore, view, gather, from which understanding might possibly begin. But we have only ephemera, our thought & action owned inside other people's clouds.
I've exported my data from 35 services, most via GDPR/CCPA exports. I've written a couple scripts to automate some, but it's not painless. I hope to eventually have a local-first unified store of my data that's easy to analyze.
As a part of this, I've been working on a Go library [0] for processing local Firefox- and Chromium-based browser data, including history, bookmarks, and extension settings.
To me, it just seems like the author wants to hoard their data, not make it useful. Making it useful (the first part of which is making it queriable) involves making some kind of interface to access it, and some code to parse it. In my own case, I find that just dumping JSON docs around the place gives me an unmanageable mess of unstructured data that I never come back to. If all you want to do is to hoard your data, then I guess that achieves the purpose... But if you want to do something with it, at some point you have to make sense of it.
Doesn't mean you have to go and fully normalize everything and make a production-level database!
Thanks a bunch for sharing this. I think the way we move forward from here is more posts like this. It's an uphill battle where companies are actively adding friction to people being in control of their own data, but together we can do it :)
...Personally I just decided to boycot services that are smartphone-only. My last mistake was Revolut. Even the (arguably non-existing) process of getting a yearly statement from them is a goddamn joke.
I really hope my next phone to be a real Linux phone like Pinephone/Librem (or apparently you can even install Linux on my Oneplus 6T), so I could run android-only stuff in the VM and have a real computing device in my pocket.
Camera integrated with glasses that you can wear for 8 hours without charging + home box that post-processes this into useful archive.
I have a Vuzix Blade (https://www.vuzix.com/products/blade-smart-glasses-upgraded) that could work for MVP.
A dron the size of a baseball with camera that is floating around you and records your actions (it doesn't record every action, but it can recognize certain ones: e.g., reading, running, sleeping). The recorded actions go to a sqlite db. It should be extendable in terms of recognizing actions in 3D (perhaps some ML is needed to achieve better results).
Yep. Even simpler (and less 'creepy'), e.g. OCR against your own screens, in theory, could provide quite a lot of data for free. But not very practical yet, unfortunately?
big "Technology without Industry"[1] resonance! big "Computer Liberation / Dream Machines"[2] energy. we can do & become anything. we can consider ourselves, gather ourselves, understand ourselves, review ourselves, try, strive, in our imperfect ("Everything is broken, and it's ok"[3]) forms to become more.
there are a lot of questions here. a lot of doubts. but few embrace the challenge. that to me is the resounding, deep silence of this era: so many great, lovely, wonderful people all about. but so few who face, head on, the moral quandry of so much of ourselves being owned by corporate data centers, dispersed, un-usable. that's not even the point, the core. rather than highlight the negative, ask: what faculties, what capabilities are gathering for ourselves?
Tim Orielly said back in 2012, "create more value than you capture"[4]. few are focused, as Karlicoss is, so clearly, on creating value. on making is usable, on freeing it from it's many, tight, narrow confines. genuine computing has so few advocates. i think wistfully of the Tim's opening quote,
"the skill of writing is to create a context in which other people can think" - edwin schlossberg
to which tim elaborates:
"a lot of what i've done is to frame things in such a way that other people can see what's important about them"
and none of our digital systems enable this. they all have an egocentric perspective, a monopolistic/platform/mainframe bent, a desire to own, control, grow prowess over the (limited domain of) data that they contain within, and to not let it free. not let it connect. they do not try to open the doors of perception, they do not try to empower and enable. they offer us all a fixed, finite set of powers, and hope to grow by out-manuevering every other monopoly of thought. no big player is out there, saying, let us organize all the information however the users want to see it. no big player is out there saying, we love you, how can we set you free.
the world direly needs the energetic, the hopeful, those who see it all, want to drink it all up, take in a much broader perspective of life & the world & data & online-ness & existence. the digital, alas, has sapped us of the connected hopeful, rather than enable that spirit of infinite endless creativity. mapping ourselves, gaining some mastery over who we have in fact already become, gaining insight into our digital footprints: it's such a logical basic simple first step, such an untapped starting place, that these endless digital regimes have never given us any glimpse of. no one is freer. we all have to free ourselves.
However as I've gotten older, I've learned that it's incredibly freeing to at least partially let go of some of your data and not worry if it gets lost or erased.
Tracking and managing all of this sucks up a lot of time and mental effort, that you could be using to progress in other areas of your life. I prefer to focus on the present instead of letting the past occupy so much of my mind.
On one hand, keeping backups of important files and automating that as much as possible is good. Tracking your top 2-10 most important habits (e.g. sleep, workouts) can also be very good.
But for social media sites (like Reddit, HackerNews, Twitter, YouTube, etc), it's just too much of a pain to manage. And they can kick you off in a heartbeat for any reason. I personally treat all content on those sites as potentially being erased at any time.
If I have anything important (e.g. a good post, comment, or image that I want to save), I make sure to save that locally, usually in a personal outliner/wiki that I use called zim-wiki (it's like Evernote/Notion/Obsidian, but it's all just flat wiki-formatted plain text files and folders that are easily greppable and scriptable). You can also post the best content on your own site. None of this requires regular social media exports, APIs, or any programming - you can just copy/paste/save this content as you create it into your own categorized section of your filesystem or PIM tool.