Hi all, I started experimenting with this new approach to finding valuable content online after I got frustrated with traditional recommendation systems leading to echo chambers and all sorts of other nasty failure modes listed in the write-up. Unlike traditional recommendation systems, the lexiscore uses a language model which has access to your written notes. It estimates how surprising different content items are for you based on the model's perplexity in reconstructing the texts. This way, you can quickly tell which articles are likely to be too boring or too challenging and narrow in on the sweet spot of balanced skill and challenge.
I'm really glad you like it! It should soon get its biggest update yet, related to sharing features / learning in public. More details at the end of this last article: https://paulbricman.com/reflections/thinking-in-public
This looks really interesting but the landing page is dense. Tough to determine what this actually does (quickly). I wonder if there is an opportunity to offer a condensed introduction ahead of all the detail? The README is a little better but still not totally clear.
Side note — the first link in the README 404s on GitHub.
tldr: You add content in a self-hosted web app (e.g. as RSS, PDF, EPUB...), and it measures how interesting it is by trying to reconstruct it with a GPT-3-like model and seeing how predictable it is.
Indeed, the page is more of a write-up than a landing page. Also, broken link fixed, thanks!
Nice. Got it. A demo instance would also be helpful in this regard. I always appreciate self-hostable tools and for that use case a demo is super valuable for getting a quick understanding of “what it is” as well.
This is amazing, I can’t wait to try this out. It’s insane how much brain bandwidth is required to optimize the internet for a positive impact rather that whatever the heck we have now.
Thanks for the kind words! I totally agree, I feel we'll look back at how we currently find online content in a few years from now and wonder how could we possibly be okay with it.
> Additionally, if one was to ponder how nutritional guidelines for content consumption might sound like for the public at large, they for sure wouldn’t advocate for reduced memetic variability across society, but would encourage individuals to independently stray off course once in a while, avoiding reaching a local optimum on a societal level.
This doesn't feel correct to me. In the analogy of avoiding a local minimum you are increasing the variability slightly so you don't get trapped, but the reason you get trapped is that you're aiming "higher", i.e. the overarching goal is to move from semi random starting points towards 'better' in the terms of the current metric.
So this appears to be talking about a plan for doing one thing, while claiming to do the opposite.
Though the fact that it's personalized, may actually provide this, while also defeating the other purpose (i.e. it could lead you deeper into a flat earth rabbit hole by finding you the exact content best placed to meet your need for more challenging flat earth content at that point).
Not sure how it can achieve both goals at the same time.
Thanks for the thoughtful reply! If I understood correctly, you argue that the lexiscore might still lead you down a rabbit hole, even if through ever more challenging ideas, rather than helping you find nutritious content outside that echo chamber.
If you're really "skilled at" or fluent in the flat earth perspective, then content which presents a wildly different perspective on the same topic would get labeled as highly nutritious (or at least this was the intention). I understand your concern, and see how "advanced" content from the same ideological background might make it there, though I think there must be a point where you can't get more advanced at it, following which related content will fall in the boring sector, leaving the conflicting takes in the sweet spot (in theory).
It's also relevant to highlight something mentioned in the discussion towards the end, that actively moving across the diagonal channel (i.e. consuming both challenging content in a familiar topic, but also accessible content in unfamiliar topics) might be a further improvement, though this is not explicitly implemented in this initial version of the thing.
It could work if the extension called on a server to do the heavy lifting of reconstructing the text with the language model. I have a feeling that trying to do it all in the browser today would essentially mean implementing non-trivial parts of HuggingFace's transformers in Javascript + tensorflow.js or trying to compile PyTorch to WASM or something. Not the most enjoyable tasks, eh..