I have written this before but I’ll put it here again. What I would like to see ...

TacticalCoder · on Sept 20, 2022

> Don’t like the results from one source? Just remove them from your sources, or lower their ranking.

That's basically Usenet killfiles and, yes, I think they're totally due for a comeback in one form or another. Usenet may have had its issues towards the end (although it still exists), but killfiles weren't one of its problems. The simplest one you could just discard sources you didn't want to read anymore but the more advanced you could assign weight/rankings based on various factors (keywords / usernames / if you did participate or not in a discussion / etc.).

arjenpdevries · on Sept 20, 2022

We like Federated search, we like decentralized search, and even P2P search; we are trying to find a good mix, and decided to get started rather than wait! Exciting times.

marginalia_nu · on Sept 20, 2022

What are the benefits from this?

I'm not trying to be dismissive, it's just my feeling from working on search.marginalia.nu is that nearly every aspect of search benefits from locality, not only is the full crawl-set instrumental in determining both domain rankings and relevance signals on a term-level such as anchor tag keywords; but the way an inverted index is typically set up is extremely disk cache friendly where the access pattern for checking the first document warms up the cache for the other queries, but that discount obviously only exists when it's the same cache.

boyter · on Sept 21, 2022

You could get people creating indexes with love such as your own. marginalia could become the de-facto index for long form content. However you probably arent that interested in running the best pokemon website, so someone else could do that.

Enough people add domain specific search endpoints, with perhaps a taxonomy to say "hey send those sort of queries over here" and you have a compelling engine that self heals should someone stop running things, or starts spamming.

arjenpdevries · on Sept 21, 2022

Yes, that is an advantage.

You can also integrate search results for which you cannot have the index, like social media APIs, another reason.

You could also mix and match search results from various topic-oriented indices. That's a research question, whether that is really better than building one unified one. But we think it is the way to bring index fragments to the edge, with the obvious privacy advantages.

hkt · on Sept 20, 2022

I would love to be able to run a node that mirrors part or all of an index like this, and to let people query it - a bit like https://torrents-csv.ml/#/

Good luck! I'll be watching your progress and cheering you all on!

melony · on Sept 20, 2022

What's the point of a federated search engine? At the end of the day most nodes will end up implementing the same regulations/censorship with development driven primarily by a few. It's like ethereum vs ethereum classic all over again. If the EU or the developers' respective governments demand a censorship or forgetting feature to be implemented, it's not like the federated nature would matter. An open source search index is useful, a search engine that can be easily self hosted is also useful. But building a search engine as a federated system is a gimmick with no significant value.

Do you see any major Mastodon nodes interfacing with Truth Social or Gab? I certainly don't. If federation barely works for a social media app, I fail to see how it would even matter for a search engine.

fabrice_d · on Sept 20, 2022

At least one of the partners (https://openwebsearch.eu/partners/radboud-university/) does research on "federated search systems", so there's hope!

cookiengineer · on Sept 20, 2022

Isn't searx what you're describing? I was running an instance for a while, and it's basically a meta search engine that has support for all kinds of providers.

There are also some web extensions available so that you can fill it with more data.

[1] https://searx.github.io/searx/

vindarel · on Sept 20, 2022

I'd say it rather looks like Seeks, unfortunately defunkt: https://en.wikipedia.org/wiki/Seeks

> a decentralized p2p websearch and collaborative tool.

> It relies on a distributed collaborative filter[6] to let users personalize and share their preferred results on a search.

boyter · on Sept 20, 2022

Searx is half of it where it calls out to other searches but does not provide its own index as far as I can see. It also does not remix the results.

cookiengineer · on Sept 21, 2022

If it is about a decentralized index there's also YaCy.net [1] but I don't know how actively maintained the project is.

For me it made more of an enterprise-grade use case (e.g. for building a search for your own file servers or confluence) so I only tested it out a little. It's a huge java project, that's why I decided to go with searx back then...cause yacy was pretty hard to setup.

[1] https://github.com/yacy

asim · on Sept 20, 2022

One of the things I wonder here is if it would be easier to just start by crawling known RSS feeds and then exposing a JSON API for the data and making the whole thing open source. Then keeping a public list of indexes and who crawls what. Eventually moving into crawling other sources but first primarily addressing the majority of useful content that's easily parseable.

boyter · on Sept 20, 2022

That's probably the easiest way I know to get good content into a search engine. Annoyingly however it does not contain all the content available.

googlryas · on Sept 20, 2022

What benefit does federation bring here? Unless it is very simple to set up, most communities are non-technical and probably won't be able to set up their own crawler. I would think just a search engine that lets you customize the ranking algorithm, and maybe hook into whatever ontology they've developed and ranking it accordingly would be sufficient.

viraptor · on Sept 21, 2022

> most communities are non-technical and probably won't be able to set up their own crawler

They can use a solution which already integrates the search. Forums and CMSes are a good target for that. Then you can say "I'd like my search to look at widgetlovers.com too" - and you get their sitemap + featured external links, because they run FooPress that supports it.

Kind of the same as sitemap we already produce for Google.

boyter · on Sept 20, 2022

It can be very simple to setup. Think single binary to run, or lambda to deploy (yes this is possible) with the URL back to it.

I imagine a binary, with a simple Admin UI allowing you to crawl some domains recursively would be enough to index your own website, and then have those results shared.

Where I could see this being really useful, is let someone who knows everything about pokemon provide the index for searching pokemon information. Then when they federate, provide a taxonomy saying "for queries that have these words, call me". Suddenly you have a very high value search source for pokemon.

Throw in some zero click info information boxes and you have added a lot of value.

grishka · on Sept 21, 2022

ActivityPub is not well suited for this application. It's for publishing activities made by actors — hence the name. You'll want to invent your own federation protocol specifically for federated search.

boyter · on Sept 21, 2022

Last time I checked there was something in there for search...

Even so you could base it on activitypub I suspect. It would need to be extended for sure to implement the sorts of things believe would be required.

camel-cdr · on Sept 21, 2022

They've listed "DECENTRALISED SEARCH" as a ongoing project/goal.