Show HN: Docfd: TUI multiline fuzzy document finder

nico · 2024-04-02T12:32:06 1712061126

This looks great! Thank you for putting it up on GitHub and sharing

Have you tried it with log files? It would be great to have this kind of search when hunting things down on multiple large log files in a server

darrenldl · 2024-04-02T21:07:18 1712092038

> This looks great! Thank you for putting it up on GitHub and sharing

Thank you!

> Have you tried it with log files? It would be great to have this kind of search when hunting things down on multiple large log files in a server

That is an interesting suggestion, and I can see the myself wanting to use Docfd search engine even for line oriented input.

Just to make sure I didn't misunderstand, log files in your context mean things that have one entry per line, and no point in searching across lines, correct?

nico · 2024-04-02T21:17:18 1712092638

I guess it depends on what I’m looking for

But usually yes, most of the times the search case is:

* I have an id/uuid or some text or a piece of some id/text

* there are multiple log files, each for a different process/daemon/app which handle different parts of a workflow

* I have to grep each log file individually and then piece things together to figure out how the workflow went for some usage/user

Usually there is a lot of back-and-forth between the log files and the grep commands to find the info I need, especially if what I’m looking for is not an id/uuid

darrenldl · 2024-04-02T21:38:13 1712093893

I see. Docfd right now should be fine for the id/uuid part, but the "piece of text" can be problematic as right now there is no way to ask Docfd to constrain search to line level.

I'll add a --line-oriented-exts command line argument that defaults to "log", so searching anything *.log will not be cross line boundaries.

Are there other file extensions you're dealing with?

nico · 2024-04-02T22:04:56 1712095496

Usually only .log files, although sometimes the log files might not have an extension and just be in a logs/ folder and the file name might be something like a date+name of service that writes the log

darrenldl · 2024-04-02T22:33:55 1712097235

Gotcha. In the latter case, would file globbing through bash/whatever suffice? Been trying to avoid opening more cans of worms than I need, but if I really do need to add file globbing then oh well : v

EDIT: Scratch that, I'll just add file globbing since I'm that close to covering most use cases anyway.

nico · 2024-04-03T00:41:13 1712104873

Amazing! Love the motivation!

darrenldl · 2024-04-04T14:42:13 1712241733

Just added file globbing and single line search mode, will make a new release when I've added some tests and have used it myself for a week or so.

nico · 2024-04-05T19:57:22 1712347042

Impressive! So cool

mutant · 2024-04-02T14:08:29 1712066909

Is interactive mode the value-ad over ripgrep and ripgrep-all?

I accomplish similar with ripgrep and fzf.

darrenldl · 2024-04-02T20:55:25 1712091325

> Is interactive mode the value-ad over ripgrep and ripgrep-all?

Partly, yes. Though if that's the only part then ugrep would largely have sufficed for me somewhat.

The other part is that the search algorithm of Docfd is very different that of fzf or ripgrep, and some searches are easier in Docfd than the two.

For instance, "(recursive function | recursion)" will match phrases like "function ... is recursive" that might be split into more than one line, but accomplishing that in ripgrep and fzf will take a lot more elbow grease, especially when the search expression gets bigger.

dmos62 · 2024-04-02T15:55:27 1712073327

Wouldn't you struggle to search over docx and pdfs with ripgrep?

darrenldl · 2024-04-02T21:15:37 1712092537

I will add ripgrep-all is great for that purpose (and you can also search inside archives with it if I recall correctly).

adr1an · 2024-04-02T14:24:26 1712067866

Are you writing temporary files to temp or /dev/shm ? I would hate it otherwise. Of course, others my prefer to have RAM used by other processes...

darrenldl · 2024-04-02T21:03:05 1712091785

Depends on how "temporary" we're talking.

Index caches are written to $XDG_CACHE_HOME/docfd if XDG_CACHE_HOME is defined, otherwise written to .cache/docfd (in current working directory). Docfd handles LRU eviction of cached indices for you here.

Piped stdin are stored in /tmp since that's what was handed to me when using the temp file API. I normally think that if the piped stdin are too much to be stored in RAM, then it probably should be saved into a place first anyway. But I am happy to discuss your use case and see if Docfd should be adjusted.

dvfjsdhgfv · 2024-04-02T15:47:27 1712072847

Great work! How to test it out quickly on Windows?

darrenldl · 2024-04-02T21:11:11 1712092271

Thank you! Unfortunately the quickest way to test it on Windows right now is WSL, which is also how I use it on Windows.

I have not spent much time into figuring out how to make Windows builds via GitHub CI, and have not spent time investigating how the PDF viewer, Word invocation code etc behave in Windows.