Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I wrote a program to convert lines of text into trees (github.com/birchb1024)
333 points by birchb on March 29, 2021 | hide | past | favorite | 53 comments



Maybe this is not super relevant, but my favorite hack is that any tree-like structure like this can be browsed with ncdu. Here's a gist for breaking down redis traffic by command for example: https://gist.github.com/petethepig/0f33c910fb2edad8969a5775e...


Whoa, I didn't know that! This is super useful as a general-purpose tree viewer!

For the record, the relevant commands are

    ncdu -o /tmp/files.json
and

    ncdu -f /tmp/files.json


Neat! Is there some documentation on the format ?



Author here: Often I have to digest log files and lists of Azure resource names. I prefer to work with hierarchies of things, so I wrote this simple filter. Turns out to quite handy, especially when combined with awk and its friends.


Great, very handy! Just FYI `readlink -f` doesn't work on macOS

May I suggest the name txtree?


Sorry macos people, I don't own an Apple. (Not strictly true, I have an iMac rescued from a dumpster, but it's running Ubuntu now). Apparently greadlink is available https://stackoverflow.com/a/4031502 for you.


Very cool tool! Any chance that you're going to add it to Homebrew?




I refer you to the answer I gave earlier.


Ah yes, I was just thinking that this could be useful for `aws s3 ls` and similar.

(Really I just want `tree` on `s3`. There's `s3-tree` but that uses the entire bucket rather than a prefix)


I love this as a post-processor for log-files. We actually have something like this at work. Unfortunately, it's applied to log-files by default. It's terrible: it breaks grep, and I haven't been able to figure a way around that.

Have you found any solutions for grepping on matching lines? Presumably, one could use awk, but awk takes a lot of cycles.


Grepping the _input_ is, like, Standard. Knowhatimean bruv?


Neat. I'll add this to my toolbox.

Somewhat unrelated: I discovered some time ago that the column command (from util-linux) can print trees of hierarchical data (up to 2 levels deep).

From the man page:

  $ echo -e '1 0 A\n2 1 AA\n3 1 AB\n4 2 AAA\n5 2 AAB' | column --tree-id 1 --tree-parent 2 --tree 3
  1  0  A
  2  1  |-AA
  4  2  | |-AAA
  5  2  | `-AAB
  3  1  `-AB
column(1): https://github.com/karelzak/util-linux/blob/master/text-util...


I did not know that. But then it's BSD, right? http://harmful.cat-v.org/cat-v/.


Awesome. You really nailed it. In my experience the output for spreadsheets turns out to be key so good job highlighting that (https://github.com/birchb1024/frangipanni#output-for-spreads...).

One suggestion: it may help to generalize the newline as the node separator. You may already be doing this (my go is rusty) but instead of https://github.com/birchb1024/frangipanni/blob/7543b4ee15ae7... be able to override the "newline" as the node separator, like you've done with the "spacer" param.

What do you do when the same line is encountered?


OK good idea, I was coming round to that myself. https://github.com/birchb1024/frangipanni/issues/5


really cool! But I think you should have called it "birch"! :)

Any way you could add build instructions to the README?


I also wanted to vote for "birch". Apparently "frangipanni" is a nice flower, but it reminds me of an unpleasant (imho) German dessert ("frangipane").



Done


This is really very nice and useful! I read through the examples and thought about something that would be a good addition and then see that “-skip” was just implemented! I can skip (pun intended) using an additional layer of cut or awk because of this.

The detailed examples are great too, showing different use cases and features.

Thank you very much for creating this tool and sharing it.


TIL from the README, you can insert `img` tags into a markdown file, and github will honor the align tag, etc

```

<img src="frangipanni.jpg" alt="A Tree" width="200" align="right">

```


I love what it does for log files. Abbreviated example from the README:

    May 10 03:17:06 localhost systemd: Removed slice User Slice of root.
    May 10 03:17:06 localhost systemd: Stopping User Slice of root.
becomes:

    May 10
     03:17:06 localhost systemd
      : Removed slice User Slice of root
      : Stopping User Slice of root


Definitely very interesting! But I think I'd often end up stuck in the middle of a file with no context about the time of the current line, and struggling to scroll up to capture the parent without shooting past it.

Maybe that could be resolved if it were combined with some sort of GUI / TUI that collapsed subtrees with default (maybe with child count and total anscentor count) e.g. folding by indentation in Vim / Emacs


Seen this about 50 years ago. It was method of saving expensive Teletype ink ribbons. Instead of printing same repeating messages on mainframe log, it printed only those parts that were different from previous line.

And of course empty line was "ditto". Except this wasted paper and was later replaced with "...".


There is also

  tree --fromfile
e.g.

  echo a/b1 a/b2/c | xargs -n1 | tree --fromfile

  .
  └── a
      ├── b1
      └── b2
           └── c

 2 directories, 2 files


In retrospect, it’s not clear to me why I somehow expected this project to be something more literally arboreal. I even got my hopes up further when I clicked the link and saw that photo of plumerias.

Yet, this project is cool enough that I’m not even disappointed.


Um you can get actual trees with graphviz, or my favourite, YeD. I will add an example of that. https://github.com/birchb1024/frangipanni/issues/7


This looks like what I always wanted for normalizing data sets and grepping logs but couldn't articulate. Thank you!


I've been thinking about making something like this for a while, this is great!

I always thought it was weird that "do one thing and one thing well" stopped short of dealing with tree representations on the commandline.


What do you mean? The only reason this didn't exist is nobody did it yet. It doesn't contradict "do one thing and do it well".


For me, the problem isn’t that it doesn’t do one thing well, which is clearly does, it’s that there are no other tools to process trees.

So it fits well with the UNIX ethos of a small tool doing one thing well, but not so much the concept of pipelines, which is closely related.

Obviously not this tool’s fault.


I does take advantage of pipelines, as long as it is on the end. You can use awk, sed, cut, sort, grep and a hose of other tools to massage the data before it gets put into graph form.

To become the front-end of a pipe requires that its or some other hierarchical format become an effective standard shared by a number of tools. It can happen.


Looks nice. Perhaps a next step could be an ncurses program that allows you to fold/unfold the trees at arbitrary places, and select entries to reveal their full path (useful for copy+paste).


I find I think about things a lot in tree structures, documentation, todo lists, technical information. I’m really keen to give this a go in my work. Thanks!


What a beautiful idea. Like the application of quick analysis of ls or logs. Although now we are typically outputting JSON and collecting for Kibana.


did you try the '-format json' option?


  $ find /etc/network | ./frangi.tl
  etc:
      network:
          if-post-down.d:
              wireless-tools wpasupplicant avahi-daemon 
          if-down.d:
              resolvconf wpasupplicant avahi-autoipd 
          interfaces.d interfaces if-pre-up.d:
              wireless-tools wpasupplicant ethtool 
          if-up.d:
              ntpdate wpasupplicant 000resolvconf openssh-server ethtool avahi-autoipd slrn avahi-daemon 

  $ find /etc/network | ./frangi-cheat.tl 
  /etc/network
              /if-post-down.d
                             /wireless-tools
                             /wpasupplicant
                             /avahi-daemon
              /if-down.d
                        /resolvconf
                        /wpasupplicant
                        /avahi-autoipd
              /interfaces.d
              /interfaces
              /if-pre-up.d
                          /wireless-tools
                          /wpasupplicant
                          /ethtool
              /if-up.d
                      /ntpdate
                      /wpasupplicant
                      /000resolvconf
                      /openssh-server
                      /ethtool
                      /avahi-autoipd
                      /slrn
                      /avahi-daemon

  $ cat frangi-cheat.tl
  #!/usr/bin/env txr
  (let (old-path)
    (whilet ((line (get-line)))
      (whenlet ((path (tok #/[^\/]*/ line))
                (canon `@{path "/"}`)
                (pos (mismatch path old-path)))
        (let ((cpos (max 0 (+ pos -1 [sum [path 0..pos] len]))))
          (put-line `@{"" cpos}@{canon [cpos..:]}`))
        (set old-path path))))


  $ cat frangi.tl
  #!/usr/bin/env txr

  (defstruct (node name) list-builder
    name

    (:method equal (me) me.name)

    (:method ensure-child (me child-name)
      (let ((children me.(get)))
        (or (find child-name me.(get))
            (let ((new-child (new (node child-name))))
              me.(add new-child)
              new-child))))

    (:method print (me stream : pretty-p)
      (let* ((old-im (set-indent-mode stream indent-code))
             (old-id (get-indent stream))
             (children me.(get))
             (is-bottom (none children .(get))))
        (unwind-protect
          (cond
            (children
              (put-line `@{me.name}:` stream)
              (set-indent stream (+ old-id 4))
              [mapdo (op print @1 stream) children]
              (when is-bottom
                (put-char #\newline stream)))
            (t (put-string `@{me.name} `) stream))
          (set-indent-mode stream old-im)
          (set-indent stream old-id)))))

  (let ((supernode (new (node :root))))
    (whilet ((line (get-line)))
      (let ((path (tok #/[^\/]+/ line))
            (node supernode))
        (each ((comp path))
          (set node node.(ensure-child comp)))))
    (each ((top-child supernode.(get)))
      (pprinl top-child)))


C version. Maybe this logic should be built into GNU find as an option!

  $ find /etc/network | ./frangi-cheat 
  /etc/network
              /if-post-down.d
                             /wireless-tools
                             /wpasupplicant
                             /avahi-daemon
              /if-down.d
                        /resolvconf
                        /wpasupplicant
                        /avahi-autoipd
              /interfaces.d
              /interfaces
              /if-pre-up.d
                          /wireless-tools
                          /wpasupplicant
                          /ethtool
              /if-up.d
                      /ntpdate
                      /wpasupplicant
                      /000resolvconf
                      /openssh-server
                      /ethtool
                      /avahi-autoipd
                      /slrn
                      /avahi-daemon

  $ cat frangi-cheat.c
  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>

  int main(void)
  {
    char old_line[FILENAME_MAX] = "", line[FILENAME_MAX];

    while (fgets(line, sizeof line, stdin)) {
      char *p = line, *o = old_line, *nl = strchr(line, '\n');

      if (nl)
        *nl = 0;

      while (*p && *o) {
        char *op = p;

        if (*o == '/' && *p == '/')
          o++, p++;

        size_t lp = strcspn(p, "/");
        size_t lo = strcspn(o, "/");

        if (lp == lo && !strncmp(p, o, lp)) {
          p += lp;
          o += lp;
          printf("%*s", (int) (p - op), "");
          continue;
        }

        p = op;
        break;
      }

      puts(p);
      strcpy(old_line, line);
    }

    return feof(stdin) ? EXIT_SUCCESS : EXIT_FAILURE;
  }
Note: yes, we could swap pointers between two buffers instead of strcpy.


New version:

- swap pointers to flip buffers instead of strcpy

- handle corner case of directory printed after contents (find -depth) by avoiding printing nothing but spaces, or nothing but spaces followed by a slash

  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>

  int main(void)
  {
    typedef char buf_t[FILENAME_MAX];
    buf_t buf[2] = { "" };
    buf_t *pline = &buf[0], *line = &buf[1];

    while (fgets(*line, sizeof *line, stdin)) {
      buf_t *tmp;
      char *l = *line, *p = *pline, *nl = strchr(l, '\n');

      if (nl)
        *nl = 0;

      while (*l && *p) {
        char *op = l;

        if (*p == '/' && *l == '/')
          p++, l++;

        size_t lp = strcspn(l, "/");
        size_t lo = strcspn(p, "/");

        if (lp == lo && !strncmp(l, p, lp)) {
          l += lp;
          p += lp;
          continue;
        }

        l = op;
        break;
      }

      if (l[0] && (l[1] || l[0] != '/'))
        printf("%*s%s\n", (int) (l - *line), "", l);
      else
        puts(*line);

      tmp = pline, pline = line, line = tmp;
    }

    return feof(stdin) ? EXIT_SUCCESS : EXIT_FAILURE;
  }


I love your enthusiasm, and thanks for the code. But 1) the above only works with sorted input and 2) adding it to find that would be like adding '-v' to 'cat' which as we know is considered harmful. http://harmful.cat-v.org/cat-v/


Actually, what it works with is a partially sorted input: an input in some tree-order input, which is what any recursive file system traversal will always put out. It will work with depth first or breadth-first order, with usefully different results, and that order preserved.

All it is doing is hiding the redundancy with spaces to improve the human readability of find's output.

Here it is on the same data in breadth-first. Note the lack of any lexicographic sort: "interfaces" is flanked by "if-down.d" and "if-pre-up.d":

  /etc/network
              /if-post-down.d
              /if-down.d
              /interfaces.d
              /interfaces
              /if-pre-up.d
              /if-up.d
              /if-post-down.d/wireless-tools
                             /wpasupplicant
                             /avahi-daemon
              /if-down.d/resolvconf
                        /wpasupplicant
                        /avahi-autoipd
              /if-pre-up.d/wireless-tools
                          /wpasupplicant
                          /ethtool
              /if-up.d/ntpdate
                      /wpasupplicant
                      /000resolvconf
                      /openssh-server
                      /ethtool
                      /avahi-autoipd
                      /slrn
                      /avahi-daemon
My first program in TXR Lisp in the grandparent comment builds the tree structure from the paths in any order and then prints that, so the paths could be scrambled into random order, yet it will recover the tree structure.

However, not munging the the output in that way and just tweaking the original output for readability has some advantages


Exceptional!

I am a heavy user for `find <> | xargs grep` -- this makes my life so much sweeter. Thank you @birchb!


It's like super enhanced tree command! Very cool!


Super cool, yo! Very unix – do one simple thing well.


This looks awesome, great job, thanks for sharing.


This is a really wonderful idea! Thanks, OP.


[flagged]


You've been breaking the site guidelines badly. If you keep doing this we will have to ban you.

Please review https://news.ycombinator.com/newsguidelines.html and stick to the rules from now on.


Yes, sorry


No worries mate. Your point is basically 'What a fuss over a very simple bit of code.'? I'm as surprised as you are.


> Be kind. Don't be snarky. Have curious conversation; don't cross-examine. Please don't fulminate. Please don't sneer, including at the rest of the community.

https://news.ycombinator.com/newsguidelines.html




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: