Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Stamp turns a folder into a plain text file and a file into a folder (github.com/treenotation)
195 points by breck on Feb 7, 2021 | hide | past | favorite | 87 comments



This brings back memories of an ancient program called "patchy" that I encountered during my PhD.

Fun story, I was one of the last students doing his PhD at the Tevatron, a large particle accelerator near Chicago, before everybody's attention moved to the LHC. Our collaboration was winding down, and close to the end of my thesis I realized that there was nobody producing the simulations that I needed anymore, so I had do it all myself. The first step was building the software. One program I used has been maintained by the same guy since the 70s, and you could see the accretion of layer upon layer as new scientific models got added to the program, but it was never rewritten. The code itself was in glorious FORTRAN 77, and to compile it you needed the aforementioned patchy. A patchy file is a plain text archive of all your source files much like in OPs stamp. In addition, you can add directives to each file to do a primitive form of conditional compilation (e.g. to include certain models, or to run on AIX).

I really wanted to use this one program because it could simulate things no other program could do. But the biggest hurdle was actually compiling patchy, which required a specific version of CentOS, CERNLIB, two decades worth of patches on top, and a crazy bootstrapping procedure. I especially recall a manual for patchy, which proudly talked about laying a software foundation for the soon to be built "superconducting super collider" in Texas (which was cancelled while I was still in elementary school). The episode made me realize how deep the stacks are sometimes behind the legacy software that we use.


Then a bit of software for the SSC was used to analyze radio astronomy data for a decade or so.


Interesting! Do you have a link to any paper or page about patchy?


Unfortunately nothing beyond what you find on Google, I guess a lot of the documentation was internal.


Another alternative for something like this would be MIME multipart/mixed (https://en.wikipedia.org/wiki/MIME#Multipart_messages).

That allows a text file to contain several other files. Plain text files can be included as-is (as text/plain), and binaries can be included (as application/octet-stream). Each file can be given a name through 'Content-Disposition: filename="foo.txt"'.

It can be hand-edited if you're reasonably careful. (Use boundaries, not Content-Length.)

And if any of the files would have been mergeable (with Git, etc.) as a separate file, it should also be pretty mergeable as a section of a MIME file. (Because it is included literally, and because presumably you give each one a unique filename, so the merge/diff algorithm has a unique Content-Disposition line to work with for each file.)


Would love to accept a PR adding a link to that.

There really should be one. Thanks!


Cool seems like unix `cat` command with addition of storing the filepath (so maybe like `find`?)

While this of course sound like a cool project and my comment might be the most Hacker News-y thing [1] I think it's aimed at developers and so it fits.

I don't agree with the zip downsides listed in the repo since firstly `unzip -l` exists to see what's in a zip and secondly the argument with reuploading the zip doesn't work in favor of a different format since you also need to update it in a remote when changed.

Secondly I am kind of amused about this format when `touch`, `mkdir -p` and `echo` are available in every POSIX compliant system which can be combined to create a nice coherent shell script which without dependencies would cover the functionality of this project as far as I understand.

I don't want to sound condescending but I'm a feeling a little left-pad [2] on this one.

[1]: https://news.ycombinator.com/item?id=9224

[2]: https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how...


I see this as a declarative vs imperative approach. The interface is a bit barebones and heavy right now by needing to fire up node and run the functions. But like declarative vs imperative DevOps, the declarative nature of the resulting format seems like it can carry a lot of the same benefits.

POSIX scripting and other tools are a viable alternative as well. Though despite knowing all of the commands you’ve mentioned, and regularly sharing/applying git patches, I think I would still have difficulty renaming FileA to FileB in a tarball or patch prior to unpacking or applying it.

The declarative approach seems like a nice feature that greatly reduces the cognitive friction there. But as a single feature, it’s not clear whether it’s worth a tool change.


I prefer to use shell scripts for that, this works pretty well and is very readable:

    cat >README <<'EOF'
    This is a README
    EOF

    # This is a comment
    mkdir -p myProject
    cat >myproject/index.js <<'EOF'
    console.log("Hello world")
    EOF
it does not require any tools to extract either. (and if you omit quotes around EOF, you can do substitutions)

    MY_EMAIL=put-your-email-here
    cat >config <<EOF
    email=$MY_EMAIL
    EOF


I use this pattern a lot along with a tool I built for doing server deployments and administration using plain old shell scripts and ssh (golem: https://github.com/robsheldon/golem/).

There are two caveats:

First, if there's any chance at all that the heredoc may contain a $, or a `, or possibly some other shell-magical characters, then you have to use a single-quoted heredoc:

  cat <<'EOF'...
This means that if you want to do variable interpolation, like you're doing, then you need something that looks like:

  cat <<'EOF' | sed -e "s/\\\$username/$username/g" -e "s/\\\$my_email/$my_email/g" ...
It looks yucky and unwieldy at first, but I've found that it's nice to be able to see at the top of the heredoc exactly what's getting replaced and what values it needs.

Second, if root privileges are required to write the file, then you need to use `tee`, because you can't sudo an output redirection:

  cat <<'EOF' | sed -e "s/\\\$username/$username/g" | sudo tee /path/to/file >/dev/null
After using it for a while, I've found I really like this pattern for managing configuration file templates.

You're right though that something like Stamp could be built using standard shell tools if someone were so inclined.



It seems very comparable to "shar"...shell archives.

https://en.wikipedia.org/wiki/Shar


shar is specifically mentioned in the footer, with the drawback that it's really an arbitrary shell scipt. ptar is also mentioned, and seems rather nice. It's also way better documented[0] though with glaring holes e.g. how does it deal with non-UTF8 file content (it's not clear whether file size or delimiters is relevant, and why you'd have a closing delimiter if the filesize rules), it also specifies file names as UTF8 or ASCII, neither of which is sufficient to handle the full breadth of possible file names.

[0] https://github.com/jtvaughan/ptar/blob/master/FORMAT.md


There are file names that can’t be expressed as UTF8? I’m intrigued.


Posix file names can be any sequence of bytes other than '/' and 0


I guess that's true. I suspect the support for non-UTF8 names in modern tooling is very, very spotty, given how many config files and file formats that refer to other files use UTF-8 themselves. E.g. can you refer to one of these names in an nginx config? (just an example; I have no idea if its config is UTF-8 or not)


I have a collection of non-utf8 and other problematic files to test such tools:

https://github.com/benibela/nasty-files

You probably cannot clone the repo on Windows. It works well on Linux. But in KDE you could not delete it after checkout


Also, filenames with one or more newlines bork a lot of Linux software.


And Windows file names can include unpaired surrogates, which are not allowed in UTF8 (that’s why WTF8 exists).


The main benefit, in my mind, over zip/tar is the built-in parameter substitution.

You could imagine using this as a development dependency, to standardize the creation of new (anything) that follows a predictable pattern. Check in your stamp file, and any junior dev who creates a new [anything] in your project can get all the custom boilerplate and best-practices right away.


So like `sed` or `mv` ?


Yes, the only real advantage over zip files seems to be the parameter substitution. I’d rather build that on top of the zip format, which supports extension by custom fields, and provide a wrapper around zip/unzip that adds the substitution functionality. Users of regular zip could still use the zip files then, just without automatic substitution.


Zip unpackers usually spill files into already populated directories or create double foldered contents, depending on how you prefer to unpack them: <here> or <into zipname>. The only app that does the right thing is The Unarchiver from AppStore. All other unpackers on all platforms force you to look into the zip file before unpacking. This is annoying AF.

I can’t tell from the article, but if it allowed “unstamp -“ and then simply copy-pasting into stdin from a site, that would be great.


> Cool seems like unix `cat` command with addition of storing the filepath

Like `patch`, then?


Right, I didn't even consider that we've gone full circle to git.


I think the main takeaway over zip is that with zip you need to actually run “zip” before committing whereas here you can declare it in “code”. I don’t see the benefit over shipping a skeleton directory though.


> Secondly I am kind of amused about this format when `touch`, `mkdir -p` and `echo

I actually think I added the ability to "compile" a Stamp file to bash. I at least remember starting that, don't think I finished it.

I think the important thing here is not the library and node stuff, but the simple declarative format, which could be implemented in any language.

Though of course JS and Node are very popular, so I will do my best to make this work great.


Why don’t you write it with posix only tools then? It’s a great idea that didn’t exist yet. Who cares how the developer made it? I’m sure if it takes off people will rewrite it in go and rust with standalone binaries and the HN crowd will love it. I’m pretty sure people use the tools they’re most familiar with. Bashing people for their choice of tool is a popular comment on HN but doesn’t accomplish anything.


It's not bashing a tool per se, sorry if you read it as such. I was pointing out that with the abundance of tools we have at our disposal this project sounds to me like a solution looking for a problem.

As for the solution not existing yet - as people pointed out there is tar, zip, git, and shar which all seem to accomplish the same thing


Did you read the very thorough rationale provided by the project page, complete with pro-cons for many other methods?

I have no idea how someone could read such a thorough README and say "this sounds like a solution looking for a problem" with a straight face.


Yes I read it and I don't agree with the posted downsides for most of them, since the tool itself seems to share the downsides with the posted pro-cons of other solutions and doesn't seem to solve any problem it promises to do.

Is visibility into what's being created a problem?

The listed problems like "everytime the template changes, author has to rezip and reupload the folder" don't seem to be solved with stamp to me, it seems like the problem is now shifted to controlling a version of this new format and still distributing it somehow (probably through git) at which point you ask yourself why can't the git repo already have the structure? If somebody wants to rename something they do this really easily, no need for var substitution.

If there was a migration mechanism that would move files from an old template to a new one I would see value added.


The difference is you can directly edit the file in a text editor (possibly even in the github ui) instead of needing to unzip, edit, and rezip.

It makes find and replace across all files trivial with your editor, whereas that would be less fluid with a zip.


Regarding the project templates, did you look at Cookiecutter [0]? There are over 5k projects that GitHub finds that use it [1]. It solves the “custom utility” problem, since it can be used with multiple templates. I can just install `cookiecutter` and immediately use any of the 5k+ templates, and I get variable substitution, and some basic logic in the code. I also don’t need any special structures — I prefer 4-space tabs, and to write or edit the `stamp` file manually, I would need to remember how to tell vim to disable all indentation support, since I’d have tabstops at 1 (the `data` line), 2 (top-level code), 6 (one level of indentation), 10, …

[0]: https://cookiecutter.readthedocs.io/en/1.7.2/ [1]: https://github.com/search?q=cookiecutter&type=Repositories


Cookiecutter is nice but it requires an entire python install to run, which is a big thing to ask for some of the scenarios mentioned by the tool creator (like someone going through a simple learning tutorial which might not even be using python at all).

IMHO gomplate is a nicer alternative that's just a single static go-based tool that can do everything cookiecutter does and a lot more: https://github.com/hairyhenderson/gomplate


Stamp requires node.js, which is an even bigger ask than Python considering, unlike node, Python is often bundled with your Linux/UNIX base install.


All Linux distros ship with Python. macOS installs it with the developer tools (which you’re likely to need anyway). Windows makes it very easy to install (type `python` in cmd, you’ll get sent to the Microsoft Store).


I don't think I've seen that! Thanks! Happy to accept a PR with a link to it.

The "template package manager" sounds very cool. Perhaps Stamp as a format for the CookieCutter Package manager would be a good combo. Not sure haven't taken a close look, just thinking out loud.


As someone who does a lot a "roll your own" programming as a scientist, I never got the appeal of overly nested directories in web dev and other fields, it feels a lot like it make things more complicated at least for small projects like the very example they show here. Might as well just have a single directory for three files.

Reminds me a lot like OOP examples in tutorials that make a class that only has two methods, too much boilerplate. Like OOP it becomes useful for large systems (GUI libraries) but it's overkill when you have less than ten files I think.

EDIT: too late to edit but what I mean in the end of the last sentence is directory structure, not OOP, that less than ten files can be in a single dir without confusing people (or me at least).


> ...I never got the appeal of overly nested directories in web dev and other fields...

This is often a side effect of many competing interests trying to cram their incentives into a single ontology and then taxonomy. No strong consensus (or a lack of highly opinionated direction) to shape a single concise ontology and taxonomic implementation, so Design By Committee creeps into the decision-making, and creates enough branching to satisfy the "gotta catch them all"-itis to capture all the requirements.

That in turn is usually a side effect of the business not knowing its domain sufficiently well to articulate to a granular-enough detail the implementation priorities, and improperly punting that decision to the development organization. "We need this requirement for sure that will as a result prioritize the domain under this ontology. No, wait, we need that. No, wait, we need both even though they are overlapping but mostly disjoint ontologies, and in some places contradictory...."

Which finally, is usually in turn a side effect of people leaders managing by KPI's instead of leading teams, to produce results that happen to affect KPI's as a happy side effect. Which is probably where the "leadership that comes up through the ranks is best leadership" comes from, because we often do not find leaders who embody that "lead the teams not the KPI's" characteristic without the deeply-internalized knowledge of the domain acquired by such arduous, time-consuming work over years and often decades within the domain.

It's a tough abstraction stack to deal with for everyone involved. There are good solutions, just not quick and cheap ones of course.


A directory per project is a good rule of thumb. I generally avoid directories for file "Types", thats what file extensions are for. Also:

- Adding a directory to an otherwise flat project at least 2x the complexity, so wait until there's a good reason

- Every directory should have a readme. If it doesn't have a readme doesn't deserve a directory.


What about files with different encodings? Which of them is used? UTF-8? Same question about line endings

And how does it deal with files that have strict whitespace requirements like python or Yaml? Can it reliably restore each tab and space?


If you do not want to install a npm package for this you can use

  tar -cv file | base64
It also supports compression!

  tar -cv --gzip file | base64
Sadly it is not human readable, unlike this utility.


Tar files are human readable - file names, sizes, mtime and so on are all in ASCII text numerals, immediately preceding each file's content.

Maybe not human writable, there are some \x00 bytes used for padding.


Interesting, I would really not expect sizes and time to be in ASCII. I wonder their reasoning was.


Debugging text is easier and the bulk of the data is binary anyway, so there would be no great file savings with a "pure" binary protocol. That's the same rationale for HTML and text-based internet protocols -- they are easier to write and debug.

This is a particular concern when you're talking about a file format that has to work on wide variety of architectures and operating systems. Having a common ASCII encoding makes it significantly easier to build an interoperable file format.


If it needs to be plain text there shar command (shell archieve).

It was used to send text file attachments.


^ this

I remember telnet-based installers back in the early 1990s that used the pipe-to-sh truck along with shar archives to install things like IRC clients onto your workstation.


"uuencode" was also used in the Usenet days.


This is pretty cool.

Expanding a bit this would be great in an IDE, Often I have wanted to simply select a bunch of files and edit them at once (maybe with a special temporary comment between them) as a single file then save the changes back to the individual files. Would be nice if that was a right click context menu item or bindable command.

In C(++) land I could see it being extremely useful for h/c(pp) juggling if the IDE automagically did the combining.

Finally being able to write the comments that separate the files as I type would be really nice when prototyping out new code you want all together before you split it up.

Also perhaps if its common enough certain meta files that describe link other files together could be left as shortcuts in the repo. When you open those the files (or portions of files) they reference are opened for editing.


I built this a long time ago:

https://github.com/xixixao/many-to-one


I've never seen that before, looks really neat, I'll have to try it out. Thanks.


Interested but the Demo gif says "This Content is Not Available".


> Often I have wanted to simply select a bunch of files and edit them at once

DirEd (Directory Editor IIRC) is one thing I've seen sort of do this. https://www.gnu.org/software/emacs/manual/html_node/emacs/Di...

I'm really surprised that idea hasn't been doubled down on in IDEs. I still use a handy little Mac App called Rename It to edit a bunch of file names at once, because DirEd I find a bit clunky. I've taken some stabs at building this kind of editor, but it's definitely hard to nail. At least for the Tree Notation web ides we can get it added eventually, perhaps it will be easier now that CodeMirror 6 is out (which I use for the fancy syntax highlighting and all that in the current Language builder IDE).

But I 100% agree with your ideas here. If something like this (either as one scrollable buffer, or perhaps a bunch of buffers in a 2-D spreadsheet like interface), ends up being the primary IDE view in 10 years, I would not be surprised.


Emacs + Org-mode can sort of do this using org-babel. A lot of people use it for literate programming, or even just for documenting their own emacs configuration.

I'm not sure if there's already a plugin or script that can take a selection of files like you mentioned, but it shouldn't be hard to write one either.


It would be interesting to generate HTML as an output. This way it could inline text files, images (data URI), videos, audio and other files as <a href="data:application/octet-stream;base64,... download="filename.ext" ... You could get mime types from file(1) not to put octet-stream everywhere.

Basic version wouldn't even need a single JS line.

Sounds like a good little weekend project.


This is a really cool idea!

If you make it let us know. Would be cool to check out.


This seems to be one of the project of Tree Notation [1], which was discussed in the context of Dumbdown, a Markdown alternative [2].

[1]: https://treenotation.org/

[2]: https://news.ycombinator.com/item?id=25848204, https://news.ycombinator.com/item?id=20856525


Tree notation looks fun... I was reading what I think is the spec (https://github.com/treenotation/blog.treenotation.org/blob/m...)? I honestly can't make quite heads or tails of it, but I do get an sense that giving cells 2D size is important. Then I looked at the language examples and... none of them seem to really use this idea of cell size??

Am I missing something?


> Am I missing something?

Nope! That's a keen observation.

The 2 and 3 dimension stuff is still coming. That's when Tree Notation starts to get really interesting.

Here's one recent example of a 2-d thingy:

https://www.youtube.com/watch?v=vn2aJA5ANUc

The beauty of this stuff as it gets going as all of these seemingly simple langs will be able to take advantage of the 2d and 3d stuff.

I always stretch myself thin, but I'm starting to see other people do really cool things with this concept of 2 and 3 dimensional languages, an area of research that has been really quiet for ~50 years. If anyone is interested I'm telling ya I still can't see the limit! :)_


Maybe I missed it: What happens to binary data?


Good question. Base64? I forget if I added support for that.


I am sure it has been fun to do. But I would leave it at that.

I don't understand the use cases at all. The downsides you mention totally misses the point because it lacks to mention every upside.

In my world, if I wish to share anything remotely programmable it goes into a Git repo.

I don't use zip for anything else that if I must share something over an email.

What am I missing here?


Oh, this looks similar to my "motllo" project, [1] (and so many other projects, mine wasn't the first either). I have variable substitution, but no additional logic. For me the point was having a "readable" representation of the template.

[1]: https://github.com/rberenguel/motllo


Nice! I love the animated demo you have there.

I'd be happy to accept a PR adding a link to this.


Oh no need to, it's not that related and I think both stand out well on their own. As for the animated demo, it was with asciinema [1] (I think, it's usually what I have used in the past for this). Thanks for your good work!

[1]: https://asciinema.org


Interesting approach, I like it.

I’m bemused to observe that even though you include a section describing alternatives, there are still the usual number of comments saying “why not just use x” in typical HN style.

I often want a simple file based templating system, and this is a nice example because it’s closer to being a declarative standard that could be reimplemented in various languages and for any platform.


I have a couple questions:

Why should I use your tool vs. a one line bash script for creating a directory tree?

Why would I keep any file content in the stamp file, in the version control history, if I can keep, say a bootstrap repository with some "templates" and check them out without history (git archive, git checkout-index).


> Why should I use your tool

I don't think you should use my tool.

I do think you should use the Stamp pattern, however.

It's solid.

Not too much code to write a Stamp Reader/Writer in your lang of choice.

P.S. Of course if you do use my tool, I would be more motivated to make it better, but really I think the idea is the important thing to use.


Its seems a kind of relation, where 'parent directory' can be any string e.g. a file name. Cool. But I'd much rather just have a relational way of looking at my data objects and metadata, without this ancient, obsolete notion of 'folders'.


I'm wondering what OS you're using where folders are "obsolete". It feels like you've missed the use case here.


I've watched OSs come and go. They all make about the same mistakes - including a file system as part of the OS (instead of just another service). Organizing it around directory-parent directory-file.

I'm just an old guy, moaning about how little OS technology has changed or grown in 30 years.


The trees vs graphs debate is one that constantly comes up.

I'm obviously a tree guy, but of the things that I am most uncertain of, this is one of them.

To me it seems when you dissect things, you can always make a tree, and trees are simpler, therefore everything is a tree.

However, trees don't come to life until you have motion/time, and I could see maybe how the graph is the ultimate data structure.

You can't dissect graphs into 2 dimensions like you can trees however, given the constraint that wires cannot cross. You can create models of graphs, but not arbitrary graphs themselves. Whereas there is no tree that you cannot create in 2 dimensions.



> *The GitHub method - create a repo for your template and have people git clone it

>> Downside: no way of doing variable replacement post-clone

I handle this by having a setup script that does a search and replace in the project.


I think diff & patch already serve us well for a few decades.


I never thought about using diff+patch as an alternative to tar, but I just tried this and it does work as expected:

    mkdir dir1 dir2 dir3
    for x in {1..99};do echo $x > dir1/$x.txt;date >> dir1/$x.txt;done
    diff -urN dir2 dir1 > dirs.diff
    cd dir3
    patch -i ../dirs.diff -p1
After that, dir3 has the same contents as dir1 had. I couldn't figure out how to make diff consider all files in a directory as new though, without having an empty directory to compare it to.


As you already use GNU diff's `-N` then you can pass any non existing as the first directory:

    diff -urN /dev/null-non-existing-dir dir1 > x.diff
Or use `mktemp -d`. It is still cheaper than investing into some niche tooling without groundbreaking capabilities.


Thanks! I didn't realize you could pass in a non-existent directory, that makes it an easier trick.


I would be open to a PR adding a link to a good diff/patch way of doing this.


This is very much a solved problem.

Wierd people can think this is something that the grey beards didn't ever think of between 1970-01-01T00:00:00 and today.


[flagged]


Instead of all the snark you should have just included a link:

https://en.m.wikipedia.org/wiki/Shar


I generally am against snark, but I think that in this case it communicated something quite useful about how we can go through the same thought process that resulted in a tool that has existed forever, and not realize we have just reinvented it.


They mention shar in the readme:

> shar (https://en.wikipedia.org/wiki/Shar) - old school method that has the major security flaw of potentially running any shell command


Incidentally, this is exactly the behavior on which this project relies:

> #! /usr/local/bin/node --use_strict /usr/local/bin/tree

This has additional dependencies. I don't see much of a benefit here.


Disclaimer: I've never used this tool, today is the first time I've heard about it.

Reading the shebang of a stamp file is a lot easier than scanning a bash file for any sneaky obfuscations (including any custom logic that the author may have "helpfully" included).

Alternately, the security-conscious can execute it directly as `node --use_strict /usr/local/bin/tree my.stamp`.


> Reading the shebang of a stamp file is a lot easier than scanning a bash file for any sneaky obfuscations

This.

However, the feedback in this thread is right.

Looking back over the past few years, I almost never used stamps as executables.

Generally I always use it as a library, or via something like `unstamp someStampToExpand.stamp`. So the shebangs and executability of stamp files were stupid, and added a lot of complexity for almost no gain.

I've just gone ahead and removed them.

I think now it should be a little clearer how easy it is to write stamp functions in other langs.


> has the major security flaw of potentially running any shell command

Stamp has the major security flaw of potentially running any node.js module, which in turn could then invoke any shell command.

Stamp has also seen more production usage as opposed to Stamp which is far from battle tested. So who knows what exploitable bugs lurk inside.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: