Hacker News new | past | comments | ask | show | jobs | submit login
A magic getopt (daemonology.net)
123 points by Hello71 on Dec 7, 2015 | hide | past | favorite | 45 comments



This is really cool (I've been wondering how to do a string-switch for a while now), but I don't think getopt is a great use case, because getopt still results in imperative argument parsing and this always results in pain.

I've been working on libargs for the past year; it's a declarative argument-parsing library for C: https://github.com/mcinglis/libargs

The main idea is that each argument is by default parsed and stored as a string, or you can optionally specify a function of the type `void f(char * name, char * arg, void * dest)` to parse the argument string and store it in a well-typed destination. This way, you can have an `int` argument by passing `int__argparse` as the parser, and if the user passes a value outside the range of `int`, then an appropriate out-of-range error is printed to the console. Similarly with `uchar__argparse` or something like `point__argparse` (e.g. taking some format like `{x,y}`).

libargs is quite flexible and has worked well for me so far. Automatic help text generation can be added in future while maintaining (non-ABI) backwards compatibility.

The main disadvantage is that it depends on other libraries I've developed that are essentially Jinja-templated C source files that function as makeshift generic types / typeclasses in C. Your inclination towards this approach depends on taste; personally I much prefer deferring the pain to the build system, as opposed to the source code.


Fun fact, C++ reserves double-underscores anywhere in names for the implementation. It's highly unlikely that you'll run into anything colliding with your names in the wild, but if someone wants to use your library in C++, it's technically bogus.


It's similar in C. C11 Standard chapter 7.1.3:

>All identifiers that begin with an underscore and either an uppercase letter or another underscore are always reserved for any use.

I doubt the library is usable in C++ though, since longjmp doesn't play well with the destruction of local objects.


getopt still results in imperative argument parsing

That's one of the reasons I use it, actually. For simple utilities it may be adequate to set flags and store values for command-line parameters, but sometimes you need the flexibility of being able to run whatever code you need when an option arrives.


>To accept notable contributions, I'll require you to assign your copyright to me.

Okay, but why?


Among other things, it allows the project owner to relicense the project without having to contact every contributor requesting permission to relicense.


And usually makes accepting contributions from people outside the US a mess.

Ask the FSF how long it took them to finally sort out all the legal issues and prepare new forms.


Also, people are generally more inclined to sign that for a non-profit organization like FSF or the OSGeo Foundation, than for an individual person, although that's still more accepted than signing to a company.

CLA to non-profit > CLA to indidual > CLA to company


Why would a Canadian programmer have trouble accepting contributions from outside the US, specifically?


Not specifically outside the US, but the US (and apparently Canada) have the concept of copyright assignment, many other countries don't, so it gets interesting what value and consequences a copyright assignment by a programmer from there has, esp. in local courts and when the contributor isn't on your side.


Don't forget that "moral rights" may not be assigned at all, in at least some countries, so you'll need a per-country agreement that the contributor won't enforce them ... in countries where it is possible to agree not to enforce them, which is not all countries.

https://en.wikipedia.org/wiki/Moral_rights#In_Europe


Anyone who designs command-line interfaces for GNU/Linux should read this: https://www.gnu.org/prep/standards/html_node/Command_002dLin... and this: https://www.gnu.org/prep/standards/html_node/Option-Table.ht...

If you are using C++, Boost.Program Options is a solid choice. It is one of the relatively few command line argument parsers in the world which supports close to a full set of (GNU-like) behaviors without custom workarounds. It also enables type safety in the sense of disallowing decimals where integers were expected.


Thank you for the link. It contains this tiny gem: a glimpse into a path that CGI applications have not taken:

> CGI programs should accept these as command-line options, and also if given as the PATH_INFO; for instance, visiting ‘http://example.org/p.cgi/--help’ in a browser should output the same information as invoking ‘p.cgi --help’ from the command line.


Wasn't there a massive security issue in PHP a few years back because in some CGI environments, it did in fact accept command line arguments, allowing arbitrary code execution via `php -r "<?php ...` and allowing to view sources via `php -s`?


Regardless of CGI, I do advocate writing web services that respond to "GET /help" with a developer-facing synopsis of available routes. Pretty much the same thing.


I advocate using "OPTIONS /path" for that.


That prevents people from loading it with a plain-vanilla web browser. "GET /help" is more accessible, therefore more likely to be used. By humans, I mean.


My major pain with Boost.Program Options is that it is not header-only and pretty massive.


The shared libraries for it on my system are less than 700 KiB.


Exactly?!



To add to this, here's a full example of conditionally using getopt that still works without it. It also includes an example of the recommended-by-GNU help and version options, which jzwinck mentioned.

https://gist.github.com/pdkl95/363d48999e9df027a99c


This still has the "list the options twice" wart though. Of course there's no way to avoid that for shell scripts, since getopt(1) has to explode the options before the rest of your script starts looking at them.


I've found LLVM's solution to be the fastest/slickest method I have thus far discovered. [0] I love that it is declarative and works across TUs. I've thought about making a standalone distribution for some time.

[0]: http://llvm.org/docs/CommandLine.html


That looks very slick - although it seems to be C++ only.



One word: docopt.


Not sure why you were downvoted. This is the way to go IMO. Declare your options like writing a man page. Let docopt generate your option handling code. There is an implementation for C. https://github.com/docopt/docopt.c


That's a generator for docopt parsers, written in python, targeting C. The closest docopt (variant) in C that I know of is the one I wrote: https://github.com/jaroslov/docoptc . Although, at best, I'd say that code is "looking for a maintainer".


There is a decent C++ one. What's wrong with the C people? Heck, they live and die by parsers.


As I am not the first one to shamelessly self plug: I made ngetopt.awk[0] an argument parser for GNU awk that handles long options as well.

I've been using it quite a bit, to transform a three-liner awk program into a full fledged command in little effort. Very nice to share with colleagues.

[0]: https://github.com/joepvd/ngetopt.awk


Shameless plug: I wrote a Python library that not only lets you declaratively define all your options, but also works with config files too. Don't write your own logic for this. Just use https://github.com/ipartola/groper


For Python, I remember docopt (http://docopt.org/) being a big deal for a while - never used it, but sounds much nicer than writing out lots of code to construct your options parsing. Any reason why groper is a better option?


groper is better in several ways. First off, you specify all your options in Python, not a DSL that won't be checked until compile time. I mean, sure you could add code highlighting for docopt to your editor of choice, but chances are it already supports Python and works well.

Second, with groper each module is free to define its own options. You no longer need to have a centralized place where all the options are defined, and you don't need to do the plumbing through your application to give the right values to the right modules. This way your server.py can say "I want a host and a port" and your logger.py can say "I want the verbosity level and the filename" and the two don't have to know about each other (but can).

Third and most important, groper supports config files, which no other argument parser does. Chances are that if you are creating something more than just a simple command line program, you'd have too many options to specify on the command line. Instead, it'd be lovely to have a config file, and be able to override some of the options via command line args. Python's ConfigParser is a mess, and it's a mess that works completely differently from argparse/optparse/getopt. I also don't consider something like config.py to be good practice either.

So basically, groper is your one stop shop for getting config files and command line options right. Oh, and it can generate sample config files for you if you want.


https://github.com/clibs/flag is yet-another approach, more like Go's flag package. Docopt is awesome too.


I considered going with something along those lines, but figured that for us crufty UNIX people it was far more likely to be useful as a drop-in replacement for an existing C getopt loop.


docopt is almost perfect. It needs to support optional flag arguments (á la `--color[=WHEN]), and having the first un-indented line after `Usage:` be treated as prose. The latter is on `master` of the Python (reference) implementation, but hasn't landed in a release or a spec.

Also, better errors for when the programmer screws up the formatting of the doc string.


Shameless self-plug: you may be interested in my version of docopt at https://github.com/ridiculousfish/docopt_fish . Its syntax is more forgiving, it's capable of expressing more usage specs, and it has excellent error reporting for doc strings.


Very cool! I'll definitely keep it in mind (though I've been tending to use Go for greenfield projects recently).


Almost totally unrelated, but taking the opportunity to rant. If you use OS X, your getopt is fully POSIX compliant (as are many BSDs, I think), unlike most Linux distributions. This means globs (files arguments) must come after optionals.

For example:

ls -la * .txt

OK

ls * .txt -la

Not OK

This drives me mad.


Yes, I deliberately did not implement that GNU bug. I also avoided implementing the --prefix-of-long-option bug.


It makes me very happy that you refer to this as a bug :)


uhm, sorry? shouldn't glob be handled by the shell? I mean, the "ls" program does not handle literally '*.txt', but "foo.txt bar.txt baz.txt"


Right but ls calls getopt – and determines the meaning of the values provided.

After a glob, of say, * .txt, and let's say there are a.txt, b.txt and c.txt, if I call

ls * .txt -ltra

it expands to ls a.txt b.txt c.txt -ltra

In POSIX it expects options to appear before files, see: https://en.wikipedia.org/wiki/Getopt#Example_1_.28using_POSI...

Trust me, I wish this were not adhered to, but it is.


I can recommend CCAN's opt module: http://ccodearchive.net/info/opt.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: