then you just need some way to calculate various merits of the different designs. one merit is simplicity, which you could maybe try to operationalize as something like this:
comparing two merit-maps such as {'simplicity': 8} and {'simplicity': 6} for pareto optimality is simple enough with some thought, especially if we assume the keys are the same:
>>> some_way_better = lambda a, b: any(a[k] > b[k] for k in a)
>>> defeats = lambda a, b: some_way_better(a, b) and not some_way_better(b, a)
>>> defeats({'simplicity': 9}, {'simplicity': 8})
True
>>> defeats({'simplicity': 8}, {'simplicity': 9})
False
>>> defeats({'simplicity': 9, 'turing-complete': 0}, {'simplicity': 8, 'turing-complete': 1})
False
>>> defeats({'simplicity': 9, 'turing-complete': 1}, {'simplicity': 8, 'turing-complete': 1})
True
then it's easy enough to calculate which of a set of candidate designs can be eliminated because some other design defeats them
the bogus part is how you automatically calculate the various merits of a hypothetical collection of language features
anyone has an idea?