We have some common cause / need in the Nix/nixpkgs ecosystem, where finding missing package dependencies (whether they're bare invocations or absolute paths) is a bit of an ongoing problem.
The golden path stuff isn't too hard to find during packaging, but there's a long tail of stuff that only shows up when you hold something exactly right.
We can (with some effort) handle a decent fraction of posix/bash shell statically with https://github.com/abathur/resholve. I think a similar approach could generalize to most interpreted languages (but I haven't gotten around to it myself and I think it's broadly rarer beyond Shell).
I have taken one or two steps into a lower-fidelity approach (https://github.com/abathur/binlore) that uses YARA rules to scan for signs of ~exec in both binary formats and interpreted languages. For now this is mostly just a top-N formats thing, but there's no reason we can't keep going. One of the big advantages is that it's fairly cross-platform, even if accuracy is suboptimal.
The scanning approach isn't just amazing for resholve's specific needs. I'm stepping down this path to try to identify executables that likely exec their command-line arguments--because this is a sign that resholve need to know how to check invocations of that command for other executables. But it's a good bit better at flagging things that have _any_ exec behavior, and that's a decent-enough first step for letting focus on a smaller set.
Once we find these, though, we're generally left with human analysis of the help, manpage, and/or source to figure out what exec the scanner is hitting on and whether it's part of the CLI.
I've been ~snoozing on this front for a while, but I feel like a decent fraction of the components for building something like an 80% toolkit exist in at least a primitive form. It could probably benefit from some more static source analysis bits. I also vaguely intend to look into whether LLMs can do a reasonably-good job of making it easier to answer these questions from the help/man/source. Unless something out in that space can be reliably automated, it also needs some way to ~pin assertions about the arguments to the relevant code in a way that they'll break if that specific corner of the codebase changes. (I've wondered if semgrep expressions could handle this.)
There are also some other bits around that might fit in, though cross-platform is a common-ish problem. We've looked a little at using a FUSE, but that is hard to operationalize on macOS. I also poked at 3 or so less-interactive dynamic approaches a few years back in https://github.com/abathur/commandeer/ and I'm sure a few more have cut their teeth since then. I also wrote a ~shelved-for-now WIP (https://github.com/abathur/faffer) for using some cursed shell metaprogramming to do something like fuzz/tree-shake a shell script for loose dependencies. That isn't directly viable for many languages, but its focus on ~overloading flow-control structures to make it easier for force branch coverage does seem like something that might be feasible with rewriting tools.
The golden path stuff isn't too hard to find during packaging, but there's a long tail of stuff that only shows up when you hold something exactly right.
We can (with some effort) handle a decent fraction of posix/bash shell statically with https://github.com/abathur/resholve. I think a similar approach could generalize to most interpreted languages (but I haven't gotten around to it myself and I think it's broadly rarer beyond Shell).
I have taken one or two steps into a lower-fidelity approach (https://github.com/abathur/binlore) that uses YARA rules to scan for signs of ~exec in both binary formats and interpreted languages. For now this is mostly just a top-N formats thing, but there's no reason we can't keep going. One of the big advantages is that it's fairly cross-platform, even if accuracy is suboptimal.
The scanning approach isn't just amazing for resholve's specific needs. I'm stepping down this path to try to identify executables that likely exec their command-line arguments--because this is a sign that resholve need to know how to check invocations of that command for other executables. But it's a good bit better at flagging things that have _any_ exec behavior, and that's a decent-enough first step for letting focus on a smaller set.
Once we find these, though, we're generally left with human analysis of the help, manpage, and/or source to figure out what exec the scanner is hitting on and whether it's part of the CLI.
I've been ~snoozing on this front for a while, but I feel like a decent fraction of the components for building something like an 80% toolkit exist in at least a primitive form. It could probably benefit from some more static source analysis bits. I also vaguely intend to look into whether LLMs can do a reasonably-good job of making it easier to answer these questions from the help/man/source. Unless something out in that space can be reliably automated, it also needs some way to ~pin assertions about the arguments to the relevant code in a way that they'll break if that specific corner of the codebase changes. (I've wondered if semgrep expressions could handle this.)
There are also some other bits around that might fit in, though cross-platform is a common-ish problem. We've looked a little at using a FUSE, but that is hard to operationalize on macOS. I also poked at 3 or so less-interactive dynamic approaches a few years back in https://github.com/abathur/commandeer/ and I'm sure a few more have cut their teeth since then. I also wrote a ~shelved-for-now WIP (https://github.com/abathur/faffer) for using some cursed shell metaprogramming to do something like fuzz/tree-shake a shell script for loose dependencies. That isn't directly viable for many languages, but its focus on ~overloading flow-control structures to make it easier for force branch coverage does seem like something that might be feasible with rewriting tools.