> How many read from csv functions would we be left with?
It probably couldn't be that, because many build on one another. Some are deprecated and others are clearly incompatible, but out of 50 parameters you likely could imagine calling this with 20 parameters if the environment and the CSV you're ingesting are wonky enough.
I think feasible refactorings would be:
- rationalise currently separate parameters into meatier objects e.g. there's at least half a dozen parameters which deal with dates parsing, a dozen which configure the low-level CSV parsing, etc... that could probably be coalesced into configuration objects
- a builder-type API, but you'd end up at the same result using intermediate steps instead of a function, not really useful unless you leverage (1) and each builder step configures a non-trivial amount of the system, so rather than 50 parameters you'd have maybe 10 builder, each with 0~10 knobs
- or you'd build the thing as a bunch of composable transformers on top of a base parser
Of note: the latter at least might be undesirable from the Pandas POV, as it would imply layers of recursive Python calls, which might be much slower than whatever Pandas currently does (I've no idea).
I think that this style (such as it is) comes from R, and scientific computing more generally. I grew up with R and never realised how terrible long argument functions are until relatively recently.
You can see how ReadOptions is written on this link [2]. It's interesting they use a `cdef class` from `Cython` for this.
This doesn't solve all issues (the ReadOptions object and the others will inevitably have a bunch of default arguments) but I do think it's safer and it's easier to have a mental map of the things you need to decide and what's decided for you.
So you end up at the same point, but now you need additional intermediate structures and infrastructure which do nothing to help. And for Python specifically it's also a pain in the ass to format due to the whitespace sensitivity.
But at the same time I wonder how it would look refacotred. How many read from csv functions would we be left with?