I mentioned this in a separate comment but it may be worth bearing in mind how t...

I mentioned this in a separate comment but it may be worth bearing in mind how the AI pipeline works, in that you’re not pushing all this data into an LLM and asking it to produce graphs, which would be prone to some terrible errors.

Instead, you’re using the LLM to generate Python code that runs using normal libraries like Pandas and gnuplot. When it makes errors it’s usually generating totally the wrong graphs rather than inaccurate data, and you can quickly ask it “how many X Y Z” and use that to spot check the graphs before you proceed.

My initial version of this began in a spreadsheet so it’s not like you need sophisticated analysis to check this stuff. Hope that explains it!