__s basically covers it but my initial switch from pure python to NumPy just involved changing my vector 'class' (just a tuple in reality) into NumPy arrays. Whilst the basic vector multiplications and additions became way faster the overhead of creating hundreds of tiny NumPy arrays was a killer.
So, just like in the original article instead of creating a single 1x3 array I created a Mx3 array where M represented as many rays as I could fit into memory at once (I have quite a weedy machine).
Due to how NumPy broadcasting exactly the same code for, say, subtracting the origin from a ray vector works to subtract that single origin vector for a multidimensional array of ray vectors.