Not a priori. In practice, I'd guess that if we destructure a typical video frame, it will have a header which is somewhat tree-like and a payload which we are not going to touch and decode. In that case, we avoid the copy for the heavy parts of the frame, i.e., the payload.
A good trick is that if you have a full frame from the network, keep a pointer to the full frame. Then decode the header as needed on top. This way, since the frame is full, you are guaranteed that no copy will occur for the frame while you are processing it. Also, the redundancy of the header is likely to be small in overhead with this approach.
About 10 years ago, I did some work in Erlang where I was moving around 16 kilobyte blocks of data. I managed 800mbit at that time on that generation of hardware. The hardware on the board notwithstanding, I think Erlang is much better at handling this nowadays.
As for the structural shaing between nodes: it is possible but only for data which is written very rarely and read quite often. In practice, the copying overhead in typical usage scenarios tend to be much cheaper than what people fear. One reason is that after the copy, you obtain cache locality, though also potential increased cache pressure.