They're memory bandwidth limited, you can basically just estimate the performanc...

		nullc 9 days ago \| parent \| context \| favorite \| on: AMD's Turin: 5th Gen EPYC Launched They're memory bandwidth limited, you can basically just estimate the performance from the time it takes to read the entire model from ram for each token.