One thing people (myself included!) often missed though is that Metropolis is extremely prone to flicker. I think the best summary of why comes from Kalos and Whitlock (2nd Edition):
> The M(RT)2 algorithm is very simple and powerful; it can be used to sample essentially any distribution function regardless of analytic complexity in any number of dimensions. Complementary disadvantages are that sampling is correct only asymptotically and that successive variables produced are correlated, often very strongly. This means that the evaluation of integrals normally produces positive correlations in the values of the integrand, with consequent increase in variance for a fixed number of steps as compared with independent samples. Also the method is not well suited to sampling distributions with parameters that change frequently.
The thinning here is based on a misconception. If the purpose is plotting a histogram it doesn't matter if the
samples are correlated. The bin heights are consistently estimated if you keep everything. Throwing away intermediate steps usually just introduces noise.
Thinning is a good idea if the samples are strongly correlated, and the computation to process them all would be better spent on running the chain for longer. Or if you don't know what computation you want in advance, and you can't afford to store everything.
(Also, as someone points out in the comments, the implementation of the Metropolis algorithm in this post is wrong.)