> A somehow better approach is to parse the original PDF, disassemble it into pieces, and then reassemble them into a new PDF only using the “trusted” pieces
I wish this approach was used more often, as it also easily allows you to deprecate stuff in your file formats. What you usually have is a huge mess of code that supports all things that ever existed and often even standards don't drop cruft in the name of backward compatibility.
The current approach leads to big unmaintainable codebases riddled with security holes. Font parsers are a good example for this as can be seen in the google project zero font parsing vulnerability series:
http://googleprojectzero.blogspot.de/2015/07/one-font-vulner...
I wish this approach was used more often, as it also easily allows you to deprecate stuff in your file formats. What you usually have is a huge mess of code that supports all things that ever existed and often even standards don't drop cruft in the name of backward compatibility.
The current approach leads to big unmaintainable codebases riddled with security holes. Font parsers are a good example for this as can be seen in the google project zero font parsing vulnerability series: http://googleprojectzero.blogspot.de/2015/07/one-font-vulner...