Title is a little general. This specifically is a technique for breaking down text analysis, where the goal is to give semantic meaning to a block of text. In their example, they want to condense beer reviews into star ratings of a few categories. A totally black box technique would take the review and spit out the scores, whereas their technique has two jointly trained networks: one identifies relevant text fragments for each category, and the other gets the corresponding category score for the fragment.
This is not groundbreaking, but still a good example of a larger trend in trying to understand neural network decision making. Here's a cool paper that analyzes how CNNs can learn image features for attributes like "fuzziness" and other higher level visual constructs while training for object recognition: https://pdfs.semanticscholar.org/3b31/9645bfdc67da7d02db766e...
From a business point of view (getting executives to want to use ML) understanding "the black box" is important. But the two-step process you outline would tend to be less accurate than a one-step process, no?
This is not groundbreaking, but still a good example of a larger trend in trying to understand neural network decision making. Here's a cool paper that analyzes how CNNs can learn image features for attributes like "fuzziness" and other higher level visual constructs while training for object recognition: https://pdfs.semanticscholar.org/3b31/9645bfdc67da7d02db766e...