Can you ellaborate on this please? My goal is to work in text processing professionally, I have the intuition that text should 'learn' its own features. I'm learning Python, reading about machine learning and practicing my coding all the time.
Recently I had to build an algorithm that did a form of substring detection. As part of this process I had to generate feature vectors for the actual model to classify tokens of the string as part-of-substring or not. Prior to this though, I had to do a few levels of tokenization, normalization, and preprocessing, to get the raw text into a form that the substring classifier could use the resultant data effectively.
Ideally if you have a good parse of the text from step 0, you don't need to do nearly as much munging/processing yourself, and can just focus on the thrust of your specific algo, rather than cleaning and generating the feature data to drive it.
I'm personally very skeptical as to if this will make feature engineering disappear entirely, if for no other reason than that we do a lot to tweak our features other than just _getting_ them, whether this be second order processing, aggregation, smoothing or transformation. That being said, as with your parent post, I am VERY hopeful for new techniques to arise that can cut into that overhead at least a little.