What perceptrons feed into the next layer are not features- they're activations. They can't be used as features because they don't represent attributes of some concept.
CNNs invent new features that can be used further down the track- sure, but the end result is not something you can use any further. You can't, for example, take a learned concept of a dog and then use it as a feature to learn about pets.
With language learning pipelines again, you're not adding a newly learned model to a set of features- instead, you use that model to label parts of your data that were not previously labelled.
What I mean (what my research group does) is more like what is discussed by Francois Chollett of Keras, here:
> What perceptrons feed into the next layer are not features- they're activations.
> They can't be used as features because they don't represent attributes of some concept.
I think we have very different definitions on what a feature is. You can use these activations as features in a model.
> With language learning pipelines again, you're not adding a newly learned model to a set of features- instead, you use that model to label parts of your data that were not previously labelled.
I'm afraid I don't understand what you are trying to say. In an NLP pipeline you use the predictions of several models as features for your final model. How is that not composable and modular ?
>> I think we have very different definitions on what a feature is. You can use these activations as features in a model.
Let me backtrack a bit, to where I phrased the issue thusly:
Say, if you train a machine learning classifier C1 to recognise class Y1 from features F1,...,Fn, you can't then take the model of Y1 built by C1 and give it to a different classifier, C2, as a feature in a new feature vector Fn+1,...,Fn+k to learn a different class, Y2.
When you train a statistical machine learning classifier - let's take a simple linear model as an example, for simplicity; what you get in the output is a vector of numbers- parameters to a function. That's your model.
You can't use this vector of numbers as a feature. It is not the value of any one attribute - it's a set of parameters. So you can't just add it to your existing features, because your features are the values of attributes and the model is a set of parameters meant to be combined with those attributes.
What you can do is take your newly trained model and label the instances you have so far with the class labels the model can assign to them. Now, that's a pipeline alright. For instance, if you had a linear model with features "height" and "weight" and learned to label instances with "1" for male and "-1" for female, you could then go through your data, label every instance with a "1" or "-1" and then train again to learn a model of "age". At that point you have a new feature that is not the concept you learned in a previous session, but only a subset of that concept. Now you can try to learn a new concept from this new set of features, but the original concept ("sex") may or may not be part of it. It may turn out that "sex" is not necessary for learning "age" (it's redundant); or, it may be necessary, but in that case you have to learn the concept of "sex" all over again as part of learning "age".
By contrast, the class of algorithms I study, Inductive Logic Programming algorithms, can add the models they learn to their features (features are called "background knowledge" in ILP) and go on learning. For instance, such an algorithm can learn "parent" from examples of "father" and "mother", then "grandfather" from the original examples of "father" and the learned concept "parent" and "grandmother" from "mother" and "parent", then "grandparent" from "grandfather" and "grandmother" etc. Every time the new concept learned can be added to the learning algorithm's store of background knowledge, as it is- you don't need to go through the data and label it. That's because the representation of "data" and "concept" is the same, so you can interchange them at will.
Say, your background knowledge on "father" and "mother" might look like this:
father(Earendil, Tuor)
mother(Earendil, Idril)
From that you can learn "parent" that might look something like this:
CNNs invent new features that can be used further down the track- sure, but the end result is not something you can use any further. You can't, for example, take a learned concept of a dog and then use it as a feature to learn about pets.
With language learning pipelines again, you're not adding a newly learned model to a set of features- instead, you use that model to label parts of your data that were not previously labelled.
What I mean (what my research group does) is more like what is discussed by Francois Chollett of Keras, here:
https://blog.keras.io/the-future-of-deep-learning.html
The kind of program-like modularity he's describing is missing from modern statistical classifiers, deep nets included.