>> I think we have very different definitions on what a feature is. You can use ...

>> I think we have very different definitions on what a feature is. You can use these activations as features in a model.

Let me backtrack a bit, to where I phrased the issue thusly:

Say, if you train a machine learning classifier C1 to recognise class Y1 from features F1,...,Fn, you can't then take the model of Y1 built by C1 and give it to a different classifier, C2, as a feature in a new feature vector Fn+1,...,Fn+k to learn a different class, Y2.

When you train a statistical machine learning classifier - let's take a simple linear model as an example, for simplicity; what you get in the output is a vector of numbers- parameters to a function. That's your model.

You can't use this vector of numbers as a feature. It is not the value of any one attribute - it's a set of parameters. So you can't just add it to your existing features, because your features are the values of attributes and the model is a set of parameters meant to be combined with those attributes.

What you can do is take your newly trained model and label the instances you have so far with the class labels the model can assign to them. Now, that's a pipeline alright. For instance, if you had a linear model with features "height" and "weight" and learned to label instances with "1" for male and "-1" for female, you could then go through your data, label every instance with a "1" or "-1" and then train again to learn a model of "age". At that point you have a new feature that is not the concept you learned in a previous session, but only a subset of that concept. Now you can try to learn a new concept from this new set of features, but the original concept ("sex") may or may not be part of it. It may turn out that "sex" is not necessary for learning "age" (it's redundant); or, it may be necessary, but in that case you have to learn the concept of "sex" all over again as part of learning "age".

By contrast, the class of algorithms I study, Inductive Logic Programming algorithms, can add the models they learn to their features (features are called "background knowledge" in ILP) and go on learning. For instance, such an algorithm can learn "parent" from examples of "father" and "mother", then "grandfather" from the original examples of "father" and the learned concept "parent" and "grandmother" from "mother" and "parent", then "grandparent" from "grandfather" and "grandmother" etc. Every time the new concept learned can be added to the learning algorithm's store of background knowledge, as it is- you don't need to go through the data and label it. That's because the representation of "data" and "concept" is the same, so you can interchange them at will.

Say, your background knowledge on "father" and "mother" might look like this:

  father(Earendil, Tuor)
  mother(Earendil, Idril)

From that you can learn "parent" that might look something like this:

  parent(A,B) :- father(A,B).
  parent(A,B) :- mother(A,B).

Now, you can add "parent" to your background knowledge, like this:

  father(Earendil, Tuor)
  mother(Earendil, Idril)
  parent(A,B) :- father(A,B).
  parent(A,B) :- mother(A,B).

From that you can learn "grandfather" and "grandmother" and add them to your background knowledge:

  father(Earendil, Tuor)
  mother(Earendil, Idril)
  parent(A,B) :- father(A,B).
  parent(A,B) :- mother(A,B).
  grandfather(A,B) :- father(A,C), parent(C,B).
  grandmother(A,B) :- mother(A,C), parent(C,B).

And then learn "grandparent" from that:

  father(Earendil, Tuor)
  mother(Earendil, Idril)
  parent(A,B) :- father(A,B).
  parent(A,B) :- mother(A,B).
  grandfather(A,B) :- father(A,C), parent(C,B).
  grandmother(A,B) :- mother(A,C), parent(C,B).
  grandparent(A,B) :- grandfather(A,B).
  grandparent(A,B) :- grandmother(A,B).

And so on.

That is what I mean by "composition"- building up knowledge by adding new concepts to your representation of the world.