(Note that this is quite a one-sided view of things, but it does convey the excitement of deep-learning researchers and the potential of what might be possible.)
Although I'm not in deep learning myself (I'm a computer vision researcher), here's a TL;DR as I understand it: rather than having people in specific domains such as computer vision or speech processing create their own features, the idea is to take raw inputs (pixels in the case of images) and train multi-layer neural net architectures that "learn" the relevant higher-level features in an unsupervised way (i.e., without labeled training data). Some of these seem to be pulling out interesting features and perform competitively on a few benchmarks in vision and other fields.
I'm not sold on this yet, because it seems like the complexity of designing features has merely been traded in for the complexity of designing different learning architectures, but it's certainly becoming quite popular these days (mostly led by Geoff Hinton of Toronto, Yoshua Bengio of Montreal, and Yann LeCunn of NYU).
No, non-parametric learning is an unrelated topic, which seeks to estimate probability distributions without using "parametric" (= having known structure) models. The canonical example of a non-parametric approach is using histograms, or their generalization, called Parzen Windows.
That's an orthogonal distinction. The methods thus far have typically been parametric, in that there's a fixed network topology and the learning algorithm adjusts the (fixed set of) weights on the edges. There's no reason, though, why you couldn't have a nonparametric version that adaptively chose the number of hidden nodes in the networks and the connectivity structure.