Hacker News new | past | comments | ask | show | jobs | submit login

I think there are two major reasons why CNN might not be a good model. The first is that CNNs expect translation invariance, which is pretty common in images. But language sentences do not have this structure. Another reason is that in NLP outputs usually have varying length, this is why RNNs, LSTMs are so popular these days.



I think you're right when thinking about CNNs on words. It's max-pool that's usually combined with CNNs that helps with translational invariance, less so the CNN filters themselves (which, if it were a full convolution, would show up in the complex phase).

I think it makes more sense when CNNs are applied at the character level. The filter banks then activate for specific n-gram patterns of characters, like certain prefixes, suffixes, and root words. The higher level LSTMs are then relieved of having to understand that level of structure. Also, tokenization is hard, and might be especially wrong for media with grammatical abuse like Twitter, and this avoids that janky preprocessing. See: http://arxiv.org/abs/1508.06615


CNNs are used in NLP all the time for a range of problems. see Collobert and Weston's work, NLP (almost) from scratch.

even in images, you're able to zero pad the input.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: