Neural Language Models
Neural langauge models
- [neural LM] Bengio et al., “A Neural Probabilistic Language Model.” pdf Journal of Machine Learning Research 2003 
- [bi-loglinear LM] 
- [discriminative LM] Brian Roark, Murat Saraclar, and Michael Collins. “Discriminative n-gram language modeling.” pdf Computer Speech and Language, 21(2):373-392. 2007
Long short term memory (LSTMs)
- [parsing] Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, Geoffrey Hinton, “Grammar as Foreign Language” pdf arXiv 2014 
- [program] Wojciech Zaremba, Ilya Sutskever, “Learning to Execute” pdf arXiv 2014 
- [translation] Ilya Sutskever, Oriol Vinyals, Quoc Le, “Sequence to Sequence Learning with Neural Networks” pdf NIPS 2014
- [attention-based LSTM, summarization] Alexander M. Rush, Sumit Chopra and Jason Weston, “A Neural Attention Model for Abstractive Sentence Summarization” pdf EMNLP 2015
- [bi-LSTM, character] Wang Ling, Tiago Luis, Luis Marujo, Ramon Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W Black, Isabel Trancoso, “Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation” pdf EMNLP 2015
- [reading gate, dialogue cell] Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Pei-Hao Su, David Vandyke, Steve Young, “Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems” pdf EMNLP 2015 Best Paper
- [state embedding, character] Miguel Ballesteros, Chris Dyer and Noah A. Smith, “Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs” pdf EMNLP 2015
- [no stacked, highway networks, character, CNN with LSTM] Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush “Character-Aware Neural Language Models” pdf arXiv pre-print
CNNs: convolution neural networks for language
- [convoluting from character-level to doc-level] Xiang Zhang, Yann LeCun. “Text Understanding from Scratch” pdf 
- [character LM for doc-level] Peng, F., Schuurmans, D., Keselj, V. and Wang, S. “Language independent authorship attribution using character level language models.” pdf EACL 2004. 
- [convnet for sentences, dynamic, k-max pooling, stacked] Nal Kalchbrenner, Edward Grefenstette and Phil Blunsom. “A Convolutional Neural Network for Modelling Sentences” pdf ACL 2014. 
- [unsupervised pretraining for CNN] Wenpeng Yin and Hinrich Schutze. “Convolutional Neural Network for Paraphrase Identification.” pdf NAACL 2015 
- [convolute better with word order, parallel-CNN, different region] Rie Johnson and Tong Zhang. “Effective Use of Word Order for Text Categorization with Convolutional Neural Networks” pdf
- [character, ConvNet, data augumentation] Xiang Zhang, Junbo Zhao, Yann LeCun, “Character-level Convolutional Networks” pdf NIPS 2015
- [no stacked, highway networks, character, CNN with LSTM] Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush “Character-Aware Neural Language Models” pdf arXiv pre-print
QA with commonsense reasoning
- [nlp for AI] Jason Weston, Antoine Bordes, Sumit Chopra, Tomas Mikolov. “Towards AI-Complete Question Answering:A Set of Prerequisite Toy Tasks” pdf 2015 
- [memory networks] Jason Weston, Sumit Chopra, Antoine Bordes “Memory Networks” pdf ICLR 2015 
- [winograd schema] Hector J. Levesque. “The Winograd Schema Challenge” pdf AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning 2011 
- [textual entailment] Ion Androutsopoulos, Prodromos Malakasiotis “A Survey of Paraphrasing and Textual Entailment Methods” pdf Journal of Artificial Intelligence Research 38 (2010) 135-187
Compositional
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeff Dean, “Distributed Representations of Words and Phrases and their Compositionality,” pdf NIPS 2013 
- [socher’s] 
- [cutting RNN trees] Christian Scheible, Hinrich Schutze. “Cutting Recursive Autoencoder Trees” pdf CoRR abs/1301.2811 (2013)