###Survey
Bengio et al’s survey on representation learning
- Yoshua Bengio, Aaron Courville and Pascal Vincent. “Representation Learning: A Review and New Perspectives.” pdf TPAMI 35:8(1798-1828)
Bengio, LeCun Yann, Yoshua Bengio and Geoffrey Hinton’s survey on Nature
- Yann LeCun, Yoshua Bengio and Geoffrey Hinton. “Deep Learning”
pdf Nature 521, 436–444
Embeddings & Language Models
Skip-gram embeddings
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. “Efficient Estimation of Word Representations in Vector Space.” pdf ICLR, 2013.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. “Distributed Representations of Words and Phrases and their Compositionality.” pdf NIPS, 2013.
- [king-man+woman=queen] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. “Linguistic Regularities in Continuous Space Word Representations.” pdf NAACL, 2013.
- [technical note] Yoav Goldberg and Omer Levy “word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method” pdf Tech-report 2013
- [buzz-busting] Omer Levy and Yoav Goldberg “Linguistic Regularities in Sparse and Explicit Word Representations” pdf CoNLL-2014 Best Paper Award
- [lessons learned] Omer Levy, Yoav Goldberg, Ido Dagan “Improving Distributional Similarity with Lessons Learned from Word Embeddings” pdf, TACL 2015
- [syntax-word order] Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. “Two/Too Simple Adaptations of Word2Vec for Syntax Problems” pdf NAACL 2015 (Short)
Embedding enhancement: Syntax, Retrofitting, etc
- [dependency embeddings] Omer Levy and Yoav Goldberg “Dependency Based Word Embeddings” pdf ACL 2014 (Short)
- [dependency embeddings] Mohit Bansal, Kevin Gimpel and Karen Livescu. “Tailoring Continuous Word Representations for Dependency Parsing” pdf ACL 2014 (Short)
- [retrofitting with lexical knowledge] Manaal Faruqui, Jesse Dodge, Sujay Kumar Jauhar, Chris Dyer, Eduard Hovy and Noah A. Smith. “Retrofitting Word Vectors to Semantic Lexicons” pdf, NAACL 2015
- [contrastive estimation] Mnih and Kavukcuoglu, “Learning Word Embeddings Efficiently with Noise-Contrastive Estimation.” pdf NIPS 2013
- [embedding documents] Quoc V Le, Tomas Mikolov. “Distributed representations of sentences and documents” pdf ICML 2014
- [synonymy relations] Mo Yu, Mark Dredze. “Improving Lexical Embeddings with Semantic Knowledge” pdf ACL 2014 (Short)
- [embedding relations] Asli Celikyilmaz, Dilek Hakkani-Tur, Panupong Pasupat, Ruhi Sarikaya. “Enriching Word Embeddings Using Knowledge Graph for Semantic Tagging in Conversational Dialog Systems” pdf AAAI 2015 (Short)
- [multimodal] Angeliki Lazaridou, Nghia The Pham and Marco Baroni. “Combining Language and Vision with a Multimodal Skip-gram Model” pdf NAACL 2015
- [syntax-word order] Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. “Two/Too Simple Adaptations of Word2Vec for Syntax Problems” pdf NAACL 2015 (Short)
- [autoencoder, lexeme, lexical resource, synset] Sascha Rothe and Hinrich Schutze, “AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes” pdf ACL 2015 Best Paper
- [lexical resource, babelnet] Ignacio Iacobacci, Mohammad Taher Pilehvar and Roberto Navigli, “SensEmbed: Learning Sense Embeddings for Word and Relational Similarity” pdf ACL 2015
- [specific linguistic relation] Zhigang Chen, Wei Lin, Qian Chen, Xiaoping Chen, Si Wei, Hui Jiang and Xiaodan Zhu, “Revisiting Word Embedding for Contrasting Meaning” pdf ACL 2015
Embedding enhancement: Word order, Morphological, etc
- [syntax-word order] Wang Liang, Chris Dyer, Alan Black, Isabel Trancoso. “Two/Too Simple Adaptations of Word2Vec for Syntax Problems” pdf NAACL 2015 (Short)
- [word order] Rie Johnson and Tong Zhang. Effective use of word order for text categorization with convolutional neural networks. pdf NAACL 2015
- [word order] Radu Soricut and Franz Och. “Unsupervised Morphology Induction Using Word Embeddings” pdf NAACL 2015 Best Paper Awards
- [morphology] Minh-Thang Luong Richard Socher Christopher D. Manning. “Better Word Representations with Recursive Neural Networks for Morphology” pdf CoNLL 2013
- [morpheme] Siyu Qiu, Qing Cui, Jiang Bian, Bin Gao, Tie-Yan Liu. “Co-learning of Word Representations and Morpheme Representations” pdf COLING 2014
- [morphological] Ryan Cotterell and Hinrich Schütze. “Morphological Word-Embeddings” pdf NAACL 2015 (Short)
- [regularization] Dani Yogatama, Manaal Faruqui, Chris Dyer, Noah Smith. “Learning Word Representations with Hierarchical Sparse Coding” pdf ICML 2015
- [character, word order, based on word2vec] Andrew Trask David Gilmore Matthew Russell, “Modeling Order in Neural Word Embeddings at Scale” pdf ICML 2015
Embeddings as matrix factorization
- [approximate interpretation] Levy and Goldberg, “Neural Word Embedding as Implicit Matrix Factorization.” pdf NIPS 2014
- Omer Levy, Steffen Remus, Chris Biemann, and Ido Dagan. “Do Supervised Distributional Methods Really Learn Lexical Inference Relations?” pdf NAACL 2015 (Short)
- Tim Rocktaschel, Sameer Singh and Sebastian Riedel. “Injecting Logical Background Knowledge into Embeddings for Relation Extraction” pdf NAACL 2015
- [exact interpretation] Yitan Li, Linli Xu, Fei Tian, Liang Jiang, Xiaowei Zhong and Enhong Chen. “Word Embedding Revisited: A New Representation Learning and Explicit Matrix Factorization Perspective” pdf IJCAI 2015
Embedding obtained from other methods
- [noise-contrasive estimation] Andriy Mnih and Koray Kavukcuoglu, “Learning word embeddings efficiently with noise-contrastive estimation” pdf NIPS 2013
- [logarithm of word-word co-occurrences] Jeffrey Pennington, Richard Socher, and Christopher D. Manning, “GloVe: Global Vectors for Word Representation” pdf EMNLP 2014
- [explicitly encode co-occurrences] Omer Levy, Goldberg Yoav, and Ramat-Gan Israel, “Linguistic regularities in sparse and explicit word representations.” pdf CoNLL 2014.
Why and when embeddings are better
- [comparison between pretrained embeddings] Yanqing Chen, Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. “The expressive power of word embeddings” pdf ICML 2013
- [prediction fashioned matters] Felix Hill, KyungHyun Cho, Sebastien Jean, et al., “Not all neural embeddings are born equal” pdf NIPS Workshop 2014
- [multichannel as multi-embeddings input] Wenpeng Yin, Hinrich Schütze. “MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity” ACL 2015
- [dimension, corpus, compare] Siwei Lai, Kang Liu, Liheng Xu, Jun Zhao, “How to Generate a Good Word Embedding?” pdf arXiv pre-print
Word Representations via Distribution Embedding
- Katrin Erk, “Representing Words As Regions in Vector Space”. pdf In Proceedings of the Thirteenth Conference on Computational Natural Language Learning, Boulder, Colorado, 2009.
- [random walks, generative model] Sanjeev Arora, Yuanzhi Li, Yingyu Liang, Tengyu Ma, Andrej Risteski. “Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings” pdf. In CoRR, 2015.
- [breadth, asymmetric] Luke Vilnis, Andrew McCallum. “Word Representations via Gaussian Embedding”. pdf slides In ICLR, 2015.
Classic(!)
- Brown et al., “Class-Based n-Gram Models of Natural Language.” [pdf] Computational Linguistics 1992
Example Notes, Mini-Tutorials, Technical Reports
- Yoav Goldberg. “A note on Latent Semantic Analysis” [pdf] Tech-report
- Yoav Goldberg and Omer Levy “word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method” [pdf] Tech-report 2013