Learning Word Embeddings: Word2Vec and GloVe

The basic concept behind learning word embeddings is to use the word vectors of one or more words in a sentence to obtain the word vector for another word in the sentence.

For example,

I want a glass of orange ______.

The word vector of the last word can be learned by passing a vector containing the vectors of one or more previous words, concatenated together, through a neural network and obtaining the output from a softmax layer. It can then be compared to other words in the database and may be similar to the vector for the word juice.

There are some efficient approaches to learning word embeddings, including Word2Vec and GloVe.

Word2Vec

Word2Vec (word to vector) aims to learn a word vector for a given word, such that words that appear in a similar context have similar vectors. It is based on the skip-gram model. Another version of Word2Vec is based on the CBOW (Continuous bag of Words) model.

In the skip-gram model, a single word is used to predict its neighboring words i.e. its context.

In the CBOW model, a certain word is predicted using its neighboring words.

The skip-gram model is slow to train depending on the size of the vocabulary. Therefore, a technique called negative sampling is used.

GloVe

GloVe stands for "Global Vectors for word representation". While Word2Vec is a "predictive" approach, GloVe is a "count-based" approach.

GloVe learns by constructing a co-occurrence matrix that basically counts how many times a word appears in a given context. This matrix is then factorized to obtain word vectors and context vectors.

Last updated