Assignment 8: Word2Vec

Deadline: December 6th, 9am

In this week, we will look at “the” classic model for learning word embeddings. This will be another tutorial-based assignment. Find the link here.

The key points are:

Questions for Understanding

As in the last assignment, answer these questions in your submission to make sure you understand what is happening in the tutorial code!

  1. Given the sentence “I like to cuddle dogs”, how many skipgrams are created with a window size of 2?
  2. In general, how does the number of skipgrams relate to the size of the dataset (in terms of input-target pairs)?
  3. Why is it not a good idea to compute the full softmax for classification?
  4. The way the dataset is created, for a given (target, context) pair, are the negative samples (remember, these are randomly sampled) the same each time this training example is seen, or are they different?
  5. For the given example dataset (Shakespeare), would the code create (target, context) pairs for sentences that span multiple lines? For example, the last word of one line and the first word of the next line?
  6. Does the code generate skipgrams for padding characters (index 0)?
  7. The skipgrams function uses a “sampling table”. In the code, this is shown to be a simple list of probabilities, and it is created without any reference to the actual text data. How/why does this work? I.e. how does the program “know” which words to sample with which probability?

Possible Improvements & Extensions

Optional: CBOW Model

The tutorial only covers the Skipgram model, however the same paper also proposed the (perhaps more intuitive) Continuous Bag-of-Words model. Here instead of predicting the context from the center word, it’s the other way around. If you are looking for more of a challenge implementing a model by yourself, the changes should be as follows:

The rest stays pretty much the same. You will still need to generate negative examples through sampling, since the full softmax is just as inefficient as with the Skipgram model.

Compare the results of the CBOW model with the Skipgram one!