Assignment 7: Assorted Programming Puzzles

Deadline: December 11th, 9am

Today we will be focusing on getting to know the Tensorflow core a little more. In the long run, knowing what tools you have available will allow you to write better/shorter/faster programs when you go beyond the straightforward models we’ve been looking at. Below you will find several (mostly disconnected) small programming tasks to be solved using Tensorflow functions. Most of these can be solved in just a few lines of code, but you will need to find the right tools first. The API docs will be indispensable here. Since the API can be a bit overwhelming at first, hints are included for most tasks. Some of the tasks build on each other, becoming more difficult in turn. You can try to get the “easy” stuff done first and then focus on a few of the more tricky ones. You should usually wrap each routine in a Python function and also try to add the tf.function decorator for performance (and see if still works…).

Given a 2D tensor of shape (?, n), extract the k (k <= n) highest values for each row into a tensor of shape (?, k). Hint: There might be a function to get the “top k” values of a tensor.
Given a tensor of shape (?, n), find the argmax in each row and return a new tensor that contains a 1 in each of the argmax’ positions, and 0s everywhere else.
As in 1., but instead of “extracting” the top k values, create a new tensor with shape (?, n) where all but the top k values for each row are zero. Try doing this with a 1D tensor of shape (n,) (i.e. one row) first. Getting it right for a 2D tensor is more tricky; consider this a bonus. Hint: You should look for a way to “scatter” a tensor of values into a different tensor. For two or more dimensions, you need to think carefully about the indices.
Implement an exponential moving average. That is, given a decay rate a and an input tensor of length T, create a new length T tensor where new[0] = input[0] and new[t] = a * new[t-1] + (1-a) * input[t] otherwise. Do not use tf.train.ExponentialMovingAverage.
Find a way to return the last element in 4. without using loops. That is, return new[T] only – you don’t need to compute the other time steps (if you can avoid it).
Given three integer tensors x, y, z all of the same (arbitrary) shape, create a new tensor that takes values from y where x is even and from z where x is odd.
Given a tensor of arbitrary and unknown shape (but at least one dimension), return 100 if the last dimension has size > 100, 12 if the last dimension has size <= 100 and > 44, and return 0 otherwise.
As 7., but also create three global counts (integers), where count i should grow by 1 if condition i happened. Run the function from 7. multiple times to test whether your counting works. Now, add a @tf.function decorator to the function from 7. Does your counter still work? If not, why not? Can you change it so it does work?
Given two 1D tensors of equal length n, create a tensor of shape (n, n) where element i,j is the ith element of the first tensor minus the jth element of the second tensor. No loops! Hint: Tensorflow supports broadcasting much like numpy.
Implement dot product attention: You are given a sequence of encoder states h of shape batch x time x features and the last decoder state s of shape batch x features. Compute the attention weights alpha where alpha[:, i] is equal to h[:, i] * s where * is the dot product between vectors (in this case we also have a batch dimension so the dot product should be between the corresponding vectors within the batch). That is, alpha should be of shape batch x time and alpha[:, i] should contain the attention weights for encoder time step i.