Welcome to the traditional Monday overview of the previous week's arXiv!
Speaking of traditional: since I have started doing these weekly overviews, I noticed that there are two particular research subareas of Deep Learning that one is bound to see among the preprints of any given week. No, I am not talking about BERT and its many contextualized word embedding cousins, although I rarely see a day go by without it appearing in the latest titles posted in the NLP section. The two subareas I have in mind are (a) adversarial machine learning (namely, staging and defending against attacks that take advantage of the models' vulnerabilities) and (b) various methods aimed at reducing the neural networks' size, ideally without compromizing their performance.
I have already addressed the adversarial issue in some of my previous posts: as deep learning models get more accurate (naturally becoming a bigger part of our day-to-day), attacks against them not only get more sophisticated, but also present a greater potential threat to the public. On the other hand, many of the impressive recent victories accomplished by AI models come at the cost of training neural networks that are larger than ever before. However, the deep learning field is not evolving in a vacuum: in particular, the advances in the IoT (Internet of Things) are directly linked to the so called Edge AI - meaning, we need smaller and lighter models that can be deployed locally on the IoT devices.
The first preprint that I would like to mention is called, I kid you not, Tiny Video Networks. Normally, any model processing video inputs is quite the opposite of tiny: think deep CNNs used in computer vision but with the added temporal dimension (3D convolutions, maybe even some recurrent architecture wrapping it all up, etc). Tiny Video Networks (I keep picturing them as some sort of cartoon characters every time I say the name, lets just call them TVNs) are highly efficient video models found through an automated architecture search in which a constraint was placed on the number of network parameters. The resulting TVNs were trained on multiple datasets aimed at video understanding, and demonstrated [close to] the same level of performance as their traditional, non-tiny counterparts - all while being 10x to 100x faster at inference time!
While we are on the subject of finding efficient model architectures via automated means, here is the second preprint that caught my eye, State of Compact Architecture Search For Deep Neural Networks, a review and comparison of four different methods for compact network architecture search. Being of modest six page length, this preprint does not go into much detail, but I still appreciate a review paper as a starting point in an area that I am rather unfamiliar with for the time being.
Whereas neural network architecture search is certainly a hot research topic in its own right, most (perhaps all?) of the well known architectures so far have been designed manually. The aforementioned BERT model is no exception. Even the smaller version of the original model is, frankly, huge, so unsurprisingly multiple ways of reducing its size while keeping its spirit (state-of-the-art performance included) have been proposed. In the past week alone, there were two: Q8BERT: Quantized 8Bit BERT and Pruning a BERT-based Question Answering Model.
The first model compression approach, quantization, involves reducing the number of bits that represent the network's weights from FP32 down to 8, decreasing the model's size and speeding up its inference time by almost a factor of 4. One of the benefits of the method is that it can be readily extended to other transformer-based pre-trained language models.
Finally, the last preprint on this week's list involves pruning. Pruning of neural networks is pretty similar to how I imagine pruning works in gardening: you start with a large neural network, and try removing parts of it without compromizing its accuracy [too much]. Voilà !
Till next week!