Welcome to ShortScience.org! |

- ShortScience.org is a platform for post-publication discussion aiming to improve accessibility and reproducibility of research ideas.
- The website has 1547 public summaries, mostly in machine learning, written by the community and organized by paper, conference, and year.
- Reading summaries of papers is useful to obtain the perspective and insight of another reader, why they liked or disliked it, and their attempt to demystify complicated sections.
- Also, writing summaries is a good exercise to understand the content of a paper because you are forced to challenge your assumptions when explaining it.
- Finally, you can keep up to date with the flood of research by reading the latest summaries on our Twitter and Facebook pages.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Ren, Shaoqing and He, Kaiming and Girshick, Ross B. and Sun, Jian

Neural Information Processing Systems Conference - 2015 via Local Bibsonomy

Keywords: dblp

Ren, Shaoqing and He, Kaiming and Girshick, Ross B. and Sun, Jian

Neural Information Processing Systems Conference - 2015 via Local Bibsonomy

Keywords: dblp

[link]
**Object detection** is the task of drawing one bounding box around each instance of the type of object one wants to detect. Typically, image classification is done before object detection. With neural networks, the usual procedure for object detection is to train a classification network, replace the last layer with a regression layer which essentially predicts pixel-wise if the object is there or not. An bounding box inference algorithm is added at last to make a consistent prediction (see [Deep Neural Networks for Object Detection](http://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection.pdf)). The paper introduces RPNs (Region Proposal Networks). They are end-to-end trained to generate region proposals.They simoultaneously regress region bounds and bjectness scores at each location on a regular grid. RPNs are one type of fully convolutional networks. They take an image of any size as input and output a set of rectangular object proposals, each with an objectness score. ## See also * [R-CNN](http://www.shortscience.org/paper?bibtexKey=conf/iccv/Girshick15#joecohen) * [Fast R-CNN](http://www.shortscience.org/paper?bibtexKey=conf/iccv/Girshick15#joecohen) * [Faster R-CNN](http://www.shortscience.org/paper?bibtexKey=conf/nips/RenHGS15#martinthoma) * [Mask R-CNN](http://www.shortscience.org/paper?bibtexKey=journals/corr/HeGDG17) |

Fractional Max-Pooling

Benjamin Graham

arXiv e-Print archive - 2014 via Local arXiv

Keywords: cs.CV

**First published:** 2014/12/18 (6 years ago)

**Abstract:** Convolutional networks almost always incorporate some form of spatial
pooling, and very often it is alpha times alpha max-pooling with alpha=2.
Max-pooling act on the hidden layers of the network, reducing their size by an
integer multiplicative factor alpha. The amazing by-product of discarding 75%
of your data is that you build into the network a degree of invariance with
respect to translations and elastic distortions. However, if you simply
alternate convolutional layers with max-pooling layers, performance is limited
due to the rapid reduction in spatial size, and the disjoint nature of the
pooling regions. We have formulated a fractional version of max-pooling where
alpha is allowed to take non-integer values. Our version of max-pooling is
stochastic as there are lots of different ways of constructing suitable pooling
regions. We find that our form of fractional max-pooling reduces overfitting on
a variety of datasets: for instance, we improve on the state-of-the art for
CIFAR-100 without even using dropout.
more
less

Benjamin Graham

arXiv e-Print archive - 2014 via Local arXiv

Keywords: cs.CV

[link]
## Introduction * [Link to Paper](http://arxiv.org/pdf/1412.6071v4.pdf) * Spatial pooling layers are building blocks for Convolutional Neural Networks (CNNs). * Input to pooling operation is a $N_{in}$ x $N_{in}$ matrix and output is a smaller matrix $N_{out}$ x $N_{out}$. * Pooling operation divides $N_{in}$ x $N_{in}$ square into $N^2_{out}$ pooling regions $P_{i, j}$. * $P_{i, j}$ ⊂ $\{1, 2, . . . , N_{in}\}$ $\forall$ $(i, j) \in \{1, . . . , N_{out} \}^2$ ## MP2 * Refers to 2x2 max-pooling layer. * Popular choice for max-pooling operation. ### Advantages of MP2 * Fast. * Quickly reduces the size of the hidden layer. * Encodes a degree of invariance with respect to translations and elastic distortions. ### Issues with MP2 * Disjoint nature of pooling regions. * Since size decreases rapidly, stacks of back-to-back CNNs are needed to build deep networks. ## FMP * Reduces the spatial size of the image by a factor of *α*, where *α ∈ (1, 2)*. * Introduces randomness in terms of choice of pooling region. * Pooling regions can be chosen in a *random* or *pseudorandom* manner. * Pooling regions can be *disjoint* or *overlapping*. ## Generating Pooling Regions * Let $a_i$ and $b_i$ be 2 increasing sequences of integers, starting at 1 and ending at $N_{in}$. * Increments are either 1 or 2. * For *disjoint regions, $P = [a_{i−1}, a_{i − 1}] × [b_{j−1}, b_{j − 1}]$ * For *overlapping regions, $P = [a_{i−1}, a_i] × [b_{j−1}, b_j 1]$ * Pooling regions can be generated *randomly* by choosing the increment randomly at each step. * To generate pooling regions in a *peusdorandom* manner, choose $a_i$ = ceil($\alpha | (i+u))$, where $\alpha \in (1, 2)$ with some $u \in (0, 1)$. * Each FMP layer uses a different pair of sequence. * An FMP network can be thought of as an ensemble of similar networks, with each different pooling-region configuration defining a different member of the ensemble. ## Observations * *Random* FMP is good on its own but may underfit when combined with dropout or training data augmentation. * *Pseudorandom* approach generates more stable pooling regions. * *Overlapping* FMP performs better than *disjoint* FMP. ## Weakness * No justification is provided for the observations mentioned above. * It needs to be seen how performance is affected if the pooling layer in architectures like GoogLeNet. |

Deep Adversarial Networks for Biomedical Image Segmentation Utilizing Unannotated Images

Zhang, Yizhe and Yang, Lin and Chen, Jianxu and Fredericksen, Maridel and Hughes, David P. and Chen, Danny Z.

Medical Image Computing and Computer Assisted Interventions Conference - 2017 via Local Bibsonomy

Keywords: dblp

Zhang, Yizhe and Yang, Lin and Chen, Jianxu and Fredericksen, Maridel and Hughes, David P. and Chen, Danny Z.

Medical Image Computing and Computer Assisted Interventions Conference - 2017 via Local Bibsonomy

Keywords: dblp

[link]
This work improves the performance of a segmentation network by utilizing unlabelled data. They use a discriminator (they call EN) to distinguish between annotated and unannotated examples. They then train the segmentation generator (they call SN) based on what will fool the discriminator. https://i.imgur.com/7CfKnh5.png Three training phases are shown above This work is really great. They are using the segmentation to condition the discriminator which will learn to point out flaws when applying the segmentation to the unlabelled examples. Then these flaws in the segmentation are corrected by using the gradients from the discriminator to adjust the segmentation. In contrast with other semi-supervised approaches which learn a latent space for all samples, labelled and unlabelled, and then uses this space to learn a classifier or segmentation; this approach looks for the boundaries of the space only. The unlabelled examples are used to bias the representation learned by the segmentation network to conform to the distribution represented by all observed examples. Read this paper for more: https://arxiv.org/abs/1611.08408 Poster: https://i.imgur.com/eR5jgwn.png |

Neural Architecture Search with Reinforcement Learning

Barret Zoph and Quoc V. Le

arXiv e-Print archive - 2016 via Local arXiv

Keywords: cs.LG, cs.AI, cs.NE

**First published:** 2016/11/05 (4 years ago)

**Abstract:** Neural networks are powerful and flexible models that work well for many
difficult learning tasks in image, speech and natural language understanding.
Despite their success, neural networks are still hard to design. In this paper,
we use a recurrent network to generate the model descriptions of neural
networks and train this RNN with reinforcement learning to maximize the
expected accuracy of the generated architectures on a validation set. On the
CIFAR-10 dataset, our method, starting from scratch, can design a novel network
architecture that rivals the best human-invented architecture in terms of test
set accuracy. Our CIFAR-10 model achieves a test error rate of 3.84, which is
only 0.1 percent worse and 1.2x faster than the current state-of-the-art model.
On the Penn Treebank dataset, our model can compose a novel recurrent cell that
outperforms the widely-used LSTM cell, and other state-of-the-art baselines.
Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is
3.6 perplexity better than the previous state-of-the-art.
more
less

Barret Zoph and Quoc V. Le

arXiv e-Print archive - 2016 via Local arXiv

Keywords: cs.LG, cs.AI, cs.NE

[link]
### Main Idea: It basically tunes the hyper-parameters of the neural network architecture using reinforcement learning. The reward signal is taken as evaluation on the validation set. The method is policy gradient as the cost function is non-differentiable. ### Method: #### i. Actions: 1. There is controller RNN which predicts some hyper-parameters of the layer conditioned on the previous predictions. This prediction is just a one-hot vector based on the previous hyper-parameter chosen. At the start for the first prediction - this vector is just all zeros. 2. Once the network is generated completely, it is trained for a fixed number of epochs, the reward signal is calculated based on the evaluation on the validation set. #### ii. Training: 1. Nothing fancy in the reinforcement learning approach simple policy gradients. 2. Baseline is added to reduce the variance. 3. It takes 2-3 weeks to train it over 800 GPUs!! #### iii. Results: 1. Use it to generate CNNs and LSTM cells. Close to state-of-art results with generated architectures. ### Possible new directions: 1. Use better techniques RL techniques like TRPO, PPO etc. 2. Right now, they generate fixed length architecture. Their reason is for variable length architectures, it is difficult to determine how much time each architecture is trained. Smaller networks are easier to train. Thus, somehow determine training time as function of the learning capacity of the network. Code: They haven't released the code yet. I tried to simulate it in torch.(https://github.com/abhigenie92/nn_search) |

Relational Forward Models for Multi-Agent Learning

Andrea Tacchetti and H. Francis Song and Pedro A. M. Mediano and Vinicius Zambaldi and Neil C. Rabinowitz and Thore Graepel and Matthew Botvinick and Peter W. Battaglia

arXiv e-Print archive - 2018 via Local arXiv

Keywords: cs.LG, cs.AI, cs.MA, stat.ML

**First published:** 2018/09/28 (2 years ago)

**Abstract:** The behavioral dynamics of multi-agent systems have a rich and orderly
structure, which can be leveraged to understand these systems, and to improve
how artificial agents learn to operate in them. Here we introduce Relational
Forward Models (RFM) for multi-agent learning, networks that can learn to make
accurate predictions of agents' future behavior in multi-agent environments.
Because these models operate on the discrete entities and relations present in
the environment, they produce interpretable intermediate representations which
offer insights into what drives agents' behavior, and what events mediate the
intensity and valence of social interactions. Furthermore, we show that
embedding RFM modules inside agents results in faster learning systems compared
to non-augmented baselines. As more and more of the autonomous systems we
develop and interact with become multi-agent in nature, developing richer
analysis tools for characterizing how and why agents make decisions is
increasingly necessary. Moreover, developing artificial agents that quickly and
safely learn to coordinate with one another, and with humans in shared
environments, is crucial.
more
less

Andrea Tacchetti and H. Francis Song and Pedro A. M. Mediano and Vinicius Zambaldi and Neil C. Rabinowitz and Thore Graepel and Matthew Botvinick and Peter W. Battaglia

arXiv e-Print archive - 2018 via Local arXiv

Keywords: cs.LG, cs.AI, cs.MA, stat.ML

[link]
One of the dominant narratives of the deep learning renaissance has been the value of well-designed inductive bias - structural choices that shape what a model learns. The biggest example of this can be found in convolutional networks, where models achieve a dramatic parameter reduction by having features maps learn local patterns, which can then be re-used across the whole image. This is based on the prior belief that patterns in local images are generally locally contiguous, and so having feature maps that focus only on small (and gradually larger) local areas is a good fit for that prior. This paper operates in a similar spirit, except its input data isn’t in the form of an image, but a graph: the social graph of multiple agents operating within a Multi Agent RL Setting. In some sense, a graph is just a more general form of a pixel image: where a pixel within an image has a fixed number of neighbors, which have fixed discrete relationships to it (up, down, left, right), nodes within graphs have an arbitrary number of nodes, which can have arbitrary numbers and types of attributes attached to that relationship. The authors of this paper use graph networks as a sort of auxiliary information processing system alongside a more typical policy learning framework, on tasks that require group coordination and knowledge sharing to complete successfully. For example, each agent might be rewarded based on the aggregate reward of all agents together, and, in the stag hunt, it might require collaborative effort by multiple agents to successfully “capture” a reward. Because of this, you might imagine that it would be valuable to be able to predict what other agents within the game are going to do under certain circumstances, so that you can shape your strategy accordingly. The graph network used in this model represents both agents and objects in the environment as nodes, which have attributes including their position, whether they’re available or not (for capture-able objects), and what their last action was. As best I can tell, all agents start out with directed connections going both ways to all other agents, and to all objects in the environment, with the only edge attribute being whether the players are on the same team, for competitive environments. Given this setup, the graph network works through a sort of “diffusion” of information, analogous to a message passing algorithm. At each iteration (analogous to a layer), the edge features pull in information from their past value and sender and receiver nodes, as well as from a “global feature”. Then, all of the nodes pull in information from their edges, and their own past value. Finally, this “global attribute” gets updated based on summations over the newly-updated node and edge information. (If you were predicting attributes that were graph-level attributes, this global attribute might be where you’d do that prediction. However, in this case, we’re just interested in predicting agent-level actions). https://i.imgur.com/luFlhfJ.png All of this has the effect of explicitly modeling agents as entities that both have information, and have connections to other entities. One benefit the authors claim of this structure is that it allows them more interpretability: when they “play out” the values of their graph network, which they call a Relational Forward Model or RFM, they observe edge values for two agents go up if those agents are about to collaborate on an action, and observe edge values for an agent and an object go up before that object is captured. Because this information is carefully shaped and structured, it makes it easier for humans to understand, and, in the tests the authors ran, appears to also help agents do better in collaborative games. https://i.imgur.com/BCKSmIb.png While I find graph networks quite interesting, and multi-agent learning quite interesting, I’m a little more uncertain about the inherent “graphiness” of this problem, since there aren’t really meaningful inherent edges between agents. One thing I am curious about here is how methods like these would work in situations of sparser graphs, or, places where the connectivity level between a node’s neighbors, and the average other node in the graph is more distinct. Here, every node is connected to every other node, so the explicit information localization function of graph networks is less pronounced. I might naively think that - to whatever extent the graph is designed in a way that captures information meaningful to the task - explicit graph methods would have an even greater comparative advantage in this setting. |

About