[link]
This paper surveys progress on adapting deep learning techniques to non-Euclidean data and suggests future directions. One of the strengths (and weaknesses) of deep learning--specifically exploited by convolutional neural networks--is that the data is assumed to exhibit translation invariance/equivariance and invariance to local deformations. Hence, long-range dependencies can be learned with multi-scale, hierarchical techniques where spatial resolution is reduced. However, this means that any information about the data that can't be learned when spatial resolution is reduced can get lost (I believe that residual networks aim to address this by the skip connections that are able to learn an identity operation; also, in computer vision, multi-scale versions of the data are often fed to CNNs). Key areas where this assumption about the data appears to be true is computer vision and speech recognition. #### Some quick background The *Laplacian*, a self-adjoint (symmetric) positive semi-definite operator, which is defined for smooth manifolds and graphs in this paper, can be thought of as the difference between the local average of a function around a point and the value of the function at the point itself. It's generally defined as $\triangle = -\text{div} \nabla$. When discretizing a continuous, smooth manifold with a *mesh*, note that the graph Laplacian might not converge to the continuous Laplacian operator with increasing sampling density. To be consistent, need to create a triangular mesh, i.e., represent the manifold as a polyhedral surface. ### Spectral methods Fourier analysis on non-Euclidean domains is possible by considering the eigendecomposition of the Laplacian operator. A possible transformation of the Convolution Theorem to functions on manifolds and graphs is discussed, but is noted as not being shift-invariant. The Spectral CNN can be defined by introducing a spectral convolutional layer acting on the vertices of the graph and using filters in the frequency domain and the eigenvectors of the Laplacian. However, the spectral filter coefficients will be dependent on the particular eigenvectors (basis) - domain dependency == bad for generalization! The non-Euclidean analogy of pooling is *graph coarsening*- only a fraction of the vertices of the graph are retained. Strided convolutions can be generalized to the spectral construction by only keeping the low-frequency components - must recompute the graph Laplacian after applying the nonlinearity in the spatial domain, however. Performing matrix multiplications on the eigendecomposition of the Laplacian is expensive! ### Spectrum-free Methods **A polynomial of the Laplacian acts as a polynomial on the eigenvalues**. ChebNet (Defferrard et al.) and Graph Convolutional Networks (Kipf et al.) boil down to applying simple filters acting on the r- or 1-hop neighborhood of the graph in the spatial domain. Some examples of generalizations of CNNs that define weighting functions for a locally Euclidean coordinate system around a point on a manifold are the * Geodesic CNN * Anisotropic CNN * Mixture Model network (MoNet) #### What problems are being solved with these methods? * Ranking and community detection on social networks * Recommender systems * 3D geometric data in Computer Vision/Graphics * Shape classification * Feature correspondence for 3D shapes * Behavior of N-particle systems (particle physics, LHC) * Molecule design * Medical imaging ### Open Problems * *Generalization* spectral analogues of convolution learned on one graph cannot be readily applied to other ones (domain dependency). Spatial methods generalize across different domains, but come with their own subtleties * *Time-varying domains* * *Directed graphs* non-symmetric Laplacian that do not have orthogonal eigendecompositions for interpretable spectral-domain constructions * *Synthesis problems* generative models * *Computation* extending deep learning frameworks for non-Euclidean data |
[link]
## Summary The research question the authors answered was whether by shifting from an episodic to a "schematic", or gist-like, memory system, a reinforcement learning agent could learn to achieve its goals in a dynamic environment. The authors focused on 2D navigation tasks where the reward locations constantly changed, such that new reward locations were correlated in the short-term but where independent and sampled from a stable distribution in the long-term. I found it interesting that the authors claimed the real world is like this, and consequentially they staked a lot of the significance of their work on this fact. **The main conclusion they came to was that given the existence of a stable long-term distribution for reward location (or whatever random variable the agent is concerned with estimating a distribution for), the optimal strategy for an agent is to shift from utilizing episodic to schematic memories slowly.** The authors implemented their agent using a novel neural network architecture that consisted of, in general, an episodic memory system, a schematic memory system, a critic to generate a TD-error. The episodic memory system was: * a spatial encoder which took in the (x,y)-pair of the current location of the agent, * an autoencoder implemented as a 3-layer recurrent network * a network of place field units The output of the spatial encoder fed into the autoencoder, and the output of this fed into the place cells. "Retrieving" memory from the place cells was implemented as a fully-connected sigmoid layer. The use of place field units was quite interesting; the idea behind this was to learn to associate activation patterns of place cells with specific locations within the environment where rewards were recently found. The schematic memory was implemented as a Restricted Boltzman Machine. The first layer was a direct projection of the place cells from the episodic network. The ultimate goal of the RBM was to learn a general statistical model of the reward locations. It was trained in an offline manner (i.e., while the agent was "at rest" between trials) by using random activity in the spatial encoder, and propagating that through to the RBM. This was curious, but apparently since they also had added a TD-prediction error to the episodic system via a critic, this was more biologically plausible than iid sampling from the episodic memory. The agent has a parameter that controls how much it can mix its episodic and schematic memories; the resultant "mixed" memory influences action-selection. ## Take-aways We're seeing a shift towards more complex RL environments that this could be applied to; for example, 3D navigation tasks where there are multiple goals that could potentially move over time. Perhaps this could also be applied to modeling of the dynamic behavior of other agents in a multi-agent setting? Cool use of unsupervised learning to enhance RL! |
[link]
In the future, AI and people will work together; hence, we must concern ourselves with ensuring that AI will have interests aligned with our own. The authors suggest that it is in our best interests to find a solution to the "value-alignment problem". As recently pointed out by Ian Goodfellow, however, [this may not always be a good idea](https://www.quora.com/When-do-you-expect-AI-safety-to-become-a-serious-issue). Cooperative Inverse Reinforcement Learning (CIRL) is a formulation of a cooperative, partial information game between a human and a robot. Both share a reward function, but the robot does not initially know what it is. One of the key departures from classical Inverse Reinforcement Learning is that the teacher, which in this case is the human, is not assumed to act optimally. Rather, it is shown that sub-optimal actions on the part of the human can result in the robot learning a better reward function. The structure of the CIRL formulation is such that it should encourage the human to not attempt to teach by demonstration in a way that greedily maximizes immediate reward. Rather, the human learns how to "best respond" to the robot. CIRL can be formulated as a dec-POMDP, and reduced to a single-agent POMDP. The authors solved a 2D navigation task with CIRL to demonstrate the inferiority of having the human follow a "demonstration-by-expert" policy as opposed to a "best-response" policy. |