[link]
Summary by CodyWild 4 years ago
This paper, presented this week at ICLR 2020, builds on existing applications of message-passing Graph Neural Networks (GNN) for molecular modeling (specifically: for predicting quantum properties of molecules), and extends them by introducing a way to represent angles between atoms, rather than just distances between them, as current methods are limited to.
The basic version of GNNs on molecule data works by creating features attached to atoms at each level (starting at level 0 with the element-specific embedding of that atom), and constructing "messages" between neighboring atoms that are a function of the neighbor atom's feature vector and the distance between the two neighbors. (This is the minimal version; some methods also include the bond type along with the distance as part of the edge-specific features). At a given layer, an atom's features are updated by applying an update function to both its own prior value and the sum of all the messages it receives from neighbors.
The trouble with this method is that it doesn't account for angular relationships between sets of atoms, which are physically important to quantum properties of a molecule. The naive way you might imagine representing angle is by situating the molecule in a 2D grid, and applying spherical convolutions, so your contribution to a neighbor's features would be based on your spherical distance away. However, this doesn't work, because molecules don't have a canonical frame of reference - there is no fixed left or right, or up and down, and operating in this way would mean that a molecule and its horizontal flip would have different representations.
Instead, the authors propose an interesting inversion of the existing approaches, where feature vectors are attached to atoms, and are updated using the features of other atoms. In this model, features live on "messages" between pairs of atoms, and are updated by incorporating information from all messages within some local distance window. Importantly, each pair of atoms has a vector associated with their relationship in the molecule, and so when you combine two such messages together, you can calculate the angle between the associated vectors. This angle is invariant to flipping or rotation, because it's defined based on reference points internal to the molecule, which move together when the molecule is moved.
https://i.imgur.com/mw46gWz.png
Messages are updated from other messages using a combination of the distance between the non-shared endpoints of the messages (that is, if both message vectors share an endpoint i, and go to j and k respectively, this would be the distance between j and k), and the angle between the (i-j) vector and the (i-j) vector. For physics-based reasons I admit I didn't fully follow, these two pieces of information are embedded in a spherical basis function, so messages will update from each other differently based on their relative positions in a sphere.
https://i.imgur.com/Tvc7Gex.png
The representation of a given atom is then simply the sum of all its incoming messages, conditioned by the distance between the reference atom and the paired neighbor for which the message is defined. A concatenation of atom representations across layers is used to create a final atom representation, which is used for final quantum property prediction.
The authors tested on two datasets, and found dramatic improvements, with an average of 31% relative gain on the prior state of the art over different quantum property targets.
more
less