[link]
Summary by CodyWild 4 years ago
Most of the interesting mechanics within living things are mediated by interactions between proteins, making it important and useful to have good predictive models of whether proteins will interact with one another, for validating possible interaction graph structures.
Prior methods for this problem - which takes as its input sequence representations of two proteins, and outputs a probability of interaction - have pursued different ideas for how to combine information from the two proteins. On a basic level, you need your method to be independent to the ordering of the proteins, since the property we care about is a property of a pair of proteins, not a property of a particular ordering of proteins. Some examples of those have included:
- A kernel function between some representation of proteins
- Representing a protein pair according to whether and how often given k-mer sequences co-occur in both proteins
This paper's DPPI method is built on a Siamese network, which applies a single shared set of convolutional layers to each of the two proteins, and then calculates a "binding score," that structurally acts a bit like a similarity score, but with allowances for proteins to be complementary, rather than just similar. In more detail:
https://i.imgur.com/8ruY9es.png
1. Crop each protein into multiple overlapping subsequences of length 512 amino acids. Perform all following steps for every combination of cropped subsequences between the two proteins. (If A is divided into A1 and A2, you'd do the following steps for A1 x B and A2 x B and take the max score out of the two)
2. Each cropped protein is represented as a probabilistic sequence. Since we can't get fully certain sequences of what amino acid is at each point in the chain, we instead pass in a 20x512 representation, where at each of the 512 locations we have a distribution over 20 possible amino acids. This tensor is passed into multiple layers of convolutional network, with the same network weights applied to each of the two proteins.
3. A random projection is applied to the outputs of the convolutional network. The features that come out of the projection are conceptually similar to feature maps that might come out of a neural network layer, except that the weights aren't learned. This random projection has a specialized structure, in that it's composed of two (randomly-weighted) networks, A and B, each of which result in feature maps A1...K and B1...K. For protein 1, the outputs of the network are ordered A1...AK B1...BK, whereas for protein 2, the weights are swapped, and so the outputs are ordered B1...BK A1...AK.
4. A Hadamard product between the two random projection outputs. This is basically a dot product, but for matrices (you multiply each element in the matrix by the corresponding element in the other matrix). This is basically like calculating a similarity score between the feature values of the randomly projected features. One benefit of doing the odd reordering in the prior step is that it breaks symmetry: if we took a dot product between features calculated by a fully shared-weight network, then we'd be looking explicitly for similarity between sequence features, which might not be sufficient to know if proteins interact in a complementary way. Another benefit is that it makes the final fully connected layer (which predicts interaction) agnostic to the order of inputs. (Caveat: I only about 70% follow the logic of this) In the example above, the 1st element will end up being A1(Prot1) x B1(ProtB), and the K+1th element will end up being B1(Prot1) x A1(ProtB). Since multiplication is order-independent, both values 1 and K+1 represent the similarity between the proteins according to features A/B1.
5. Once you have this final representation, feed it into a fully connected layer
https://i.imgur.com/3LsgZNn.png
The authors show superior performance to past methods, and even show that they can get 96% accuracy on protein interactions within humans from training on a non-human species, showing that a lot of the biomechanical logic transfers.
https://i.imgur.com/REoU3Ab.png
They did an ablation test and showed that the random projection layer added value, but also that it was better to have the weights be random than learned, which was surprising to me, and suggests the model as a whole is prone to overfit.

more
less