Summary by CodyWild 3 years ago
This preprint is a bit rambling, and I don't know that I fully followed what it was doing, but here's my best guess:
- We think it's probably the case that SARS-COV2 (COVID19) uses a protease (enzyme involved in its reproduction) that isn't available and co-optable in the human body, and is also quite similar to the comparable protease protein in the original SARS virus. Therefore, it is hoped that we might be able to take inhibitors that bind to SARS, and modify them in small ways to make them bond to SARS-COV2
- The paper notes that it's specifically interested in targeted covalent inhibitors. These are drugs that inhibit the function of a protein by actually covalently binding with the relevant binding pocket, as opposed to most drugs, which by default just fit neatly inside the pocket and occupy it much of the time in equilibrium, but don't actually form permanent, stable covalent bonds with the protein. This class of drugs can be more effective, because its binding is stronger and more permanent, but it can also be more dangerous, because its potential side effects if it binds with a non-intended protein pocket can be more severe.
- In order to get a covalently-binding drug that fits with the pocket of SARS-COV2, the authors start with a known inhibitor from SARS, and then use reinforcement learning to make modifications to it. The allowed modification actions consist of adding or removing "fragments" rather than atoms, where "fragments" here refers to coherent subcomponents of other drugs from similar families that were broken up according to hand-coded chemical rules. This approach is more stable than just adding on molecules, because at every stage in generation, the generated molecule will be coherent and chemically sound.
- The part I don't fully follow is what they use for the reward function for compounds that are in the process of being made. They specify that they do reward intermediate compounds, rather than just ones at the end of generation, but don't specify what goes into the reward. If I had to guess, I'd imagine it consists of (1) a molecular docking simulation that can't be differentiated through, and thus can't be used directly as a loss function, and/or (2) hand-coded heuristics from chemists for what makes a stable binding