Geirhos et al. show that state-of-the-art convolutional neural networks put too much importance on texture information. This claim is confirmed in a controlled study comparing convolutional neural network and human performance on variants of ImageNet image with removed texture (silhouettes) or on edges. Additionally, networks only considering local information can perform nearly as well as other networks. To avoid this bias, they propose a stylized ImageNet variant where textured are replaced randomly, forcing the network to put more weight on global shape information.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).