ShortScience.org - Making Science Accessible!

Welcome to ShortScience.org!

www.wikidata.org
sci-hub
scholar.google.com

Dermatologist-level classification of skin cancer with deep neural networks
Esteva, Andre and Kuprel, Brett and Novoa, Roberto A. and Ko, Justin and Swetter, Susan M. and Blau, Helen M. and Thrun, Sebastian
Nature - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Anmol Sharma 6 years ago

Skin cancer is one of the most common cancer type in humans. Primarily, the lesion is diagnosed visually through a series of 2D color images taken of the affected area. This may be followed by dermoscopic analysis, a biopsy and histopathological examination. Automated classification of skin lesions using images is a challenging task owing to the fine-grained variability in the appearance of skin lesions. 

To this end, Esteva et al. propose a deep learning based solution to automate the task of diagnosing lesions of the skin into fine-grained categories. Specifically, they use a GoogleNet Inception v3 CNN architecture which won the ImageNet Large Scale Visual Recognition Challenge in 2014. The method also leverages pre-training, in which an already trained DNN can be fine-tuned on a slightly varied task, which allows the network to leverage the convolutional filters it might have learnt from a much larger dataset. To achieve this, the Inception v3 CNN was fine-tuned from a pre-trained state. The model was initially trained on approximately 1.28 million images with about 1000 classes, from the 2014 ImageNet Large Scale Visual Recognition Challenge. Following which, the network is then fine-tuned on the dermatology dataset. 

The dataset used in the study was obtained clinically from open-access online repositories and Stanford Medical Center. It consists of 127,463 training and validation images, and held out set of 1942 labelled test images. The labels are organized hierarchically in a tree like structure, where each succeeding depth level represents a fine-grained classification of the disease.

The network is trained to perform three tasks: i) classify the first-level nodes of the taxonomy,
which represent benign lesions, malignant lesions and non-neoplastic. ii)  nine-class
disease partition—the second-level nodes—so that the diseases of
each class have similar medical treatment plans, and finally iii)  using only biopsy-proven images on medically important use cases, whether the algorithm and dermatologists could distinguish malignant versus benign lesions of epidermal (keratinocyte carcinoma compared to benign seborrheic keratosis) or melanocytic (malignant melanoma compared to benign nevus) origin. The CNN achieved 72.1 $\pm$ 0.9\% (mean $\pm$ s.d.) overall accuracy (the average of individual inference class accuracies) and two dermatologists attain 65.56\% and 66.0\% accuracy on a subset of the validation set for the first task. The CNN achieves 55.4 $\pm$ 1.7\% overall accuracy whereas the same two dermatologists attain 53.3\% and 55.0\% accuracy in the second task. For the third task, the CNN outperforms the dermatologists, and obtains an area under the curve (AUC) over 91\% for each case.

dx.doi.org
sci-hub
scholar.google.com

A Potential Reduction Algorithm with User-Specified Phase I-Phase II Balance for Solving a Linear Program from an Infeasible Warm Start
Freund, Robert M.
SIAM Journal on Optimization - 1995 via Local Bibsonomy
Keywords: dblp

[link] Summary by Chris Murray 9 years ago

This is a fantastic paper: it presents the problem well, the algorithms are intuitive and clearly presented, and the proofs are not overly long.  This describes the online allocation problem: roughly how to use the hypothesis of N "experts" to create a hypothesis that isn't much worse than the best expert (alternatively, how to allocate wealth every period among N traders, conditioned on only their past performance, such that at the end you performed almost as well as the best trader chosen in hindsight).  This paper presents the hedge algorithm, which works with general bounded summable loss functions and has only one parameter (a learning rate) to tune: it works by simply decreasing the weight on each expert every period by a factor like (learning_rate)^(loss), where 0 < learning_rate <= 1.  The final bound, where L(hedge) is the loss of the hedge algorithm, L* is the loss of the best expert (the one with minimum loss), and $0<B<=1$ is the learning rate, is:

$$L(hedge) <= \[  ln(1/B) \* L\* + ln(N)  \] / (1 - B)$$

This paper also describes boosting and relates it to the hedge algorithm, though in my opinion the description given of the boosting problem and adaboost isn't nearly as good as the explanation of the online allocation problem and Hedge.  In boosting, we have a weak learner, which can learn a function $X \rightarrow Y$ given some examples, but possibly with high error rate (the precise definition of a weak is, of course, in the paper).  In boosting, a "master algorithm" has some set of labelled examples $X_i \rightarrow Y_i$  (X may be multi-dimensional).  it calls the weak learner many times, giving it different distributions over these examples and looking at the hypotheses created by the weak learner each time.  It then combines these hypothesis into a "master" hypothesis that is guaranteed to be "good". The paper continues with several extentions of boosting to other domains like regression.

arxiv.org
arxiv-vanity.com
scholar.google.com

Learning Online Alignments with Continuous Rewards Policy Gradient
Yuping Luo and Chung-Cheng Chiu and Navdeep Jaitly and Ilya Sutskever
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.LG, cs.CL
more

[link] Summary by Denny Britz 8 years ago

TLDR; The authors use policy gradients on an RNN to train a "hard" attention mechanism that decides whether to output something at the current timestep or not. Their algorithm is online, which means it does not need to see the complete sequence before making a prediction, as is the case with soft attention. The authors evaluate their model on small- and medium-scale speech recognition tasks, where they achieve performance comparable to standard sequential models.

#### Notes:

- Entropy regularization and baselines were critical to make the model learn
- Neat trick: Increase dropout as training progresses
- Grid LSTMs outperformed standard LSTMs

arxiv.org
scholar.google.com

Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
Ashwin K Vijayakumar and Michael Cogswell and Ramprasath R. Selvaraju and Qing Sun and Stefan Lee and David Crandall and Dhruv Batra
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.AI, cs.CL, cs.CV
more

[link] Summary by Denny Britz 8 years ago

TLDR; The authors propose a new Diverse Beam Search (DBS) decoding procedure that produces more diverse responses than standard Beam Search (BS). The authors divide the beam of size B into G groups of size B/G. At each step they perform beam search for each group with an added similarity penalty (with scaling factor lambda) that encourages groups to be pick different outputs. This procedure is done greedily, i.e. group 1 does regular BS, group 2 is conditioned on group 1, group 3 is conditioned on group 1 and 2, and so on. Similarity functions include Hamming distance, Cumulative Diversity, n-gram diversity and neural embedding diversity. Hamming Distance tends to perform best. The authors evaluate their model on Image Captioning (COCO, PASCAL-50S), Machine Translation (europarl) and Visual Question Generation. For Image Captioning the authors perform a human evaluation (1000 examples on Mechanical Turk) and find that DBS is preferred over BS 60% of the time.

arxiv.org

What's Hidden in a Randomly Weighted Neural Network?
Ramanujan, Vivek and Wortsman, Mitchell and Kembhavi, Aniruddha and Farhadi, Ali and Rastegari, Mohammad
- 2019 via Local Bibsonomy
Keywords: deep-learning, readings, generalization, theory

[link] Summary by devin132 5 years ago

The paper: "Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask" by Zhou et al., 2019 found that by just learning binary masks one can find random subnetworks that do much better than chance on a task. This new paper builds on this method by proposing a strong algorithm than Zhou et al. for finding these high-performing subnetworks.https://i.imgur.com/vxDqCKP.png

The intuition follows: "If a neural network with random weights (center) is sufficiently overparameterized, it will contain a subnetwork (right) that performs as well as a trained neural network (left) with the same number of parameters."

While Zhou et al. learned a probability for each weight this paper learns a score for each weight and takes the top k percent at evaluation. The scores are learned through their primary contribution that they call the edge-popup algorithm: 

https://i.imgur.com/9KcIbxd.png

"In the edge-popup Algorithm, we associate a score with each edge. On the forward pass we choose the top edges by score. On the backward pass we update the scores of all the edges with the straight-through estimator, allowing helpful edges that are “dead” to re-enter the subnetwork. *We never update the value of any weight in the network, only the score associated with each weight.*"

They're able to find higher-performing random subnetworks than Zhou et al.

https://i.imgur.com/T3D7OsZ.png