ShortScience.org - Making Science Accessible!

Welcome to ShortScience.org!

doi.ieeecomputersociety.org
sci-hub
scholar.google.com

Trojan Horses in Amazon's Castle: Understanding the Incentivized Online Reviews
Jamshidi, Soheil and Rejaie, Reza and Li, Jun
IEEE Computer Society ASONAM - 2018 via Local Bibsonomy
Keywords: dblp

[link] Summary by SOJA 6 years ago

During the past few years, sellers have increasingly offered discounted or free products to selected reviewers of ecommerce platforms in exchange for their reviews. Such incentivized (and often very positive) reviews can improve the rating of a product which in turn sways other users’ opinions about the product. 

Here, we examine the problem of detecting and characterizing incentivized reviews in two primary categories of Amazon products.  We show that the key features of EIRs and normal reviews exhibit different characteristics. 

https://i.imgur.com/1BdwsK4.png

Furthermore, we illustrate how the prevalence of EIRs has evolved and been affected by Amazon’s ban. 

https://i.imgur.com/1XsSaxX.png

Our examination of the temporal pattern of submitted reviews for sample products reveals promotional campaigns by the corresponding sellers and their effectiveness in attracting other users. 
https://i.imgur.com/TvQiDlc.png

Finally, we demonstrate that a classifier that is trained by EIRs (without explicit keywords) and normal reviews can accurately detect other EIRs as well as implicitly incentivized reviews. Overall, this analysis sheds an insightful light on the impact of EIRs on Amazon products and users.

www.elen.ucl.ac.be
sci-hub
scholar.google.com

Using very deep autoencoders for content-based image retrieval
Krizhevsky, Alex and Hinton, Geoffrey E.
ESANN - 2011 via Local Bibsonomy
Keywords: dblp

[link] Summary by nishnik 9 years ago

Here the author presents a way to retrieve similar images in O(distance) time (where distance can be termed as 100 - correlation in percentage between two images), but it uses O(2^n) memory (n is the number of bits in which we are encoding the image). Therefore this approach is independent of the size of database (Though the value of 'n' is very important, it is kind of precision measure for the correlation, if we choose very small 'n' the difference between similar images to given image can't be scored efficiently, while if we use very large 'n' the semantic similarity won't be captured efficiently enough).
(But this is only hashing, what is peculiar about this paper? It uses semantic hashing.)

For all the images, they are fed to autoencoder as input, a code is generated as output which is used for hashing. And then for the query image, again a code is generated and k-nearest images are retrieved from the output space.

But what is autoencoder? It is an artificial neural network with layers which are not intra-connected but inter-connected. They are comprised of two parts: encoder and decoder.

In the paper n is 28 that means the autoencoder will have 28 units in the middle layer.

*Training:*
1. The encoder training is done greedily one by one (layer by layer) using the way Restricted Boltzmann Machines are trained.
2. Then the decoder is just the inverse of the encoder layer (*Unsymmetrical auto-encoders generally give poor results.)
3. And then they are fine tuned using back-propagation, using the input image itself for calculating the loss.
(*Due to huge number of weights, the back-propagation training converges very slowly).

In short the encoder transforms the high dimensional input to low dimensional output.
And then the decoder represents it back to high dimensional space.

In the paper this low dimensional output has been used as the code representation for the input image. (Note that these are rounded so that they are binary.) And these capture the semantic content of the image. The author has also written that this approach wont be useful if applied to words with respect to documents as words w.r.t. documents are much more semantically meaningful than pixels w.r.t. images.

Then they have used one more thing to tackle translation related variance. They have taken 81 patches of each of the image (regularly spaced patches), and have applied the above mentioned algorithm to compute the hashes. (*It is a kind of convolution except for the fact that we are not summing anything, just iterating over the bags to find their representation. It won't be much effective for tackling rotational variance.)

In the array of size 2^28, for the images whose code comes to be `a1,a2..a28` the value of `a1,a2..a28` is computed in decimal representation as 'i' and the image is stored at index 'i'.

Now for a query image, it is broken into 81 overlapping patches which are fed to the network and its code is computed. Then all the images at indexes whose difference is less than 3 bits are returned and given a score based on difference in bits. And then scores are summed for each of the image and then images are returned as per descending order of score.

The author has used two layer searching, where in first layer for the given input image, the output images are returned using the 28-bit codes, then the 256 bit code of input image and the previously returned output images are compared and based on the 256-bit codes a refined order is returned.

Though I recommend people to study A Practical Guide to Training Restricted Boltzmann Machines for a better understanding of the learning used in the training part above.

doi.org
sci-hub
scholar.google.com

Deep Adversarial Networks for Biomedical Image Segmentation Utilizing Unannotated Images
Zhang, Yizhe and Yang, Lin and Chen, Jianxu and Fredericksen, Maridel and Hughes, David P. and Chen, Danny Z.
Medical Image Computing and Computer Assisted Interventions Conference - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Joseph Paul Cohen 8 years ago

This work improves the performance of a segmentation network by utilizing unlabelled data. They use a discriminator (they call EN) to distinguish between annotated and unannotated examples. They then train the segmentation generator (they call SN) based on what will fool the discriminator. 

https://i.imgur.com/7CfKnh5.png

Three training phases are shown above

This work is really great. They are using the segmentation to condition the discriminator which will learn to point out flaws when applying the segmentation to the unlabelled examples. Then these flaws in the segmentation are corrected by using the gradients from the discriminator to adjust the segmentation.

In contrast with other semi-supervised approaches which learn a latent space for all samples, labelled and unlabelled, and then uses this space to learn a classifier or segmentation; this approach looks for the boundaries of the space only. The unlabelled examples are used to bias the representation learned by the segmentation network to conform to the distribution represented by all observed examples.

Read this paper for more: https://arxiv.org/abs/1611.08408

Poster:
https://i.imgur.com/eR5jgwn.png

aclweb.org
scholar.google.com

Bidirectional RNN for Medical Event Detection in Electronic Health Records
Jagannatha, Abhyuday N. and Yu, Hong
The Association for Computational Linguistics HLT-NAACL - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Marek Rei 8 years ago

The authors have a dataset of 780 electronic health records and they use it to detect various medical events such as adverse drug events, drug dosage, etc. The task is done by assigning a label to each word in the document.

https://i.imgur.com/bZ7yM0z.png

Annotation statistics for the corpus of health records.

They look at CRFs, LSTMs and GRUs. Both LSTMs and GRUs outperform the CRF, but the best performance is achieved by a GRU trained on whole documents.

papers.nips.cc
scholar.google.com

Sanity Checks for Saliency Maps
Adebayo, Julius and Gilmer, Justin and Muelly, Michael and Goodfellow, Ian J. and Hardt, Moritz and Kim, Been
Neural Information Processing Systems Conference - 2018 via Local Bibsonomy
Keywords: dblp

[link] Summary by Hadrien Bertrand 6 years ago

The paper designs some basic tests to compare saliency methods. It founds that some of the most popular methods are independent of model parameters and the data, meaning they are effectively useless.

## Methods compared

The paper compare the following methods: gradient explanation, gradient x input, integrated gradients, guided backprop, guided GradCam and SmoothGrad. They provide a refresher on those methods in the appendix.

All those methods can be put in the same framework. They require a classification model and an input (typically an image). The output of the method is an *explanation map* of the shape of the input where a higher value for a feature implies greater relevance in the decision of the model.

## Metrics of comparison

The authors argue that visual inspection of the saliency maps can be misleading. They propose to compute the Spearman rank correlation, the structural similarity index (SSMI) and the Pearson correlation of the histogram of gradients. The authors point out that those metrics capture various notions of similarity, but it is an active area of research and those metrics are imperfect.

## First test: model parameters randomization

A saliency method must be dependent of model parameters, otherwise it cannot help us understand a model. In this test, the authors randomize the model parameters, layer per layer, starting from the top.

Surprisingly, methods such as guided backprop and guided gradcam are completely insensitive to model parameters, as illustrated on this Inception v3 trained on ImageNet:

![image](https://user-images.githubusercontent.com/8659132/61403152-b10b8000-a8a2-11e9-9f6a-cf1ed6a876cc.png)

Integrated gradients looks also dubious as the bird is still visible with a mostly fully randomized model, but the quantitative metrics reveal the difference is actually big between the two models.

## Second test: data randomization

It is well-known that randomly shuffling the labels of a dataset does not prevent a neural network from getting a high accuracy on the training set, though it does prevent generalization. The model is able to learn by either memorizing the data or finding spurious patterns. As a result, saliency maps obtained from such a network should have no clearly interpretable signal.

Here is the result for a ConvNet trained on MNIST and a shuffled MNIST:

![image](https://user-images.githubusercontent.com/8659132/61406757-7efe1c00-a8aa-11e9-9826-a859a373cb4f.png)

The results are very damning for most methods. Only gradients and GradCam are very different between both models, as confirmed by the low correlation.

## Discussion

- Even though some methods do no depend on model parameters and data, they might still depend on the architecture of the models, which could be of some use in some contexts.
- Methods that multiply the input with the gradient are dominated by the input.
- Complex saliency methods are just fancy edge detectors.
- Only gradient, smooth gradient and GradCam survives the sanity checks.

# Comments

- Why is their GradCam maps so ugly? They don't look like usual GradCam maps at all.
- Their tests are simple enough that it's hard to defend a method that doesn't pass them.
- The methods that are left are not very good either. They give fuzzy maps that are difficult to interpret.
- In the case of integrated gradients (IG), I'm not convinced this is sufficient to discard the method. IG requires a "baseline input" that represents the absence of features. In the case of images, people usually just set the image to 0, which is not at all the absence of a feature. The authors also use the "set the image to 0" strategy, and I'd say their tests are damning for this strategy, not for IG in general. I'd expect an estimation of the baseline such as done in [this paper](https://arxiv.org/abs/1702.04595) would be a fairer evaluation of IG.

Code: [GitHub](https://github.com/adebayoj/sanity_checks_saliency) (not available as of 17/07/19)