ShortScience.org - Making Science Accessible!

Welcome to ShortScience.org!

scholar.google.com

Collaborative Filtering for Implicit Feedback Datasets
Hu, Yifan and Koren, Yehuda and Volinsky, Chris
International Conference on Data Mining - 2008 via Local Bibsonomy
Keywords: collaborativfiltering, alternaterootsquare

[link] Summary by Martin Thoma 6 years ago

This paper is about a recommendation system approach using collaborative filtering (CF) on implicit feedback datasets.

The core of it is the minimization problem

$$\min_{x_*, y_*} \sum_{u,i} c_{ui} (p_{ui} - x_u^T y_i)^2 + \underbrace{\lambda \left ( \sum_u || x_u ||^2 + \sum_i || y_i ||^2\right )}_{\text{Regularization}}$$

with

* $\lambda \in [0, \infty[$ is a hyper parameter which defines how strong the model is regularized
* $u$ denoting a user, $u_*$ are all user factors $x_u$ combined
* $i$ denoting an item, $y_*$ are all item factors $y_i$ combined
* $x_u \in \mathbb{R}^n$ is the latent user factor (embedding); $n$ is another hyper parameter. $n=50$ seems to be a reasonable choice.
* $y_i \in \mathbb{R}^n$ is the latent item factor (embedding)
* $r_{ui}$ defines the "intensity"; higher values mean user $u$ interacted more with item $i$
* $p_{ui} = \begin{cases}1 & \text{if } r_{ui} >0\\0 &\text{otherwise}\end{cases}$
* $c_{ui} := 1 + \alpha r_{ui}$ where $\alpha \in [0, \infty[$ is a hyper parameter; $\alpha =40$ seems to be reasonable

In contrast, the standard matrix factoriation optimization function looks like this ([example](https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture25-mf.pdf)):

$$\min_{x_*, y_*} \sum_{(u, i, r_{ui}) \in \mathcal{R}} {(r_{ui} - x_u^T y_i)}^2  + \underbrace{\lambda \left ( \sum_u || x_u ||^2 + \sum_i || y_i ||^2\right )}_{\text{Regularization}}$$

where

* $\mathcal{R}$ is the set of all ratings $(u, i, r_{ui})$ - user $u$ has rated item $i$ with value $r_{ui} \in \mathbb{R}$

They use alternating least squares (ALS) to train this model.

The prediction then is the dot product between the user factor and all item factors ([source](https://github.com/benfred/implicit/blob/master/implicit/recommender_base.pyx#L157-L176))

dx.doi.org
sci-hub
scholar.google.com

Generative adversarial networks uncover epidermal regulators and predict single cell perturbations
Arsham Ghahramani and Fiona M Watt and Nicholas M Luscombe
bioRxiv: The preprint server for biology - 2018 via Local CrossRef
Keywords:

[link] Summary by David Stutz 7 years ago

Lee et al. propose a variant of adversarial training where a generator is trained simultaneously to generated adversarial perturbations. This approach follows the idea that it is possible to “learn” how to generate adversarial perturbations (as in [1]). In this case, the authors use the gradient of the classifier with respect to the input as hint for the generator. Both generator and classifier are then trained in an adversarial setting (analogously to generative adversarial networks), see the paper for details.

[1] Omid Poursaeed, Isay Katsman, Bicheng Gao, Serge Belongie. Generative Adversarial Perturbations. ArXiv, abs/1712.02328, 2017.

doi.ieeecomputersociety.org
sci-hub
scholar.google.com

Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis
Li, Chuan and Wand, Michael
Conference and Computer Vision and Pattern Recognition - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Alexander Jung 8 years ago

* They describe a method that applies the style of a source image to a target image.
* Example: Let a normal photo look like a van Gogh painting.
* Example: Let a normal car look more like a specific luxury car.
* Their method builds upon the well known artistic style paper and uses a new MRF prior.
* The prior leads to locally more plausible patterns (e.g. less artifacts).

### How
* They reuse the content loss from the artistic style paper.
* The content loss was calculated by feed the source and target image through a network (here: VGG19) and then estimating the squared error of the euclidean distance between one or more hidden layer activations.
* They use layer `relu4_2` for the distance measurement.
* They replace the original style loss with a MRF based style loss.
* Step 1: Extract from the source image `k x k` sized overlapping patches.
* Step 2: Perform step (1) analogously for the target image.
* Step 3: Feed the source image patches through a pretrained network (here: VGG19) and select the representations `r_s` from specific hidden layers (here: `relu3_1`, `relu4_1`).
* Step 4: Perform step (3) analogously for the target image. (Result: `r_t`)
* Step 5: For each patch of `r_s` find the best matching patch in `r_t` (based on normalized cross correlation).
* Step 6: Calculate the sum of squared errors (based on euclidean distances) of each patch in `r_s` and its best match (according to step 5).
* They add a regularizer loss.
* The loss encourages smooth transitions in the synthesized image (i.e. few edges, corners).
* It is based on the raw pixel values of the last synthesized image.
* For each pixel in the synthesized image, they calculate the squared x-gradient and the squared y-gradient and then add both.
* They use the sum of all those values as their loss (i.e. `regularizer loss = <sum over all pixels> x-gradient^2 + y-gradient^2`).
* Their whole optimization problem is then roughly `image = argmin_image MRF-style-loss + alpha1 * content-loss + alpha2 * regularizer-loss`.
* In practice, they start their synthesis with a low resolution image and then progressively increase the resolution (each time performing some iterations of optimization).
* In practice, they sample patches from the style image under several different rotations and scalings.

### Results
* In comparison to the original artistic style paper:
* Less artifacts.
* Their method tends to preserve style better, but content worse.
* Can handle photorealistic style transfer better, so long as the images are similar enough. If no good matches between patches can be found, their method performs worse.

![Non-photorealistic example images](https://raw.githubusercontent.com/aleju/papers/master/neural-nets/images/Combining_MRFs_and_CNNs_for_Image_Synthesis__examples.png?raw=true "Non-photorealistic example images")

*Non-photorealistic example images. Their method vs. the one from the original artistic style paper.*

![Photorealistic example images](https://raw.githubusercontent.com/aleju/papers/master/neural-nets/images/Combining_MRFs_and_CNNs_for_Image_Synthesis__examples_real.png?raw=true "Photorealistic example images")

*Photorealistic example images. Their method vs. the one from the original artistic style paper.*

dx.doi.org
sci-hub
scholar.google.com

Prediction gradients for feature extraction and analysis from convolutional neural networks
Lo, Henry Z. and Cohen, Joseph Paul and Ding, Wei
Conference on Automatic Face and Gesture Recognition - 2015 via Local Bibsonomy
Keywords: dblp

3	[link] Summary by Joseph Paul Cohen 10 years ago The prediction gradient is just $\frac{\partial \mathbf{y}}{\partial w}$ where $\mathbf{y}$ is the output before the loss function. more less

dx.doi.org
sci-hub
scholar.google.com

Near-optimal probabilistic RNA-seq quantification
Nicolas L Bray and Harold Pimentel and Páll Melsted and Lior Pachter
Nature Biotechnology - 2016 via Local CrossRef
Keywords:

[link] Summary by Geneviève Boucher 8 years ago

This paper from 2016 introduced a new k-mer based method to estimate isoform abundance from RNA-Seq data called kallisto.  The method provided a significant improvement in speed and memory usage compared to the previously used methods while yielding similar accuracy.   In fact, kallisto is able to quantify expression in a matter of minutes instead of hours.

The standard (previous) methods for quantifying expression rely on mapping, i.e. on the alignment of a transcriptome sequenced reads to a genome of reference.  Reads are assigned to a position in the genome and the gene or isoform expression values are derived by counting the number of reads overlapping the features of interest. 

The idea behind kallisto is to rely on a pseudoalignment which does not attempt to identify the positions of the reads in the transcripts, only the potential transcripts of origin. Thus,  it avoids doing an alignment of each read to a reference genome. In fact, kallisto only uses the transcriptome sequences (not the whole genome) in its first step which is the generation of  the kallisto index.  Kallisto builds a colored de Bruijn graph (T-DBG) from all the k-mers found in the transcriptome.  Each node of the graph corresponds to a k-mer (a short sequence of k nucleotides) and retains the information about the transcripts in which they can be found in the form of a color.  Linear stretches having the same coloring in the graph correspond to transcripts. Once the T-DBG is built, kallisto stores a hash table mapping each k-mer to its transcript(s) of origin along with the position within the transcript(s).  This step is done only once and is dependent on a provided annotation file (containing the sequences of all the transcripts in the transcriptome).  
  
Then for a given sequenced sample, kallisto decomposes each read into its k-mers and uses those k-mers to find a path covering in the T-DBG.  This path covering of the transcriptome graph, where a path corresponds to a transcript, generates k-compatibility classes for each k-mer, i.e. sets of potential transcripts of origin on the nodes.   The potential transcripts of origin for a read can be obtained using the intersection of its k-mers k-compatibility classes. To make the pseudoalignment faster, kallisto removes redundant k-mers since neighboring k-mers often belong to the same transcripts. Figure1, from the paper, summarizes these different steps.

https://i.imgur.com/eNH2kuO.png

**Figure1**. Overview of kallisto. The input consists of a reference transcriptome and reads from an RNA-seq experiment. (a) An example of a read (in black) and three overlapping transcripts with exonic regions as shown. (b) An index is constructed by creating the transcriptome de Bruijn Graph (T-DBG) where nodes (v1, v2, v3, ... ) are k-mers, each transcript corresponds to a colored path as shown and the path cover of the transcriptome induces a k-compatibility class for each k-mer. (c) Conceptually, the k-mers of a read are hashed (black nodes) to find the k-compatibility class of a read. (d) Skipping (black dashed lines) uses the information stored in the T-DBG to skip k-mers that are redundant because they have the same k-compatibility class. (e) The k-compatibility class of the read is determined by taking the intersection of the k-compatibility classes of its constituent k-mers.[From Bray et al. Near-optimal probabilistic RNA-seq quantification, Nature Biotechnology, 2016.]

Then, kallisto optimizes the following RNA-Seq likelihood function using the expectation-maximization (EM) algorithm.  

$$L(\alpha) \propto \prod_{f \in F} \sum_{t \in T} y_{f,t} \frac{\alpha_t}{l_t} = \prod_{e \in E}\left(  \sum_{t \in e} \frac{\alpha_t}{l_t} \right )^{c_e}$$

In this function,  $F$ is the set of fragments (or reads), $T$ is the set of transcripts, $l_t$ is the (effective) length of transcript $t$ and **y**$_{f,t}$ is a compatibility matrix defined as 1 if  fragment $f$ is compatible with $t$ and 0 otherwise.  The parameters $α_t$ are the probabilities of selecting reads from a transcript $t$.  These $α_t$ are the parameters of interest since they represent the isoforms abundances or relative expressions.

To make things faster, the compatibility matrix is collapsed (factorized) into equivalence classes. An equivalent class consists of all the reads compatible with the same subsets of transcripts. The EM algorithm is applied to equivalence classes (not to reads).  Each $α_t$ will be optimized to maximise the likelihood of transcript abundances given observations of the equivalence classes. The speed of the method makes it possible to evaluate the uncertainty of the  abundance estimates for each RNA-Seq sample using a bootstrap technique.  For a given sample containing $N$ reads, a bootstrap sample is generated from the sampling of $N$ counts from a multinomial distribution over the equivalence classes derived from the original sample.  The EM algorithm is applied on those sampled equivalence class counts to estimate transcript abundances. The bootstrap information is then used in downstream analyses such as determining which genes are differentially expressed.

Practically, we can illustrate the different steps involved in kallisto using a small example.  Starting from a tiny genome with 3 transcripts, assume that the RNA-Seq experiment produced 4 reads as depicted in the image below.

https://i.imgur.com/5JDpQO8.png

The first step is to build the T-DBG graph and the kallisto index.  All transcript sequences are decomposed into k-mers (here k=5) to construct the colored de Bruijn graph. Not all nodes are represented in the following drawing.  The idea is that each different transcript will lead to a different path in the graph.  The strand is not taken into account, kallisto is strand-agnostic.

https://i.imgur.com/4oW72z0.png

Once the index is built, the four reads of the sequenced sample can be analysed.  They are decomposed into k-mers (k=5 here too) and the pre-built index is used to determine the k-compatibility class of each k-mer. Then, the k-compatibility class of each read is computed. For example, for read 1, the intersection of all the k-compatibility classes of its k-mers suggests that it might come from transcript 1 or transcript 2.

https://i.imgur.com/woektCH.png

This is done for the four reads enabling the construction of the compatibility matrix  **y**$_{f,t}$ which is part of the RNA-Seq likelihood function.  In this equation, the $α_t$ are the parameters that we want to estimate.

https://i.imgur.com/Hp5QJvH.png

The EM algorithm being too slow to be applied on millions of reads, the compatibility matrix **y**$_{f,t}$ is factorized into equivalence classes and a count is computed for each class (how many reads are represented by this equivalence class). The EM algorithm uses this collapsed information to maximize the new equivalent RNA-Seq likelihood function and optimize the $α_t$.

https://i.imgur.com/qzsEq8A.png

The EM algorithm stops when for every transcript $t$, $α_tN$ > 0.01 changes less than 1%, where $N$ is the total number of reads.