First published: 2016/09/21 (8 years ago) Abstract: Molecular profiling data (e.g., gene expression) has been used for clinical
risk prediction and biomarker discovery. However, it is necessary to integrate
other prior knowledge like biological pathways or gene interaction networks to
improve the predictive ability and biological interpretability of biomarkers.
Here, we first introduce a general regularized Logistic Regression (LR)
framework with regularized term $\lambda \|\bm{w}\|_1 +
\eta\bm{w}^T\bm{M}\bm{w}$, which can reduce to different penalties, including
Lasso, elastic net, and network-regularized terms with different $\bm{M}$. This
framework can be easily solved in a unified manner by a cyclic coordinate
descent algorithm which can avoid inverse matrix operation and accelerate the
computing speed. However, if those estimated $\bm{w}_i$ and $\bm{w}_j$ have
opposite signs, then the traditional network-regularized penalty may not
perform well. To address it, we introduce a novel network-regularized sparse LR
model with a new penalty $\lambda \|\bm{w}\|_1 + \eta|\bm{w}|^T\bm{M}|\bm{w}|$
to consider the difference between the absolute values of the coefficients. And
we develop two efficient algorithms to solve it. Finally, we test our methods
and compare them with the related ones using simulated and real data to show
their efficiency.
In this paper they prior the representation a logistic regression model using known protein-protein interactions. They do so by regularizing the weights of the model using the Laplacian encoding of a graph.
Here is a regularization term of this form:
$$\lambda ||w||_1 + \eta w^T L w,$$
#### A small example:
Given a small graph of three nodes A, B, and C with one edge: {A-B} we have the following Laplacian:
$$
L = D - A =
\left[\array{
1 & 0 & 0 \\
0 & 1 & 0\\
0 & 0 & 0}\right]
-
\left[\array{
0 & 1 & 0 \\
1 & 0 & 0\\
0 & 0 & 0}\right]$$
$$L =
\left[\array{
1 & -1 & 0 \\
-1 & 1 & 0\\
0 & 0 & 0}\right]
$$
If we have a small linear regression of the form:
$$y = x_Aw_A + x_Bw_B + x_Cw_C$$
Then we can look at how $w^TLw$ will impact the weights to gain insight:
$$w^TLw $$
$$=
\left[\array{
w_A &
w_B &
w_C}\right]
\left[\array{
1 & -1 & 0 \\
-1 & 1 & 0\\
0 & 0 & 0}\right]
\left[\array{
w_A \\
w_B \\
w_C}\right]
$$
$$=
\left[\array{
w_A &
w_B &
w_C}\right]
\left[\array{
w_A -w_B \\
-w_A + w_B \\
0}\right]
$$
$$
=
(w_A^2 -w_Aw_B ) +
(-w_Aw_B + w_B^2)
$$
So because all terms are squared we can remove them from consideration to look at what is the real impact of regularization.
$$
=
(-w_Aw_B ) +
(-w_Aw_B)
$$
$$ = -2w_Aw_B$$
The Laplacian regularization seems to increase the weight values of edges which are connected. Along with the squared terms and the $L1$ penalty that is also used the weights cannot grow without bound.
#### A few more experiments:
If we perform the same computation for a graph with two edges: {A-B, B-C} we have the following term which increases the weights of both pairwise interactions:
$$ = -2w_Aw_B -2w_Bw_C$$
If we perform the same computation for a graph with two edges: {A-B, A-C} we have no surprises:
$$ = -2w_Aw_B -2w_Aw_C$$
Another thing to think about is if there are no edges. If by default there are self-loops then the degree matrix will have 1 on the diagonal and it will be the identity which will be an $L2$ term. If no self loops are defined then the result is a 0 matrix yielding no regularization at all.
#### Contribution:
A contribution of this paper is to use the absolute value of the weights to make training easier.
$$|w|^T L |w|$$
TODO: Add more about how this impacts learning.
#### Overview
Here a high level figure shows the data and targets together with a graph prior. It looks nice so I wanted to include it.
https://i.imgur.com/rnGtHqe.png