Henry Z Lo's profile - ShortScience.org

scholar.google.com

Bitcoin: A peer-to-peer electronic cash system
Nakamoto, Satoshi
- 2008 via Local Bibsonomy
Keywords: sre, critical_cooperation, peeri

[link] Summary by Henry Z Lo 8 years ago

This paper describes the Bitcoin peer-to-peer currency system.

The paper first describes a bitcoin as a chain of transactions, the latest of which contains the current owner's public key, and is cryptographically signed by the previous owner. The integrity of this chain (and therefore the ownership of the coin) is maintained as each transaction contains a hash of the previous transaction. Therefore, it is impossible to corrupt any one transaction in the chain without affecting all future transactions as well.

https://i.imgur.com/hKbgsdi.png

The key issue with this system is *double counting* in a peer-to-peer environment. In physical currency, this problem does not exist, as the purchase using a physical token precludes using the same token by the same owner again. In other digital currencies, the problem of double counting is solved using a central arbiter.

The authors solve this using a timestamp service, which stamps a block of transactions. The transaction which have the earliest timestamp is used as the true transaction. The block of transactions is linked to a previous block, thereby forming a *blockchain*.

https://i.imgur.com/BYlVirg.png

To prevent the integrity of the blockchain, each block contains the previous hash, thereby ensuring the integrity of the chain. There is also a nonce value, which is an input to a computationally difficult problem which must be solved in order to place a new block in the blockchain. This makes it difficult for an attacker to tack on a new chain.

The problem that must be solved is to increment the nonce such that the hash of the block contains a specified number of 0 bits. The difficulty of this problem can be adjusted by increasing the number of 0's, and grows over time to accommodate the growing power of computer hardware. Note that verifying that this problem is easily solved.

The resulting blockchain therefore constitutes a large investment of computational resources, with the longest blockchain being used as the standard.

Not all participants in bitcoin need to mine, so an incentive is given. When a new block is mined, the individual who mines it is given some portion of bitcoin either out of thin-air or as a transaction fee. This also deters would-be corruptors of the blockchain, as the computational resources needed may be better spent adding new blocks on the newest version of the blockchain.

papers.nips.cc
scholar.google.com

Algorithms for Non-negative Matrix Factorization
Lee, Daniel D. and Seung, H. Sebastian
Neural Information Processing Systems Conference - 2000 via Local Bibsonomy
Keywords: dblp

[link] Summary by Henry Z Lo 10 years ago

The paper introduces nonnegative matrix factorization, a technique which used in fields such as chemometrics.  The problem formulation is this:

$$
\underset{W,H}{\text{argmin}} ~ d(X, WH) \\\\
\text{s.t.} ~W_{ij}, H_{ij} \ge 0
$$

Where:
- $X \in \mathbb{R}^{n \times m}$ is a matrix of data, for example, $n$ samples of $m$ features.  Each element of $X$ is nonnegative, as are the elements of $W$ and $H$.
- $W \in \mathbb{R}^{n \times k}$ represents how each of the $n$ samples belong to each of the $k$ "clusters".
- $H \in \mathbb{R}^{k \times m}$ describes each of the $k$ clusters in terms of the $m$ variables.
- $d$ is some cost function, for example, sum of squared differences.

The non-negativity constraint means the clusters (represented by the rows $W$ of $W$) describe clusters in terms of what features are present.  This may make interpretation easier in some instances, but makes the optimization problem more difficult.

The paper mentions two loss functions, sum of squared error:

$$
d(X,WH) = \sum_{ij} |X_{ij}-(WH)_{ij}|^2
$$

 and a measure similar to unnormalized Kullback-Leibler divergence:
 
 $$
d(X,WH) = \sum_{ij} \left( X_{ij} \log \frac{X_{ij}}{(WH)_{ij}} - X_{ij}+(WH)_{ij} \right)
$$

For each of these objectives, multiplicative update rules are given.  For squared error:

$$
H_{ij} \leftarrow H_{ij} \frac{(W^TX)_{ij}}{(W^TWH)}_{ij} ~~~
W_{ij} \leftarrow W_{ij} \frac{(XH^T)_{ij}}{(WHH^T)}_{ij}
$$

And for divergence:

$$
H_{ij} \leftarrow H_{ij} \frac{\sum_a W_{ai} X_{aj} / (WH)_{aj}} { \sum_b W_{bj}} ~~~~
W_{ij} \leftarrow W_{ij} \frac{\sum_a H_{ja} X_{ia} / (WH)_{ia}} { \sum_b H_{jb}}
$$

These rules are applied alternatingly; fix $W$ and update $H$, then fix $H$ and update $W$.

These multiplicative updates are essentially a diagonally rescaled gradient descent.  The authors then prove that these update rules do not increase the objective.  Future authors have pointed out that not increasing the cost does not imply convergence; e.g. the parameters could stop updating, without having reached a minima.  However, a trivial fix to the multiplicative update rules (ensuring no division by zero, by making 0 elements slightly positive) alleviates these problems.

dx.doi.org
sci-hub
scholar.google.com

Fast R-CNN
Girshick, Ross B.
International Conference on Computer Vision - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by Henry Z Lo 10 years ago

This paper is awesome in that it is full of content.

They replace W with its TSVD.  When t, the reduced rank, is small, it saves computation time because you multiply smaller matrices twice rather than multiplying bigger matrices once.

In terms of units in hidden layers, they turn n->m into n->t->m

This only works for the forward pass though.  If you were to train this, you would only learn a rank t matrix.  In which case, there would be no reason to have the t->m layer.  Unless you want more nonlinearities, but less rank; haven't seen that before.

arxiv.org
scholar.google.com

RandomOut: Using a convolutional gradient norm to win The Filter Lottery
Cohen, Joseph Paul and Lo, Henry Z. and Ding, Wei
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

3	[link] Summary by Henry Z Lo 10 years ago The proposed Randomout algorithm randomly restarts filter weights in a CNN when it has a low gradient. more less 1 Comments

Henry Z Lo

sciscore: 2.75