[link]
Summary by mashayekhi 6 years ago
https://i.imgur.com/QxHktQC.png
The fundamental question that the paper is going to answer is weather deep learning can be realized with other prediction model other thahttps://i.imgur.com/Wh6xAbP.pngn neural networks. The authors proposed deep forest, the realization of deep learning using random forest(gcForest). The idea is simple and was inspired by representation learning in deep neural networks which mostly relies on the layer-by-layer processing of raw features.
Importance: Deep Neural Network (DNN) has several draw backs. It needs a lot of data to train. It has many hyper-parameters to tune. Moreover, not everyone has access to GPUs to build and train them. Training DNN is mostly like an art instead of a scientific/engineering task. Finally, theoretical analysis of DNN is extremely difficult. The aim of the paper is to propose a model to address these issues and at the same time to achieve performance competitive to deep neural networks.
Model: The proposed model consists of two parts. First part is a deep forest ensemble with a cascade structure similar to layer-by-layer architecture in DNN. Each level is an ensemble of random forest and to include diversity a combination of completely-random random forests and typical random forests are employed (number of trees in each forest is a hyper-parameter). The estimated class distribution, which is obtained by k-fold cv from forests, forms a class vector, which is then concatenated with the original feature vector to be input to the next level of cascade. Second part is a multi-grained scanning for representational learning where spatial and sequential relationships are captured using a sliding window scan (by applying various window sizes) on raw features, similar to the convolution and recurrent layers in DNN. Then, those features are passed to a completely random tree-forest and a typical random forest in order to generate transformed features. When transformed feature vectors are too long to be accommodated, feature sampling can be performed.
Benefits: gcForest has much fewer hyper-parameters than deep neural networks. The number of cascade levels can be adaptively determined such that the model complexity can be automatically set. If growing a new level does not improve the performance, the growth of the cascade terminates. Its performance is quite robust to hyper-parameter settings, such that in most cases and across different data from different domains, it is able to get excellent performance by using the default settings. gcForest achieves highly competitive performance to deep neural networks, whereas the training time cost of gcForest is smaller than that of DNN.
Experimental results: the authors compared the performance of gcForest and DNN by fixing an architecture for gcForest and testing various architectures for DNN, however assumed some fixed hyper-parameters for DNN such as activation and loss function, and dropout rate. They used MNIST (digit images recognition), ORL(face recognition), GTZAN(music classification ), sEMG (Hand Movement Recognition), IMDB (movie reviews sentiment analysis), and some low-dimensional datasets. The gcForest got the best results in these experiments and sometimes with significant differences.
My Opinions: The main goal of the paper is interesting; however one concern is the amount of efforts they put to find the best CNN network for the experiments as they also mentioned that finding a good configuration is an art instead of scientific work. For instance, they could use deep recurrent layers instead of MLP for the sentiment analysis dataset, which is typically a better option for this task. For the time complexity of the method, they only reported it for one experiment not all. More importantly, the result of CIFAR-10 in the supplementary materials shows a big gap between superior deep learning method result and gcForest result although the authors argued that gcForest can be tuned to get better result. gcForest was also compared to non-deep learning methods such as random forest and SVM which showed superior results. It was good to have the time complexity comparison for them as well. In my view, the paper is good as a starting point to answer to the original question, however, the proposed method and the experimental results are not convincing enough.
Github link: https://github.com/kingfengji/gcForest
more
less