[link]
Summary by Tianxiao Zhao 6 years ago
- *issue:* RL on real systems -> sparse and slow data sampling;
- *solution:* pre-train the agent with the EGAN;
- *performance:* ~20% improvement of training time in the beginning of learning compared to no pre-training; ~5% improvement and smaller variations compared to GAN pre-training.
## Introduction
5G telecom systems -> fufill ultra-low latency, high robustness, quick response to changed capacity needs, and dynamic allocation of functionality.
*Problems:*
1. exploration has an impact on the service quality in real-time service systems;
2. sparse and slow data sampling -> extended training duration.
## Enhanced GAN
**Fomulas**
the training data for RL tasks:
$$x = [x_1, x_2] = [(s_t,a),(s_{t+1},r)]$$
the generated data:
$$G(z) = [G_1(z), G_2(z)] = [(s'_t,a'),(s'_{t+1},r')] $$
the value function for GAN:
$$V(D,G) = \mathbb{E}_{z \sim p_z(z)}[\log(1-D(G(z)))] + \lambda D_{KL}(P||Q)$$
where the regularization term $D_{KL}$ has the following form:
$$D_{KL}(P||Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}$$
**EGAN structure**
https://i.imgur.com/FhPxamJ.png
**Algorithm**
https://i.imgur.com/RzOGmNy.png
The enhancer is fed with training data *D\_r(s\_t, a)* and *D\_r(s\_{t+1}, r)*, and trained by supervised learning. After GAN generates synthetic data *D\_t(s\_t, a, s\_{t+1}, r)*, the enhancer could enhance the dependency between *D\_t(s\_t, a)* and *D\_t(s\_{t+1}, r)* and update the weights of GAN.
## Results
two lines of experiments on CartPole environment involved with PG agents:
1. one for comparing the learning curves of agents with no pre-training, GAN pre-training and EGAN pre-training. => Result: EGAN > GAN > no pre-training
2. one for comparing the learning curves of agents with EGAN pre-training for various episodes (500, 2000, 5000). => Result: 5000 > 2000 ~= 500
more
less