DDPG + Mixing policy targets
方法
时间差分和蒙特卡洛的关系



Computing On-Policy MC Targets

Mixing Update Targets
实验
Results in discrete action space

Results: DDPG


Last updated