deep-reinforcement-learning
Search...
Ctrl
K
附录
Policy Gradient
Soft Actor-Critic
Previous
Generalized Advantage Estimation
Next
PPO-Penalty
Last updated
5 years ago
Was this helpful?
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor