deep-reinforcement-learning
search
⌘Ctrlk
deep-reinforcement-learning
  • 介绍
  • 前言
    • 神经网络
    • 研究平台
  • 方法
    • 街机游戏
    • 蒙特祖玛的复仇
    • 竞速游戏
    • 第一人称射击游戏
    • 开放世界游戏
    • 即时战略游戏
    • 团队体育游戏
    • 文字冒险游戏
    • 开放的挑战
  • 附录
    • Distributional RL
    • Policy Gradient
      • Off-Policy Actor-Critic
      • Generalized Advantage Estimation
      • Soft Actor-Critic
      • PPO-Penalty
    • Model-Based RL
    • Imitation Learning and Inverse Reinforcement Learning
    • Transfer and Multitask RL
gitbookPowered by GitBook
block-quoteOn this pagechevron-down
  1. 附录chevron-right
  2. Policy Gradient

PPO-Penalty

Emergence of Locomotion Behaviours in Rich Environmentsarrow-up-right

PreviousSoft Actor-Criticchevron-leftNextModel-Based RLchevron-right

Last updated 6 years ago