deep-reinforcement-learning
search
Ctrlk
  • 介绍
  • 前言
    • 神经网络
    • 研究平台chevron-right
  • 方法
    • 街机游戏chevron-right
    • 蒙特祖玛的复仇chevron-right
    • 竞速游戏chevron-right
    • 第一人称射击游戏chevron-right
    • 开放世界游戏chevron-right
    • 即时战略游戏chevron-right
    • 团队体育游戏chevron-right
    • 文字冒险游戏chevron-right
    • 开放的挑战chevron-right
  • 附录
    • Distributional RLchevron-right
    • Policy Gradientchevron-right
      • Off-Policy Actor-Critic
      • Generalized Advantage Estimation
      • Soft Actor-Critic
      • PPO-Penalty
    • Model-Based RLchevron-right
    • Imitation Learning and Inverse Reinforcement Learningchevron-right
    • Transfer and Multitask RLchevron-right
gitbookPowered by GitBook
block-quoteOn this pagechevron-down
  1. 附录

Policy Gradient

Off-Policy Actor-Criticchevron-rightGeneralized Advantage Estimationchevron-rightSoft Actor-Criticchevron-rightPPO-Penaltychevron-right
PreviousQR-DQNchevron-leftNextOff-Policy Actor-Criticchevron-right

Last updated 6 years ago