# 蒙特祖玛的复仇

具有稀疏反馈的环境仍然是增强学习的一个开放挑战。游戏 Montezuma's Revenge 便是一个很好的例子，并因此得到了更详细的研究，并被用于基于内在动机和好奇心的学习方法。运用内在动机的主要思想是在一定的自我奖励机制的基础上改进对环境的探索，最终帮助施动者获得外在的奖励。DQN在这款游戏中没有获得任何奖励(得分为0)，而Gorila的平均分仅为4.2分。一个人类专家可以获得4367分，很明显，目前提出的方法无法处理回报如此稀少的环境。一些有有希望方法旨在克服这些挑战。

![](/files/-LaNwvS2w5eKForEqAtB)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://hujian.gitbook.io/deep-reinforcement-learning/fang-fa/ment-te-zu-ma-de-fu-chou.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
