A Guide to MIRI’s Research

by Nate Soares

Update March 2019: This research guide has been only lightly updated since 2015. Our new recommendation for people who want to work on theAI alignment problemis:

如果您有计算机科学或软件工程背景: Apply to attend our newworkshops on AI riskand towork as an engineer at MIRI。为此，您不需要事先熟悉我们的研究。金宝博娱乐
- If you aren’t sure whether you’d be a good fit for an AI risk workshop, or for an engineer position,shoot us an emailand we can talk about whether it makes sense.
- You can find out more about our engineering program in our2018策略更新。
If you’d like to learn more about the problems we’re working on（无论您对上述答案如何）：请参阅“Embedded Agency”介绍我们的代理基金会研究，并查看我们的金宝博娱乐Alignment Research Field Guidefor general recommendations on how to get started in AI safety.
- After checking out those two resources, you can use the links and references in “Embedded Agency” and on this page to learn more about the topics you want to drill down on. If you want a particular problem set to focus on, we suggest Scott Garrabrant’s “固定点练习。”正如斯科特指出的那样：
  
  Sometimes people ask me what math they should study in order to get into agent foundations. My first answer is that I have found the introductory class in every subfield to be helpful, but I have found the later classes to be much less helpful. My second answer is to learn enough math to understand all fixed point theorems.
  These two answers are actually very similar. Fixed point theorems span all across mathematics, and are central to (my way of) thinking about agent foundations.
- If you want people to collaborate and discuss with, we suggest starting or joining aMIRIx group, posting onLessWrong, applying for ourAI Risk for Computer Scientistsworkshops, or otherwiseletting us knowyou’re out there.

如果人类要发展出比人类的人工智能更聪明的人工智能，那么我们必须面临三个艰巨的挑战。首先，我们必须设计比人类更明智的系统金宝博官方高度可靠, so that we can justify confidence that the system will fulfill the specified goals or preferences. Second, the designs must beerror-tolerant, so that the systems are amenable to online modification and correction in the face of inevitable human error. Third, the system must actually learn beneficial goals or preferences.

Miri当前的研究计划的重点是金宝博娱乐了解如何原则上应对这些挑战。可靠的推理的某些方面即使在理论上也不理解。有界定理性的问题，即使在简化的设置中，我们也无法解决。我们的研究重点是在简化的设置中找到解决方案，这是第一步。因此，我们的现代研究看起来更像是纯数学，金宝博娱乐而不是软件工程或实用的机器学习。

本指南简要概述了我们的研究优先事项，并提供了可以帮助您进入每个主题金宝博娱乐领域的最前沿的资源。本指南并非旨在证明这些研究主题是合理的。金宝博娱乐有关我们方法的进一步动机，请参阅文章“美里的方法”, or to our技术议程andsupporting papers。

Note (Sep 2016): This research guide is based around ourAgent Foundations agenda。截至2016年，我们也有一个machine learning focused agenda。Refer to that document for more information about research directions that we think are promising, and which are not covered by this guide.

How to Use This Guide

This guide is intended for aspiring researchers who are not yet well-versed in the relevant subject areas. If you are already an AI professional or a seasoned mathematician, consider skipping to our118bet金博宝app instead. (Our技术议程这是一个很好的起点。）本指南是针对那些想知道是否想成为Miri研究人员的学生，以及其他想要加快工作速度的专业人士的研究。金宝博娱乐

金宝博娱乐研究人员通常最终以两条途径之一加入我们的团队。首先是参加美里车间，并亲自与我们建立关系。您可以使用this formto apply to attend a research workshop. Be warned that there is often quite a bit of time between workshops, and that they have limited capacity.

The second path is to make some progress on our research agenda independently and let us know about your results. You can useour online formto apply for assistance or input on your work, but the fastest way to start contributing is to read posts on theIntelligent Agent Foundations Forum(IAFF), note the open problems people are working on, and solve one. You can then post your results as alink在论坛上。

(Update March 2019: LessWrong and theAI Alignment Forum现在是我们公开讨论AI对齐问题的首选场所，并将IAFF取代。有关本节中建议的其他更新，请参见本文的顶部。）

The primary purpose of the research forum is for researchers who are already on the same page to discuss unpolished partial results. As such, posts on the forum can be quite opaque. This research guide can help you get up to speed on the open problems being discussed on the IAFF. It can also help you develop the skills necessary to qualify for a workshop, or find ways to work on open problems in AI alignment at other institutions.

This guide begins with recommendations for basic subjects that it’s important to understand before attempting this style of research, such as probability theory. After that, it’s broken into a series of topic areas, with links to papers that will catch you up to the state of the art in that area.

这不是线性指南：如果您想成为一名美里研究员，我建议您首先确保您了解基础知识，然后选择一个让您感兴趣并在该领域深入深金宝博娱乐度的主题。一旦您很好地理解了一个主题，就可以准备在IAFF上的该主题领域做出贡献。

With all of the material in this guide, please do not grind away for the sake of grinding away. If you already know the material, skip ahead. If one of the active research areas fails to capture your interest, switch to a different one. If you don’t like one of the recommended textbooks, find a better one or skip it entirely. This guide should serve as a tool for figuring out where you can contribute, not as an obstacle to that goal.

The Basics

It’s important to have some fluency with elementary mathematical concepts before jumping directly into our active research topics. All of our research areas are well-served by a basic understanding of computation, logic, and probability theory. Below are some resources to get you started.

您无需按照列出的顺序阅读本节中的书籍。拿起有趣的东西，请随时在研究区域和基本知识之间来回跳过。金宝博娱乐

Set Theory Most of modern mathematics is formalized in set theory, and the textbooks and papers listed here are no exception. This makes set theory a great place to begin. chapters 1-18 Computability and Logic The theory of computability (and the limits posed by diagonalization) is foundational to understanding what can and can’t be done by machines. chapters 1-4 Probability Theory Probability theory is central to an understanding of rational agency. Some familiarity with reasoning under uncertainty is critical in all of our active research areas. chapters 1-5 Probabilistic Inference This book will help flesh out an understanding of how inference can be done using probabilistic world-models. Statistics 统计建模的流利度将有助于我们的“Alignment for Advanced Machine Learning” research agenda. Some prior familiarity with probabilistic reasoning is a good idea here. 机器学习 To develop a practical familiarity with machine learning, we highly recommendAndrew Ng’s Coursera course（演讲笔记here). For a more theoretical introduction to ML, tryUnderstanding Machine Learning。 Artificial Intelligence 尽管我们的大部分工作都是理论上的，但对现代人工智能领域的知识对于将这项工作放在上下文中很重要。

It’s also important to understand the concept of VNM rationality, which I recommend learning fromWikipedia文章but which can also be picked up from theoriginal book。Von Neumann and Morgenstern showed that any agent obeying a few simple consistency axioms acts with preferences characterizable by a utility function. While some expect that we may ultimately need to abandon VNM rationality in order to construct reliable intelligent agents, the VNM framework remains the most expressive framework we have for characterizing the behavior of arbitrarily powerful agents. (For example, see theorthogonality thesisand theinstrumental convergence thesisfrom Bostrom’s “The Superintelligent Will.”）VNM合理性的概念在我们所有活跃的研究领域都使用。金宝博娱乐

Realistic World-Models

Formalizing beneficial goals does you no good if your smarter-than-human system is unreliable. There are aspects of good reasoning that we don’t yet understand, even in principle. It is likely possible to gain insight by building practical systems that use algorithms which seem to work, even if the reasons why they work are not yet well-understood: often, theoretical understanding follows in the wake of practical application. However, we consider this approach imprudent when designing systems that have the potential to become superintelligent: we will be safer if we have a theory of general intelligence on hand before attempting to create practical superintelligent systems.

For this reason, many of our active research topics focus on parts of general intelligence that we do not yet understand how to solve, even in principle. For example, consider the following problem:

我有一个计算机程序，称为“宇宙”。宇宙中的一个功能是未定义的。您的工作是为我提供适当类型的计算机程序来完成我的宇宙程序。然后，我将运行我的宇宙程序。我的目标是根据其了解原始宇宙计划的程度来评分您的代理商。

How could I do this? Solomonoff’s theory of inductive inference sheds some light on a theoretical solution: it describes a method for making ideal predictions from observations, but only in the case where the predictor lives outside the environment. Solomonoff induction has led to many useful tools for thinking about inductive inference (including Kolmogorov complexity, the universal prior, and AIXI), but the problem becomes decidedly more difficult in the case where the agent is a subprocess of the universe, computed by the universe.

In the case where the agent is embedded inside the environment, the induction problem gets murky: what counts as “learning the universe program”? Against what distribution over environments should the agent be scored? What constitutes ideal induction in the case where the boundary between “agent” and “environment” becomes blurry? These are questions of “naturalized induction.”

Soares’ “正式化两个现实世界模型的问题” further motivates problems of naturalized induction as relevant to the construction of a theory of general intelligence.
Altair的“Solomonoff感应的直观解释” explains Solomonoff’s theory of inductive inference, which is important background knowledge when it comes to understanding open problems of naturalized induction.
Bensinger’s “Naturalized induction” (series) explores questions of naturalized induction in more detail.

Solving problems of naturalized induction requires gaining a better understanding of realistic world-models: What is the set of “possible realities”? What sort of priors about the environment would an ideal agent use? Answers to these questions must not only allow good reasoning, they must allow for the specification of human goals in terms of those world-models.

例如，在Solomonoff感应（以及Hutter的Aixi）中，图灵机用于对环境进行建模。假装我们唯一重视的是钻石（碳原子共价与其他四个碳原子结合）。现在，说我给你一台图灵机。你能告诉我里面有多少钻石吗？

In order to design an agent that pursues goals specified in terms of its world models, the agent must have some way of identifying the ontology of our goals (carbon atoms) inside its world models (Turing machines). This “ontology identification” problem is discussed in “Formalizing Two Problems of Realistic World Models” (linked above), and was first introduced by De Blanc:

de Blanc的“Ontological crises in artificial agents’ value systems” asks how one might make an agent’s goals robust to changes in ontology. If the agent starts with an atomic model of physics (where carbon atoms are ontologically basic) then this may not be hard. But what happens when the agent builds a nuclear model of physics (where atoms are constructed from neutrons and protons)? If the “carbon recognizer” was hard-coded, the agent might fail to identify any carbon in this new world-model, and may start acting strangely (in search of hidden “true carbon”). How could the agent be designed so that it can successfully identify “six-proton atoms” with “carbon atoms” in response to this ontological crisis?

Further Reading

Legg and Hutter’s “通用智能：机器智能的定义” describes AIXI, a universally intelligent agent in settings where the agent is separate from the environment, and a “scoring metric” used to rate the intelligence of various agent programs in this setting. Hutter’s AIXI and Legg’s scoring metric are very similar in spirit to what we are looking for in response to problems of naturalized induction and ontology identification. The two differences are that AIXI lives in a universe where agent and environment are separated whereas naturalized induction requires a solution where the agent is embedded within the environment, and AIXI maximizes rewards specified in terms of observations whereas we desire a solution that optimizes rewards specified in terms of the outside world.

You can learn more about AIXI in Hutter’s bookUniversal Artificial Intelligence，尽管阅读Legg的论文（上面链接）可能足以满足我们的目的。

Decision Theory

Say I give you the following: (1) a computer program describing a universe; (2) a computer program describing an agent; (3) a set of actions available to the agent; (4) a set of preferences specified over the history of states that the universe has been in. I task you with identifying the best action available to the agent, with respect to those preferences. For example, your inputs might be:

def Universe(): outcomes = {Lo, Med, Hi} actions = {One, Two, Three} def Agent(): worldmodel = {Lo: One, Hi: Two, Med: Three} return worldmodel[Hi] territory = {One: Lo, Two: Med, Three: Hi} return territory[Agent()]

def agent（）：worldmodel = {lo：一个，嗨：二，med：三}返回worldmodel [hi]

动作= {一个，两个，三个}

Hi > Med > Lo

(Notice how the agent is embedded in the environment.) This is another question that we don’t know how to answer, even in principle. It may seem easy: just iterate over each action, figure out which outcome the agent would get if it took that action, and then pick the action that leads to the best outcome. But as a matter of fact, in this thought experiment, the agent is a deterministic subprocess of a deterministic computer program: there is exactly one action that the agent is going to output, and asking what “would happen” if a deterministic part of a deterministic program did something that it doesn’t do is ill-defined.

In order to evaluate what “would happen” if the agent took a different action, a “counterfactual environment” (where the agent does something that it doesn’t) must be constructed. Satisfactory theories of counterfactual reasoning do not yet exist. We don’t yet understand how to identify the best action available to an agent embedded within its environment, even in theory, even given full knowledge of the universe and our preferences and given unlimited computing power.

Solving this problem will require a better understanding of counterfactual reasoning; this is the domain of decision theory.

Decision Theory 彼得森的教科书用广泛的中风解释了规范决策理论的领域。有关更快的调查，重点是新康克问题，请参阅Muehlhauser的“决策理论常见问题解答。” Game Theory 决策理论中的许多开放问题涉及多代理设置。我听说过关于塔德利斯教科书的好消息，但我自己没有读过。Scott Alexander的“也可能很幸运Introduction to game theory” on LessWrong. chapters 1-5 （+6-9如果热情） Provability Logic Toy models of multi-agent settings can be studied in an environment where agents base their actions on the things that they can prove about other agents in the same environment. Our current toy models make heavy use of provability logic.

Existing methods of counterfactual reasoning turn out to be unsatisfactory both in the short term (in the sense that they systematically achieve poor outcomes on some problems where good outcomes are possible) and in the long term (in the sense that self-modifying agents reasoning using bad counterfactuals would, according to those broken counterfactuals, decide that they should not fix all of their flaws). My talk “你为什么不钱？” briefly touches upon both these points. To learn more, I suggest the following resources:

Soares & Fallenstein’s “走向理想化的决策理论” serves as a general overview, and further motivates problems of decision theory as relevant to MIRI’s research program. The paper discusses the shortcomings of two modern decision theories, and discusses a few new insights in decision theory that point toward new methods for performing counterfactual reasoning.

如果“走向理想化的决策理论”移动太快，那么这一系列博客文章可能是一个更好的起点：

Yudkowsky的“The true Prisoner’s Dilemma” explains why cooperation isn’t automatically the ‘right’ or ‘good’ option.
Soares’ “Causal decision theory is unsatisfactory”利用囚犯的困境来说明决策算法之间非因果关系的重要性。
Yudkowsky的“Newcomb’s problem and regret of rationality” argues for focusing on decision theories that ‘win,’ not just on ones that seem intuitively reasonable. Soares’ “Introduction to Newcomblike problems” covers similar ground.
Soares’ “Newcomblike problems are the norm” notes that human agents probabilistically model one another’s decision criteria on a routine basis.

MIRI’s research has led to the development of “Updateless Decision Theory” (UDT), a new decision theory which addresses many of the shortcomings discussed above.

Hintze’s “Problem class dominance in predictive dilemmas” summarizes UDT’s dominance over other known decision theories, including Timeless Decision Theory (TDT), another theory that dominates CDT and EDT.
Fallenstein’s “逻辑语句上具有混凝土先验的UDT模型” provides a probabilistic formalization.

However, UDT is by no means a solution, and has a number of shortcomings of its own, discussed in the following places:

Slepnev’s “An example of self-fulfilling spurious proofs in UDT” explains how UDT can achieve sub-optimal results due to spurious proofs.
Benson-Tilsen的“UDT与已知的搜索顺序” is a somewhat unsatisfactory solution. It contains a formalization of UDT with known proof-search order and demonstrates the necessity of using a technique known as “playing chicken with the universe” in order to avoid spurious proofs.

In order to study multi-agent settings, Patrick LaVictoire has developed a modal agents framework, which has also allowed us to use provability logic to make some novel progress in the field of decision theory:

Barasz et al。”囚犯困境中的强大合作” allows us to consider agents which decide whether or not to cooperate with each other based only upon what they canprove关于彼此的行为。这样可以防止无限退步；实际上，只有根据他们能证明对方的行为的行为，可以使用Provability Logic的结果在二次时间内确定对方的行为的行为。

Further Reading

UDT was developed by Wei Dai and Vladimir Slepnev, among others. Dai’s “走向新的决策理论” introduced the idea, and Slepnev’s “A model of UDT with a halting oracle” provided an early first formalization. Slepnev also described a strange problem with UDT wherein it seems as if agents are rewarded for having less intelligence, in “Agent simulates predictor”.

These blog posts are of historical interest, but nearly all of their content is in ”Toward idealized decision theory”, above.

逻辑不确定性

Imagine a black box, with one input chute and two output chutes. A ball can be put into the input chute, and it will come out of one of the two output chutes. Inside the black box is a Rube Goldberg machine which takes the ball from the input chute to one of the output chutes.

A perfect probabilistic reasoner who doesn’t know which Rube Goldberg machine is in the box doesn’t know how the box will behave, but if they could figure out which machine is inside the box, then they would know which chute would take the ball. This reasoner isenvironmentally uncertain。

A realistic reasoner might know which machine is in the box, and might know exactly how the machine works, but may lack the deductive capability to figure out where the machine will drop the ball. This reasoner islogically uncertain.

概率理论假定逻辑无所不知。它假设推理者知道他们所知道的事物的所有后果。实际上，有限的推理器在逻辑上并不是无所不知的：我们可以准确地知道该盒子的实施方式和机器的工作原理，而只是没有时间推断球出来的地方。我们在逻辑不确定性下推理。

A formal theory of reasoning under logical uncertainty does not yet exist. Gaining this understanding is extremely important when it comes to constructing a highly reliable generally intelligent system: whenever an agent reasons about the behavior of complex systems, computer programs, or other agents, it must operate under at least a little logical uncertainty.

要理解艺术的状态，必须对概率理论有一个可靠的理解。考虑增加前几章杰恩斯withFeller，第1、5、6和9章，然后研究以下论文：

Soares & Fallenstein’s “Questions of reasoning under logical uncertainty” provides a general introduction, explaining the field of logical uncertainty and motivating its relevance to MIRI’s research program.
Gaifman’s “关于一阶微弦的度量” looked at this problem many years ago. Gaifman has largely focused on a relevant subproblem, which is the assignment of probabilities to different models of a formal system (assuming that once the model is known, all consequences of that model are known). We are now attempting to expand this approach to a more complete notion of logical uncertainty (where a reasoner can know what the model is but not know the implications of that model), but work by Gaifman is still useful to gain a historical context and an understanding of the difficulties surrounding logical uncertainty.
Hutter et al。”Probabilities on sentences in an expressive logic”在很大程度上，假设访问无限的计算能力（以及许多级别的停止甲壳），则主要研究了逻辑不确定性的问题。了解Hutter的方法（以及无限的计算能力可以做什么）有助于充实我们对困难问题所在的理解。
Demski’s “Logical prior probability” provides a computably approximable logical prior. Following Demski, our work largely focuses on the creation of an approximable prior probability distribution over logical sentences, as the act of refining and approximating a logical prior is very similar to the act of reasoning under logical uncertainty in general.
Christiano’s “非善良，概率推理和metAmathematics”很大程度上遵循这种方法。本文提供了一些关于逻辑先验的生成的早期实际考虑因素，并突出了一些开放问题。

Further Reading

For more historical work on this problem, see Gaifman’s “Probabilities over rich languages…” and “Reasoning with limited resources and assigning probabilities to arithmetic statements。”

Vingean Reflection

Much of what makes the AI problem unique is that a sufficiently advanced system will be able to do higher-quality science and engineering than its human programmers. Many of the possible hazards and benefits of an advanced system stem from its potential to bootstrap itself to higher levels of capability, possibly leading to anintelligence explosion。

如果代理通过递归的自我完善实现了超智能，那么所得系统的影响完全取决于初始系统可靠地推论比自身更聪明的代理的能力。金宝博官方系统可以使用哪种推理方法来证明对更智能系统的行为的极高信心是合理的？金宝博官方在Vernor Vinge之后，我们将这种推理称为“ Vingean Reflection”（1993), who noted that it is not possible in general to precisely predict the behavior of agents which are more intelligent than the reasoner.

A reasoner performing Vingean reflection must necessarily reasonabstractlyabout the more intelligent agent. This will almost certainly require some form of high-confidence logically uncertain reasoning, but in lieu of a working theory of logical uncertainty, reasoning about proofs (using formal logic) is the best available formalism for studying abstract reasoning. As such, a modern study of Vingean reflection requires a background in formal logic:

First-Order Logic 为研究self-modif米里现有的玩具模型ying agents are largely based on this logic. Understanding the nuances of first-order logic is crucial for using the tools we have developed for studying formal systems capable of something approaching confidence in similar systems.

We study Vingean reflection by constructing toy models of agents which are able to gain some form of confidence in highly similar systems. To get to the cutting edge, read the following papers:

Fallenstein＆Soares的“”Vingean reflection: Reliable reasoning for self-improving agents” introduces the field of Vingean reflection, and motivates its connection to MIRI’s research program.
Yudkowsky的“The procrastination paradox” goes into more detail on the need for satisfactory solutions to walk a fine line between the Löbian obstacle (a problem stemming from too little “self-trust”) and unsoundness that come from toomuch自信。
Christiano et al.’s “概率逻辑中真理的确定性”描述了一种早期尝试创建一个正式系统的尝试，该系统可以推理自身，同时避免自我参考的悖论。金宝博官方它成功了，但最终被证明是不健全的。我的walkthroughfor this paper may help put it into a bit more context.
Fallenstein＆Soares的“”Problems of self-reference in self-improving space-time embedded intelligence” describes our simple suggester-verifier model for studying agents that produce slightly improved versions of themselves, or ’tile’ themselves. The paper demonstrates a toy scenario in which sound agents can successfully tile to (e.g., gain high confidence in) other similar agents.

Further Reading

Yudkowsky & Herreshoff’s “Tiling agents for self-modifying AI” is an older, choppier introduction to Vingean reflection which may be easier to work through using mywalkthrough。

If you’re excited about this research topic, there are a number of other relevant tech reports. Unfortunately, most of them don’t explain their motivations well, and have not yet been put into their greater context.

Fallenstein’s “概率逻辑中的拖延” illustrates how Christiano et al.’s probabilistic reasoning system is unsound and vulnerable to the procrastination paradox. Yudkowsky’s “分布允许平铺……”需要一些早期的步骤s probabilistic tiling settings.

Fallenstein’s “Decreasing mathematical strength…” describes one unsatisfactory property of Parametric Polymorphism, a partial solution to the Löbian obstacle. Soares’ “法伦斯坦的怪物” describes a hackish formal system which avoids the above problem. It also showcases a mechanism for restricting an agent’s goal predicate which can also be used by Parametric Polymorphism to create a less restrictive version of PP than the one explored in the tiling agents paper. Fallenstein’s “An infinitely descending sequence of sound theories…” describes a more elegant partial solution to the Löbian obstacle, which is now among our favored partial solutions.

An understanding of recursive ordinals provides a useful context from which to understand these results, and can be gained by reading Franzén’s “Transfinite progressions: A second look at completeness.”

Corrigibility

随着人为智能的系统在智能和能力方面的增长，他们的某些可用金宝博官方选择可能使他们能够抗拒程序员干预。如果AI系统与创作者认为的纠金宝博官方正干预措施合作，我们将其称为“可验证”，尽管有理由激励理性代理商抵制试图将其关闭或修改其偏好的尝试。

这个研究领域基本上是全新的，金宝博娱乐因此要迅速迅速而要做的就是阅读一两篇论文：

Soares et al.’s “Corrigibility”整个领域以及一些开放问题。
Armstrong’s “通过冷漠学习适当的价值学习” discusses one potential approach for making agents indifferent between which utility function they maximize, which is a small step towards agents that allow themselves to be modified.

我们当前的工作可订正主要on a small subproblem known as the “shutdown problem”: how do you construct an agent that shuts down upon the press of a shutdown button, and which does not have incentives to cause or prevent the pressing of the button? Within that subproblem, we currently focus on the utility indifference problem: how could you construct an agent which allows you to switch which utility function it maximizes, without giving it incentives to affect whether the switch occurs? Even if we had a satisfactory solution to the utility indifference problem, this would not yield a satisfactory solution to the shutdown problem, as it still seems difficult to adequately specify “shutdown behavior” in a manner that is immune to perverse instantiation. Stuart Armstrong has written several blog posts about the specification of “reduced impact” AGIs:

“Domesticating reduced impact AIs”
“Reduced impact AI: no back channels”

These first attempts are not yet a full solution, but they should get you up to speed on our current understanding of the problem.

Further Reading

Early work in corrigibility can be found on the web forumLess Wrong。Most of the relevant results are captured in the above papers. One of the more interesting of these is “Cake or Death”，“积极价值选择”问题的一个例子。在此示例中，对于避免降低其不确定性的信息，对其效用功能的不确定性不确定性受益。

Armstrong’s “影响减少的数学：需要帮助” lists initial ideas for specifying reduced-impact agents, and his “Reduced impact in practice: randomly sampling the future草图是评估未来是否受到影响的一种简单方法。

Armstrong’s “效用无动物” outlines the original utility indifference idea, and is largely interesting for historical reasons. It is subsumed by the “Proper value learning through indifference” paper linked above.

Value Learning

Since our own understanding of our values is fuzzy and incomplete, perhaps the most promising approach for loading values into a powerful AI is to specify a criterion for the agent tolearn我们的价值逐步。但这提出了许多有趣的问题：

Say you construct a training set containing many outcomes filled with happy humans (labeled “good”) and other outcomes filled with sad humans (labeled “bad”). The simplest generalization, from this data, might be that humans really like human-shaped smiling-things: this agent may then try to build many tiny animatronic happy-looking people.

Value learning must be an online process: the system must be able to identify ambiguities and raise queries about these ambiguities to the user. It must not only identify cases that it doesn’t know how to classify (such as cases where it cannot tell whether a face looks happy or sad), butalso识别培训数据没有提供信息的维度（例如，当您的培训数据从未显示出填充有人形自动机的结果时，看起来很快乐，标记为毫无价值）。

Of course, ambiguity identification alone isn’t enough: you don’t want a system that spends the first three weeks asking for clarification on whether humans are still worthwhile when they are at different elevations, or when the wind is blowing, before finally (after the operators have stopped paying attention) asking whether it’s important that the human-shaped things be acting of their own will.

为了使代理商可靠地学习我们的intentions, the agent must be constructing and refining a model of its operator and using that model to inform its queries and alter its preferences. To learn more about these problems and others, see the following:

Soares’ “The value learning problem”提供了一个通用的一些开放问题的概述ms related to value learning.
杜威的“Learning what to value” further discusses the difficulty of value learning.
Theorthogonality thesis认为默认情况下不会解决价值学习。
MacAskill’s “Normative Uncertainty” provides a framework for discussing normative uncertainty. Be warned, the full work, while containing many insights, is very long. You can get away with skimming parts and/or skipping around some, especially if you’re more excited about other areas of active research.

Further Reading

一种方法解决规范的不确定性Bostrom & Ord’s “议会模型”，这表明价值学习在一定程度上等同于选民的聚集问题，并且许多价值学习系统可以建模为议会投票系统（选民是可能的实用性功能）。金宝博官方

Owen Cotton-Barratt’s “Geometric reasons for normalising…” discusses the normalization of utility functions; this is relevant to toy models of reasoning under moral uncertainty.

Fallenstein & Stiennon’s “Loudness” discusses a concern with aggregating utility functions stemming from the fact that the preferences encoded by utility functions are preserved under positive affine transformation (e.g. as the utility function is scaled or shifted). This implies that special care is required in order to normalize the set of possible functions.

Other Tools

掌握在任何话题都可以成为一个非常强大的工具, especially in the realm of mathematics, where seemingly disjoint topics are actually deeply connected. Many fields of mathematics have the property that if you understand them very very well, then that understanding is useful no matter where you go. With that in mind, while the subjects listed below are not necessary in order to understand MIRI’s active research, an understanding of each of these subjects constitutes an additional tool in the mathematical toolbox that will often prove quite useful when doing new research.

离散数学 Textbook availableonline。大多数数学研究要么连续或离散结构。许多人发现离散数学更直观，对离散数学的深入了解将帮助您快速了解许多其他数学工具的离散版本，例如小组理论，拓扑和信息理论。 Linear Algebra Linear algebra is one of those tools that shows up almost everywhere in mathematics. A solid understanding of linear algebra will be helpful in many domains. Type Theory Set theory commonly serves as the foundation for modern mathematics, but it’s not the only available candidate. Type theory can also serve as a foundation for mathematics, and in many cases, type theory is a better fit for the problems at hand. Type theory also bridges much of the theoretical gap between computer programs and mathematical proofs, and is therefore often relevant to certain types of AI research. Category Theory Category theory studies many mathematical structures at a very high level of abstraction. This can help you notice patterns in disparate branches of mathematics, and makes it much easier to transfer your mathematical tools from one domain to another. Topology Topology is another one of those subjects that shows up pretty much everywhere in mathematics. A solid understanding of topology turns out to be helpful in many unexpected places. Computability and Complexity Miri的数学研究正在为金宝博娱乐最终与计算机程序相关的解决方案致力于解决方案。对于计算机所能做的东西的良好直觉通常是必不可少的。 Program Verification Program verification techniques allow programmers to become confident that a specific program will actually act according to some specification. (It is, of course, still difficult to validate that the specification describes the intended behavior.) While MIRI’s work is not currently concerned with verifying real-world programs, it is quite useful to understand what modern program verification techniques can and cannot do.

Understanding the Mission

Why do this kind of research in the first place?

Superintelligence This guide largely assumes that you’re already on board with MIRI’s mission, but if you’re wondering why so many people think this is an important and urgent area of research in the first place,Superintelligenceprovides a nice overview. Rationality: From AI to Zombies 这种电子书籍汇编了六卷论文，这些论文解释了Miri对AI的看法背后的许多哲学和认知科学。平衡不足 A discussion of microeconomics and epistemology as they bear on spotting societal missteps and blind spots, including neglected research opportunities. An attempt to answer the basic question, “When can ambitious projects to achieve unusual goals hope to succeed?”

A Guide to MIRI’s Research

by Nate Soares

How to Use This Guide

The Basics

Set Theory

Computability and Logic

Probability Theory

Probabilistic Inference

Statistics

机器学习

Artificial Intelligence

Realistic World-Models

Decision Theory

Decision Theory

Game Theory

Provability Logic

逻辑不确定性

Vingean Reflection

First-Order Logic

Corrigibility

Value Learning

Other Tools

离散数学

Linear Algebra

Type Theory

Category Theory

Topology

Computability and Complexity

Program Verification

Understanding the Mission

Superintelligence

Rationality: From AI to Zombies

平衡不足