2018 Update: Our New Research Directions

For many years, MIRI’s goal has been to resolve enough fundamental confusions aroundalignment和intelligence to enable humanity to think clearly about technical AI safety risks—and to do this before this technology advances to the point of potential catastrophe. This goal has always seemed to us to be difficult, but possible.¹

Last year, we said that we were beginning a new research program aimed at this goal.²Here, we’re going to provide background on how we’re thinking about this new set of research directions, lay out some of the thinking behind our recent decision to do less default sharing of our research, and make the case for interested software engineers tojoin our team和help push our understanding forward.

1.我们的研究金宝博娱乐

2014年，Miri发布了其第一个研究议程，“金宝博娱乐使机器智能与人类利益对齐的代理基础。”从那时起，我们的主要研究重点之一就是对金宝博娱乐embedded agency：正式表征缺乏清晰的代理/环境边界的推理系统，比其环境小，必须推理自己，金宝博官方并冒着在交叉目的工作的零件的风险。这些研究问题金宝博娱乐仍然是Miri的主要重点，并且正在与我们的新研究方向同时研究（我将重点介绍下面的更多内容）。³

从我们的角度来看，解决这类问题的目的不是直接告诉我们如何构建良好的AGI系统。金宝博官方相反，关键是解决我们围绕诸如“对齐”和“ AGI”等思想的困惑，以便未来的AGI开发人员对问题有毫无疑问的看法。Eliezer在“The Rocket Alignment Problem”，它想象一个世界在了解牛顿力学或微积分之前试图降落在月球上的世界。

最近，一些Miri研究人员开发了新的研究金宝博娱乐方向，这些方向似乎可以使得解决这些基本困惑的更可扩展的进展。Specifically, the progress is more scalable in researcher hours—it’s now the case that we believe excellent engineers coming from a variety of backgrounds can have their work efficiently converted into research progress at MIRI—where previously, we only knew how to speed our research progress with a (relatively atypical) breed of mathematician.

同时，我们看到了一些重要的financial success在过去的一年中，不再是一定不再是限制的，而是足以从新的和不同的方向提取我们的研究议程。金宝博娱乐

此外，我们的观点意味着仓促是必不可少的。我们认为阿吉是a likely cause of existential catastrophes, especially if it’s developed with relatively brute-force-reliant, difficult-to-interpret techniques; and although we’requite uncertainabout when humanity’s collective deadline will come to pass, many of us are somewhat alarmed by the speed of recent machine learning progress.

由于这些原因，我们渴望迅速找到合适的人，并为他们提供这些新方法的工作；借助这种帮助，它使我们很有可能解决足够的基本混乱，以及时将理解移植到在建立和部署AGI之前需要它的人。

Comparing our new research directions and Agent Foundations

Our new research directions involve building software systems that we can use to test our intuitions, and building infrastructure that allows us to rapidly iterate this process. Like the Agent Foundations agenda, our new research directions continue to focus on “deconfusion,” rather than on, e.g., trying to improve robustness metrics of current systems—our sense being that even if we make major strides on this kind of robustness work, an AGI system built on principles similar to today’s systems would still be too opaque to align in practice.

从某种意义上说，您可以将我们的新研究视为解决我们一直在攻击的同样的问题，但金宝博娱乐要从新角度开始。换句话说，如果您对logical inductorsorfunctional decision theory，您可能也不会对我们的新作品感到兴奋。Conversely, if you already have the sense that becoming less confused is a sane way to approach AI alignment, and you’ve been wanting to see those kinds of confusions attacked with software and experimentation in a manner that yields theoretical satisfaction, then you may well want to work at MIRI. (I’ll have more to say about thisbelow。)

Our new research directions stem from some distinct ideas had by Benya Fallenstein, Eliezer Yudkowsky, and myself (Nate Soares). Some high-level themes of these new directions include:

寻求全新的低级基础以进行优化, designed for transparency and alignability from the get-go, as an alternative to gradient-descent-style machine learning foundations.
请注意,这并不需要试图击败模式rn ML techniques on computational efficiency, speed of development, ease of deployment, or other such properties. However, it does mean developing new foundations for optimization that are broadly applicable in the same way, and for some of the same reasons, that gradient descent scales to be broadly applicable, while possessing significantly better alignment characteristics.
We’re aware that there are many ways to attempt this that are shallow, foolish, or otherwise doomed; and in spite of this, we believe our own research avenues have a shot.
Endeavoring to figure out parts of cognition that can be very transparent as cognition, without being GOFAI or completely disengaged from subsymbolic cognition.
Experimenting with some specific alignment problems比以前已经进入计算环境的问题要深。

In common between all our new approaches is a focus on using high-level theoretical abstractions to enable coherent reasoning about the systems we build. A concrete implication of this is that we write lots of our code in Haskell, and are often thinking about our code through the lens of type theory.

We aren’t going to distribute the technical details of this work anytime soon, in keeping with the recent MIRI policy changes在下面讨论。However, we have a good deal to say about this research on the meta level.

We are excited about these research directions, both for their present properties and for the way they seem to be developing. When Benya began the predecessor of this work ~3 years ago, we didn’t know whether her intuitions would pan out. Today, having watched the pattern by which research avenues in these spaces have opened up new exciting-feeling lines of inquiry, none of us expect this research to die soon, and some of us are hopeful that this work may eventually open pathways to attacking the entire list of basic alignment issues.⁴

我们也同样兴奋的程度eful cross-connections have arisen between initially-unrelated-looking strands of our research. During a period where I was focusing primarily on new lines of research, for example, I stumbled across a solution to the original version of thetiling agents problemfrom the Agent Foundations agenda.⁵

这项工作似乎比《代理基金会议程》更多地“发出自己的指南”。虽然过去我们需要对研究口味非常紧密地适应研究口味，但现在我们认为我们对地形有足够的感觉，可以放松这些要求。金宝博娱乐我们仍在寻找科学创新和是的员工fairlyclose on research taste, but our work is now much more scalable with the number of good mathematicians and engineers working at MIRI.

With all of that said, and despite how promising the last couple of years have seemed to us, this is still “blue sky” research in the sense that we’d guess most outside MIRI would still regard it as of academic interest but of no practical interest. The more principled/coherent/alignable optimization algorithms we are investigating are not going to sort cat pictures from non-cat pictures anytime soon.

通常，我们对研究结果感到兴奋的事情是，从下一节中描述的意义上，他们授予我们“解灌注”金宝博娱乐的程度，而不是他们直接启用的ML/工程功率。目前，他们据称反映出的这种“解息”必须通过混凝土薄弱地支持“看这种理解让我们做什么”演示，主要是通过抽象论点来辨别的。Miri中的许多人都认为我们的工作具有很强的实际相关性，但这是因为我们有长期模型的短期壮举表明了进步，并且因为我们认为对一致性的困惑不那么混乱，因为practical relevance to humanity’s future, for reasons that I’ll sketch out next.

2.为什么反灌注对我们如此重要

我们的脱口而一的意思

Quoting Anna Salamon, the president of the Center for Applied Rationality and a MIRI board member:

If I didn’t have the concept of deconfusion, MIRI’s efforts would strike me as mostly inane. MIRI continues to regard its own work as significant for human survival, despite the fact that many larger and richer organizations are now talking about AI safety. It’s a group that got all excited aboutLogical Induction(and tried paranoidly to make sure Logical Induction “wasn’t dangerous”beforereleasing it)—even though Logical Induction had only a moderate amount of math and no practical engineering at all (and did something similar with永恒的决策理论, to pick an even more extreme example). It’s a group that continues to stare mostly at basic concepts, sitting reclusively off by itself, while mostly leaving questions of politics, outreach, and how much influence the AI safety community has, to others.

However, I do have the concept of deconfusion. And when I look at MIRI’s activities through that lens, MIRI seems to me much more like “oh, yes, good, someoneis直接拍摄看起来像关键的事情”和“他们似乎有战斗机会”和“天哪，我希望他们（或某人以某种方式）在截止日期之前解决了更多的困惑，因为没有这样的进步，人类肯定会肯定似乎有点沉没。”

I agree that MIRI’s perspective and strategy don’t make much sense without the idea I’m calling “deconfusion.” As someone reading a MIRI strategy update, you probably already partly have this concept, but I’ve found that it’s not trivial to transmit the full idea, so I ask your patience as I try to put it into words.

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”

举一个具体的例子，我对10岁的无穷大的想法是由重新排列的混乱而不是连贯的，即使是1700年最好的数学家的想法也是如此。如果我们从方程式的两侧减去无穷大，会发生什么？”但是我对20岁的无穷大的想法是不是同样感到困惑，因为到那时，我已经接触到后来数学家生产的更连贯的概念。我不像乔治·康托尔（Georg Cantor）或1700年最好的数学家那样聪明或好。但是可以在人之间转移灌注；这种转移可以传播思考实际思想的能力。

In 1998, conversations about AI risk and technological singularity scenarios often went in circles in a funny sort of way. People who are serious thinkers about the topic today, including my colleagues Eliezer and Anna, said things that today sound confused. (When I say “things that sound confused,” I have in mind things like “isn’t intelligence an不连贯concept,” “but the economy’salreadysuperintelligent,” “if a superhuman AI is smart enough that it could kill us, it’ll also besmart enoughto see that that isn’t what the good thing to do is, so we’ll be fine,” “we’re Turing-complete, so it’s impossible to have something dangerously更聪明比我们，因为图灵完整计算可以效仿任何东西。”和“无论如何，我们都可以unplug它。”）今天，这些对话是不同的。在这两者之间，人们努力使自己和其他人对这些主题的根本不那么困惑 - 今天，一个14岁的年轻人想跳到所有不连贯的结局，只能拿起尼克·博斯特罗姆的副本超级智能。⁶

Of note is the fact that the “take AI risk and technological singularities seriously” meme started to spread to the larger population of ML scientists only after its main proponents attained sufficient deconfusion. If you were living in 1998 with a strong intuitive sense that AI risk and technological singularities should be taken seriously, but you still possessed a host of confusion that caused you to occasionally spout nonsense as you struggled to put things into words in the face of various confused objections, then evangelism would do you little good among serious thinkers—perhaps because the respectable scientists and engineers in the field can smell nonsense, and can tell (correctly!) that your concepts are still incoherent. It’s by accumulating deconfusion until your concepts cohere and your arguments become well-formed that your ideas can become memetically fit and spread among scientists—and can serve as foundations for future work by those same scientists.

有趣的是，科学的历史实际上充满了一个实例，在这种情况下，个人研究人员长期以来拥有大多数直觉的机构，然后最终这些直觉得到了正式的，纠正，精确的，精确的和转移的人之间的直觉。金宝博娱乐法拉第（Faraday）发现了各种各样的电磁现象，在直觉的指导下，除非通过数百页详细的实验室音符和图表，但他无法正式或传播。麦克斯韦后来通过阅读法拉第的作品来形式地描述电磁主义的语言，并以三行方式表达了数百页直觉。

An even more striking example is the case of Archimedes, who intuited his way to the ability to do useful work in both integral and differential calculus thousands of years before calculus became a simple formal thing that could be passed between people.

In both cases, it was the eventual formalization of those intuitions—and the linked ability of these intuitions to be passed accurately between many researchers—that allowed the fields to begin building properly and quickly.⁷

Why deconfusion (on our view) is highly relevant to AI accident risk

If human beings eventually build smarter-than-human AI, and if smarter-than-human AI is as powerful and hazardous as we currently expect it to be, then AI will one day bring enormous forces of optimization to bear.⁸我们认为，当发生这种情况时，在理论上善待的背景下，需要将这些巨大的力量置于现实世界中的问题和子问题上。这些力量越大，当研究人员针对认知问题时，要求更精确。金宝博娱乐

We suspect that today’s concepts about things like “optimization” and “aiming” are incapable of supporting the necessary precision, even if wielded by researchers who care a lot about safety. Part of why I think this is that if you pushed me to explain what I mean by “optimization” and “aiming,” I’d need to be careful to avoid spouting nonsense—which indicates that I’m still confused somewhere around here.

A worrying fact about this situation is that, as best I can tell, humanitydoesn’t需要这些概念的连贯版本hill-climb它进入Agi的方式。进化的山坡爬到了那个距离，而进化没有模型。但是，随着进化对基因组的巨大优化压力，这些基因组开始编码大脑internally对仅与遗传适应性相关的靶标进行了优化。人类找到了满足我们自己的目标（视频游戏，冰淇淋，节育……）的越来越多的方法，即使这直接与引起我们的选择标准相反，“将您的基因传播到下一代中。”

如果我们要避免使用类似的命运 - 我们通过大量梯度下降和其他优化技术获得AGI的地方，只是发现所得系统具有内部优化目标，这些目标与我们对外部优化的目标有很大不同金宝博官方追求 - 然后我们必须更加谨慎。

当AI研究人金宝博娱乐员探索优化器的空间时，要确保研究人员发现的第一个高功能优化器是他们知道如何针对所选任务的优化者需要什么？我不确定，因为我仍然对这个问题感到困惑。我可以模糊地告诉你问题与收敛的乐器激励措施, and I can observe various reasons why we shouldn’t expect the strategy “train a large cognitive system to optimize forX” to actually result in a system that内部优化为了X，但是这个问题仍然很广泛，我不能说胡说八道。

As an example, AI systems like Deep Blue and AlphaGo cannot reasonably be said to be reasoning about the whole world. They’re reasoning about some much simpler abstract platonic environment, such as a Go board. There’s an intuitive sense in which we don’t need to worry about these systems taking over the world, for this reason (among others), even in the world where those systems are run on implausibly large amounts of compute.

Vaguely speaking, there’s a sense in which some alignment difficulties don’t arise until an AI system is “reasoning about the real world.” But what does that mean? It doesn’t seem to mean “the space of possibilities that the system considers literally concretely includes reality itself.” Ancient humans did perfectly good general reasoning even while utterly lacking the concept that the universe can be described by specific physical equations.

它看起来像它必定有什么意思更像“the system is building internal models that, in some sense, are little representations of the whole of reality.” But what counts as a “little representation of reality,” and why do a hunter-gatherer’s confused thoughts about a spirit-riddled forest count while a chessboard doesn’t? All these questions are likely confused; my goal here is not to name coherent questions, but to gesture in the direction of a confusion that prevents me from precisely naming a portion of the alignment problem.

或者，简而言之：精确地命名问题是一半的战斗，我们目前对如何精确命名对齐问题感到困惑。

要为命名这个概念的替代尝试，请参阅Eliezer的rocket alignment比喻。为了进一步讨论当今概念的某些原因，似乎不足以用足够精确地描述一致的智能，请参阅Scott和Abram的recent write-up。(Or come discuss with us in person, at an “AI对计算机科学家的风险” workshop.)

Why this research may be tractable here and now

Many types of research become far easier at particular places and times. It seems to me that for the work of becoming less confused about AI alignment, MIRI in 2018 (and for a good number of years to come, I think) is one of those places and times.

为什么？一个一点是，Miri在解灌注风格的研究中有一些成功的历史（至少是我的说法），而Miri的研究人员是与该工作对话中长大的当地研究传统的受益者。金宝博娱乐Miri贡献的概念进步的位置包括：

today’s understanding that AI accident risk is important;
today’s understanding that an aligned AI is at least a theoretical possibility (the甘地的论点that consequentialist preferences are reflectively stable by default, etc.), and that it’s worth investing in basic research toward the possibility of such an AI in advance;
像子问题的早期陈述科罗，theLöbian obstacle, andsubsystem alignment, including descriptions of various problems in the代理基金会research agenda;
永恒的决策理论和its successors (updateless decision theory和functional decision theory);
logical induction;
reflective oracles;和
代理基金会附近的许多较小的结果，特别是强大的合作在一次囚犯的困境中，universal inductors, and模型多态性,HOL-in-HOL，以及Vingean反思的最新进展。

Logical inductors, as an example, give us at least a clue about why we’re apt to informally use words like “probably” in mathematical reasoning. It’s not a full answer to “how does probabilistic reasoning about mathematical facts work?”, but it does feel like an interesting hint—which is relevant to thinking about how “real-world” AI reasoning could possibly work, because AI systems might well also use probabilistic reasoning in mathematics.

第二点是,如果有东西unites most folks at MIRIbesides提高人类生存几率的动力，这可能是使我们对宇宙基础的理解正确的味道。我们中的许多人都带来了这种口味 - 例如，我们许多人都有物理学背景（尤其是基本物理学），而我们中的编程背景的人往往对诸如类型理论，正式逻辑，，，和/或概率理论。

A third point, as notedabove, is that we are excited about our current bodies of research intuitions, and about how they seem increasingly transferable/cross-applicable/concretizable over time.

Finally, I observe that the field of AI at large is currently highly vitalized, largely by the deep learning revolution and various other advances in machine learning. We are not particularly focused on deep neural networks ourselves, but being in contact with a vibrant and exciting practical field is the sort of thing that tends to spark ideas. 2018 really seems like an unusually easy time to be seeking a theoretical science of AI alignment, in dialog with practical AI methods that are beginning to work.

3.非默认研究的非公开研究，以及该政策如何符合我们的整金宝博娱乐体战略

Miri最近决定将其大部分研究“非默认为非违规”进行，我们的意思是，在此过程中，Mi金宝博娱乐ri中发现的大多数结果将保持不动，除非有明确决定释放这些结果，通常是基于一个基于一个的结果他们发布的特定预期安全性。

我想试图分享我们选择这项政策的理解，尤其是因为这项政策可能对许多对AI安全感兴趣的研究领域感兴趣的人感到失望或不便。金宝博娱乐⁹Miri是一个非营利组织，并且有一个自然的默认假设，即我们的善良机制是定期发布新的想法和见解。但是，我们认为这不是当前服务于我们的非营利任务的正确选择。

为什么我们选择此政策的简短版本是：

we’re in a hurry to decrease existential risk;
in the same way that Faraday’s journals aren’t nearly as useful as Maxwell’s equations, and in the same way that logical induction isn’t all that useful to the average modern ML researcher, we don’t think it would be that useful to try to share lots of half-confused thoughts with a wider set of people;
we believe we can have more of the critical insights faster if we stay focused on making new research progress rather than on exposition, and if we aren’t feeling pressure to justify our intuitions to wide audiences;
我们认为，对解灌注风格的见解是否可能导致能力见解并在经验上观察到，当我们不必担心这一点时，我们可以更自由地思考。和
even when we conclude that those concerns were paranoid or silly upon reflection, we benefited from moving the cognitive work of evaluating those fears from “before internally sharing insights” to “before broadly distributing those insights,” which is enabled by this policy.

The somewhat longer version is below.

I’ll caveat that in what follows I’m attempting to convey what I believe, but not necessarily why—I am not trying to give an argument that would cause any rational person to take the same strategy in my position; I am shooting only for the more modest goal of conveying how I myself am thinking about the decision.

I’ll begin by saying a few words about how our research fits into our overall strategy, then discuss the pros and cons of this policy.

当我们说我们正在进行AI对齐研究时，我们真的并不是说外展金宝博娱乐

At present, MIRI’s aim is to make research progress on the alignment problem. Our focus isn’t on shifting the field of ML toward taking AGI safety more seriously, nor on any other form of influence, persuasion, or field-building. We are simply and only aiming to directly make research progress on the core problems of alignment.

This choice may seem surprising to some readers—field-building and other forms of outreach can obviously have hugely beneficial effects, and throughout MIRI’s history, we’ve been much more outreach-oriented than the typical math research group.

Our impression is indeed that well-targeted outreach efforts can be highly valuable. However, attempts at outreach/influence/field-building seem to us to currently constitute a large majority of worldwide research activity that’s motivated by AGI safety concerns,¹⁰这样一来，美里的时间最好花在直接面对核心研究问题上。金宝博娱乐此外，我们认为我们自己的比较优势在于这里，而不是外展工作。¹¹

My beliefs here are connected to my beliefs about the mechanics of deconfusion describedabove。特别是，我认为一旦可以精确命名，对齐问题可能会开始变得更加容易，而且我相信精确地命名此类问题可能是一个串行挑战 - 因为某种意义上说，直到其他人才能达到某些反应性，直到其他人才能达到另一种挑战。反应已经成熟。此外，我关于历史的读物说，反见人经常来自相对较小的社区，认为正确的思想（如法拉第和麦克斯韦的情况下），并且一旦周围的概念变得连贯，这种反应就可以迅速传播（如by Bostrom’s超级智能）。我从所有这一切中得出结论，试图影响更广泛的领域并不是花费我们自己努力的最佳场所。

It is difficult to predict whether successful deconfusion work could spark capability advances

We think that most of MIRI’s expected impact comes from worlds in which our deconfusion work eventually succeeds—that is, worlds where our research eventually leads to a principled understanding of alignable optimization that can be communicated to AI researchers, more akin to a modern understanding of calculus and differential equations than to Faraday’s notebooks (with the caveat that most of us aren’t expecting solutions to the alignment problem to compress nearly so well as calculus or Maxwell’s equations, but I digress).

这可能会发生的一种非常合理的方式是，我们的解灌注工作使对齐变得成为可能，而没有太多改变了AGI的可用途径。¹²To pick a trivial analogy illustrating this sort of world, consider间隔算术与通常进行浮点操作的通常方式相比。在间隔算术中，类似的操作sqrttakes two floating point numbers, a lower and an upper bound, and returns a lower and an upper bound on the result. Figuring out how to do interval arithmetic requires some careful thinking about the error of floating-point computations, and it certainly won’t speed those computations up; the only reason to use it is to ensure that the error incurred in a floating point operation isn’t larger than the user assumed. If you discover interval arithmetic, you’re at no risk of speeding up modern matrix multiplications, despite the fact that you really have found a new way of doing arithmetic that has certain desirable properties that normal floating-point arithmetic lacks.

In worlds where deconfusing ourselves about alignment leads us primarily to insights similar (on this axis) to interval arithmetic, it would be best for MIRI to distribute its research as widely as possible, especially once it has reached a stage where it is comparatively easy to communicate, in order to encourage AI capabilities researchers to adopt and build upon it.

However, it is also plausible to us that a successful theory of alignable optimization may itself spark new research directions in AI capabilities. For an analogy, consider the progression from classical probability theory and statistics to a modern deep neural net classifying images. Probability theory alone does not let you classify cat pictures, and it is possible to understand and implement an image classification network without thinking much about probability theory; but probability theory and statistics were central to the way machine learning was actually discovered, and still underlie how modern deep learning researchers think about their algorithms.

在对一致性的解体的世界中，与概率理论相似的见解（在此轴上）相似，尚不清楚是否会产生积极的影响。不用说，即使在这些世界中，我们也希望产生积极的影响（或至少是中立的影响）。

The latter scenario is relatively less important in worlds whereAGI timelines很短。If current deep learning research is already on the brink of AGI, for example, then it becomes less plausible that the results of MIRI’s deconfusion work could become a relevant influence on AI capabilities research, and most of the potential impact of our work would come from its direct applicability to deep-learning-based systems. While many of us at MIRI believe that short timelines are at least plausible, there is significant uncertainty and disagreement about timelines inside MIRI, and I would not feel comfortable committing to a course of action that is safe only in worlds where timelines are short.

In sum, if we continue to make progress on, and eventually substantially succeed at, figuring out the actual “cleave nature at its joints” concepts that let us think coherently about alignment, I find it quite plausible that those same concepts may also enable capabilities boosts (especially in worlds where there’s a lot of time for those concepts to be pushed in capabilities-facing directions). There is certainly strong historical precedent for deep scientific insights yielding unexpected practical applications.

By the nature of deconfusion work, it seems very difficult to predict in advance which other ideas a given insight may unlock. These considerations seem to us to call for conservatism and delay on information releases—potentially very long delays, as it can take quite a bit of time to figure out where a given insight leads.

我们需要研究人员在自己的金宝博娱乐头上没有墙壁

We take our research seriously at MIRI. This means that, for many of us, we know in the back of our minds that deconfusion-style research could sometimes (often in an unpredictable fashion) open up pathways that can lead to capabilities insights in the manner discussed above. As a consequence, many MIRI researchers flinch away from having insights when they haven’t spent a lot of time thinking about the potential capabilities implications of those insights down the line—and they usually haven’t spent that time, because it requires a bunch of cognitive overhead. This effect has been evidenced in reports from researchers, myself included, and we’ve empirically observed that when we set up “closed” research retreats or research rooms,¹³researchers report that they can think more freely, that their brainstorming sessions extend further and wider, and so on.

这种抑制似乎对研究进度很不利。金宝博娱乐我们的研究人员（非或半自觉）退缩并不是一个小领域；金宝博娱乐这是一个相当宽阔的杂物，很可能包括我们正在寻找的大多数深层想法或见解。

At the same time, this kind of caution is an unavoidable consequence of doing deconfusion research in public, since it’s very hard to know what ideas may follow five or ten years after a given insight. AI alignment work and AI capabilities work are close enough neighbors that many insights in the vicinity of AI alignment are “potentially capabilities-relevant until proven harmless,” both for reasons discussed above and from the perspective of the conservative安全心态我们试图鼓励这里。

In short, if we request that our brains come up with alignment ideas that are fine to share with everybody—and this is what we’re implicitly doing when we think of ourselves as “researching publicly”—then we’re requesting that our brains cut off the massive portion of the search space that is only probably safe.

If our goal is to make research progress as quickly as possible, in hopes of having concepts coherent enough to allow rigorous safety engineering by the time AGI arrives, then it seems worth finding ways to allow our researchers to think without constraints, even when those ways are somewhat expensive.

Focus seems unusually useful for this kind of work

There may be some additional speed-up effects from helping free up researchers’ attention, though we don’t consider this a major consideration on its own.

Historically, early-stage scientific work has often been done by people who were solitary or geographically isolated, perhaps because this makes it easier to slowly develop a new way to factor the phenomenon, instead of repeatedly translating ideas into the current language others are using. It’s difficult to describe how much mental space and effort turns out to be taken up with thoughts of how your research will look to other people staring at you, until you try going into a closed room for an extended period of time with a promise to yourself that all the conversation within it really won’t be shared at all anytime soon.

一旦我们意识到这种情况正在进行，我们意识到回想起来，我们可能会以某种方式忽略共同的实践。许多初创公司的创始人报告了寻找隐形模式和不是VC外部人员的资金，对重点非常有用。因此，最近我们还鼓励Miri的研究人员不必担心在做公共工作时吸引广泛的受众。金宝博娱乐我们希望研究人员金宝博娱乐主要关注他们发现最引人注目的任何研究方向，使博览会和蒸馏成为次要的优先级，而不必担心优化说服力或更易于捍卫的想法。

早期的解灌注工作还不是那么有用

ML researchers aren’t running around using logical induction or functional decision theory. These theories don’t have practical relevance to the researchers on the ground, and they’re not supposed to; the point of these theories is just deconfusion.

更准确地说，理论本身并不是有趣的新颖性。新颖的是几年前，我们无法写下anytheory of how in principle to assign sane-seeming probabilities to mathematical facts, and today we can write down logical induction. In the journey from pointAto pointB，我们变得不那么困惑。逻辑归纳论文是一个文物，目睹了该融合的融合，也是一个文物，在他们经历了编写过程的过程中，它赋予其作者更多的解灌注。but the thing that excited me about logical induction was not any one particular algorithm or theorem in the paper, but rather the fact that we’re a little bit less in-the-dark than we were about how a reasoner can reasonably assign probabilities to logical sentences. We’re not fully out of the dark on this front, mind you, but we’re a little less confused than we were before.¹⁴

If the rest of the world were talking about how confusing they find the AI alignment topics we’re confused about, and were as concerned about their confusions as we are concerned about ours, then failing to share our research would feel a lot more costly to me. But as things stand, most people in the space look at us kind of funny when we say that we’re excited about things like logical induction, and I repeatedly encounter deep misunderstandings when I talk to people who have read some of our papers and tried to infer our research motivations, from which I conclude that they weren’t drawing a lot of benefit from my current ramblings anyway.

从某种意义上说，我们目前的大多数研究都是一种漫步的一种形式 - 充其量金宝博娱乐以同样的方式，法拉第的日记也在漫游。可以，如果大多数实用的科学家避免在法拉第期刊上闲逛并等到麦克斯韦出现并将其提炼成三个有用的方程式，那就可以了。而且，如果法拉第期望物理理论最终提炼，他就不需要四处传播他的日记 - 他只能等到它被蒸馏，然后努力传播一些较少的概念。

我们希望我们理解对齐,这是currently far from complete, to eventually distill, and I, at least, am not very excited about attempting to push it on anyone until it’s significantly more distilled. (Or, barring full distillation, until a project with a commitment to the common good, an adequate安全心态, and a large professed interest in deconfusion research comes knocking.)

在此期间,当然有一些researche金宝博娱乐rs outside MIRI who care about the same problems we do, and who are also pursuing deconfusion. Our nondisclosed-by-default policy will negatively affect our ability to collaborate with these people on our other research directions, and this is a real cost and not worth dismissing. I don’t have much more to say about this here beyond noting that if you’re one of those people, you’re very welcome toget in touch with us(and you may want to consider加入团队)!

We’ll have a better picture of what to share or not share in the future

从长远来看，如果我们的研究将是有用的，我们的发现金宝博娱乐将需要进入世界，它们可以影响人类建立AI系统的方式。金宝博官方但是，这并不是根据最终分发的需求（某种形式的），我们可以立即发表我们所有的研究。金宝博娱乐如上所述，正如我能说的那样，我们当前的研究见解实际上并不是那么有用，并且共享早期解灌注研究是耗时的。金宝博娱乐

Our nondisclosed-by-default policy also allows us to preserve options like:

确定我们认为应该进一步发展的研金宝博娱乐究结果，同时考虑differential technological development;和
deciding which group(s) to share each interesting finding with (e.g., the general public, other closed safety research groups, groups with strong commitment to security mindset and the common good, etc.).

我们的未来版本显然具有更好的能力来呼吁这类问题，尽管这需要权衡与朝相反方向推动的许多事实相比 - 我们以后决定要发布什么，其他人必须越少。，在过渡期（从而浪费了重复的努力）中，它越有可能独立发现，依此类推。

Now that I’ve listed reasons in favor of our nondisclosed-by-default policy, I’ll note some reasons against.

考虑我们的非默认政策的考虑因素

有许多途径，通过这种非违约政策，我们的工作将变得更加困难：

我们将很难吸引和评估新的研究人员；金宝博娱乐分享较少的研究意味着尝试进金宝博娱乐行各种研究合作的机会较少，并注意哪些合作对双方都效果很好。
We lose some of the benefits of accelerating the progress of other researchers outside MIRI via sharing useful insights with them in real time as they are generated.
由于我们将与他们分享较少的工作，因此我们将获得访问者，远程学者和研究人员的有用科学见解和反馈。金宝博娱乐
We will have a harder time attracting funding and other indirect aid—with less of our work visible, it will be harder for prospective donors to know whether our work is worth supporting.
We will have to pay various costs associated with keeping research private, including social costs and logistical overhead.

我们预计这些成本将是可观的。我们将努力抵消一些损失a，正如我将在下一节中讨论的那样。出于讨论的原因above，我目前不太担心b。The remaining costs will probably be paid in full.

These costs are why we didn’t adopt this policy (for most of our research) years ago. With outreach feeling less like our comparative advantage than it did in thepre-Puerto-Rico日子和资金似乎比以前的瓶颈更少（尽管仍然是瓶颈），但这种方法现在似乎是可行的。

We’ve already found it helpful in practice to let researchers have insights first and sort out the safety or desirability of publishing later. On the whole, then, we expect this policy to cause a significant net speed-up to our research progress, while ensuring that we can responsibly investigate some of the most important technical questions on our radar.

4.加入美里团队

I believe that MIRI is, and will be for at least the next several years, a focal point of one of those rare scientifically exciting points in history, where the conditions are just right for humanity to substantially deconfuse itself about an area of inquiry it’s been pursuing for centuries—and one where the output is directly impactful in a way that is rare even among scientifically exciting places and times.

我们能提供什么？在我看来：

Work that Eliezer, Benya, myself, and a number of researchers in AI safety view as having a significant chance of boosting humanity’s survival odds.
Work that, if it pans out, visibly has central relevance to the alignment problem—the kind of work that has a meaningful chance of shedding light on problems like “is there a loophole-free way to upper-bound the amount of optimization occurring within an optimizer?”.
如果您的口味与我们的口味相匹配，那么与有关智力，代理和现实结构的基本问题密切相关的问题；以及在人类知识的伟大和野生边界之一工作的相关刺激，具有大量重要的见解可能会紧密。
An atmosphere in which people are taking their own and others’ research progress seriously. For example, you can expect colleagues who come into work every day looking to actually make headway on the AI alignment problem, and looking to pull their thinking different kinds of sideways until progress occurs. I’m consistently impressed with MIRI staff’s drive to get the job done—with their visible appreciation for the fact that their work really matters, and their enthusiasm for helping one another make forward strides.
As an increasing focus at MIRI, empirically grounded computer science work on the AI alignment problem, with clear feedback of the form “did my code type-check?” or “do we have a proof?”.
Finally, some good, old-fashioned fun—for a certain very specific brand of “fun” that includes the satisfaction that comes from making progress on important technical challenges, the enjoyment that comes from pursuing lines of research you find compelling without needing to worry about writing grant proposals or otherwise raising funds, and the thrill that follows when you finally manage to distill a nugget of truth from a thick cloud of confusion.

在Miri工作还意味着与其他因素所吸引的其他人一起工作 - 在我看来，人们对人类福利和整个有情生活的福利有着不寻常的照顾和关注，这是一种不寻常的创造力并坚持处理重大技术问题，具有不同的认知反思和技巧的程度，并具有观点的效果以及异常水平的功效和毅力。

My own experience at MIRI has been that this is a group of people who really want to help Team Life get good outcomes from the large-scale events that are likely to dramatically shape our future; who can tackle big challenges head-on without appealing to虚假的叙述about how likely a given approach is to succeed; and who are remarkably good at fluidly updating on new evidence, and at creating a really fun environment for collaboration.

我们正在寻找谁？

We’re seeking anyone who can cause our “become less confused about AI alignment” work to go faster.

在实践中，这意味着：在数学或代码中本地思考的人，他们认真对待AI对齐（迅速！）并且通常有能力的问题。特别是，我们正在寻找高端Google程序员的功能级别；您不需要1万分之一的考试成绩或halo of 命运。You also don’t need a PhD, explicit ML background, or even prior research experience.

Even if you’re not pointed towards our research agenda, we intend to fund or help arrange funding for any deep, good, and truly new ideas in alignment. This might be as a hire, a fellowship grant, or whatever other arrangements may be needed.

What to do if you think you might want to work here

如果你想要更多的信息,有几个good options:

和...聊天Buck Shlegeris, a MIRI computer scientist who helps out with our recruiting. In addition to answering any of your questions and running interviews, Buck can sometimes help skilled programmers take some time off to skill-build through ourAI安全再培训计划。
If you already know someone else at MIRI and talking with them seems better, you might alternativelyreach out to that person—especiallyBlake Borgeson(a new MIRI board member who helps us with technical recruiting) orAnna Salamon（Miri董事会成员，也是CFAR的总裁，并正在帮助举办一些Miri招聘活动）。
Come to a 4.5-dayAI对计算机科学家的风险车间，由Miri和CFAR共同运行。These workshops are open only to people who Buck arbitrarily deems “probably above MIRI’s technical hiring bar,” though their scope is wider than simply hiring for MIRI—the basic idea is to get a bunch of highly capable computer scientists together to try to fathom AI risk (with a good bit of rationality content, and of trying to fathom the way we’re failing to fathom AI risk, thrown in for good measure).
These are a great way to get a sense of MIRI’s culture, and to pick up a number of thinking tools whether or not you are interested in working for MIRI. If you’d like to either apply to attend yourself or nominate a friend of yours,send us your info here。
Come to next year’sMIRI Summer Fellows program，或者是summer internwith us. This is a better option for mathy folks aiming at Agent Foundations than for computer sciencey folks aiming at our new research directions. This last summer we took 6 interns and 30 MIRI Summer Fellows (see Malo’sSummer MIRI Updatespost for more details). Also, note that “summer internships” need not occur during summer, if some other schedule is better for you. ContactColm Ó Riainif you’re interested.
你可以尝试applying for a job。

Some final notes

关于“推理距离,” or on what it sometimes takes to understand MIRI researchers’ perspectives:To many, MIRI’s take on things is really weird. Many people who bump into our writing somewhere find our basic outlook pointlessly weird/silly/wrong, and thus find us uncompelling forever. Even among those who do ultimately find MIRI compelling, many start off thinking it’s weird/silly/wrong and then, after some months or years of MIRI’s worldview slowly rubbing off on them, eventually find that our worldview makes a bunch of unexpected sense.

If you think that you may be in this latter category, and that such a change of viewpoint, should it occur,would be becauseMIRI’s worldview is onto something and not because we all got tricked byfalse-but-compellingideas… you might want to start exposing yourself to all this funny worldview stuff now, and see where it takes you. Good starting-points are理性：从AI到僵尸;Inadequate Equilibria;哈利·波特和理性方法;the “AI对计算机科学家的风险”讲习班；普通的CFAR研讨会;or just hanging out with folks in ornearMIRI.

我怀疑我没有在过去的失败尝试传达我的观点的尝试中传达上面的一些关键事情，并且基于一些读者的一些读者，这些帖子的早期草稿缺少我想说的关键事情。我试图澄清尽可能多的要点（因此这篇文章的长度！），但最终，“我们专注于研究而不是博览会”也适合我，我需要重新开始工作。金宝博娱乐¹⁵

关于该领域状态的注释：MIRI is one of the dedicated teams trying to solve technical problems in AI alignment, but we’re not the only such team. There are currently three others: theCenter for Human-Compatible AI在加州大学伯克利分校和安全团队OpenAI并在Google Deepmind。All three of these safety teams are highly capable, top-of-their-class research groups, and we recommend them too as potential places to join if you want to make a difference in this field.

在许多其他机构中，还有一些扎实的研究人员，金宝博娱乐例如人类研究所的未来AI计划的治理重点关注与AGI发展相关的重要社会/协调问题。

To learn more about AI alignment research at MIRI and other groups, I recommend the MIRI-produced代理基金会和嵌入式代理write-ups; Dario Amodei, Chris Olah, et al.’s具体问题agenda; theAI Alignment Forum;和Paul Christiano和theDeepMind safety team’s blogs.

On working here:Salaries here are more flexible than people usually suppose. I’ve had a number of conversations with folks who assumed that because we’re a nonprofit, we wouldn’t be able to pay them enough to maintain their desired standard of living, meet their financial goals, support their family well, or similar. This is false. If you bring the right skills, we’re likely able to provide the compensation you need. We also place a high value on weekends and vacation time, on avoiding burnout, and in general on people here being happy and thriving.

You do need to be physically in Berkeley to work with us on the projects we think are most exciting, though we have pretty great relocation assistance and ops support for moving.

尽管在Miri工作中有很多伟大的事情，但如果您想要的只是一份工作，我会考虑在这里工作非常糟糕。重新定位应对全球主要风险的工作不太可能是大多数人使用的最享乐或放松选择。

On the other hand, if you like the idea of an epic calling with a group of people who somehow claim to take seriously a task that sounds more like it comes from a science fiction novel than from a迪尔伯特strip, while having a lot of scientific fun; or you just care about humanity’s future, and want to help however you can… give us a call.

这篇文章是由各种Miri员工组合在一起的汞合金。说“ nate”的旁白意味着我（内特）认可帖子，许多概念和主题很大程度上来自我，我写了很多单词。但是，我没有写所有的单词，概念和主题是与其他许多Miri员工合作建立的。（这大致是章程在Miri博客上已经有一段时间的含义，值得一提。）↩
See our 2017strategic update和fundraiserposts for more details.↩
Inpast 筹款活动，我们说过，通过足够的资金，我们希望对对齐问题进行替代的攻击线。我们的新研究方向金宝博娱乐可以看作是遵循这种精神，的确，至少我们在2015年考虑的替代方法中至少有一个新的研究方向。我们的2015年筹款活动，我们的新作品与我们的代理创始风格的研究非常连续。金宝博娱乐↩
也就是说，对齐AGI系统执行有限的必要条件金宝博官方tasks;并非所有要对齐的必要条件CEV-班级autonomous AGI。比较保罗·克里斯蒂安诺（Paul Christiano）的区别ambitious and narrow value learning(though note that Paul thinks narrow value learning is sufficient for strongly autonomous AGI).↩
该结果在很快就会发表的论文中得到了更多描述。或者，至少最终。由于下面讨论的原因，这些天我没有花很多时间写论文。↩
有关此概念的更多讨论，请参见“Personal Thoughts on Careers in AI Policy and Strategy”卡里克·弗林（Carrick Flynn）。↩
Historical examples of deconfusion work that gave rise to a rich and healthy field include the distillation of Lagrangian and Hamiltonian mechanics from Newton’s laws; Cauchy’s overhaul of real analysis; the slow acceptance of the usefulness of complex numbers; and the development of formal foundations of mathematics.↩
我应该强调，从我的角度来看，人类永远不会建立AGI，从未意识到我们的潜力，也没有使用cosmic endowmentwould be a tragedy comparable (on anastronomical比例尺），以使我们消灭我们。我说“危险”，但我们不应该忽视人类的好处。↩
My own feeling is that I and other senior staff at MIRI have never been particularlygood在解释我们在做什么以及原因时，这种不便可能不是新事物。但是，对于我们来说，这是新的attemptto explain where we’re coming from.↩
换句话说，许多人仅明确地专注于外展，许多其他人正在选择技术问题，以加强该领域并将其吸引到该领域的既定目标。↩
This isn’t meant to suggest that nobody else is taking a straight shot at the core problems. For example, OpenAI’sPaul Christiano是一位顶级研究员，正是这样做金宝博娱乐的。但是，我们仍然希望在目前的利润率上更多。↩
For example, perhaps the easiest path to unalignable AGI involves following descendants of today’s gradient descent and deep learning techniques, and perhaps the same is true for alignable AGI.↩
In other words, retreats/rooms where it is common knowledge that all thoughts and ideas are not going to be shared, except perhaps after some lengthy and irritating bureaucratic process and with everyone’s active support.↩
As an aside, perhaps my main discomfort with attempting to publish academic papers is that there appears to be no venue in AI where we can go to say, “Hey, check this out—we used to be confused aboutX, and now we can sayY，这意味着我们的困惑不那么困惑！”我认为这背后有很多原因，尤其是，混乱的本质如此之多Y通常一旦说出来，听起来很明显，因此使这样的结果听起来像是令人印象深刻的实践结果，这一点尤其困难。
A side effect of this, unfortunately, is that all MIRI papers that I’ve ever written with the goal of academic publishing do a pretty bad job of saying what I was previously confused about, and how the “result” is indicative of me becoming less confused—for which I hereby apologize.↩
If you have more questions, I encourage you to shoot us an email atcontact@www.gqpatrol.com。↩

Contents:

1.我们的研究金宝博娱乐

2.为什么反灌注对我们如此重要

3.非默认研究的非公开研究，以及该政策如何符合我们的整金宝博娱乐体战略

4.加入美里团队

搜索

Browse

订阅