2018 Update: Our New Research Directions

||MIRI Strategy,News

多年以来,Miri的目标一直是解决足够的基本困惑alignment和intelligence to enable humanity to think clearly about technical AI safety risks—and to do this before this technology advances to the point of potential catastrophe. This goal has always seemed to us to be difficult, but possible.1

去年,我们说我们正在启动针对这一目标的新研究计划。金宝博娱乐2Here, we’re going to provide background on how we’re thinking about this new set of research directions, lay out some of the thinking behind our recent decision to do less default sharing of our research, and make the case for interested software engineers to加入我们的团队和help push our understanding forward.

Contents:

  1. Our research
  2. Why deconfusion is so important to us
  3. 非默认研究的非公开研究,以及该政策如何符合我们的金宝博娱乐整体战略
  4. Joining the MIRI team

1.我们的研究金宝博娱乐

2014年,Miri发布了其第一个研究议程,“金宝博娱乐使机器智能与人类利益对齐的代理基础。”从那时起,我们的主要研究重点之一就是对金宝博娱乐embedded agency:正式表征缺乏清晰的代理/环境边界的推理系统,比其环境小,必须推理自己,金宝博官方并冒着在交叉目的工作的零件的风险。这些研究问题金宝博娱乐仍然是Miri的主要重点,并且正在与我们的新研究方向同时研究(我将重点介绍下面的更多内容)。3

From our perspective, the point of working on these kinds of problems isn’t that solutions directly tell us how to build well-aligned AGI systems. Instead, the point is to resolve confusions we have around ideas like “alignment” and “AGI,” so that future AGI developers have an unobstructed view of the problem. Eliezer illustrates this idea in “The Rocket Alignment Problem”,它想象一个世界在了解牛顿力学或微积分之前试图降落在月球上的世界。

最近,一些Miri研究人员开发了新的研究金宝博娱乐方向,这些方向似乎可以使得解决这些基本困惑的更可扩展的进展。Specifically, the progress is more scalable in researcher hours—it’s now the case that we believe excellent engineers coming from a variety of backgrounds can have their work efficiently converted into research progress at MIRI—where previously, we only knew how to speed our research progress with a (relatively atypical) breed of mathematician.

同时,我们看到了一些重要的financial success在过去的一年中,不再是一定不再是限制的,而是足以从新的和不同的方向提取我们的研究议程。金宝博娱乐

此外,我们的观点意味着仓促是必不可少的。我们认为阿吉是a likely cause of existential catastrophes, especially if it’s developed with relatively brute-force-reliant, difficult-to-interpret techniques; and although we’re非常不确定about when humanity’s collective deadline will come to pass, many of us are somewhat alarmed by the speed of recent machine learning progress.

由于这些原因,我们渴望迅速找到合适的人,并为他们提供这些新方法的工作;借助这种帮助,它使我们很有可能解决足够的基本混乱,以及时将理解移植到在建立和部署AGI之前需要它的人。

Comparing our new research directions and Agent Foundations

Our new research directions involve building software systems that we can use to test our intuitions, and building infrastructure that allows us to rapidly iterate this process. Like the Agent Foundations agenda, our new research directions continue to focus on “deconfusion,” rather than on, e.g., trying to improve robustness metrics of current systems—our sense being that even if we make major strides on this kind of robustness work, an AGI system built on principles similar to today’s systems would still be too opaque to align in practice.

从某种意义上说,您可以将我们的新研究视为解决我们一直在攻击的同样的问题,但金宝博娱乐要从新角度开始。换句话说,如果您对logical inductors或者functional decision theory,您可能也不会对我们的新作品感到兴奋。Conversely, if you already have the sense that becoming less confused is a sane way to approach AI alignment, and you’ve been wanting to see those kinds of confusions attacked with software and experimentation in a manner that yields theoretical satisfaction, then you may well want to work at MIRI. (I’ll have more to say about thisbelow

我们的新研究方向金宝博娱乐源于Benya Fallenstein,Eliezer Yudkowsky和我本人(Nate Soares)的一些截然不同的想法。这些新方向的一些高级主题包括:

  1. 寻求全新的低级基础以进行优化, designed for transparency and alignability from the get-go, as an alternative to gradient-descent-style machine learning foundations.

    请注意,这并不需要试图击败模式rn ML techniques on computational efficiency, speed of development, ease of deployment, or other such properties. However, it does mean developing new foundations for optimization that are broadly applicable in the same way, and for some of the same reasons, that gradient descent scales to be broadly applicable, while possessing significantly better alignment characteristics.

    我们知道,有很多方法可以尝试这种方法,这些方法是浅薄,愚蠢或注定要失败的。尽管如此,我们相信我们自己的研究途径有一个镜头。金宝博娱乐

  2. Endeavoring to figure out parts of cognition that can be very transparent as cognition, without being GOFAI or completely disengaged from subsymbolic cognition.

  3. Experimenting with some specific alignment problems比以前已经进入计算环境的问题要深。

In common between all our new approaches is a focus on using high-level theoretical abstractions to enable coherent reasoning about the systems we build. A concrete implication of this is that we write lots of our code in Haskell, and are often thinking about our code through the lens of type theory.

We aren’t going to distribute the technical details of this work anytime soon, in keeping with the recent MIRI policy changesdiscussed below。However, we have a good deal to say about this research on the meta level.

We are excited about these research directions, both for their present properties and for the way they seem to be developing. When Benya began the predecessor of this work ~3 years ago, we didn’t know whether her intuitions would pan out. Today, having watched the pattern by which research avenues in these spaces have opened up new exciting-feeling lines of inquiry, none of us expect this research to die soon, and some of us are hopeful that this work may eventually open pathways to attacking the entire list of basic alignment issues.4

我们也同样兴奋的程度eful cross-connections have arisen between initially-unrelated-looking strands of our research. During a period where I was focusing primarily on new lines of research, for example, I stumbled across a solution to the original version of the瓷砖代理问题from the Agent Foundations agenda.5

这项工作似乎比《代理基金会议程》更多地“发出自己的指南”。虽然过去我们需要对研究口味非常紧密地适应研究口味,但现在我们认为我们对地形有足够的感觉,可以放松这些要求。金宝博娱乐我们仍在寻找科学创新和是的员工fairlyclose on research taste, but our work is now much more scalable with the number of good mathematicians and engineers working at MIRI.

的说,尽管有前途的the last couple of years have seemed to us, this is still “blue sky” research in the sense that we’d guess most outside MIRI would still regard it as of academic interest but of no practical interest. The more principled/coherent/alignable optimization algorithms we are investigating are not going to sort cat pictures from non-cat pictures anytime soon.

通常,我们对研究结果感到兴奋的事情是,从下一节中描述的意义上,他们授予我们“解灌注”金宝博娱乐的程度,而不是他们直接启用的ML/工程功率。目前,他们据称反映出的这种“解息”必须通过混凝土薄弱地支持“看这种理解让我们做什么”演示,主要是通过抽象论点来辨别的。Miri中的许多人都认为我们的工作具有很强的实际相关性,但这是因为我们有长期模型的短期壮举表明了进步,并且因为我们认为对一致性的困惑不那么混乱,因为practical relevance to humanity’s future, for reasons that I’ll sketch out next.

2.为什么反灌注对我们如此重要

我们的脱口而一的意思

Quoting Anna Salamon, the president of the Center for Applied Rationality and a MIRI board member:

If I didn’t have the concept of deconfusion, MIRI’s efforts would strike me as mostly inane. MIRI continues to regard its own work as significant for human survival, despite the fact that many larger and richer organizations are now talking about AI safety. It’s a group that got all excited about逻辑归纳(and tried paranoidly to make sure Logical Induction “wasn’t dangerous”beforereleasing it)—even though Logical Induction had only a moderate amount of math and no practical engineering at all (and did something similar withTimeless Decision Theory, to pick an even more extreme example). It’s a group that continues to stare mostly at basic concepts, sitting reclusively off by itself, while mostly leaving questions of politics, outreach, and how much influence the AI safety community has, to others.

However, I do have the concept of deconfusion. And when I look at MIRI’s activities through that lens, MIRI seems to me much more like “oh, yes, good, someoneis直接拍摄看起来像关键的事情”和“他们似乎有战斗机会”和“天哪,我希望他们(或某人以某种方式)在截止日期之前解决了更多的困惑,因为没有这样的进步,人类肯定会肯定似乎有点沉没。”

I agree that MIRI’s perspective and strategy don’t make much sense without the idea I’m calling “deconfusion.” As someone reading a MIRI strategy update, you probably already partly have this concept, but I’ve found that it’s not trivial to transmit the full idea, so I ask your patience as I try to put it into words.

通过反灌注,我的意思是“做到这一点,这样您就可以考虑一个给定的话题而不会不断意外地吐胡说。”

举一个具体的例子,我对10岁的无穷大的想法是由重新排列的混乱而不是连贯的,即使是1700年最好的数学家的想法也是如此。如果我们从方程式的两侧减去无穷大,会发生什么?”但是我对20岁的无穷大的想法是not同样感到困惑,因为到那时,我已经接触到后来数学家生产的更连贯的概念。我不像乔治·康托尔(Georg Cantor)或1700年最好的数学家那样聪明或好。但是可以在人之间转移灌注;这种转移可以传播思考实际思想的能力。

In 1998, conversations about AI risk and technological singularity scenarios often went in circles in a funny sort of way. People who are serious thinkers about the topic today, including my colleagues Eliezer and Anna, said things that today sound confused. (When I say “things that sound confused,” I have in mind things like “isn’t intelligence an不连贯concept,” “but the economy’salreadysuperintelligent,” “if a superhuman AI is smart enough that it could kill us, it’ll also be足够聪明看到那不是好事,所以我们会没事的。”smarterthan us, because Turing-complete computations can emulate anything,” and “anyhow, we could just它。”)今天,这些对话是不同的。在这两者之间,人们努力使自己和其他人对这些主题的根本不那么困惑 - 今天,一个14岁的年轻人想跳到所有不连贯的结局,只能拿起尼克·博斯特罗姆的副本超级智能6

Of note is the fact that the “take AI risk and technological singularities seriously” meme started to spread to the larger population of ML scientists only after its main proponents attained sufficient deconfusion. If you were living in 1998 with a strong intuitive sense that AI risk and technological singularities should be taken seriously, but you still possessed a host of confusion that caused you to occasionally spout nonsense as you struggled to put things into words in the face of various confused objections, then evangelism would do you little good among serious thinkers—perhaps because the respectable scientists and engineers in the field can smell nonsense, and can tell (correctly!) that your concepts are still incoherent. It’s by accumulating deconfusion until your concepts cohere and your arguments become well-formed that your ideas can become memetically fit and spread among scientists—and can serve as foundations for future work by those same scientists.

有趣的是,科学的历史实际上充满了一个实例,在这种情况下,个人研究人员长期以来拥有大多数直觉的机构,然后最终这些直觉得到了正式的,纠正,精确的,精确的和转移的人之间的直觉。金宝博娱乐法拉第(Faraday)发现了各种各样的电磁现象,在直觉的指导下,除非通过数百页详细的实验室音符和图表,但他无法正式或传播。麦克斯韦后来通过阅读法拉第的作品来形式地描述电磁主义的语言,并以三行方式表达了数百页直觉。

An even more striking example is the case of Archimedes, who intuited his way to the ability to do useful work in both integral and differential calculus thousands of years before calculus became a simple formal thing that could be passed between people.

In both cases, it was the eventual formalization of those intuitions—and the linked ability of these intuitions to be passed accurately between many researchers—that allowed the fields to begin building properly and quickly.7

Why deconfusion (on our view) is highly relevant to AI accident risk

如果人类最终建立了比人类更明智的人AI,并且像我们目前所期望的那样强大而聪明的人类AI强大而危险,那么AI将有一天会带来巨大的优化力量。8我们认为,当发生这种情况时,在理论上善待的背景下,需要将这些巨大的力量置于现实世界中的问题和子问题上。这些力量越大,当研究人员针对认知问题时,要求更精确。金宝博娱乐

我们怀疑今天关于“优化”和“瞄准”之类的事情的概念也无法支持必要的精度,即使受到非常关心安全性的研究人员的行使。金宝博娱乐部分原因是,我认为这是如果您推动我通过“优化”和“瞄准”来解释我的含义,我需要小心避免避免胡说八道,这表明我仍然感到困惑。

A worrying fact about this situation is that, as best I can tell, humanity没有need coherent versions of these concepts tohill-climb它进入Agi的方式。进化的山坡爬到了那个距离,而进化没有模型。但是,随着进化对基因组的巨大优化压力,这些基因组开始编码大脑internallyoptimized for targets that merely correlated with genetic fitness. Humans find ever-smarter ways to satisfy our own goals (video games, ice cream, birth control…) even when this runs directly counter to the selection criterion that gave rise to us: “propagate your genes into the next generation.”

如果我们要避免使用类似的命运 - 我们通过大量梯度下降和其他优化技术获得AGI的地方,只是发现所得系统具有内部优化目标,这些目标与我们对外部优化的目标有很大不同金宝博官方追求 - 然后我们必须更加谨慎。

当AI研究人金宝博娱乐员探索优化器的空间时,要确保研究人员发现的第一个高功能优化器是他们知道如何针对所选任务的优化者需要什么?我不确定,因为我仍然对这个问题感到困惑。我可以模糊地告诉你问题与收敛的乐器激励措施,我可以观察到我们不应该期望该策略“训练大型认知系统以优化的各种原因金宝博官方X” to actually result in a system thatinternally optimizes为了X,但是这个问题仍然很广泛,我不能说胡说八道。

As an example, AI systems like Deep Blue and AlphaGo cannot reasonably be said to be reasoning about the whole world. They’re reasoning about some much simpler abstract platonic environment, such as a Go board. There’s an intuitive sense in which we don’t need to worry about these systems taking over the world, for this reason (among others), even in the world where those systems are run on implausibly large amounts of compute.

Vaguely speaking, there’s a sense in which some alignment difficulties don’t arise until an AI system is “reasoning about the real world.” But what does that mean? It doesn’t seem to mean “the space of possibilities that the system considers literally concretely includes reality itself.” Ancient humans did perfectly good general reasoning even while utterly lacking the concept that the universe can be described by specific physical equations.

它看起来像它必定有什么意思更像“the system is building internal models that, in some sense, are little representations of the whole of reality.” But what counts as a “little representation of reality,” and why do a hunter-gatherer’s confused thoughts about a spirit-riddled forest count while a chessboard doesn’t? All these questions are likely confused; my goal here is not to name coherent questions, but to gesture in the direction of a confusion that prevents me from precisely naming a portion of the alignment problem.

或者,简而言之:精确地命名问题是一半的战斗,我们目前对如何精确命名对齐问题感到困惑。

要为命名这个概念的替代尝试,请参阅Eliezer的火箭对齐比喻。为了进一步讨论当今概念的某些原因,似乎不足以用足够精确地描述一致的智能,请参阅Scott和Abram的recent write-up。(Or come discuss with us in person, at an “AI Risk for Computer Scientists“ 作坊。)

为什么这项研究在这金宝博娱乐里和现在可能是可以解决的

Many types of research become far easier at particular places and times. It seems to me that for the work of becoming less confused about AI alignment, MIRI in 2018 (and for a good number of years to come, I think) is one of those places and times.

为什么?一个一点是,Miri在解灌注风格的研究中有一些成功的历史(至少是我的说法),而Miri的研究人员是与该工作对话中长大的当地研究传统的受益者。金宝博娱乐Miri贡献的概念进步的位置包括:

Logical inductors, as an example, give us at least a clue about why we’re apt to informally use words like “probably” in mathematical reasoning. It’s not a full answer to “how does probabilistic reasoning about mathematical facts work?”, but it does feel like an interesting hint—which is relevant to thinking about how “real-world” AI reasoning could possibly work, because AI systems might well also use probabilistic reasoning in mathematics.

第二点是,如果有些东西使大多数人团结在Miribesidesa drive to increase the odds of human survival, it is probably a taste for getting our understanding of the foundations of the universe right. Many of us came in with this taste—for example, many of us have backgrounds in physics (and fundamental physics in particular), and those of us with a background in programming tend to have an interest in things like type theory, formal logic, and/or probability theory.

A third point, as noted多于,是我们对当前的研究直觉的身体感到兴奋,以及它们如何随着时间的流逝而变得越来越可转移/跨涂抹/具体化。金宝博娱乐

最后,我观察到,整个AI领域目前正在高度生命,这在很大程度上是由于深度学习革命和机器学习方面的其他各种进步。我们本身并不特别专注于深度神经网络,但是与充满活力和令人兴奋的实践领域接触的是那种倾向于激发想法的事情。2018年似乎确实是一个非常容易的时间来寻求AI对齐理论科学,与实用的AI方法开始使用的对话中。

3.非默认研究的非公开研究,以及该政策如何符合我们的整金宝博娱乐体战略

MIRI recently decided to make most of its research “nondisclosed-by-default,” by which we mean that going forward, most results discovered within MIRI will remain internal-only unless there is an explicit decision to release those results, based usually on a specific anticipated safety upside from their release.

我想试图分享我们选择这项政策的理解,尤其是因为这项政策可能对许多对AI安全感兴趣的研究领域感兴趣的人感到失望或不便。金宝博娱乐9Miri是一个非营利组织,并且有一个自然的默认假设,即我们的善良机制是定期发布新的想法和见解。但是,我们认为这不是当前服务于我们的非营利任务的正确选择。

为什么我们选择此政策的简短版本是:

  • we’re in a hurry to decrease existential risk;

  • in the same way that Faraday’s journals aren’t nearly as useful as Maxwell’s equations, and in the same way that logical induction isn’t all that useful to the average modern ML researcher, we don’t think it would be that useful to try to share lots of half-confused thoughts with a wider set of people;

  • we believe we can have more of the critical insights faster if we stay focused on making new research progress rather than on exposition, and if we aren’t feeling pressure to justify our intuitions to wide audiences;

  • 我们认为,对解灌注风格的见解是否可能导致能力见解并在经验上观察到,当我们不必担心这一点时,我们可以更自由地思考。和

  • even when we conclude that those concerns were paranoid or silly upon reflection, we benefited from moving the cognitive work of evaluating those fears from “before internally sharing insights” to “before broadly distributing those insights,” which is enabled by this policy.

下面的更长版本。

我要警告说,在下面的事情中,我试图传达我的信念,但不一定是为什么 - 我并不是要提出一个会导致任何理性的人在我的立场上采取相同策略的论点;我只是为了传达自己如何思考决定的更为适中的目标。

I’ll begin by saying a few words about how our research fits into our overall strategy, then discuss the pros and cons of this policy.

当我们说我们正在进行AI对齐研究时,我们真的并不是说外展金宝博娱乐

At present, MIRI’s aim is to make research progress on the alignment problem. Our focus isn’t on shifting the field of ML toward taking AGI safety more seriously, nor on any other form of influence, persuasion, or field-building. We are simply and only aiming to directly make research progress on the core problems of alignment.

This choice may seem surprising to some readers—field-building and other forms of outreach can obviously have hugely beneficial effects, and throughout MIRI’s history, we’ve been much more outreach-oriented than the typical math research group.

Our impression is indeed that well-targeted outreach efforts can be highly valuable. However, attempts at outreach/influence/field-building seem to us to currently constitute a large majority of worldwide research activity that’s motivated by AGI safety concerns,10这样一来,美里的时间最好花在直接面对核心研究问题上。金宝博娱乐此外,我们认为我们自己的比较优势在于这里,而不是外展工作。11

My beliefs here are connected to my beliefs about the mechanics of deconfusion described多于。特别是,我认为一旦可以精确命名,对齐问题可能会开始变得更加容易,而且我相信精确地命名此类问题可能是一个串行挑战 - 因为某种意义上说,直到其他人才能达到某些反应性,直到其他人才能达到另一种挑战。反应已经成熟。此外,我关于历史的读物说,反见人经常来自相对较小的社区,认为正确的思想(如法拉第和麦克斯韦的情况下),并且一旦周围的概念变得连贯,这种反应就可以迅速传播(如by Bostrom’s超级智能)。我从所有这一切中得出结论,试图影响更广泛的领域并不是花费我们自己努力的最佳场所。

很难预测成功的解灌注工作是否可以引发能力的进步

We think that most of MIRI’s expected impact comes from worlds in which our deconfusion work eventually succeeds—that is, worlds where our research eventually leads to a principled understanding of alignable optimization that can be communicated to AI researchers, more akin to a modern understanding of calculus and differential equations than to Faraday’s notebooks (with the caveat that most of us aren’t expecting solutions to the alignment problem to compress nearly so well as calculus or Maxwell’s equations, but I digress).

One pretty plausible way this could go is that our deconfusion work makes alignment possible, without much changing the set of available pathways to AGI.12To pick a trivial analogy illustrating this sort of world, consider间隔算术as compared to the usual way of doing floating point operations. In interval arithmetic, an operation likesqrt采用两个浮点数,一个下限和上限,并在结果上返回下部和上限。弄清楚如何进行间隔算术需要仔细考虑浮点计算的错误,并且当然不会加快这些计算的速度;使用它的唯一原因是确保在浮点操作中发生的错误并不比用户所假设的错误大。如果您发现间隔算术,尽管您确实找到了一种新的算术方式,但您确实有一种具有普通浮点算术算术缺乏的理想属性的新方法,但您就不会有可能加快现代矩阵乘法的危险。

In worlds where deconfusing ourselves about alignment leads us primarily to insights similar (on this axis) to interval arithmetic, it would be best for MIRI to distribute its research as widely as possible, especially once it has reached a stage where it is comparatively easy to communicate, in order to encourage AI capabilities researchers to adopt and build upon it.

但是,对于我们来说,成功的优化理论本身可能会引发AI能力的新研究方向,这对我们来说也是合理的。金宝博娱乐为了进行类比,请考虑从经典概率理论和统计学到现代深度神经网络分类图像的发展。仅概率理论并不能让您对猫的图片进行分类,并且可以理解和实施图像分类网络而无需考虑概率理论。但是,概率理论和统计学对于实际发现机器学习的方式至关重要,并且仍然是现代深度学习研究人员如何看待其算法的基础。金宝博娱乐

在对一致性的解体的世界中,与概率理论相似的见解(在此轴上)相似,尚不清楚是否会产生积极的影响。不用说,即使在这些世界中,我们也希望产生积极的影响(或至少是中立的影响)。

The latter scenario is relatively less important in worlds whereAGI timelines很短。If current deep learning research is already on the brink of AGI, for example, then it becomes less plausible that the results of MIRI’s deconfusion work could become a relevant influence on AI capabilities research, and most of the potential impact of our work would come from its direct applicability to deep-learning-based systems. While many of us at MIRI believe that short timelines are at least plausible, there is significant uncertainty and disagreement about timelines inside MIRI, and I would not feel comfortable committing to a course of action that is safe only in worlds where timelines are short.

In sum, if we continue to make progress on, and eventually substantially succeed at, figuring out the actual “cleave nature at its joints” concepts that let us think coherently about alignment, I find it quite plausible that those same concepts may also enable capabilities boosts (especially in worlds where there’s a lot of time for those concepts to be pushed in capabilities-facing directions). There is certainly strong historical precedent for deep scientific insights yielding unexpected practical applications.

根据解灌注工作的性质,似乎很难预先预测给定的洞察力可能会解锁哪些其他想法。在我们看来,这些考虑因素呼吁保守主义并延迟信息发布 - 非常漫长的延迟,因为可能需要大量时间才能弄清楚给定的见解的何处。

我们需要研究人员在自己的金宝博娱乐头上没有墙壁

We take our research seriously at MIRI. This means that, for many of us, we know in the back of our minds that deconfusion-style research could sometimes (often in an unpredictable fashion) open up pathways that can lead to capabilities insights in the manner discussed above. As a consequence, many MIRI researchers flinch away from having insights when they haven’t spent a lot of time thinking about the potential capabilities implications of those insights down the line—and they usually haven’t spent that time, because it requires a bunch of cognitive overhead. This effect has been evidenced in reports from researchers, myself included, and we’ve empirically observed that when we set up “closed” research retreats or research rooms,13金宝博娱乐研究人员报告说,他们可以更自由地思考,他们的集思广益会议进一步扩大,更广泛,等等。

This sort of inhibition seems quite bad for research progress. It is not a small area that our researchers were (un- or semi-consciously) holding back from; it’s a reasonably wide swath that may well include most of the deep ideas or insights we’re looking for.

At the same time, this kind of caution is an unavoidable consequence of doing deconfusion research in public, since it’s very hard to know what ideas may follow five or ten years after a given insight. AI alignment work and AI capabilities work are close enough neighbors that many insights in the vicinity of AI alignment are “potentially capabilities-relevant until proven harmless,” both for reasons discussed above and from the perspective of the conservative安全心态我们试图鼓励这里。

In short, if we request that our brains come up with alignment ideas that are fine to share with everybody—and this is what we’re implicitly doing when we think of ourselves as “researching publicly”—then we’re requesting that our brains cut off the massive portion of the search space that is only probably safe.

如果我们的目标是尽快使研究进展,希望使概念足够金宝博娱乐连贯以允许AGI到达严格的安全工程,那么似乎值得寻找允许我们的研究人员在没有约束的情况下思考的方法有点贵。

重点似乎对这类工作异常有用

There may be some additional speed-up effects from helping free up researchers’ attention, though we don’t consider this a major consideration on its own.

从历史上看,早期的科学工作经常是由孤独或地理孤立的人进行的,也许是因为这使人们更容易慢慢发展一种新的方法来考虑这种现象,而不是反复将思想转化为其他人正在使用的当前语言。It’s difficult to describe how much mental space and effort turns out to be taken up with thoughts of how your research will look to other people staring at you, until you try going into a closed room for an extended period of time with a promise to yourself that all the conversation within it really won’t be shared at all anytime soon.

Once we realized this was going on, we realized that in retrospect, we may have been ignoring common practice, in a way. Many startup founders have reported finding stealth mode, and funding that isn’t from VC outsiders, tremendously useful for focus. For this reason, we’ve also recently been encouraging researchers at MIRI to worry less about appealing to a wide audience when doing public-facing work. We want researchers to focus mainly on whatever research directions they find most compelling, make exposition and distillation a secondary priority, and not worry about optimizing ideas for persuasiveness or for being easier to defend.

Early deconfusion work just isn’t that useful (yet)

ML researchers aren’t running around using logical induction or functional decision theory. These theories don’t have practical relevance to the researchers on the ground, and they’re not supposed to; the point of these theories is just deconfusion.

更准确地说,理论本身并不是有趣的新颖性。新颖的是几年前,我们无法写下任何这或者y of how in principle to assign sane-seeming probabilities to mathematical facts, and today we can write down logical induction. In the journey from pointAto pointB,我们变得不那么困惑了。逻辑归纳paper is an artifact witnessing that deconfusion, and an artifact which granted its authors additional deconfusion as they went through the process of writing it; but the thing that excited me about logical induction was not any one particular algorithm or theorem in the paper, but rather the fact that we’re a little bit less in-the-dark than we were about how a reasoner can reasonably assign probabilities to logical sentences. We’re not fully out of the dark on this front, mind you, but we’re a little less confused than we were before.14

If the rest of the world were talking about how confusing they find the AI alignment topics we’re confused about, and were as concerned about their confusions as we are concerned about ours, then failing to share our research would feel a lot more costly to me. But as things stand, most people in the space look at us kind of funny when we say that we’re excited about things like logical induction, and I repeatedly encounter deep misunderstandings when I talk to people who have read some of our papers and tried to infer our research motivations, from which I conclude that they weren’t drawing a lot of benefit from my current ramblings anyway.

从某种意义上说,我们目前的大多数研究都是一种漫步的一种形式 - 充其量金宝博娱乐以同样的方式,法拉第的日记也在漫游。可以,如果大多数实用的科学家避免在法拉第期刊上闲逛并等到麦克斯韦出现并将其提炼成三个有用的方程式,那就可以了。而且,如果法拉第期望物理理论最终提炼,他就不需要四处传播他的日记 - 他只能等到它被蒸馏,然后努力传播一些较少的概念。

我们希望我们对目前远离完整的对齐方式的理解最终要蒸馏出来,至少我对尝试将其推向任何人并不感到非常兴奋,直到它更加蒸馏。(或者,除非进行全面蒸馏,直到一个对共同利益承诺的项目,安全心态,并且对解灌注研究的广泛兴趣爆炸了。)金宝博娱乐

在此期间,当然有一些researche金宝博娱乐rs outside MIRI who care about the same problems we do, and who are also pursuing deconfusion. Our nondisclosed-by-default policy will negatively affect our ability to collaborate with these people on our other research directions, and this is a real cost and not worth dismissing. I don’t have much more to say about this here beyond noting that if you’re one of those people, you’re very welcome toget in touch with us(and you may want to consider加入团队)!

我们将更好地了解将来要分享或不分享的内容

从长远来看,如果我们的研究将是有用的,我们的发现金宝博娱乐将需要进入世界,它们可以影响人类建立AI系统的方式。金宝博官方但是,这并不是根据最终分发的需求(某种形式的),我们可以立即发表我们所有的研究。金宝博娱乐如上所述,正如我能说的那样,我们当前的研究见解实际上并不是那么有用,并且共享早期解灌注研究是耗时的。金宝博娱乐

Our nondisclosed-by-default policy also allows us to preserve options like:

  • deciding which research findings we think should be developed further, while thinking aboutdifferential technological development;和
  • deciding which group(s) to share each interesting finding with (e.g., the general public, other closed safety research groups, groups with strong commitment to security mindset and the common good, etc.).

我们的未来版本显然具有更好的能力来呼吁这类问题,尽管这需要权衡与朝相反方向推动的许多事实相比 - 我们以后决定要发布什么,其他人必须越少。,在过渡期(从而浪费了重复的努力)中,它越有可能独立发现,依此类推。

现在,我已经列出了有利于我们非默认政策的理由,我将注意到一些理由。

Considerations pulling against our nondisclosed-by-default policy

有许多途径,通过这种非违约政策,我们的工作将变得更加困难:

  1. 我们将很难吸引和评估新的研究人员;金宝博娱乐分享较少的研究意味着尝试进金宝博娱乐行各种研究合作的机会较少,并注意哪些合作对双方都效果很好。

  2. We lose some of the benefits of accelerating the progress of other researchers outside MIRI via sharing useful insights with them in real time as they are generated.

  3. 由于我们将与他们分享较少的工作,因此我们将获得访问者,远程学者和研究人员的有用科学见解和反馈。金宝博娱乐

  4. 我们将很难吸引资金和其他间接援助,而我们的工作较少,而潜在捐助者很难知道我们的工作是否值得支持。

  5. We will have to pay various costs associated with keeping research private, including social costs and logistical overhead.

我们预计这些成本将是可观的。我们将努力抵消一些损失a,正如我将在下一节中讨论的那样。出于讨论的原因多于,我目前不太担心b。The remaining costs will probably be paid in full.

These costs are why we didn’t adopt this policy (for most of our research) years ago. With outreach feeling less like our comparative advantage than it did in thepre-Puerto-Ricodays, and funding seeming like less of a bottleneck than it used to (though still something of a bottleneck), this approach now seems workable.

We’ve already found it helpful in practice to let researchers have insights first and sort out the safety or desirability of publishing later. On the whole, then, we expect this policy to cause a significant net speed-up to our research progress, while ensuring that we can responsibly investigate some of the most important technical questions on our radar.

4.加入美里团队

I believe that MIRI is, and will be for at least the next several years, a focal point of one of those rare scientifically exciting points in history, where the conditions are just right for humanity to substantially deconfuse itself about an area of inquiry it’s been pursuing for centuries—and one where the output is directly impactful in a way that is rare even among scientifically exciting places and times.

我们能提供什么?在我看来:

  • Work that Eliezer, Benya, myself, and a number of researchers in AI safety view as having a significant chance of boosting humanity’s survival odds.

  • Work that, if it pans out, visibly has central relevance to the alignment problem—the kind of work that has a meaningful chance of shedding light on problems like “is there a loophole-free way to upper-bound the amount of optimization occurring within an optimizer?”.

  • 如果您的口味与我们的口味相匹配,那么与有关智力,代理和现实结构的基本问题密切相关的问题;以及在人类知识的伟大和野生边界之一工作的相关刺激,具有大量重要的见解可能会紧密。

  • An atmosphere in which people are taking their own and others’ research progress seriously. For example, you can expect colleagues who come into work every day looking to actually make headway on the AI alignment problem, and looking to pull their thinking different kinds of sideways until progress occurs. I’m consistently impressed with MIRI staff’s drive to get the job done—with their visible appreciation for the fact that their work really matters, and their enthusiasm for helping one another make forward strides.

  • As an increasing focus at MIRI, empirically grounded computer science work on the AI alignment problem, with clear feedback of the form “did my code type-check?” or “do we have a proof?”.

  • 最后,一些良好的老式乐趣 - 对于某种非常特定的“乐趣”品牌,包括在重要的技术挑战方面取得进步所带来的满足感,这是您追求的研究行动所带来的娱乐,而无需担心,金宝博娱乐关于撰写赠款建议或以其他方式筹集资金,以及您最终设法从浓密的混乱云中提炼出真理的掘金时所带来的刺激。

在Miri工作还意味着与其他因素所吸引的其他人一起工作 - 在我看来,人们对人类福利和整个有情生活的福利有着不寻常的照顾和关注,这是一种不寻常的创造力并坚持处理重大技术问题,具有不同的认知反思和技巧的程度,并具有观点的效果以及异常水平的功效和毅力。

My own experience at MIRI has been that this is a group of people who really want to help Team Life get good outcomes from the large-scale events that are likely to dramatically shape our future; who can tackle big challenges head-on without appealing to虚假的叙述about how likely a given approach is to succeed; and who are remarkably good at fluidly updating on new evidence, and at creating a really fun environment for collaboration.

我们正在寻找谁?

We’re seeking anyone who can cause our “become less confused about AI alignment” work to go faster.

在实践中,这意味着:在数学或代码中本地思考的人,他们认真对待AI对齐(迅速!)并且通常有能力的问题。特别是,我们正在寻找高端Google程序员的功能级别;您不需要1万分之一的考试成绩或halo命运。您也不需要博士学位,明确的ML背景甚至先前的研究经验。金宝博娱乐

Even if you’re not pointed towards our research agenda, we intend to fund or help arrange funding for any deep, good, and truly new ideas in alignment. This might be as a hire, a fellowship grant, or whatever other arrangements may be needed.

如果您认为您可能想在这里工作该怎么办

如果你想要更多的信息,有几个good options:

  • 和...聊天Buck Shlegeris, a MIRI computer scientist who helps out with our recruiting. In addition to answering any of your questions and running interviews, Buck can sometimes help skilled programmers take some time off to skill-build through ourAI Safety Retraining Program

  • If you already know someone else at MIRI and talking with them seems better, you might alternativelyreach out to that person—especially布莱克·博格森(Blake Borgeson)(a new MIRI board member who helps us with technical recruiting) orAnna Salamon(a MIRI board member who is also the president of CFAR, and is helping run some MIRI recruiting events).

  • 来4.5天AI Risk for Computer Scientists车间,由Miri和CFAR共同运行。These workshops are open only to people who Buck arbitrarily deems “probably above MIRI’s technical hiring bar,” though their scope is wider than simply hiring for MIRI—the basic idea is to get a bunch of highly capable computer scientists together to try to fathom AI risk (with a good bit of rationality content, and of trying to fathom the way we’re failing to fathom AI risk, thrown in for good measure).

    These are a great way to get a sense of MIRI’s culture, and to pick up a number of thinking tools whether or not you are interested in working for MIRI. If you’d like to either apply to attend yourself or nominate a friend of yours,send us your info here

  • Come to next year’sMIRI Summer Fellows program,或者是summer internwith us. This is a better option for mathy folks aiming at Agent Foundations than for computer sciencey folks aiming at our new research directions. This last summer we took 6 interns and 30 MIRI Summer Fellows (see Malo’sSummer MIRI Updatespost for more details). Also, note that “summer internships” need not occur during summer, if some other schedule is better for you. ContactColm Ó Riainif you’re interested.

  • 你可以尝试申请工作

一些最后的笔记

关于“推理距离,” or on what it sometimes takes to understand MIRI researchers’ perspectives:To many, MIRI’s take on things is really weird. Many people who bump into our writing somewhere find our basic outlook pointlessly weird/silly/wrong, and thus find us uncompelling forever. Even among those who do ultimately find MIRI compelling, many start off thinking it’s weird/silly/wrong and then, after some months or years of MIRI’s worldview slowly rubbing off on them, eventually find that our worldview makes a bunch of unexpected sense.

如果您认为自己可能属于后一类,并且这种观点的改变,如果发生这种情况,wouldbebecauseMIRI’s worldview is onto something and not because we all got tricked byfalse-but-compelling想法……您可能想开始将自己暴露于所有这些有趣的世界观东西上,看看它带您去哪里。好的起点是Rationality: From AI to Zombies;Inadequate Equilibria;哈利·波特和理性方法;这 ”AI Risk for Computer Scientists”讲习班;普通的CFAR workshops;或者just hanging out with folks in ornear美里。

我怀疑我没有在过去的失败尝试传达我的观点的尝试中传达上面的一些关键事情,并且基于一些读者的一些读者,这些帖子的早期草稿缺少我想说的关键事情。我试图澄清尽可能多的要点(因此这篇文章的长度!),但最终,“我们专注于研究而不是博览会”也适合我,我需要重新开始工作。金宝博娱乐15

A note on the state of the field:MIRI is one of the dedicated teams trying to solve technical problems in AI alignment, but we’re not the only such team. There are currently three others: theCenter for Human-Compatible AIat UC Berkeley, and the safety teams atOpenAI并在Google Deepmind。All three of these safety teams are highly capable, top-of-their-class research groups, and we recommend them too as potential places to join if you want to make a difference in this field.

There are also solid researchers based at many other institutions, like the Future of Humanity Institute, whoseAI计划的治理重点关注与AGI发展相关的重要社会/协调问题。

To learn more about AI alignment research at MIRI and other groups, I recommend the MIRI-produced代理基金会Embedded Agencywrite-ups; Dario Amodei, Chris Olah, et al.’s具体问题议程;这AI Alignment Forum;和Paul ChristianoDeepMind safety team’s blogs.

On working here:这里的薪水比人们通常想象的更灵活。I’ve had a number of conversations with folks who assumed that because we’re a nonprofit, we wouldn’t be able to pay them enough to maintain their desired standard of living, meet their financial goals, support their family well, or similar. This is false. If you bring the right skills, we’re likely able to provide the compensation you need. We also place a high value on weekends and vacation time, on avoiding burnout, and in general on people here being happy and thriving.

You do need to be physically in Berkeley to work with us on the projects we think are most exciting, though we have pretty great relocation assistance and ops support for moving.

尽管在Miri工作中有很多伟大的事情,但如果您想要的只是一份工作,我会考虑在这里工作非常糟糕。重新定位应对全球主要风险的工作不太可能是大多数人使用的最享乐或放松选择。

On the other hand, if you like the idea of an epic calling with a group of people who somehow claim to take seriously a task that sounds more like it comes from a science fiction novel than from a迪尔伯特脱衣舞,同时有很多科学乐趣;或者,您只是关心人类的未来,并希望提供帮助,您可以……给我们打电话。


  1. 这篇文章是由各种Miri员工组合在一起的汞合金。说“ nate”的旁白意味着我(内特)认可帖子,许多概念和主题很大程度上来自我,我写了很多单词。但是,我没有写所有的单词,概念和主题是与其他许多Miri员工合作建立的。(这大致是章程在Miri博客上已经有一段时间的含义,值得一提。)
  2. See our 2017strategic updatefundraiserposts for more details.
  3. Inpast筹款活动, we’ve said that with sufficient funding we would like to spin up alternative lines of attack on the alignment problem. Our new research directions can be seen as following this spirit, and indeed, at least one of our new research directions is heavily inspired by alternative approaches I was considering back in 2015. That said, unlike many of the ideas I had in mind when writing our 2015 fundraiser posts, our new work is quite contiguous with our Agent-Foundations-style research.
  4. That is, the requisites for aligning AGI systems to perform limitedtasks;not all of the requisites for aligning a fullCEV-classautonomous AGI。比较保罗·克里斯蒂安诺(Paul Christiano)的区别ambitious and narrow value learning(though note that Paul thinks narrow value learning is sufficient for strongly autonomous AGI).
  5. 该结果在很快就会发表的论文中得到了更多描述。或者,至少最终。由于下面讨论的原因,这些天我没有花很多时间写论文。
  6. For more discussion of this concept, see “Personal Thoughts on Careers in AI Policy and Strategy”卡里克·弗林(Carrick Flynn)。
  7. 引起丰富健康领域的解灌注工作的历史例子包括从牛顿法律中蒸馏出拉格朗日和汉密尔顿力学的蒸馏;库奇(Cauchy)对真实分析的大修;对复数的有用性的缓慢接受;以及数学正式基础的发展。
  8. 我应该强调,从我的角度来看,人类永远不会建立AGI,从未意识到我们的潜力,也没有使用cosmic endowment将是一个可比的悲剧(在astronomicalscale) to AGI wiping us out. I say “hazardous”, but we shouldn’t lose sight of the upside of humanity getting the job done right.
  9. 我自己的感觉是,我和Miri的其他高级职员从未特别good在解释我们在做什么以及原因时,这种不便可能不是新事物。但是,对于我们来说,这是新的attemptto explain where we’re coming from.
  10. 换句话说,许多人仅明确地专注于外展,许多其他人正在选择技术问题,以加强该领域并将其吸引到该领域的既定目标。
  11. This isn’t meant to suggest that nobody else is taking a straight shot at the core problems. For example, OpenAI’sPaul Christiano是一位顶级研究员,正是这样做金宝博娱乐的。但是,我们仍然希望在目前的利润率上更多。
  12. For example, perhaps the easiest path to unalignable AGI involves following descendants of today’s gradient descent and deep learning techniques, and perhaps the same is true for alignable AGI.
  13. In other words, retreats/rooms where it is common knowledge that all thoughts and ideas are not going to be shared, except perhaps after some lengthy and irritating bureaucratic process and with everyone’s active support.
  14. As an aside, perhaps my main discomfort with attempting to publish academic papers is that there appears to be no venue in AI where we can go to say, “Hey, check this out—we used to be confused aboutX,现在我们可以说Y,这意味着我们的困惑不那么困惑!”我认为这背后有很多原因,尤其是,混乱的本质如此之多Y通常一旦说出来,听起来很明显,因此使这样的结果听起来像是令人印象深刻的实践结果,这一点尤其困难。

    A side effect of this, unfortunately, is that all MIRI papers that I’ve ever written with the goal of academic publishing do a pretty bad job of saying what I was previously confused about, and how the “result” is indicative of me becoming less confused—for which I hereby apologize.

  15. 如果您还有更多问题,我鼓励您向我们发送电子邮件contact@www.gqpatrol.com