2018年更新:我们的新研究指示金宝博娱乐

||美里战略,News

多年以来,Miri的目标一直是解决足够的基本困惑结盟智力使人类能够清楚地考虑技术AI安全风险,并在这项技术发展到潜在的灾难点之前这样做。在我们看来,这个目标总是很困难,但可能。1

去年,我们说我们正在启动针对这一目标的新研究计划。金宝博娱乐2Here, we’re going to provide background on how we’re thinking about this new set of research directions, lay out some of the thinking behind our recent decision to do less default sharing of our research, and make the case for interested software engineers tojoin our team和help push our understanding forward.

Contents:

  1. 我们的研金宝博娱乐究
  2. 为什么反灌注对我们如此重要
  3. 非默认研究的非公开研究,以及该政策如何符合我们的金宝博娱乐整体战略
  4. 加入美里团队

1.我们的研究金宝博娱乐

在2014, MIRI published its first research agenda, “使机器智能与人类利益对齐的代理基础。”从那时起,我们的主要研究重点之一就是对金宝博娱乐嵌入式代理: formally characterizing reasoning systems that lack a crisp agent/environment boundary, are smaller than their environment, must reason about themselves, and risk having parts that are working at cross purposes. These research problems continue to be a major focus at MIRI, and are being studied in parallel with our new research directions (which I’ll be focusing on more below).3

From our perspective, the point of working on these kinds of problems isn’t that solutions directly tell us how to build well-aligned AGI systems. Instead, the point is to resolve confusions we have around ideas like “alignment” and “AGI,” so that future AGI developers have an unobstructed view of the problem. Eliezer illustrates this idea in “火箭对准问题,” which imagines a world where humanity tries to land on the Moon before it understands Newtonian mechanics or calculus.

Recently, some MIRI researchers developed new research directions that seem to enable more scalable progress towards resolving these fundamental confusions. Specifically, the progress is more scalable in researcher hours—it’s now the case that we believe excellent engineers coming from a variety of backgrounds can have their work efficiently converted into research progress at MIRI—where previously, we only knew how to speed our research progress with a (relatively atypical) breed of mathematician.

同时,我们看到了一些重要的financial success在过去的一年中,不再是一定不再是限制的,而是足以从新的和不同的方向提取我们的研究议程。金宝博娱乐

此外,我们的观点意味着仓促是必不可少的。我们认为阿吉是可能造成存在灾难的原因,尤其是使用相对野蛮的,难以解释的技术开发的;虽然我们是非常不确定关于人类的集体截止日期将过去,我们许多人对最近的机器学习进度的速度感到震惊。

由于这些原因,我们渴望迅速找到合适的人,并为他们提供这些新方法的工作;借助这种帮助,它使我们很有可能解决足够的基本混乱,以及时将理解移植到在建立和部署AGI之前需要它的人。

比较我们的新研究方向和代理基金会金宝博娱乐

我们的新研究方向金宝博娱乐涉及构建软件系统,我们可以用来测试直觉,并构建基础架构,使我们能够快速迭代此过程。金宝博官方像代理基金会的议程一样,我们的新研究方向继续专注于“解灌注”,而不是试图改善当前系统的金宝博娱乐稳健性指标,我们的感觉是,即使我们在这种鲁棒性工作方面取得了重大进展,也是如此金宝博官方an AGI system built on principles similar to today’s systems would still be too opaque to align in practice.

在a sense, you can think of our new research as tackling the same sort of problem that we’ve always been attacking, but from new angles. In other words, if you aren’t excited about逻辑电感器要么功能决策理论,您可能也不会对我们的新作品感到兴奋。Conversely, if you already have the sense that becoming less confused is a sane way to approach AI alignment, and you’ve been wanting to see those kinds of confusions attacked with software and experimentation in a manner that yields theoretical satisfaction, then you may well want to work at MIRI. (I’ll have more to say about this以下

Our new research directions stem from some distinct ideas had by Benya Fallenstein, Eliezer Yudkowsky, and myself (Nate Soares). Some high-level themes of these new directions include:

  1. 寻求全新的低级基础以进行优化, designed for transparency and alignability from the get-go, as an alternative to gradient-descent-style machine learning foundations.

    请注意,这并不需要试图击败模式rn ML techniques on computational efficiency, speed of development, ease of deployment, or other such properties. However, it does mean developing new foundations for optimization that are broadly applicable in the same way, and for some of the same reasons, that gradient descent scales to be broadly applicable, while possessing significantly better alignment characteristics.

    我们知道,有很多方法可以尝试这种方法,这些方法是浅薄,愚蠢或注定要失败的。尽管如此,我们相信我们自己的研究途径有一个镜头。金宝博娱乐

  2. 努力找出认知可能非常透明的认知部分,不被Gofai或完全脱离下符号认知。

  3. 实验一些特定的对齐问题比以前已经进入计算环境的问题要深。

在我们所有新方法之间的共同点是,专注于使用高级理论抽象来实现有关我们构建的系统的连贯推理。金宝博官方这样的具体含义是,我们在Haskell中编写了许多代码,并且经常通过类型理论的镜头来思考我们的代码。

We aren’t going to distribute the technical details of this work anytime soon, in keeping with the recent MIRI policy changesdiscussed below。However, we have a good deal to say about this research on the meta level.

我们对这些研究方向的当前属性以及它们似乎正在发展的方式金宝博娱乐感到兴奋。当Benya在3年前开始这项工作的前身时,我们不知道她的直觉是否会消失。如今,观看了这些空间中的研究途径开辟了新的令人兴奋的询问线的模式,我们没有人希望这项研金宝博娱乐究很快就会消失,我们中的一些人希望这项工作最终可以打开攻击攻击的途径全部基本对齐问题列表。4

同样,我们对我们研究的最初无关链之间出现了有用的跨连接的程度。金宝博娱乐例如,在我主要关注新的研究系列的时期,我偶然发现了原始版本的解决方案金宝博娱乐tiling agents problemfrom the Agent Foundations agenda.5

This work seems to “give out its own guideposts” more than the Agent Foundations agenda does. While we used to require extremely close fit of our hires on research taste, we now think we have enough sense of the terrain that we can relax those requirements somewhat. We’re still looking for hires who are scientifically innovative and who arefairlyclose on research taste, but our work is now much more scalable with the number of good mathematicians and engineers working at MIRI.

的说,尽管有前途的the last couple of years have seemed to us, this is still “blue sky” research in the sense that we’d guess most outside MIRI would still regard it as of academic interest but of no practical interest. The more principled/coherent/alignable optimization algorithms we are investigating are not going to sort cat pictures from non-cat pictures anytime soon.

通常,我们对研究结果感到兴奋的事情是,从下一节中描述的意义上,他们授予我们“解灌注”金宝博娱乐的程度,而不是他们直接启用的ML/工程功率。目前,他们据称反映出的这种“解息”必须通过混凝土薄弱地支持“看这种理解让我们做什么”演示,主要是通过抽象论点来辨别的。Miri中的许多人都认为我们的工作具有很强的实际相关性,但这是因为我们有长期模型的短期壮举表明了进步,并且因为我们认为对一致性的困惑不那么混乱,因为practical relevance to humanity’s future, for reasons that I’ll sketch out next.

2.为什么反灌注对我们如此重要

我们的脱口而一的意思

引用应用理性中心和美里董事会成员的主席安娜·萨拉蒙(Anna Salamon):

如果我没有脱口而出的概念,Miri的努力将使我大部分地融合。尽管许多更大,更富有的组织正在谈论AI安全,但Miri仍将其自己的工作视为人类生存的重要意义。这是一个对逻辑归纳(并偏执以确保逻辑归纳“并不危险”before释放它) - 即使逻辑感应只有适量的数学,根本没有实用的工程(并且与Timeless Decision Theory,选择一个更极端的例子)。这个小组继续盯着基本概念,独自居住,而主要留下政治问题,外展问题,以及AI安全社区对其他人的影响。

但是,我确实有解灌注的概念。当我通过那个镜头看美里的活动时,Miri在我看来更像是“哦,是的,好,有人taking a straight shot at what looks like the critical thing” and “they seem to have a fighting chance” and “gosh, I hope they (or someone somehow) solve many many more confusions before the deadline, because without such progress, humanity sure seems kinda sunk.”

我同意,没有我称为“脱口而为”的想法,Miri的观点和策略就没有多大意义。当有人阅读Miri策略更新时,您可能已经部分地有了这个概念,但是我发现传达完整的想法并不是很微不足道的,所以当我试图将其插入言语时,我会问您的耐心。

By deconfusion, I mean something like “making it so that you can think about a given topic without continuously accidentally spouting nonsense.”

To give a concrete example, my thoughts about infinity as a 10-year-old were made of rearranged confusion rather than of anything coherent, as were the thoughts of even the best mathematicians from 1700. “How can 8 plus infinity still be infinity? What happens if we subtract infinity from both sides of the equation?” But my thoughts about infinity as a 20-year-old werenot同样感到困惑,因为到那时,我已经接触到后来数学家生产的更连贯的概念。我不像乔治·康托尔(Georg Cantor)或1700年最好的数学家那样聪明或好。但是可以在人之间转移灌注;这种转移可以传播思考实际思想的能力。

在1998, conversations about AI risk and technological singularity scenarios often went in circles in a funny sort of way. People who are serious thinkers about the topic today, including my colleagues Eliezer and Anna, said things that today sound confused. (When I say “things that sound confused,” I have in mind things like “isn’t intelligence an不连贯概念”,“但是经济是已经superintelligent,” “if a superhuman AI is smart enough that it could kill us, it’ll also besmart enough到see that that isn’t what the good thing to do is, so we’ll be fine,” “we’re Turing-complete, so it’s impossible to have something dangerouslysmarterthan us, because Turing-complete computations can emulate anything,” and “anyhow, we could justunplug它。”)今天,这些对话是不同的。在这两者之间,人们努力使自己和其他人对这些主题的根本不那么困惑 - 今天,一个14岁的年轻人想跳到所有不连贯的结局,只能拿起尼克·博斯特罗姆的副本超级智能6

Of note is the fact that the “take AI risk and technological singularities seriously” meme started to spread to the larger population of ML scientists only after its main proponents attained sufficient deconfusion. If you were living in 1998 with a strong intuitive sense that AI risk and technological singularities should be taken seriously, but you still possessed a host of confusion that caused you to occasionally spout nonsense as you struggled to put things into words in the face of various confused objections, then evangelism would do you little good among serious thinkers—perhaps because the respectable scientists and engineers in the field can smell nonsense, and can tell (correctly!) that your concepts are still incoherent. It’s by accumulating deconfusion until your concepts cohere and your arguments become well-formed that your ideas can become memetically fit and spread among scientists—and can serve as foundations for future work by those same scientists.

有趣的是,科学的历史实际上充满了一个实例,在这种情况下,个人研究人员长期以来拥有大多数直觉的机构,然后最终这些直觉得到了正式的,纠正,精确的,精确的和转移的人之间的直觉。金宝博娱乐法拉第(Faraday)发现了各种各样的电磁现象,在直觉的指导下,除非通过数百页详细的实验室音符和图表,但他无法正式或传播。麦克斯韦后来通过阅读法拉第的作品来形式地描述电磁主义的语言,并以三行方式表达了数百页直觉。

阿基米德(Archimedes)的一个例子就是一个更引人注目的例子,他直接进入了在积分和微积分数千年中进行有用的工作能力,然后微积分成为人与人之间可以通过的简单形式事物。

在both cases, it was the eventual formalization of those intuitions—and the linked ability of these intuitions to be passed accurately between many researchers—that allowed the fields to begin building properly and quickly.7

Why deconfusion (on our view) is highly relevant to AI accident risk

如果人类最终建立了比人类更明智的人AI,并且像我们目前所期望的那样强大而聪明的人类AI强大而危险,那么AI将有一天会带来巨大的优化力量。8我们相信,当这种情况发生时,那些巨大的f要么ces need to be brought to bear on real-world problems and subproblems deliberately, in a context where they’re theoretically well-understood. The larger those forces are, the more precision is called for when researchers aim them at cognitive problems.

We suspect that today’s concepts about things like “optimization” and “aiming” are incapable of supporting the necessary precision, even if wielded by researchers who care a lot about safety. Part of why I think this is that if you pushed me to explain what I mean by “optimization” and “aiming,” I’d need to be careful to avoid spouting nonsense—which indicates that I’m still confused somewhere around here.

A worrying fact about this situation is that, as best I can tell, humanitydoesn’tneed coherent versions of these concepts tohill-climb它进入Agi的方式。进化的山坡爬到了那个距离,而进化没有模型。但是,随着进化对基因组的巨大优化压力,这些基因组开始编码大脑internallyoptimized for targets that merely correlated with genetic fitness. Humans find ever-smarter ways to satisfy our own goals (video games, ice cream, birth control…) even when this runs directly counter to the selection criterion that gave rise to us: “propagate your genes into the next generation.”

如果我们要避免使用类似的命运 - 我们通过大量梯度下降和其他优化技术获得AGI的地方,只是发现所得系统具有内部优化目标,这些目标与我们对外部优化的目标有很大不同金宝博官方追求 - 然后我们必须更加谨慎。

当AI研究人金宝博娱乐员探索优化器的空间时,要确保研究人员发现的第一个高功能优化器是他们知道如何针对所选任务的优化者需要什么?我不确定,因为我仍然对这个问题感到困惑。我可以模糊地告诉你问题与收敛的乐器激励措施, 和I can observe various reasons why we shouldn’t expect the strategy “train a large cognitive system to optimize forX”实际上导致一个系统金宝博官方internally optimizes为了X,但是这个问题仍然很广泛,我不能说胡说八道。

As an example, AI systems like Deep Blue and AlphaGo cannot reasonably be said to be reasoning about the whole world. They’re reasoning about some much simpler abstract platonic environment, such as a Go board. There’s an intuitive sense in which we don’t need to worry about these systems taking over the world, for this reason (among others), even in the world where those systems are run on implausibly large amounts of compute.

含糊地说,在某种意义上,直到AI系统“对现实世界的推理”,就不会出现一些一致性困难。金宝博官方但是,这是什么意思?这似乎并不意味着“系统认为实际上包括现实本身的可能性空间”。金宝博官方即使完全缺乏可以用特定物理方程式描述宇宙的概念,古代人类也做出了很好的一般推理。

看来这必须意味着更像是“系统正在构建内部模型,从某种意义上说,这些模型几乎没有现实的代表。”金宝博官方但是,什么算作“现实的几乎没有代表”,为什么狩猎采集者对精神缠身的森林人数感到困惑,而棋盘却没有呢?所有这些问题都可能感到困惑;我的目标不是要命名连贯的问题,而是要以混乱的方向姿势,这使我无法精确地命名一部分对齐问题。

或者,简而言之:精确地命名问题是一半的战斗,我们目前对如何精确命名对齐问题感到困惑。

For an alternative attempt to name this concept, refer to Eliezer’srocket alignment比喻。为了进一步讨论当今概念的某些原因,似乎不足以用足够精确地描述一致的智能,请参阅Scott和Abram的recent write-up。(或亲自与我们讨论AI Risk for Computer Scientists“ 作坊。)

为什么这项研究在这金宝博娱乐里和现在可能是可以解决的

在特定地方和时代,许多类型的金宝博娱乐研究变得容易得多。在我看来,对于对AI的一致性而变得不那么困惑的工作,Miri在2018年(我认为,在未来的几年中)就是其中之一。

Why? One point is that MIRI has some history of success at deconfusion-style research (according to me, at least), and MIRI’s researchers are beneficiaries of the local research traditions that grew up in dialog with that work. Among the bits of conceptual progress that MIRI contributed to are:

Logical inductors, as an example, give us at least a clue about why we’re apt to informally use words like “probably” in mathematical reasoning. It’s not a full answer to “how does probabilistic reasoning about mathematical facts work?”, but it does feel like an interesting hint—which is relevant to thinking about how “real-world” AI reasoning could possibly work, because AI systems might well also use probabilistic reasoning in mathematics.

第二点是,如果有些东西使大多数人团结在Miri除了a drive to increase the odds of human survival, it is probably a taste for getting our understanding of the foundations of the universe right. Many of us came in with this taste—for example, many of us have backgrounds in physics (and fundamental physics in particular), and those of us with a background in programming tend to have an interest in things like type theory, formal logic, and/or probability theory.

第三点,如前所述多于,是我们对当前的研究直觉的身体感到兴奋,以及它们如何随着时间的流逝而变得越来越可转移/跨涂抹/具体化。金宝博娱乐

最后,我观察到,整个AI领域目前正在高度生命,这在很大程度上是由于深度学习革命和机器学习方面的其他各种进步。我们本身并不特别专注于深度神经网络,但是与充满活力和令人兴奋的实践领域接触的是那种倾向于激发想法的事情。2018年似乎确实是一个非常容易的时间来寻求AI对齐理论科学,与实用的AI方法开始使用的对话中。

3. Nondisclosed-by-default research, and how this policy fits into our overall strategy

MIRI recently decided to make most of its research “nondisclosed-by-default,” by which we mean that going forward, most results discovered within MIRI will remain internal-only unless there is an explicit decision to release those results, based usually on a specific anticipated safety upside from their release.

我想试图分享我们选择这项政策的理解,尤其是因为这项政策可能对许多对AI安全感兴趣的研究领域感兴趣的人感到失望或不便。金宝博娱乐9Miri是一个非营利组织,并且有一个自然的默认假设,即我们的善良机制是定期发布新的想法和见解。但是,我们认为这不是当前服务于我们的非营利任务的正确选择。

为什么我们选择此政策的简短版本是:

  • we’re in a hurry to decrease existential risk;

  • in the same way that Faraday’s journals aren’t nearly as useful as Maxwell’s equations, and in the same way that logical induction isn’t all that useful to the average modern ML researcher, we don’t think it would be that useful to try to share lots of half-confused thoughts with a wider set of people;

  • 我们认为,如果我们专注于提出新的研究进展而不是在博览会上,并且如果我们对向广泛的受众的直觉辩解,我们可以更快地拥有更多的关键见解;金宝博娱乐

  • 我们认为,对解灌注风格的见解是否可能导致能力见解并在经验上观察到,当我们不必担心这一点时,我们可以更自由地思考。和

  • 即使我们得出结论,这些问题是在反思后是偏执或愚蠢的,我们也受益于将这些恐惧从“内部共享见解”中评估到“在广泛分发这些见解之前”的认知工作,这是本政策所启用的。

下面的更长版本。

我要警告说,在下面的事情中,我试图传达我的信念,但不一定是为什么 - 我并不是要提出一个会导致任何理性的人在我的立场上采取相同策略的论点;我只是为了传达自己如何思考决定的更为适中的目标。

我将首先说几句话,说明我们的研究如何适合我们的整体战略,然后讨论该政策的利弊。金宝博娱乐

When we say we’re doing AI alignment research, we really genuinely don’t mean outreach

目前,Miri的目的是在对齐问题上取得研究进展。金宝博娱乐我们的重点不是将ML领域转向更加认真地对待AGI安全,也不是在任何其他形式的影响力,说服或实地建设上。我们只是只是为了直接在对齐核心问题上取得研究进展。金宝博娱乐

This choice may seem surprising to some readers—field-building and other forms of outreach can obviously have hugely beneficial effects, and throughout MIRI’s history, we’ve been much more outreach-oriented than the typical math research group.

Our impression is indeed that well-targeted outreach efforts can be highly valuable. However, attempts at outreach/influence/field-building seem to us to currently constitute a large majority of worldwide research activity that’s motivated by AGI safety concerns,10such that MIRI’s time is better spent on taking a straight shot at the core research problems. Further, we think our own comparative advantage lies here, and not in outreach work.11

My beliefs here are connected to my beliefs about the mechanics of deconfusion described多于。特别是,我认为一旦可以精确命名,对齐问题可能会开始变得更加容易,而且我相信精确地命名此类问题可能是一个串行挑战 - 因为某种意义上说,直到其他人才能达到某些反应性,直到其他人才能达到另一种挑战。反应已经成熟。此外,我关于历史的读物说,反见人经常来自相对较小的社区,认为正确的思想(如法拉第和麦克斯韦的情况下),并且一旦周围的概念变得连贯,这种反应就可以迅速传播(如by Bostrom’s超级智能). I conclude from all this that trying to influence the wider field isn’t the best place to spend our own efforts.

很难预测成功的解灌注工作是否可以引发能力的进步

We think that most of MIRI’s expected impact comes from worlds in which our deconfusion work eventually succeeds—that is, worlds where our research eventually leads to a principled understanding of alignable optimization that can be communicated to AI researchers, more akin to a modern understanding of calculus and differential equations than to Faraday’s notebooks (with the caveat that most of us aren’t expecting solutions to the alignment problem to compress nearly so well as calculus or Maxwell’s equations, but I digress).

One pretty plausible way this could go is that our deconfusion work makes alignment possible, without much changing the set of available pathways to AGI.12To pick a trivial analogy illustrating this sort of world, consider间隔算术as compared to the usual way of doing floating point operations. In interval arithmetic, an operation likeSQRTtakes two floating point numbers, a lower and an upper bound, and returns a lower and an upper bound on the result. Figuring out how to do interval arithmetic requires some careful thinking about the error of floating-point computations, and it certainly won’t speed those computations up; the only reason to use it is to ensure that the error incurred in a floating point operation isn’t larger than the user assumed. If you discover interval arithmetic, you’re at no risk of speeding up modern matrix multiplications, despite the fact that you really have found a new way of doing arithmetic that has certain desirable properties that normal floating-point arithmetic lacks.

在worlds where deconfusing ourselves about alignment leads us primarily to insights similar (on this axis) to interval arithmetic, it would be best for MIRI to distribute its research as widely as possible, especially once it has reached a stage where it is comparatively easy to communicate, in order to encourage AI capabilities researchers to adopt and build upon it.

但是,对于我们来说,成功的优化理论本身可能会引发AI能力的新研究方向,这对我们来说也是合理的。金宝博娱乐为了进行类比,请考虑从经典概率理论和统计学到现代深度神经网络分类图像的发展。仅概率理论并不能让您对猫的图片进行分类,并且可以理解和实施图像分类网络而无需考虑概率理论。但是,概率理论和统计学对于实际发现机器学习的方式至关重要,并且仍然是现代深度学习研究人员如何看待其算法的基础。金宝博娱乐

在worlds where deconfusing ourselves about alignment leads to insights similar (on this axis) to probability theory, it is much less clear whether distributing our results widely would have a positive impact. It goes without saying that we want to have a positive impact (or, at the very least, a neutral impact), even in those sorts of worlds.

后一种情况在世界上相对不太重要AGI timelines很短。If current deep learning research is already on the brink of AGI, for example, then it becomes less plausible that the results of MIRI’s deconfusion work could become a relevant influence on AI capabilities research, and most of the potential impact of our work would come from its direct applicability to deep-learning-based systems. While many of us at MIRI believe that short timelines are at least plausible, there is significant uncertainty and disagreement about timelines inside MIRI, and I would not feel comfortable committing to a course of action that is safe only in worlds where timelines are short.

总而言之,如果我们继续取得进步并最终取得了基本取得的成功,弄清了实际的“在其关节上裂解自然”的概念,使我们对一致性一致地思考,我发现相同的概念也可以实现能力,这是非常合理的boosts (especially in worlds where there’s a lot of time for those concepts to be pushed in capabilities-facing directions). There is certainly strong historical precedent for deep scientific insights yielding unexpected practical applications.

By the nature of deconfusion work, it seems very difficult to predict in advance which other ideas a given insight may unlock. These considerations seem to us to call for conservatism and delay on information releases—potentially very long delays, as it can take quite a bit of time to figure out where a given insight leads.

我们需要研究人员在自己的金宝博娱乐头上没有墙壁

We take our research seriously at MIRI. This means that, for many of us, we know in the back of our minds that deconfusion-style research could sometimes (often in an unpredictable fashion) open up pathways that can lead to capabilities insights in the manner discussed above. As a consequence, many MIRI researchers flinch away from having insights when they haven’t spent a lot of time thinking about the potential capabilities implications of those insights down the line—and they usually haven’t spent that time, because it requires a bunch of cognitive overhead. This effect has been evidenced in reports from researchers, myself included, and we’ve empirically observed that when we set up “closed” research retreats or research rooms,13金宝博娱乐研究人员报告说,他们可以更自由地思考,他们的集思广益会议进一步扩大,更广泛,等等。

This sort of inhibition seems quite bad for research progress. It is not a small area that our researchers were (un- or semi-consciously) holding back from; it’s a reasonably wide swath that may well include most of the deep ideas or insights we’re looking for.

同时,这种谨慎是在公共场所进行反灌注研究的不可避免的结果,因为很难知道在给定的见解后五到十年可能会发生什么想法。金宝博娱乐AI对齐工作和AI能力工作是足够的邻居,以使AI附近的许多见解“可能与能力相关,直到无害,这都是出于上述原因,也是从保守派的角度讨论的。安全心态我们试图鼓励这里。

在short, if we request that our brains come up with alignment ideas that are fine to share with everybody—and this is what we’re implicitly doing when we think of ourselves as “researching publicly”—then we’re requesting that our brains cut off the massive portion of the search space that is only probably safe.

If our goal is to make research progress as quickly as possible, in hopes of having concepts coherent enough to allow rigorous safety engineering by the time AGI arrives, then it seems worth finding ways to allow our researchers to think without constraints, even when those ways are somewhat expensive.

重点似乎对这类工作异常有用

尽管我们认为这不是一个主要的考虑因素,但帮助释放研究人员的注意力可能会产生一些额外的加速效果。金宝博娱乐

Historically, early-stage scientific work has often been done by people who were solitary or geographically isolated, perhaps because this makes it easier to slowly develop a new way to factor the phenomenon, instead of repeatedly translating ideas into the current language others are using. It’s difficult to describe how much mental space and effort turns out to be taken up with thoughts of how your research will look to other people staring at you, until you try going into a closed room for an extended period of time with a promise to yourself that all the conversation within it really won’t be shared at all anytime soon.

Once we realized this was going on, we realized that in retrospect, we may have been ignoring common practice, in a way. Many startup founders have reported finding stealth mode, and funding that isn’t from VC outsiders, tremendously useful for focus. For this reason, we’ve also recently been encouraging researchers at MIRI to worry less about appealing to a wide audience when doing public-facing work. We want researchers to focus mainly on whatever research directions they find most compelling, make exposition and distillation a secondary priority, and not worry about optimizing ideas for persuasiveness or for being easier to defend.

Early deconfusion work just isn’t that useful (yet)

ML researchers aren’t running around using logical induction or functional decision theory. These theories don’t have practical relevance to the researchers on the ground, and they’re not supposed to; the point of these theories is just deconfusion.

To put it more precisely, the theories themselves aren’t the interesting novelty; the novelty is that a few years ago, we couldn’t write down任何这要么y of how in principle to assign sane-seeming probabilities to mathematical facts, and today we can write down logical induction. In the journey from pointA指向B,我们变得不那么困惑了。逻辑归纳paper is an artifact witnessing that deconfusion, and an artifact which granted its authors additional deconfusion as they went through the process of writing it; but the thing that excited me about logical induction was not any one particular algorithm or theorem in the paper, but rather the fact that we’re a little bit less in-the-dark than we were about how a reasoner can reasonably assign probabilities to logical sentences. We’re not fully out of the dark on this front, mind you, but we’re a little less confused than we were before.14

If the rest of the world were talking about how confusing they find the AI alignment topics we’re confused about, and were as concerned about their confusions as we are concerned about ours, then failing to share our research would feel a lot more costly to me. But as things stand, most people in the space look at us kind of funny when we say that we’re excited about things like logical induction, and I repeatedly encounter deep misunderstandings when I talk to people who have read some of our papers and tried to infer our research motivations, from which I conclude that they weren’t drawing a lot of benefit from my current ramblings anyway.

从某种意义上说,我们目前的大多数研究都是一种漫步的一种形式 - 充其量金宝博娱乐以同样的方式,法拉第的日记也在漫游。可以,如果大多数实用的科学家避免在法拉第期刊上闲逛并等到麦克斯韦出现并将其提炼成三个有用的方程式,那就可以了。而且,如果法拉第期望物理理论最终提炼,他就不需要四处传播他的日记 - 他只能等到它被蒸馏,然后努力传播一些较少的概念。

我们希望我们理解对齐,这是currently far from complete, to eventually distill, and I, at least, am not very excited about attempting to push it on anyone until it’s significantly more distilled. (Or, barring full distillation, until a project with a commitment to the common good, an adequate安全心态, 和a large professed interest in deconfusion research comes knocking.)

在此期间,当然,Miri以外的一些研究人员关心我们遇到的同样问题,并且也在追求解灌金宝博娱乐注。我们的非默认政策将对我们在其他研究方向上与这些人合作的能力产生负面影响,这是一个真正的成本,不值得驳回。金宝博娱乐除了指出,如果您是其中之一,我在这里没有更多的话要说请与我们联系(您可能要考虑加入团队

我们将更好地了解将来要分享或不分享的内容

从长远来看,如果我们的研究将是有用的,我们的发现金宝博娱乐将需要进入世界,它们可以影响人类建立AI系统的方式。金宝博官方但是,这并不是根据最终分发的需求(某种形式的),我们可以立即发表我们所有的研究。金宝博娱乐如上所述,正如我能说的那样,我们当前的研究见解实际上并不是那么有用,并且共享早期解灌注研究是耗时的。金宝博娱乐

Our nondisclosed-by-default policy also allows us to preserve options like:

  • deciding which research findings we think should be developed further, while thinking aboutdifferential technological development;和
  • deciding which group(s) to share each interesting finding with (e.g., the general public, other closed safety research groups, groups with strong commitment to security mindset and the common good, etc.).

我们的未来版本显然具有更好的能力来呼吁这类问题,尽管这需要权衡与朝相反方向推动的许多事实相比 - 我们以后决定要发布什么,其他人必须越少。,在过渡期(从而浪费了重复的努力)中,它越有可能独立发现,依此类推。

Now that I’ve listed reasons in favor of our nondisclosed-by-default policy, I’ll note some reasons against.

Considerations pulling against our nondisclosed-by-default policy

There are a host of pathways via which our work will be harder with this nondisclosed-by-default policy:

  1. We will have a harder time attracting and evaluating new researchers; sharing less research means getting fewer chances to try out various research collaborations and notice which collaborations work well for both parties.

  2. We lose some of the benefits of accelerating the progress of other researchers outside MIRI via sharing useful insights with them in real time as they are generated.

  3. 由于我们将与他们分享较少的工作,因此我们将获得访问者,远程学者和研究人员的有用科学见解和反馈。金宝博娱乐

  4. 我们将很难吸引资金和其他间接援助,而我们的工作较少,而潜在捐助者很难知道我们的工作是否值得支持。

  5. We will have to pay various costs associated with keeping research private, including social costs and logistical overhead.

我们预计这些成本将是可观的。我们将努力抵消一些损失a,正如我将在下一节中讨论的那样。出于讨论的原因多于, I’m not presently very worried aboutb。剩余的费用可能会全额支付。

These costs are why we didn’t adopt this policy (for most of our research) years ago. With outreach feeling less like our comparative advantage than it did in thepre-Puerto-Ricodays, and funding seeming like less of a bottleneck than it used to (though still something of a bottleneck), this approach now seems workable.

We’ve already found it helpful in practice to let researchers have insights first and sort out the safety or desirability of publishing later. On the whole, then, we expect this policy to cause a significant net speed-up to our research progress, while ensuring that we can responsibly investigate some of the most important technical questions on our radar.

4.加入美里团队

I believe that MIRI is, and will be for at least the next several years, a focal point of one of those rare scientifically exciting points in history, where the conditions are just right for humanity to substantially deconfuse itself about an area of inquiry it’s been pursuing for centuries—and one where the output is directly impactful in a way that is rare even among scientifically exciting places and times.

What can we offer? On my view:

  • Work that Eliezer, Benya, myself, and a number of researchers in AI safety view as having a significant chance of boosting humanity’s survival odds.

  • Work that, if it pans out, visibly has central relevance to the alignment problem—the kind of work that has a meaningful chance of shedding light on problems like “is there a loophole-free way to upper-bound the amount of optimization occurring within an optimizer?”.

  • Problems that, if your tastes match ours, feel closely related to fundamental questions about intelligence, agency, and the structure of reality; and the associated thrill of working on one of the great and wild frontiers of human knowledge, with large and important insights potentially close at hand.

  • 人们认真对待自己和他人的研究进展的氛围。金宝博娱乐例如,您可以期望每天上班的同事们希望真正在AI对准问题上取得进展,并希望将他们的思维不同的侧面思考直到发生。我对Miri员工完成工作的动力印象深刻 - 他们对他们的工作确实很重要,以及他们对互相帮助的热情的事实明显地感激。

  • 作为Miri的越来越重点,经验基础上的计算机科学工作在AI对齐问题上,并清楚地反馈了“我的代码类型检查吗?”的形式。或“我们有证据吗?”。

  • Finally, some good, old-fashioned fun—for a certain very specific brand of “fun” that includes the satisfaction that comes from making progress on important technical challenges, the enjoyment that comes from pursuing lines of research you find compelling without needing to worry about writing grant proposals or otherwise raising funds, and the thrill that follows when you finally manage to distill a nugget of truth from a thick cloud of confusion.

在Miri工作还意味着与其他因素所吸引的其他人一起工作 - 在我看来,人们对人类福利和整个有情生活的福利有着不寻常的照顾和关注,这是一种不寻常的创造力并坚持处理重大技术问题,具有不同的认知反思和技巧的程度,并具有观点的效果以及异常水平的功效和毅力。

我在Miri的经验是,这是一群真正想要帮助团队生活的人,从可能会极大地影响我们的未来的大型活动中获得良好的成果。谁可以直接解决重大挑战而不吸引人虚假的叙述关于给定方法成功的可能性;他们非常擅长在新的证据上进行流畅的更新,并创造一个非常有趣的协作环境。

我们正在寻找谁?

We’re seeking anyone who can cause our “become less confused about AI alignment” work to go faster.

在实践中,这意味着:在数学或代码中本地思考的人,他们认真对待AI对齐(迅速!)并且通常有能力的问题。特别是,我们正在寻找高端Google程序员的功能级别;您不需要1万分之一的考试成绩或halodestiny。您也不需要博士学位,明确的ML背景甚至先前的研究经验。金宝博娱乐

即使您没有指向我们的研究议程,我们仍打算为任何深入,良好和真正的新想法提供资金金宝博娱乐或安排资金。这可能是作为雇用,奖学金赠款或可能需要的其他安排。

如果您认为您可能想在这里工作该怎么办

如果你想要更多的信息,有几个好的options:

  • 和...聊天Buck Shlegeris,一位美里计算机科学家,帮助我们的招聘。除了回答您的任何问题并进行采访外,Buck有时还可以帮助熟练的程序员通过我们AI Safety Retraining Program

  • 如果您已经在Miri上认识其他人并与他们交谈似乎更好,那么您也可以与那个人接触-尤其布莱克·博格森(Blake Borgeson)(帮助我们进行技术招聘的新的Miri董事会成员)或Anna Salamon(a MIRI board member who is also the president of CFAR, and is helping run some MIRI recruiting events).

  • 来4.5天AI Risk for Computer Scientists车间,由Miri和CFAR共同运行。These workshops are open only to people who Buck arbitrarily deems “probably above MIRI’s technical hiring bar,” though their scope is wider than simply hiring for MIRI—the basic idea is to get a bunch of highly capable computer scientists together to try to fathom AI risk (with a good bit of rationality content, and of trying to fathom the way we’re failing to fathom AI risk, thrown in for good measure).

    这些是获得美里文化的好方法,无论您是否有兴趣为Miri工作,都可以挑选许多思维工具。如果您想申请自己参加或提名您的朋友,在这里向我们发送您的信息

  • 来年MIRI Summer Fellows program,或者是暑期实习生和我们。对于针对代理基金会的数学人来说,这是一个更好的选择,而不是针对我们新研究方向的计算机科学人员。金宝博娱乐去年夏天,我们聘请了6个实习生和30个Miri Summer Fellows(请参阅Malo的Summer MIRI Updatespost for more details). Also, note that “summer internships” need not occur during summer, if some other schedule is better for you. ContactColm Ó Riain如果您有兴趣。

  • 你可以尝试applying for a job

Some final notes

关于“inferential distance,”或有时了解美里研究人员的观点所需的内容:金宝博娱乐To many, MIRI’s take on things is really weird. Many people who bump into our writing somewhere find our basic outlook pointlessly weird/silly/wrong, and thus find us uncompelling forever. Even among those who do ultimately find MIRI compelling, many start off thinking it’s weird/silly/wrong and then, after some months or years of MIRI’s worldview slowly rubbing off on them, eventually find that our worldview makes a bunch of unexpected sense.

如果您认为自己可能属于后一类,并且这种观点的改变,如果发生这种情况,wouldbebecauseMiri的世界观是一件事情,不是因为我们都被欺骗false-but-compellingideas… you might want to start exposing yourself to all this funny worldview stuff now, and see where it takes you. Good starting-points areRationality: From AI to Zombies;在adequate Equilibria;哈利·波特和理性方法;这“AI Risk for Computer Scientists”讲习班;普通的CFAR workshops;或只是与人们一起闲逛或near美里。

我怀疑我没有在过去的失败尝试传达我的观点的尝试中传达上面的一些关键事情,并且基于一些读者的一些读者,这些帖子的早期草稿缺少我想说的关键事情。我试图澄清尽可能多的要点(因此这篇文章的长度!),但最终,“我们专注于研究而不是博览会”也适合我,我需要重新开始工作。金宝博娱乐15

A note on the state of the field:Miri是试图解决AI Alignment技术问题的专门团队之一,但我们并不是唯一的团队。目前还有其他三个:Center for Human-Compatible AIat UC Berkeley, and the safety teams atOpenai并在Google Deepmind。这三个安全团队都具有高度有能力的顶级研究小组,如果您想在该领域有所作为,我们也建议他们作为潜在的加入场所。金宝博娱乐

There are also solid researchers based at many other institutions, like the Future of Humanity Institute, whoseAI计划的治理重点关注与AGI发展相关的重要社会/协调问题。

要了解有关Miri和其他小组的AI对齐研究的更多信息,我建议您制作金宝博娱乐Miri-ward代理基金会Embedded Agency写作;Dario Amodei,Chris Olah等人具体问题agenda; theAI Alignment Forum;和Paul ChristianoDeepMind安全团队的博客。

在这里工作:这里的薪水比人们通常想象的更灵活。I’ve had a number of conversations with folks who assumed that because we’re a nonprofit, we wouldn’t be able to pay them enough to maintain their desired standard of living, meet their financial goals, support their family well, or similar. This is false. If you bring the right skills, we’re likely able to provide the compensation you need. We also place a high value on weekends and vacation time, on avoiding burnout, and in general on people here being happy and thriving.

您确实需要在伯克利身体上与我们合作,以我们认为最令人兴奋的项目,尽管我们对搬家的搬迁协助和OPS提供了很大的支持。

尽管在Miri工作中有很多伟大的事情,但如果您想要的只是一份工作,我会考虑在这里工作非常糟糕。重新定位应对全球主要风险的工作不太可能是大多数人使用的最享乐或放松选择。

On the other hand, if you like the idea of an epic calling with a group of people who somehow claim to take seriously a task that sounds more like it comes from a science fiction novel than from a迪尔伯特strip, while having a lot of scientific fun; or you just care about humanity’s future, and want to help however you can… give us a call.


  1. 这篇文章是由各种Miri员工组合在一起的汞合金。说“ nate”的旁白意味着我(内特)认可帖子,许多概念和主题很大程度上来自我,我写了很多单词。但是,我没有写所有的单词,概念和主题是与其他许多Miri员工合作建立的。(这大致是章程在Miri博客上已经有一段时间的含义,值得一提。)
  2. See our 2017strategic updatefundraiserposts for more details.
  3. past筹款活动, we’ve said that with sufficient funding we would like to spin up alternative lines of attack on the alignment problem. Our new research directions can be seen as following this spirit, and indeed, at least one of our new research directions is heavily inspired by alternative approaches I was considering back in 2015. That said, unlike many of the ideas I had in mind when writing our 2015 fundraiser posts, our new work is quite contiguous with our Agent-Foundations-style research.
  4. That is, the requisites for aligning AGI systems to perform limited任务;not all of the requisites for aligning a fullCEV-class自主agi。Compare Paul Christiano’s distinction between雄心勃勃的价值学习(尽管请注意,保罗认为狭窄的价值学习足以实现强烈自主的AGI)。
  5. 该结果在很快就会发表的论文中得到了更多描述。或者,至少最终。由于下面讨论的原因,这些天我没有花很多时间写论文。
  6. For more discussion of this concept, see “Personal Thoughts on Careers in AI Policy and Strategy”卡里克·弗林(Carrick Flynn)。
  7. Historical examples of deconfusion work that gave rise to a rich and healthy field include the distillation of Lagrangian and Hamiltonian mechanics from Newton’s laws; Cauchy’s overhaul of real analysis; the slow acceptance of the usefulness of complex numbers; and the development of formal foundations of mathematics.
  8. 我应该强调,从我的角度来看,人类永远不会建立AGI,从未意识到我们的潜力,也没有使用cosmic endowmentwould be a tragedy comparable (on anastronomicalscale) to AGI wiping us out. I say “hazardous”, but we shouldn’t lose sight of the upside of humanity getting the job done right.
  9. 我自己的感觉是,我和Miri的其他高级职员从未特别好的在解释我们在做什么以及原因时,这种不便可能不是新事物。但是,对于我们来说,这是新的试图解释我们来自哪里。
  10. 换句话说,许多人仅明确地专注于外展,许多其他人正在选择技术问题,以加强该领域并将其吸引到该领域的既定目标。
  11. 这并不意味着没有其他人对核心问题进行直接射击。例如,Openai的Paul Christiano是一位顶级研究员,正是这样做金宝博娱乐的。但是,我们仍然希望在目前的利润率上更多。
  12. 例如,也许最简单的AGI途径涉及跟随当今梯度下降和深度学习技巧的后代,而对AGI的友善也许也是如此。
  13. 在other words, retreats/rooms where it is common knowledge that all thoughts and ideas are not going to be shared, except perhaps after some lengthy and irritating bureaucratic process and with everyone’s active support.
  14. 顺便说一句,也许我试图发表学术论文的主要不适是,在AI中似乎没有场地,我们可以说:“嘿,请检查一下 - 我们曾经对此感到困惑X,现在我们可以说Y,这意味着我们的困惑不那么困惑!”我认为这背后有很多原因,尤其是,混乱的本质如此之多Y通常一旦说出来,听起来很明显,因此使这样的结果听起来像是令人印象深刻的实践结果,这一点尤其困难。

    A side effect of this, unfortunately, is that all MIRI papers that I’ve ever written with the goal of academic publishing do a pretty bad job of saying what I was previously confused about, and how the “result” is indicative of me becoming less confused—for which I hereby apologize.

  15. 如果您还有更多问题,我鼓励您向我们发送电子邮件contact@www.gqpatrol.com