安全工程,目标选择和对准理论

||Analysis


Artificial intelligence capabilities research is aimed at making computer systems more intelligent — able to solve a wider range of problems more effectively and efficiently. We can distinguish this from research specifically aimed at making AI systems at various capability levels safer, or more “robust and beneficial.” In this post, I distinguish three kinds of direct research that might be thought of as “AI safety” work:安全工程师ing,target selection, 和对准理论.

想象一下,在制定对微积分或天体力学的坚定了解之前,人类在某种程度上发育了较重的飞行。在一个这样的世界里,将需要哪些工作以便将人类安全到月球?

在这种情况下,我们可以说在汉族的主要任务d is one of engineering a rocket and refining fuel such that the rocket, when launched, accelerates upwards and does not explode. The boundary of space can be compared to the boundary between narrowly intelligent and generally intelligent AI. Both boundaries are fuzzy, but have engineering importance: spacecraft and aircraft have different uses and face different constraints.

与发展火箭能力的这项任务配对是一个安全工程师ingtask. Safety engineering is the art of ensuring that an engineered system provides acceptable levels of safety. When it comes to achieving a soft landing on the Moon, there are many different roles for safety engineering to play. One team of engineers might ensure that the materials used in constructing the rocket are capable of withstanding the stress of a rocket launch with significant margin for error. Another might designescape systems即使在失败的情况下,确保火箭中的人类可以存活。另一个人可以设计能够在危险环境中支持船员的生活支持系统。金宝博官方

另一个重要的任务target selection,即,挑选月球的地方。在月球使命的情况下,瞄准研究可能需要像设计和建造望远镜一样的东西(如果他们不存在)金宝博娱乐并识别月球上的着陆区。当然,只能提前完成如此多的目标,并且可能需要设计月球落地车辆,以便它可以alter the landing target at the last minute as new data comes in; this again would require feats of engineering.

Beyond the task of (safely) reaching escape velocity and figuring out where you want to go, there is one more crucial prerequisite for landing on the Moon. This is rocket结盟金宝博娱乐研究,达到所需的技术工作correct final destination. We’ll use this as an analogy to illustrate MIRI’s research focus, the problem ofartificial intelligence结盟.

The alignment challenge

Hitting a certain target on the Moon不是简单的as carefully pointing the nose of the rocket at the relevant lunar coordinate and hitting “launch” — not even if you trust your pilots to make course corrections as necessary. There’s also the important task of plotting trajectories between celestial bodies.

Image credit: NASA/Bill Ingalls

This rocket alignment task may require a distinct body of theoretical knowledge that isn’t required just for getting a payload off of the planet. Without calculus, designing a functional rocket would be enormously difficult. Still, with enough tenacity and enough resources to spare, we could imagine a civilization reaching space after many years of trial and error — at which point they would be confronted with the problem that reaching space isn’t sufficient for steering toward a specific location.1

The first rocket alignment researchers might ask, “What trajectory would we have our rocket take under ideal conditions, without worrying about winds or explosions or fuel efficiency?” If even that question were beyond their current abilities, they might simplify the problem still further, asking, “At what angle and velocity would we fire acannonballsuch that it enters a stable orbit around Earth, assuming that Earth is perfectly spherical and has no atmosphere?”

对于早期火箭工程师来说,即使是建造任何使其离开发射垫的车辆的问题也仍然是一个令人沮丧的任务,对齐理论家的问题可能会看起来不一致。工程师可能会问“难道你不知道火箭不会被解雇吗?”或者“地球周围的圈子会与月球有什么关系?”然而,在达到月球上的柔和着陆时,却理解火箭对齐非常重要。如果您尚未知道在什么角度和速度射击炮弹would最终在一个完美的球形上的稳定轨道上,没有大气,那么您可能需要在尝试月亮使命之前更好地了解天体力学。

Three forms of AI safety research

The case is similar with AI research. AI capabilities work comes part and parcel with associated safety engineering tasks. Working today, an AI安全工程师might focus on making the internals of large classes of software more transparent and interpretable by humans. They might ensure that the system fails gracefully in the face ofadversarial observations. They might design security protocols and early warning systems that help operators prevent or handle system failures.2

AI safety engineering is indispensable work, and it’s infeasible to separate safety engineering from capabilities engineering. Day-to-day safety work in aerospace engineering doesn’t rely on committees of ethicists peering over engineers’ shoulders. Some engineers will happen to spend their time on components of the system that are there for reasons of safety — such as failsafe mechanisms or fallback life-support — but safety engineering is an integral part of engineering for safety-critical systems, rather than a separate discipline.

在AI的域名,target selectionaddresses the question: if one could build a powerful AI system, what should one use it for? The potential development ofsuperintelligence在理论和应用道德中提高了一些棘手的问题。其中一些问题可以在不久的将来被道德哲学家和心理学家以及AI研究界的不符合令人愉快的解决。金宝博娱乐毫无疑问,其他人需要留给未来。斯图尔特罗素迄今为止predictthat “in the future, moral philosophy will be a key industry sector.” We agree that this is an important area of study, but it is not the main focus of the Machine Intelligence Research Institute.

Researchers at MIRI focus on problems of AI结盟: the study of how in principle to direct a powerful AI system towards a specific goal. Where target selection is about the destination of the “rocket” (“what effects do we want AI systems to have on our civilization?”) and AI capabilities engineering is about getting the rocket to escape velocity (“how do we make AI systems powerful enough to help us achieve our goals?”), alignment is about knowing how to aim rockets towards particular celestial bodies (“assuming we could build highly capable AI systems, how would we direct them at our targets?”). Since our understanding of AI alignmentis still at the “what is calculus?” stage, we ask questions analogous to “at what angle and velocity would we fire a cannonball to put it in a stable orbit, if Earth were perfectly spherical and had no atmosphere?”

Selecting promising AI alignment research paths is not a simple task. With the benefit of hindsight, it’s easy enough to say that early rocket alignment researchers should begin by inventing calculus and studying gravitation. For someone who doesn’t yet have a clear understanding of what “calculus” or “gravitation” are, however, choosing research topics might be quite a bit more difficult. The fruitful research directions would need to compete with fruitless ones, such as studying aether or Aristotelian physics; and which research programs are fruitless may not be obvious in advance.

Toward a theory of alignable agents

AI领域的“微积分”或“Gravitation”的作用是什么合理的候选人?

图片学分:Brian Brondel

At MIRI, we currently focus on subjects such as good reasoning under deductive limitations (逻辑不确定性), decision theories that work well even for agents embedded in large environments, and reasoning procedures that approve of the way they reason. This research often involves building toy models and studying problems under dramatic simplifications, analogous to assuming a perfectly spherical Earth with no atmosphere.

Developing theories of logical uncertainty isn’t what most people have in mind when they think of “AI safety research.” A natural thought here is to ask what specifically goes wrong if we don’t develop such theories. If an AI system can’t perform bounded reasoning in the domain of mathematics or logic, that doesn’t sound particularly “unsafe” — a system that needs to reason mathematically but can’t might be fairly无用, but it’s harder to see it becoming dangerous.

On our view, understanding logical uncertainty is important for helping us understand the systems we build well enough to justifiably conclude that they can be aligned in the first place. An analogous question in the case of rocket alignment might run: “If you don’t develop calculus, what bad thing happens to your rocket? Do you think the pilot will be struggling to make a course correction, and find that they simply can’t add up the tiny vectors fast enough?” The answer, though, isn’t that the pilot might struggle to correct their course, but rather that the trajectory that you thought led to the moon takes the rocket wildly off-course. The point of developing calculus is not to allow the pilot to make course corrections quickly; the point is to make it possible to discuss curved rocket trajectories in a world where the best tools available assume that rockets move in straight lines.

The case is similar with logical uncertainty. The problem is not that we visualize a specific AI system encountering a catastrophic failure because it mishandles logical uncertainty. The problem is that our best existing tools for analyzing rational agency assume that those agents are logically omniscient, making our best theories incommensurate with our best practical AI designs.3

此时,对准研究的目标不是解决特定的工程问题。金宝博娱乐早期火箭对齐研究的目标是开发共享语言和工具,用于产生和评估火箭轨迹,如金宝博娱乐果它们尚不存在,则需要开发微积分和天体力学。同样,AI对准研究的目标是开发共享语言和工具,用于生成和评估方法,通过该金宝博娱乐方法可以设计强大的AI系统以按预期作用。金宝博官方

人们可能担心很难为对齐研究设置成功的基准。金宝博娱乐是一种牛顿对足以尝试月亮着陆的引力的理解,或者必须在相信一个人在月球上轻轻地降落之前,必须一个完整的一般相对论理论?4

In the case of AI alignment, there is at least one obvious benchmark to focus on initially. Imagine we possessed an incredibly powerful computer with access to the internet, an automated factory, and large sums of money. If we could program that computer to reliably achieve some simple goal (such as producing as much diamond as possible), then a large share of the AI alignment research would be completed. This is because a large share of the problem is in understanding autonomous systems that are stable, error-tolerant, and demonstrably aligned with一些goal. Developing the ability to steer rockets in一些有信心的方向比开发将火箭转向特定的月球位置的额外能力更难。

The pursuit of a goal such as this one is more or lessMIRI’s approach对人工智能定位研究rch. We think of this as our version of the question, “Could you hit the Moon with a rocket if fuel and winds were no concern?” Answering that question, on its own, won’t ensure that smarter-than-human AI systems are aligned with our goals; but it would represent a major advance over our current knowledge, and it doesn’t look like the kind of basic insight that we can safely skip over.

What next?

Over the past year, we’ve seen amassiveincreasein attention towards the task of ensuring that future AI systems arerobust and beneficial. AI safety work is being taken very seriously, and AI engineers are stepping up and acknowledging that安全工程不可与能力工程不可分居. It is becoming apparent that as the field of artificial intelligence matures, safety engineering will become a more and more firmly embedded part of AI culture. Meanwhile, new investigations of target selection and other safety questions will be showcased at anAI和伦理研讨会at AAAI-16, one of the larger annual conferences in the field.

A fourth variety of safety work is also receiving increased support:strategy金宝博娱乐研究。如果您的国家目前正在从事冷战并锁在太空竞赛中,您可能希望咨询游戏理论家和战略家,以确保您将一个人放在月球上的尝试不会扰乱细微的政治平衡导致核战争。5If international coalitions will be required in order to establishtreaties regarding the use of space然后外交也可能成为安全工作的相关方面。与AI达到同一原则,在联盟建设和全球协调可能在技术的开发和使用中发挥重要作用。

Strategy research has been on the rise this year.ai影响正在制定与该潜在世界不断变化技术的设计师相关的战略分析,并将很快加入Strategic Artificial Intelligence Research Centre. The newLeverhulme Centre for the Future of Intelligencewill be pulling together people across many different disciplines to study the social impact of AI, forging new collaborations. TheGlobal Priorities Project, meanwhile, is analyzing what types of interventions might be most effective at ensuring positive outcomes from the development of powerful AI systems.

The field is moving fast, and these developments are quite exciting. Throughout it all, though, AI alignment research in particular still seems largely under-served.

MIRI is not the only group working on AI alignment; a handful of researchers from other organizations and institutions are also beginning to ask similar questions. MIRI’s particular approach to AI alignment research is by no means the only way one available — when first thinking about how to put humans on the Moon, one might want to consider both rockets and space elevators. Regardless of who does the research or where they do it, it is important that alignment research receive attention.

Smarter-than-human AI systems may be many decades away, and they may not closely resemble any existing software. This limits our ability to identify productive safety engineering approaches. At the same time, thedifficulty of specifying our values使得难以识别道德理论的生产性研究。金宝博娱乐对准研究具有摘要的优金宝博娱乐点,足以可能适用于各种未来的计算系统,同时可操作地承认明确的进步。金宝博官方因此,我们认为,我们认为AI安全领域将能够在没有丢失AI中最相关的问题的情况下将自己接地。

Safety engineering, moral theory, strategy, and general collaboration-building are all important parts of the project of developing safe and useful AI. On the whole, these areas look poised to thrive as a result of the recent rise in interest in long-term outcomes, and I’m thrilled to see more effort and investment going towards those important tasks.

The question is: What do we need to invest in next? The type of growth that I most want to see happen in the AI community next would be growth in AI alignment research, via the formation of new groups or organizations focused primarily on AI alignment and the expansion of existing AI alignment teams at MIRI, UC Berkeley, the Future of Humanity Institute at Oxford, and other institutions.

Before trying to land a rocket on the Moon, it’s important that we know how we would put a cannonball into a stable orbit. Absent a good theoretical understanding of rocket alignment, it might well be possible for a civilization to eventually reach escape velocity; but getting somewhere valuable and exciting and new, and getting there reliably, is a whole extra challenge.


My thanks to Eliezer Yudkowsky for introducing the idea behind this post, and to Lloyd Strohl III, Rob Bensinger, and others for helping review the content.


  1. Similarly, we could imagine a civilization that lives on the only planet in its solar system, or lives on a planet with perpetual cloud cover obscuring all objects except the Sun and Moon. Such a civilization might have an adequate understanding of terrestrial mechanics while lacking a model of celestial mechanics and lacking the knowledge that the same dynamical laws hold on Earth and in space. There would then be a gap in experts’ theoretical understanding of rocket alignment, distinct from gaps in their understanding of how to reach escape velocity.
  2. 罗马yampolskiy使用了“AI安全工程”一词来参考AI系统的研究,可以提供他们的外部验证安全证明,包括我们将术语“对准研究”的一些理论研究。金宝博娱乐金宝博官方他的用法与这里的使用不同。
  3. Just as calculus is valuable both for building rockets that can reach escape velocity and for directing rockets towards specific lunar coordinates, a formal understanding of logical uncertainty might be useful both for improving AI capabilities and for improving the degree to which we can align powerful AI systems. The main motivation for studying logical uncertainty is that many other AI alignment problems are blocked on models of deductively limited reasoners, in the same way that trajectory-plotting could be blocked on models of curved paths.
  4. In either case, of course, we wouldn’t want to put a moratorium on the space program while we wait for a unified theory of quantum mechanics and general relativity. We don’t need a完美的understanding of gravity.
  5. This was a role historically played by the兰德公司.