White House submissions and report on AI safety


In May, the White House Office of Science and Technology Policy (OSTP)announced“a new series of workshops and an interagency working group to learn more about the benefits and risks of artificial intelligence.” They hosted a June Workshop on Safety and Control for AI (视频), along with three other workshops, and issued a general request for information on AI (see MIRI’s primary submission在这里)。

The OSTP has now released a report summarizing its conclusions, “Preparing for the Future of Artificial Intelligence,”结果非常有前途。OSTP承认正在进行的有关AI风险的讨论,并建议“对长期能力进行研究以及如何管理其挑战”:金宝博娱乐

General AI(有时称为人工通用智能或AGI)是指一个概念的未来AI系统,该系统表现出显然聪明的行为,至少在整个认知任务中与一个人一样高级。金宝博官方广泛的鸿沟似乎将当今的狭窄AI与AI将军更加困难的挑战区分开来。在数十年的研究中,通过扩展狭窄的AI解决方案来达到AI将军的尝试几乎没有取得进展。金宝博娱乐NSTC技术委员会同意的私营专家社区目前的共识是,至少几十年来将无法实现AI。14

人们一直猜测的含义computers becoming more intelligent than humans. Some predict that a sufficiently intelligent AI could be tasked with developing even better, more intelligent systems, and that these in turn could be used to create systems with yet greater intelligence, and so on, leading in principle to an “intelligence explosion” or “singularity” in which machines quickly race far ahead of humans in intelligence.15

In a dystopian vision of this process, thesesuper-intelligentmachines would exceed the ability of humanity to understand or control. If computers could exert control over many critical systems, the result could be havoc, with humans no longer in control of their destiny at best and extinct at worst. This scenario has long been the subject of science fiction stories, and recent pronouncements from some influential industry leaders have highlighted these fears.

A more positive view of the future held by many researchers sees instead the development of intelligent systems that work well as helpers, assistants, trainers, and teammates of humans, and are designed to operate safely and ethically.

The NSTC Committee on Technology’s assessment is that long-term concerns about super-intelligent General AI should have little impact on current policy. The policies the Federal Government should adopt in the near-to-medium term if these fears are justified are almost exactly the same policies the Federal Government should adopt if they are not justified. The best way to build capacity for addressing the longer-term speculative risks is to attack the less extreme risks already seen today, such as current security, privacy, and safety risks, while investing in research on longer-term capabilities and how their challenges might be managed. Additionally, as research and applications in the field continue to mature, practitioners of AI in government and business should approach advances with appropriate consideration of the long-term societal and ethical questions – in additional to just the technical questions – that such advances portend. Although prudence dictates some attention to the possibility that harmful superintelligence might someday become possible, these concerns should not be the main driver of public policy for AI.

Later, the report discusses “methods for monitoring and forecasting AI developments”:

一个潜在的有用的研究是适应金宝博娱乐ey expert judgments over time. As one example, a survey of AI researchers found that 80 percent of respondents believed that human-level General AI will eventually be achieved, and half believed it is at least 50 percent likely to be achieved by the year 2040. Most respondents also believed that General AI will eventually surpass humans in general intelligence.50如上所述,尽管这些特定的预测高度不确定,但这种专家判断的调查很有用,尤其是当它们经常重复以衡量随着时间的推移判断的变化时。引起频繁判断的一种方法是举办“预测锦标赛”,例如预测市场,参与者有经济激励措施进行准确的预测。51Other research has found that technology developments can often be accurately predicted by analyzing trends in publication and patent data52。[…]

When asked during the outreach workshops and meetings how government could recognize milestones of progress in the field, especially those that indicate the arrival of General AI may be approaching, researchers tended to give three distinct but related types of answers:

1。Success at broader, less structured tasks: In this view, the transition from present Narrow AI to an eventual General AI will occur by gradually broadening the capabilities of Narrow AI systems so that a single system can cover a wider range of less structured tasks. An example milestone in this area would be a housecleaning robot that is as capable as a person at the full range of routine housecleaning tasks.

2.Unification of different “styles” of AI methods: In this view, AI currently relies on a set of separate methods or approaches, each useful for different types of applications. The path to General AI would involve a progressive unification of these methods. A milestone would involve finding a single method that is able to address a larger domain of applications that previously required multiple methods.

3.解决特定的技术挑战,例如转移学习: In this view, the path to General AI does not lie in progressive broadening of scope, nor in unification of existing methods, but in progress on specific technical grand challenges, opening up new ways forward. The most commonly cited challenge is transfer learning, which has the goal of creating a machine learning algorithm whose result can be broadly applied (or transferred) to a range of new applications.

The report also discusses the open problems outlined in “Concrete Problems in AI Safety” and cites the MIRI paper “著名AI预测的错误,见解和教训 - 以及它们对未来的意义。”

In related news, Barack Obama recently answered some questions about AI risk and Nick Bostrom’sSuperintelligence在一个Wiredinterview。After saying that “we’re still a reasonably long way away” from general AI (video),他对国家安全团队的指示是更多地担心近期安全问题(video), Obama adds:

Now, I think, as a precaution — and all of us have spoken to folks like Elon Musk who are concerned about the superintelligent machine — there’s some prudence in thinking about benchmarks that would indicate some general intelligence developing on the horizon. And if we can see that coming, over the course of three decades, five decades, whatever the latest estimates are — if ever, because there are also arguments that this thing’s a lot more complicated than people make it out to be — then future generations, or our kids, or our grandkids, are going to be able to see it coming and figure it out.

There were also a number of interestingresponses to the OSTP request for information。由于该文档是漫长而未编辑的,因此我对以下AI安全性和长期AI结果进行了一些回答。((Note that MIRI isn’t necessarily endorsing the responses by non-MIRI sources below, and a number of these excerpts are given important nuance by the surrounding text we’ve left out; if a response especially interests you, we recommend reading the original for added context.)

Respondent 77: JoEllen Lukavec Koester, GoodAI

[…]At GoodAI we are investigating suitable meta-objectives that would allow an open-ended, unsupervised evolution of the AGI system as well as guided learning – learning by imitating human experts and other forms of supervised learning. Some of these meta-objectives will be hard-coded from the start, but the system should be also able to learn and improve them on its own, that is, perform meta-learning, such that it learns to learn better in the future.

Teaching the AI system small skills using fine-grained, gradual learning from the beginning will allow us to have more control over the building blocks it will use later to solve novel problems. The system’s behaviour can therefore be more predictable. In this way, we can imprint some human thinking biases into the system, which will be useful for the future value alignment, one of the important aspects of AI safety. […]

Respondent 84: Andrew Critch, MIRI

[…]When we develop powerful reasoning systems deserving of the name “artificial general intelligence (AGI)”, we will need value alignment and/or control techniques that stand up to powerful optimization processes yielding what might appear as “creative” or “clever” ways for the machine to work around our constraints. Therefore, in training the scientists who will eventually develop it, more emphasis is needed on a “security mindset”: namely, to really know that a system will be secure, you need to search creatively for ways in which it might fail. Lawmakers and computer security professionals learn this lesson naturally, from experience with intelligent human adversaries finding loopholes in their control systems. In cybersecurity, it is common to devote a large fraction of R&D time toward actually trying to break into one’s own security system, as a way of finding loopholes.

In my estimation, machine learning researchers currently have less of this inclination than is needed for the safe long-term development of AGI. This can be attributed in part to how the field of machine learning has advanced rapidly of late: via a successful shift of attention toward data-driven (“machine learning”) rather than theoretically-driven (“good old fashioned AI”, “statistical learning theory”) approaches. In data science, it’s often faster to just build something and see what happens than to try to reason from first principles to figure out in advance what will happen. While useful at present, of course we should not approach the final development of super-intelligent machines with the same try-it-and-see methodology, and it makes sense to begin developing a theory now that can be used to reason about a super-intelligent machine in advance of its operation, even in testing phases. […]

受访者90:伊恩·古德福(Ian Goodfellow),Openai

[…]Over the very long term, it will be important to build AI systems which understand and are aligned with their users’ values. We will need to develop techniques to build systems that can learn what we want and how to help us get it without needing specific rules. Researchers are beginning to investigate this challenge; public funding could help the community address the challenge early rather than trying to react to serious problems after they occur. […]

Respondent 94: Manuel Beltran, Boeing

[…]Advances in picking apart the brain will ultimately lead to, at best, partial brain emulation, at worst, whole brain emulation. If we can already model parts of the brain with software, neuromorphic chips, and artificial implants, the path to greater brain emulation is pretty well set. Unchecked, brain emulation will exasperate the Intellectual Divide to the point of enabling the emulation of the smartest, richest, and most powerful people. While not obvious, this will allow these individuals to scale their influence horizontally across time and space. This is not the vertical scaling that an AGI, or Superintelligence can achieve, but might be even more harmful to society because the actual intelligence of these people is limited, biased, and self-serving. Society must prepare for and mitigate the potential for the Intellectual Divide.

((5) The most pressing, fundamental questions in AI research, common to most or all scientific fields include the questions of ethics in pursuing an AGI. While the benefits of narrow AI are self-evident and should not be impeded, an AGI has dubious benefits and ominous consequences. There needs to be long term engagement on the ethical implications of an AGI, human brain emulation, and performance enhancing brain implants. […]

AGI研究界谈到金宝博娱乐的AI将超过人类智力。目前尚不清楚这样一个实体如何评估其创作者。没有弯曲成关于这样一个实体将如何受益或伤害人类的哲学辩论,AGI支持者提出的缓解之一就是,AGI将被教导“喜欢”人类。如果要沿着这些路线完成机器学习,那么AGI研究社区需要培训数据,这些数据可用于教Agi喜欢人类。金宝博娱乐这是一个长期的需求,它将掩盖所有其他活动,并且已经被证明是非常密集的,正如我们从第一个原型AGI中看到的那样,Dr. Kristinn R. Thórisson’s Aera S1at Reykjavik University in Iceland.

受访者97:尼克·博斯特罗姆(Nick Bostrom),人类研究所的未来

[… W]e would like to highlight four “shovel ready” research topics that hold special promise for addressing long term concerns:

Scalable oversight: How can we ensure that learning algorithms behave as intended when the feedback signal becomes sparse or disappears? (SeeChristiano 2016)。Resolving this would enable learning algorithms to behave as if under close human oversight even when operating with increased autonomy.

Interruptibility: How can we avoid the incentive for an intelligent algorithm to resist human interference in an attempt to maximise its future reward? (See our recent progress in collaboration with Google Deepmind in (Orseau & Armstrong 2016)。解决此问题将使我们能够确保即使是高能力AI系统也可以在紧急情况下停止。金宝博官方

奖励黑客:我们如何设计机器学习算法,从而通过将其目标从字面上采用其目标来避免破坏性解决方案?(看Ring & Orseau, 2011)。解决这将阻止算法finding unintended shortcuts to their goal (for example, by causing problems in order to get rewarded for solving them).

Value learning: How can we infer the preferences of human users automatically without direct feedback, especially if these users are not perfectly rational? (SeeHadfield-Menell等人。2016FHI在此问题中解决了这个问题Evans et al. 2016)。Resolving this would alleviate some of the problems above caused by the difficulty of precisely specifying robust objective functions. […]

Respondent 103: Tim Day, the Center for Advanced Technology and Innovation at the U.S. Chamber of Commerce

[…]AI operates within the parameters that humans permit. Hypothetical fears of rogue AI are based on the idea that machines can obtain sentience—a will and consciousness of its own. These suspicions fundamentally misunderstand what Artificial Intelligence is. AI is not a mechanical mystery, rather a human-designed technology that can detect and respond to errors and patterns depending on its operating algorithms and the data set presented to it. It is, however, necessary to scrutinize the way humans, whether through error or malicious intent, can wield AI harmfully. […]

Respondent 104: Alex Kozak, X [formerly Google X]

[…]More broadly, we generally agree that the research topics identified in “Concrete Problems in AI Safety”,“ Google研究人员与行业中的其他人之间的共同出版物是创新者要牢记的金宝博娱乐正确技术挑战,以开发更好,更安全的现实世界中的现实产品:避免负面影响(例如,避免避免在追求追求的系统中扰乱其环境金宝博官方他们的目标),避免奖励黑客攻击(例如,清洁机器人只是掩盖了混乱而不是清洁它们),创建可扩展的监督(即创建足够独立的系统,不需要持续的监督),可以实现安全的探索(即限制探索性的范围金宝博官方系统可能会采用安全域金宝博官方),并从分配转移(即创建能够在训练环境之外运行良好的系统)创造出鲁棒性。[…]

Respondent 105: Stephen Smith, AAAI


There are two key issues with control of autonomous systems: speed and scale. AI-based autonomy makes it possible for systems to make decisions far faster and on a much broader scale than humans can monitor those decisions. In some areas, such as high speed trading in financial markets, we have already witnessed an “arms race” to make decisions as quickly as possible. This is dangerous, and government should consider whether there are settings where decision-making speed and scale should be limited so that people can exercise oversight and control of these systems.

Most AI researchers are skeptical about the prospects of “superintelligent AI”, as put forth in Nick Bostrom’s recent book and reinforced over the past year in the popular media incommentaries by other prominent individuals from non-AI disciplines. Recent AI successes in narrowly structured problems (e.g., IBM’s Watson, Google DeepMind’s Alpha GO program) have led to the false perception that AI systems possess general, transferrable, human-level intelligence. There is a strong need for improving communication to the public and to policy makers about the real science of AI and its immediate benefits to society. AI research should not be curtailed because of false perceptions of threat and potential dystopian futures. […]

As we move toward applying AI systems in more mission critical types of decision-making settings, AI systems must consistently work according to values aligned with prospective human users and society. Yet it is still not clear how to embed ethical principles and moral values, or even professional codes of conduct, into machines. […]

受访者111:Niskanen中心Ryan Hagemann

[…] AI不太可能在结束时预示。例如,目前尚不清楚失控的恶意AI是否是现实世界中的可能性。在没有任何可量化的风险的情况下,政府官员应以令人震惊的术语避免对AI进行框架讨论,这表明存在已知的,而不是完全投机的风险。幻想的世界末日情景属于科幻小说和高中辩论俱乐部,而不是关于现有,平凡和有益技术的严肃政策讨论。我们的已经是“一个充满范围内的人工智能的世界,没有人认识。正如计算机科学家约翰·麦卡锡(John McCarthy)曾经说过的那样:“一旦起作用,没人将其称为AI。”

高级AI的有益后果即将到来,可能是深远的。这些可能的好处的样本包括:改进的诊断和自闭症筛查;通过基因组模式识别预防疾病;桥接遗传学的基因型 - 表型划分,使科学家可以更清楚地了解遗传学与疾病之间的关系,这可能引入了更有效的个性化医疗服务。开发新的方式,以探视和听力受损,以体验视线和声音。可以肯定的是,其中许多发展引发了某些实际,安全和道德问题。但是,开发这些AI申请的私人企业已经在进行认真的努力,以期待和负责任地解决这些问题,以及更具投机性的担忧。

Consider OpenAI, “a non-profit artificial intelligence research company.” OpenAI’s goal “is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.” AI researchers are already thinking deeply and carefully about AI decision-making mechanisms in technologies like driverless cars, despite the fact that many of the most serious concerns about how autonomous AI agents make value-based choices are likely many decades out. Efforts like these showcase how the private sector and leading technology entrepreneurs are ahead of the curve when it comes to thinking about some of the more serious implications of developing true artificial general intelligence (AGI) and artificial superintelligence (ASI). It is important to note, however, that true AGI or ASI are unlikely to materialize in the near-term, and the mere possibility of their development should not blind policymakers to the many ways in which artificial narrow intelligence (ANI) has already improved the lives of countless individuals the world over. Virtual personal assistants, such as Siri and Cortana, or advanced search algorithms, such as Google’s search engine, are good examples of already useful applications of narrow AI. […]

The Future of Life Institute has observed that “our civilization will flourish as long as we win the race between the growing power of technology and the wisdom with which we manage it. In the case of AI technology … the best way to win that race is not to impede the former, but to accelerate the latter, by supporting AI safety research.” Government can play a positive and productive role in ensuring the best economic outcomes from developments in AI by promoting consumer education initiatives. By working with private sector developers, academics, and nonprofit policy specialists government agencies can remain constructively engaged in the AI dialogue, while not endangering ongoing developments in this technology.

Respondent 119: Sven Koenig, ACM Special Interest Group on Artificial Intelligence

[…]The public discourse around safety and control would benefit from demystifying AI. The media often concentrates on the big successes or failures of AI technologies, as well as scenarios conjured up in science fiction stories, and features the opinions of celebrity non-experts about future developments of AI technologies. As a result, parts of the public have developed a fear of AI systems developing superhuman intelligence, whereas most experts agree that AI technologies currently work well only in specialized domains, and notions of “superintelligences” and “technological singularity” that will result in AI systems developing super-human, broadly intelligent behavior is decades away and might never be realized. AI technologies have made steady progress over the years, yet there seem to be waves of exaggerated optimism and pessimism about what they can do. Both are harmful. For example, an exaggerated belief in their capabilities can result in AI systems being used (perhaps carelessly) in situations where they should not, potentially failing to fulfil expectations or even cause harm. The unavoidable disappointment can result in a backlash against AI research, and consequently fewer innovations. […]

Respondent 124: Huw Price, University of Cambridge, UK

[…]3. In his first paper[1]好的试图估计超辉煌机器的经济价值。寻找生产性脑力的基准,他对约翰·梅纳德·凯恩斯(John Maynard Keynes)不利地定居。他指出,凯恩斯对经济的价值估计为1亿英镑,并表明该机器可能是一百万倍的好,正如他所说的那样,这是一个大型钥匙。

4. But there’s a catch. “The sign is uncertain” – in other words, it is not clear whether this huge impact would be negative or positive: “The machines will create social problems, but they might also be able to solve them, in addition to those that have been created by microbes and men.” Most of all, Good insists that these questions need serious thought: “These remarks might appear fanciful to some readers, but to me they seem real and urgent, and worthy of emphasis outside science fiction.” […]

Respondent 136: Nate Soares, MIRI


我们认为,基础研究的有许多有希望的途径,如果成功,可以实现对先进AI系统的行为的强大保证 - 在最成功的机器学习的时候,比许多人认为可能更金宝博娱乐强大金宝博官方技术通常是很熟悉的。我们认为,将研究人员汇集到机器学习,程序验证以及对正规代理的数学研究金宝博娱乐将是确保高级AI系统对社会产生强大的有益影响的重要一步。金宝博官方[…]

In the long term, we recommend that policymakers make use of incentives to encourage designers of AI systems to work together cooperatively, perhaps through multinational and multicorporate collaborations, in order to discourage the development of race dynamics. In light of high levels of uncertainty about the future of AI among experts, and in light of the large potential of AI research to save lives, solve social problems, and serve the common good in the near future, we recommend against broad regulatory interventions in this space. We recommend that effort instead be put towards encouraging interdisciplinary technical research into the AI safety and control challenges that we have outlined above.

Respondent 145: Andrew Kim, Google Inc.

[…]No system is perfect, and errors will emerge. However, advances in our technical capabilities will expand our ability to meet these challenges.

To that end, we believe that solutions to these problems can and should be grounded in rigorous engineering research to provide the creators of these systems with approaches and tools they can use to tackle these problems. “Concrete Problems in AI Safety”, a recent paper from our researchers and others, takes this approach in questions around safety. We also applaud the work of researchers who – along with researchers like Moritz Hardt at Google – are looking at short-term questions of bias and discrimination. […]

Respondent 149: Anthony Aguirre, Future of Life Institute

[…s] AI的OCIETIEN有益的值对准不是自动的。至关重要的是,AI系统的设计金宝博官方不仅是为了制定一组规则,而且是为了以程序员未预先明确指定的方式实现目标。这导致了一种不可预测性,可能导致不利后果。As AI pioneer Stuart Russell explains, “No matter how excellently an algorithm maximizes, and no matter how accurate its model of the world, a machine’s decisions may be ineffably stupid, in the eyes of an ordinary human, if its utility function is not well aligned with human values.” (2015)。

Since humans rely heavily on shared tacit knowledge when discussing their values, it seems likely that attempts to represent human values formally will often leave out significant portions of what we think is important. This is addressed by the classic stories of the genie in the lantern, the sorcerer’s apprentice, and Midas’ touch. Fulfilling the letter of a goal with something far afield from the spirit of the goal like this is known as “perverse instantiation” (Bostrom [2014])。This can occur because the system’s programming or training has not explored some relevant dimensions that we really care about (Russell 2014)。These are easy to miss because they are typically taken for granted by people, and even trying with a lot of effort and a lot of training data, people cannot reliably think of what they’ve forgotten to think about.

在未来一些人工智能系统的复杂性(a金宝博官方nd even now) is likely to exceed human understanding, yet as these systems become more effective we will have efficiency pressures to be increasingly dependent on them, and to cede control to them. It becomes increasingly difficult to specify a set of explicit rules that is robustly in accord with our values, as the domain approaches a complex open world model, operates in the (necessarily complex) real world, and/or as tasks and environments become so complex as to exceed the capacity or scalability of human oversight[.] Thus more sophisticated approaches will be necessary to ensure that AI systems accomplish the goals they are given without adverse side effects. See referencesRussell, Dewey, and Tegmark (2015),,,,Taylor (2016),,,,andAmodei et al.for research threads addressing these issues. […]

We would argue that a “virtuous cycle” has now taken hold in AI research, where both public and private R&D leads to systems of significant economic value, which underwrites and incentivizes further research. This cycle can leave insufficiently funded, however, research on the wider implications of, safety of, ethics of, and policy implications of, AI systems that are outside the focus of corporate or even many academic research groups, but have a compelling public interest. FLI helped to develop a set of suggested “Research Priorities for Robust and Beneficial Artificial Intelligence” along these lines (available athttp://futureoflife.org/data/documents/research_priorities.pdf); we also support AI safety-relevant research agendas from MIRI (//www.gqpatrol.com/files/technicalagenda.pdf),书中建议的那样Amodei et al. (2016)。We would advocate for increased funding of research in the areas described by all of these agendas, which address problems in the following research topics: abstract reasoning about superior agents, ambiguity identification, anomaly explanation, computational humility or non-self-centered world models, computational respect or safe exploration, computational sympathy, concept geometry, corrigibility or scalable control, feature identification, formal verification of machine learning models and AI systems, interpretability, logical uncertainty modeling, metareasoning, ontology identification/ refactoring/alignment, robust induction, security in learning source provenance, user modeling, and values modeling. […]

It’s exciting to see substantive discussion of AGI’s impact on society by the White House. The policy recommendations regarding AGI strike us as reasonable, and we expect these developments to help inspire a much more in-depth and sustained conversation about the future of AI among researchers in the field.