两个新的研究人员金宝博娱乐加入Miri

||News

MIRI’s research team is growing! I’m happy to announce that we’ve hired two new research fellows to contribute to our work onai对齐:Sam Eisenstat和Marcello Herreshoff,都来自谷歌。

以致艾圣斯特山姆以致艾圣斯特山姆studied pure mathematics at the University of Waterloo, where he carried out research in mathematical logic. His previous work was on the automatic construction of deep learning models at Google.

SAM的研究金宝博娱乐重点是与推理和机构基础有关的问题,他特别有兴趣探索当前逻辑不确定性和贝叶斯推理的基础之间的类比。他还完成了决策理论和反事实的工作。他与Miri一起工作包括“Asymptotic Decision Theory那” “限制可计算,自我反光分布那” and “对验证长度和逻辑反射性的非正式猜想的反例。”

Marcello HerreshoffMarcello Herreshoffstudied at Stanford, receiving a B.S. in Mathematics with Honors and getting two honorable mentions in the Putnam Competition, the world’s most highly regarded university-level math competition. Marcello then spent five years as a software engineer at Google, gaining a background in machine learning.

Marcello is one of MIRI’s earliest research collaborators, and attended our very firstresearch workshopalongside Eliezer Yudkowsky, Paul Christiano, and Mihály Bárász. Marcello has worked with us in the past to help produce results such as “Program Equilibrium in the Prisoner’s Dilemma via Löb’s Theorem那” “概率逻辑中的真理绝定那” and “用于自我修改的剪辑代理。” His research interests include logical uncertainty and the design of reflective agents.

Sam and Marcello will be starting with us in the first two weeks of April. This marks the beginning of our first wave of new research fellowships since 2015, though we more recently added Ryan Carey to the team on an assistant research fellowship (in mid-2016).

We have additional plans to expand our research team in the coming months, and will soon be hiring for a more diverse set of technical roles at MIRI — details forthcoming!

2016 in review

||Miri战略

It’s time again for my annual review of MIRI’s activities.1在这篇文章中,我将在2016年提供我们在2016年所做的一套摘要,了解我们的活动如何与先前的目标和预测相比,并反思在过去一年的战略如何适合我们作为组织的使命。我们将在4月份遵循这份发布,2017年的战略更新。

After doubling the size of the research team in 2015,2我们在2016年减缓了我们的增长,并专注于将新增的添加到我们的团队中,进行研究进度,并编写现有结果的积压。金宝博娱乐

2016年为我们的研究前面是我们的一个大年,我们的新研究人员们制造了金宝博娱乐一些最值得注意的贡献。我们最大的新闻是斯科特盖拉德坦的logical inductorsframework, which represents by a significant margin our largest progress to date on the problem of logical uncertainty. We additionally released “高级机器学习系统对齐金宝博官方” (AAMLS), a new technical agenda spearheaded by Jessica Taylor.

我们还花了这个去年与更广泛的AI社区一起参与,例如,通过月长Colloquium Series on Robust and Beneficial Artificial Intelligencewe co-ran with the Future of Humanity Institute, and through talks and participation in panels at many events through the year.

Read more »


  1. See our previous reviews:201520142013
  2. From2015 in review:“帕特里克·拉沃尔雷弗在3月加入,杰西卡泰勒于8月,安德鲁·克鲁斯·克里斯·克鲁斯·克鲁斯和斯科特盖拉德坦特12月。随着Nate转换到非研究角色,总体而言,我们从一个三人研究团队(E金宝博娱乐liezer,Benya和Nate)增长到六个人。“

New paper: “Cheating Death in Damascus”

||Papers

Cheating Death in DamascusMIRI Executive Director Nate Soares and Rutgers/UIUC decision theorist Ben Levinstein have a new paper out introducingfunctional decision theory(FDT), MIRI’s proposal for a general-purpose decision theory.

The paper, titled “Cheating Death in Damascus那” considers a wide range of decision problems. In every case, Soares and Levinstein show that FDT outperforms all earlier theories in utility gained. The abstract reads:

Evidential and Causal Decision Theory are the leading contenders as theories of rational action, but both face fatal counterexamples. We present some new counterexamples, including one in which the optimal action is causally dominated. We also present a novel decision theory, Functional Decision Theory (FDT), which simultaneously solves both sets of counterexamples.

而不是考虑的物理行为eirs would give rise to the best outcomes, FDT agents consider which output of their decision function would give rise to the best outcome. This theory relies on a notion of subjunctive dependence, where multiple implementations of the same mathematical function are considered (even counterfactually) to have identical results for logical rather than causal reasons. Taking these subjunctive dependencies into account allows FDT agents to outperform CDT and EDT agents in, e.g., the presence of accurate predictors. While not necessary for considering classic decision theory problems, we note that a full specification of FDT will require a non-trivial theory of logical counterfactuals and algorithmic similarity.

“Death in Damascus” is a standard decision-theoretic dilemma. In it, a trustworthy predictor (Death) promises to find you and bring your demise tomorrow, whether you stay in Damascus or flee to Aleppo. Fleeing to Aleppo is costly and provides no benefit, since Death, having predicted your future location, will then simply come for you in Aleppo instead of Damascus.

In spite of this, causal decision theory often recommends fleeing to Aleppo — for much the same reason it recommends defecting in the one-shottwin prisoner’s dilemmaand two-boxing inNewcomb’s problem。CDT agents reason that Death has already made its prediction, and that switching cities therefore can’tcauseDeath to learn your new location. Even though the CDT agent recognizes that Death is inescapable, the CDT agent’s decision rule forbids taking this fact into account in reaching decisions. As a consequence, the CDT agent will happily give up arbitrary amounts of utility in a pointless flight from Death.

Causal decision theory fails in Death in Damascus, Newcomb’s problem, and the twin prisoner’s dilemma — and also in the “random coin,” “Death on Olympus,” “asteroids,” and “murder lesion” dilemmas described in the paper — because its counterfactuals only track its actions’ causal impact on the world, and not the rest of the world’s causal (and logical, etc.) structure.

虽然证据决策理论在这些困境中取得成功,但它在一个新的决策问题中失败,“xor勒索”。1FDT consistently outperforms both of these theories, providing an elegant account of normative action for the full gamut of known decision problems.

Read more »


  1. Just as the variants on Death in Damascus in Soares and Levinstein’s paper help clarify CDT’s particular point of failure, XOR blackmail drills down more exactly on EDT’s failure point than past decision problems have. In particular, EDT cannot be modified to avoid XOR blackmail in the ways it can be modified to smoke in the smoking lesion problem.

2017年3月通讯

||新闻稿

金宝博娱乐研究更新

General updates

  • 为何安全?:快速摘要(最初在我们期间发布筹款机) of the case for working on AI risk, including notes on distinctive features of our approach and our goals for the field.
  • 翱翔所以参加了“Envisioning and Addressing Adverse AI Outcomes那” an event pitting red-team attackers against defenders in a variety of AI risk scenarios.
  • 我们也参加了一个人工智能安全战略撤退by the Center for Applied Rationality.

News and links

  • Ray Arnold提供了一个有用的列表ways the average person help with AI safety
  • New from OpenAI:攻击机器学习对抗性示例
  • Openai研金宝博娱乐究员保罗Christianoexplains his view of human intelligence
    I think of my brain as a machine driven by a powerful reinforcement learning agent. The RL agent chooses what thoughts to think, which memories to store and retrieve, where to direct my attention, and how to move my muscles.

    The “I” who speaks and deliberates isimplemented bythe RL agent, but is distinct and has different beliefs and desires. My thoughts are outputs and inputs to the RL agent, they are not what the RL agent “feels like from the inside.”

  • Christiano describes threedirections and desideratafor AI control: reliability and robustness, reward learning, and deliberation and amplification.
  • Sarah Constantin argues that existing techniques不会扩展到人为的一般情报absent major conceptual breakthroughs.
  • The Future of Humanity Institute and the Centre for the Study of Existential Risk ran a “糟糕的演员和ai” workshop.
  • FHI isseeking internsin reinforcement learning and AI safety.
  • Michael Milfordargues againstbrain-computer interfaces as an AI risk strategy.
  • 公开慈善项目头霍尔顿卡诺夫斯基解释了为什么他看到为什么fewer benefits to public discoursethan he used to.

使用机器学习来解决AI风险

||分析Video

EA Global2016 conference, I gave a talk on “Using Machine Learning to Address AI Risk”:

It is plausible that future artificial general intelligence systems will share many qualities in common with present-day machine learning systems. If so, how could we ensure that these systems robustly act as intended? We discuss the technical agenda for a new project at MIRI focused on this question.

A recording of my talk is now up online:

The talk serves as a quick survey (for a general audience) of the kinds of technical problems we’re working on under the “高级ML系统对齐金宝博官方” research agenda. Included below is a version of the talk in blog post form.1

Talk outline:

1。Goal of this research agenda

2。Six potential problems with highly capable AI systems

2。1。Actions are hard to evaluate
2。2。含糊不清的测试例子
2。3.Difficulty imitating human behavior
2。4。Difficulty specifying goals about the real world
2。5.Negative side-effects
2。6.仍然满足目标的边缘案例

3.Technical details on one problem: inductive ambiguity identification

3.1。KWIK learning
3.2。A Bayesian view of the problem

4。Other agendas

Read more »


  1. I also gave这个演讲的一个版本在鲁棒和有益的AI上的Miri / FHI族族。

2017年2月通讯

||新闻稿

Following up on a post outlining some of the reasons MIRI researchers and OpenAI researcher Paul Christiano are追求不同的研究方向金宝博娱乐,杰西卡泰勒已经写了关键motivations for MIRI’s highly reliable agent design research



金宝博娱乐研究更新

General updates

  • We attended the Future of Life Institute’sBeneficial AI conferenceat Asilomar. See Scott Alexander’srecap。MIRI executive director Nate Soares was on a technical safety panel discussion with representatives from DeepMind, OpenAI, and academia (video), also featuring a back-and-forth with Yann LeCun, the head of Facebook’s AI research group (at22:00).
  • MIRI staff and a number of top AI researchers are signatories on FLI’s newasilomar ai原则那which include cautions regarding arms races, value misalignment, recursive self-improvement, and superintelligent AI.
  • The Center for Applied RationalityrecountsMiri研金宝博娱乐究员来源故事和其他案例,他们的研讨会对我们的工作进行了重要帮助,以及CFAR对其他群体的影响的例子。
  • 开放慈善项目已获得32,000美元grant对AI影响。
  • Andrew Critch spoke at Princeton’sENVISIONconference (video).
  • Matthew Graves has joined MIRI as a staff writer. See his first piece for our blog, areply“超智线式:吃聪明人的想法”。
  • The audio version ofRationality: From AI to Zombiesis temporarily unavailable due to the shutdown of Castify. However, fans are already putting together一个新的免费录音of the full collection.

News and links

  • An Asilomar panel on superintelligence (video) gathers Elon Musk (OpenAI), Demis Hassabis (DeepMind), Ray Kurzweil (Google), Stuart Russell and Bart Selman (CHCAI), Nick Bostrom (FHI), Jaan Tallinn (CSER), Sam Harris, and David Chalmers.
  • Also from Asilomar: Russell on corrigibility (video),在AI的开放性上有兴奋剂(video), and LeCun on the path to general AI (video).
  • FromMIT Technology ReviewAI Software Learns to Make AI Software”:
    Companies must currently pay a premium for machine-learning experts, who are in short supply. Jeff Dean, who leads the Google Brain research group, mused last week that some of the work of such workers could be supplanted by software. He described what he termed “automated machine learning” as one of the most promising research avenues his team was exploring.

AI安全的Chcai 金宝博娱乐/ Miri研究实习

||News

We’re looking for talented, driven, and ambitious technical researchers for a summer research internship with the人类兼容的AI中心(CHCAI) and the Machine Intelligence Research Institute (MIRI).

About the research:

CHCAI is a research center based at UC Berkeley with PIs including Stuart Russell, Pieter Abbeel and Anca Dragan. CHCAI describes its goal as “to develop the conceptual and technical wherewithal to reorient the general thrust of AI research towards provably beneficial systems”.

Miri是一家独立的研究非营利组织,位于UC 金宝博娱乐Berkeley校区附近,使命有助于帮助智慧比人类的AI对世界产生积极影响。188bet娱乐城

Chcai的研究金宝博娱乐重点包括逆强化学习和人机合作的工作(link), while MIRI’s focus areas includetask AIand computational reflection (金宝博娱乐 ). Both groups are also interested in theories of (bounded) rationality that may help us develop a deeper understanding of general-purpose AI agents.

申请:

1。Fill in the form here:https://goo.gl/forms/bde6xbbkwj1tgdbo1.

2。Send an email toBeth.M.Barnes@gmail.com.了一个标题为“人工智能安全实习品ation”, attaching your CV, a piece of technical writing on which you were the primary author, and your research proposal.

Read more »

New paper: “Toward negotiable reinforcement learning”

||Papers

Toward Negotiable Reinforcement LearningMIRI Research Fellow Andrew Critch has developed a new result in the theory of conflict resolution, described in “Toward negotiable reinforcement learning: Shifting priorities in Pareto optimal sequential decision-making。”

Abstract:

Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs. Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine’s policy will prioritize each player’s interests over time.

Assuming the players have reached common knowledge of their situation, this paper derives a recursion that any Pareto optimal policy must satisfy. Two qualitative observations can be made from the recursion: the machine must (1) use each player’s own beliefs in evaluating how well an action will serve that player’s utility function, and (2) shift the relative priority it assigns to each player’s expected utilities over time, by a factor proportional to how well that player’s beliefs predict the machine’s inputs. Observation (2) represents a substantial divergence from naïve linear utility aggregation (as in Harsanyi’s utilitarian theorem, and existing MORL algorithms), which is shown here to be inadequate for Pareto optimal sequential decision-making on behalf of players with different beliefs.

Read more »