新论文：“ Cirl框架中的不可验证”

2017年8月31日|马修·格雷夫斯（Matthew Graves）|Papers

Miri助理研究员瑞安·凯里（金宝博娱乐Ryan Carey）有一份新论文讨论了情况合作逆增强学习(CIRL) tasks fails to imply that software agents will assist or cooperate with programmers.

该纸，标题为“Cirl框架中的不可收拾，”提出了四种情况，其中cirl违反了四个条件corrigibility定义Soares等。（2015）。抽象的：

价值学习系统有激励措施遵循关闭说明金宝博官方，假设关闭指令提供了有关哪些操作导致有价值结果的信息（从技术意义上讲）。但是，此假设对于模拟错误指定的模型并不强（例如，在程序员错误的情况下）。我们通过提出一些有监督的POMDP方案来证明这一点，其中参数化奖励功能中的错误删除了遵循关闭命令的激励措施。这些困难与Soares等人讨论的困难相似。（2015年）在他们的有关科罗的论文中。

我们认为，重要的是要考虑在某些较弱的假设集中遵循关闭命令的系统（例如，正确实现一个小金宝博官方的验证模块；与整个先前的概率分布和/或参数化奖励函数相反）。我们通过简单的方法讨论一些困难，以尝试在价值学习框架中获得这些保证。

该论文是对Hadfield-Menell，Dragan，Abbeel和Russell的论文的回应，“The Off-Switch Game。”Hadfield-Menell等人。表明，当AI系统不确定其奖励功能金宝博官方时，AI系统将对人类的投入更加敏感，并认为其人类操作员对此奖励功能有更多信息。凯里（Carey）表明，CIRL框架可以用来形式化出色的问题，并且在“脱离开关游戏”中给出的CIRL系统的已知保证依赖于具有无错误的CIRL系统的强有力的假设。金宝博官方有了不理想化的假设，价值学习推动者可能具有使其逃避人类重定向的信念。

[T]he purpose of a shutdown button is to shut the AI system downin the event that all other assurances failed，例如，如果AI系统忽略了操作员的指示（出于某种原因）。金宝博官方如果[AI系统]的设计师金宝博官方Rhave programmed the system so perfectly that the prior and [reward function]R是完全免费的bug,那么的定理Hadfield-Menell等人。(2017) do apply. In practice, this means that in order to be corrigible, it would be necessary to have an AI system that was uncertain about all things that could possibly matter. The problem is that performing Bayesian reasoning over all possible worlds and all possible value functions is quite intractable. Realistically, humans will likely have to use a large number of heuristics and approximations in order to implement the system’s belief system and updating rules. […]

Soares等。（2015）seem to want a shutdown button that works as a mechanism of last resort, to shut an AI system down in cases where it has observed and refused a programmer suggestion (and the programmers believe that the system is malfunctioning). Clearly,somepart of the system must be working correctly in order for us to expect the shutdown button to work at all. However, it seems undesirable for the working of the button to depend on there being zero critical errors in the specification of the system’s prior, the specification of the reward function, the way it categorizes different types of actions, and so on. Instead, it is desirable to develop a shutdown module that is small and simple, with code that could ideally be rigorously verified, and which ideally works to shut the system down even in the event of large programmer errors in the specification of the rest of the system.

为了在价值学习框架中执行此操作，我们需要一个价值学习系统，该系统（i）能够将其操作覆盖，该模块被关闭命令的小型验证模块；金宝博官方（ii）没有动力去除，损坏或忽略关闭模块；（iii）有一些小动力来保持其关闭模块；即使在广泛的情况下R, the prior, the set of available actions, etc. are misspecified.

Even if the utility function is learned, there is still a need for additional lines of defense against unintended failures. The hope is that this can be achieved by modularizing the AI system. For that purpose, we would need a model of an agent that will behave corrigibly in a way that is robust to misspecification of other system components.

注册以获取有关新的Miri技术结果的最新信息

每次发布新技术论文时，都会通知。

2017年8月通讯

August 16, 2017|罗布·本辛格|消息letters

Research updates

“一种正式的方法来解决逻辑非善良的问题”：我们介绍了我们的工作logical induction在16th Conference on Theoretical Aspects of Rationality and Knowledge。
在IAFF的新事物：Smoking Lesion Steelman;“像这个世界一样，但是…”;杰西卡·泰勒（Jessica Taylor）当前对保罗·克里斯蒂安诺（Paul Christiano）的研究议金宝博娱乐程的想法;关于反事实的开放问题：初学者的简介
“A Game-Theoretic Analysis of The Off-Switch Game”：来金宝博娱乐自澳大利亚国立大学和林克宁大学的研究人员发表了一篇有关Mirix研讨会的新论文。

一般更新

Daniel Dewey of the Open Philanthropy Project写他当前的想法on MIRI’s highly reliable agent design work, with discussion from Nate Soares and others in the comments section.
Sarah Marquart of the Future of Life Institute讨论MIRI’s work on logical inductors, corrigibility, and other topics.
我们参加了Workshop on Decision Theory & the Future of Artificial Intelligence和the第五届战略推理国际研讨会。

新闻和链接

开放菲尔奖一项为期四年的240万美元赠款to Yoshua Bengio’s group at the Montreal Institute for Learning Algorithms “to support technical research on potential risks from advanced artificial intelligence”.
一个新IARPA委托报告讨论了AI加速技术创新并导致“自我增强技术和经济优势”的潜力。该报告表明，AI“有可能是最坏的情况”，即结合高破坏力的潜力，军事/民用双重使用以及监测难度和潜在生产难度的困难。
埃隆·马斯克（Elon Musk）和马克·扎克伯格（Mark Zuckerberg）批评彼此的陈述on AI risk.
中国makes plans对于AI的重大投资（full text,translation note）。
Microsoft opens一个新的AI实验室建立“更一般的人工智能”的目标。
New from the Future of Humanity Institute: “没有错误的试验：通过人类干预进行安全加强学习。”
FHI正在寻找两个研究金宝博娱乐研究员研究AI宏观结构。
丹尼尔·塞尔姆（Daniel Selm）和其他人发行了证书（arXiv,github），一种用金宝博官方于创建正式验证的机器学习系统的系统；请参阅有关黑客新闻的讨论（1,2）。
申请是针对应用理性中心开放的AI夏季研究员计划, which runs September 8–25.

July 2017 Newsletter

2017年7月25日|罗布·本辛格|消息letters

许多专业mid-year MIRI updates: we received our largest donation to date, $1.01 million from an Ethereum investor! Our research priorities have also shifted somewhat, reflecting the addition of four new full-time researchers (Marcello Herreshoff, Sam Eisenstat, Tsvi Benson-Tilsen, and Abram Demski) and the departure of Patrick LaVictoire and Jessica Taylor.

Research updates

在IAFF的新事物：Futarchy Fix,Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima
AI的新事物影响：Some Survey Results!,AI希望和恐惧数量

一般更新

我们参加了有效的利他主义全球波士顿event. Speakers included Allan Dafoe on “The AI Revolution and International Politics” (video) and Jason Matheny on “Effective Altruism in Government” (video）。
Miri Coo Malo Bourgon主持了IEEE workshop修改一个部分道德上的设计。

新闻和链接

新从DeepMind人员:“金宝博娱乐使用认知心理学解释深层神经网络”
来自Openai研究人员的新事物金宝博娱乐：“”Corrigibility”
DeepMind和Openai之间的合作：“”向人类的偏好学习”
深度学习的最新进展：“Self-Normalizing Neural Networks”
从Ian Goodfellow and Nicolas Papernot: “验证和测试机器学习的挑战”
从80,000小时开始：guide to working in AI policy and strategy和一个相关Miles Brundage的采访of the Future of Humanity Institute.

对研究团队的更新和主要捐款金宝博娱乐

July 4, 2017|马洛·布尔贡|消息

We have several major announcements to make, covering new developments in the two months since our2017策略更新:

1. On May 30th, we received a surprise捐款101万美元从一个以太坊加密货币投资者。这是我们迄今为止收到的最大贡献，将对来年的计划产生重大影响。

2.两名新的全日制研究人员金宝博娱乐are joining MIRI: Tsvi Benson-Tilsen and Abram Demski. This comes in the wake of Sam Eisenstat and Marcello Herreshoff’s addition to the teamin May。我们还开始与工程师进行试用，为我们的新板块合作software engineer job openings。

3.我们的两个研究人员最近金宝博娱乐离开了：Patrick Lavictoire和Jessica Taylor，研究人员以前金宝博娱乐领导着我们的“Alignment for Advanced Machine Learning Systems”研金宝博娱乐究议程。

有关更多详细信息，请参见下文。

阅读更多 ”

June 2017 Newsletter

2017年6月16日|罗布·本辛格|消息letters

Research updates

一个新人工智能影响paper: “AI何时会超过人类表现？”新闻报道Digital Trends和MIT Technology Review。
在IAFF的新事物：Cooperative Oracles;Jessica Taylor on the AAMLS Agenda;逻辑上无更新决策的方法
我们的2014年技术议程，“使机器智能与人类利益对齐的代理基础，”现在可以作为书籍的书籍章节获得The Technological Singularity: Managing the Journey。

一般更新

readthesequences.com：支持者汇总了Eliezer Yudkowsky的网络版本理性：从AI到僵尸。
牛津优先项目发布美里工作的模型as an existential risk intervention.

新闻和链接

从MIT Technology Review：“为什么Google的首席执行官对自动化人工智能感到兴奋。”
一个新alignment paper from researchers at Australian National University and DeepMind: “Reinforcement Learning with a Corrupted Reward Channel。”
New from OpenAI:基线，用于再现增强学习算法的工具。
TheFuture of Humanity Institute和Centre for the Future of Intelligencejoin the Partnership on AI alongsidetwenty other groups。
New AI safety职位发布包括研究角色金宝博娱乐Future of Humanity Institute和theCenter for Human-Compatible AI, as well as aUCLA PULSE fellowship用于研究AI的潜在大规模后果以及适当的准备和反应。

2017年5月通讯

2017年5月10日|罗布·本辛格|消息letters

Research updates

在IAFF的新事物：The Ubiquitous Converse Lawvere Problem;逻辑电感器决策理论的两个主要障碍;决策理论的概括II。
AI的新事物影响：Guide to Pages on AI Timeline Predictions
“决策是使结果不一致”：Nate Soares关于我们“我们”提出的一些更深层次的问题的对话大马士革作弊死亡” paper.
我们运行了机器学习作坊in early April.
“确保比人类智能更聪明”：Nate在Google的演讲（video）可能是Miri在AI对齐方面的工作的最佳一般介绍。

一般更新

我们的策略更新讨论了我们的AI预测和研究重点，新的外展目标，Miri/DeepMind合作以及其他金宝博娱乐新闻的变化。
MIRI is hiring software engineers!如果您是对Miri的使命充满热情并希望直接支持我们的研究工作的程序员，金宝博娱乐apply hereto trial with us.
MIRI Assistant Research Fellow Ryan Carey has taken on an additionalaffiliationwith the Centre for the Study of Existential Risk, and is also helping edit an issue ofInformatica在超级智能上。

新闻和链接

DeepMind研金宝博娱乐究员Viktoriya Krakovna列表ICLR的安全亮点。
Deepmind是seeking applicants对于政策研究立场，“对AI金宝博娱乐的社会和经济影响进行研究”。
与人类兼容的AI中心正在雇用助理董事。Interested parties may also wish to apply for the事件协调员position at the new Berkeley Existential Risk Initiative, which will help support work at CHAI and elsewhere.
80,000小时列出其他潜在的高影响力开口，包括斯坦福大学AI索引项目的项目，White House OSTP,iarpa，和IVADO。
新论文：“”One-Shot Imitation Learning“ 和 ”随机梯度下降作为近似贝叶斯推断。”
The Open Philanthropy Project summarizes its findings onearly field growth。
有效利他主义中心正在为有效的利他主义资金在一系列原因区域。

2017年更新和策略

April 30, 2017|罗布·本辛格|MIRI Strategy

在我们的最后策略更新中（August 2016），内特写道，美里的优先事项是在我们的代理基础agenda and begin work on our new “Alignment for Advanced Machine Learning Systems议程，与其他研究人员合作和沟通，并发展我们的研究和OPS团队。金宝博娱乐

从那以后，Miri的高级职员重新评估了他们对离距离的看法人工通用情报(AGI) is and concluded that shorter timelines are more likely than they were previously thinking. A few lines of recent evidence point in this direction, such as:¹

人工智金宝博娱乐能研究变得越来越令人兴奋，well-funded。This suggests that more top talent (in the next generation as well as the current generation) will probably turn their attention to AI.
Agi吸引了更多的学术关注，这是最高AI组的既定目标DeepMind,Openai，和公平的。In particular, many researchers seem more open to thinking about general intelligence now than they did a few years ago.
金宝博娱乐与AGI相关的研究小组显示出更清晰的外部标志盈利能力。
AI成功Alphagoindicate that it’s easier to outperform top humans in domains like Go (without any new conceptual breakthroughs) than might have been expected.²This lowers our estimate for the number of significant conceptual breakthroughs needed to rival humans in other domains.

There’s no consensus among MIRI researchers on how long timelines are, and our aggregated estimate puts medium-to-high probability on scenarios in which the research community hasn’t developed AGI by, e.g., 2035. On average, however, research staff now assign moderately higher probability to AGI’s being developed before 2035 than we did a year or two ago. This has a few implications for our strategy:

1.我们与当前在AGI安全和能力方面的主要参与者的关系在我们的战略思维中起着更大的作用。短时间的场景减少了在我们击中AGI之前进入该空间的重要新玩家的预期数量，并增加了当前玩家可能产生的影响。

2.我们的研究金宝博娱乐优先事项有所不同，因为较短的时间表会改变我们在击中AGI之前可能需要支付的方法，还将我们的概率质量更多地集中在AGI与当今机器学习系统共有各种特征的方案上。金宝博官方

Both updates represent directions we’ve already been trending in for various reasons.³However, we’re moving in these two directions more quickly and confidently than we were last year. As an example, Nate is spending less time on staff management and other administrative duties than in the past (having handed these off to MIRI COO Malo Bourgon) and less time on broad communications work (having delegated a fair amount of this to me), allowing him to spend more time on object-level research, research prioritization work, and more targeted communications.⁴

我将在下面的更具体的细节中列出这些更新对我们的计划意味着什么。

阅读更多 ”

请注意，此列表远非详尽。↩
Relatively general algorithms (plus copious compute) were able to surpass human performance on Go, going from incapable of winning against the worst human professionals in standard play to主导最好的专业人士在几个月的空间中。The relevant development here wasn’t “AlphaGo represents a large conceptual advance over previously known techniques,” but rather “contemporary techniques run into surprisingly few obstacles when scaled to tasks as pattern-recognition-reliant and difficult (for humans) as professional Go”.↩
出版AI安全中的具体问题” last year, for example, caused us to reduce the time we were spending on broad-based outreach to the AI community at large in favor of spending more time building stronger collaborations with researchers we knew at OpenAI, Google Brain, DeepMind, and elsewhere.↩
Nate continues to set MIRI’s organizational strategy, and is responsible for the ideas in this post.↩

软件工程师实习 /员工空缺

April 30, 2017|亚历克斯·威默尔（Alex Vermeer）|消息

机器情报研究所正在寻找高功能强大的软件工程师，以直金宝博娱乐接支持我们AI alignmentresearch efforts, with a focus on projects related to machine learning. We’re seeking engineers with strong programming skills who are passionate about MIRI’s mission and looking for challenging and intellectually engaging work.

While our goal is to hire full-time, we are initially looking for paid interns. Successful internships may then transition into staff positions.

关于实习计划

实习生的开始时间很灵活，但我们的目标是五月或六月。我们可能会经营几批实习，因此，如果您有兴趣但无法在接下来的几个月内开始，则仍然适用。实习的时间很灵活，但我们的目标是2-3个月。

您在实习期间将要做的工作的示例：

复制最近的机器学习论文并实施变化。
了解并实施机器学习工具（包括深度学习，凸优化等领域的结果）。
独立或小组中运行各种编码实验和项目。
快速原型，实施和测试与机器学习有关的AI对齐思想（在上述要点中取得成功之后）。

对于Miri来说，该计划的好处是，这是了解您并评估您的潜在雇用的好方法。对于申请人而言，好处是，这是一个绝佳的机会，可以使您的手变得肮脏并提高机器学习技能，并进入AI安全领域的前沿，并有可能留任全职工程角色实习结束后。

我们的goal is to trial many more people than we expect to hire, so our threshold for keeping on engineers long-term as full staff will be higher than for accepting applicants to our internship.

理想的候选人

理想候选人的某些素质：

编程技能的广泛广度和深度。尽管这是加号，但不需要机器学习经验。
非常熟悉与AI一致性有关的基本思想。
Able to work independently with minimal supervision, and in team/group settings.
愿意接受低于市场的利率。由于Miri是一个非营利组织，因此我们无法与海湾地区的知名人士竞争。
热衷于在Miri工作并帮助AI Alignment的领域。
不寻找“通用”软件工程位置。

在Miri工作

We strive to make working at MIRI a rewarding experience.

现代工作空间 - 我们中的许多人都有可调节的站台，并带有大型外部监视器。我们认为工作空间人体工程学很重要，并尝试操纵工作站以使其尽可能舒适。我们的办公室还提供免费的零食，饮料和餐点。
灵活的时间 - 我们没有严格的办公时间，也不限制员工的假期。我们的目标是在我们的研究议程上取得迅速的进步，我们希望工作人员休假一天，而不是他们扩金宝博娱乐大任务以填补额外的一天。
居住在湾区 - 美里办公室位于加利福尼亚州伯克利市中心。从我们的办公室，您可以步行30秒即可到达BART（湾区快速运输），这可以使您绕过湾区。步行3分钟即可到达加州大学伯克利分校校园；188bet娱乐城还有30分钟的巴特乘车前往旧金山市中心。

EEO和就业资格

MIRI is an equal opportunity employer. We are committed to making employment decisions based on merit and value. This commitment includes complying with all federal, state, and local laws. We desire to maintain a work environment free of harassment or discrimination due to sex, race, religion, color, creed, national origin, sexual orientation, citizenship, physical or mental disability, marital status, familial status, ethnicity, ancestry, status as a victim of domestic violence, age, or any other status protected by federal, state, or local laws.

Apply

如果有兴趣，click here to apply。有关问题或评论，请发送电子邮件工程@www.gqpatrol.com。

更新（2017年12月）：我们现在更加强调寻找实习生，并寻找可用于全职工作的高技能工程师。Updated job post here.

新论文：“ Cirl框架中的不可验证”

注册以获取有关新的Miri技术结果的最新信息

2017年8月通讯

July 2017 Newsletter

对研究团队的更新和主要捐款金宝博娱乐

June 2017 Newsletter

2017年5月通讯

2017年更新和策略

软件工程师实习 /员工空缺

关于实习计划

理想的候选人

在Miri工作

EEO和就业资格

Apply

Search

浏览

订阅