新论文:“ Cirl框架中的不可验证”

||Papers

Cirl框架中的不可收拾

Miri助理研究员瑞安·凯里(金宝博娱乐Ryan Carey)有一份新论文讨论了情况合作逆增强学习(CIRL) tasks fails to imply that software agents will assist or cooperate with programmers.

该纸,标题为“Cirl框架中的不可收拾,”提出了四种情况,其中cirl违反了四个条件corrigibility定义Soares等。(2015)。抽象的:

价值学习系统有激励措施遵循关闭说明金宝博官方,假设关闭指令提供了有关哪些操作导致有价值结果的信息(从技术意义上讲)。但是,此假设对于模拟错误指定的模型并不强(例如,在程序员错误的情况下)。我们通过提出一些有监督的POMDP方案来证明这一点,其中参数化奖励功能中的错误删除了遵循关闭命令的激励措施。这些困难与Soares等人讨论的困难相似。(2015年)在他们的有关科罗的论文中。

我们认为,重要的是要考虑在某些较弱的假设集中遵循关闭命令的系统(例如,正确实现一个小金宝博官方的验证模块;与整个先前的概率分布和/或参数化奖励函数相反)。我们通过简单的方法讨论一些困难,以尝试在价值学习框架中获得这些保证。

该论文是对Hadfield-Menell,Dragan,Abbeel和Russell的论文的回应,“The Off-Switch Game。”Hadfield-Menell等人。表明,当AI系统不确定其奖励功能金宝博官方时,AI系统将对人类的投入更加敏感,并认为其人类操作员对此奖励功能有更多信息。凯里(Carey)表明,CIRL框架可以用来形式化出色的问题,并且在“脱离开关游戏”中给出的CIRL系统的已知保证依赖于具有无错误的CIRL系统的强有力的假设。金宝博官方有了不理想化的假设,价值学习推动者可能具有使其逃避人类重定向的信念。

[T]he purpose of a shutdown button is to shut the AI system downin the event that all other assurances failed,例如,如果AI系统忽略了操作员的指示(出于某种原因)。金宝博官方如果[AI系统]的设计师金宝博官方Rhave programmed the system so perfectly that the prior and [reward function]R是完全免费的bug,那么的定理Hadfield-Menell等人。(2017) do apply. In practice, this means that in order to be corrigible, it would be necessary to have an AI system that was uncertain about all things that could possibly matter. The problem is that performing Bayesian reasoning over all possible worlds and all possible value functions is quite intractable. Realistically, humans will likely have to use a large number of heuristics and approximations in order to implement the system’s belief system and updating rules. […]

Soares等。(2015)seem to want a shutdown button that works as a mechanism of last resort, to shut an AI system down in cases where it has observed and refused a programmer suggestion (and the programmers believe that the system is malfunctioning). Clearly,somepart of the system must be working correctly in order for us to expect the shutdown button to work at all. However, it seems undesirable for the working of the button to depend on there being zero critical errors in the specification of the system’s prior, the specification of the reward function, the way it categorizes different types of actions, and so on. Instead, it is desirable to develop a shutdown module that is small and simple, with code that could ideally be rigorously verified, and which ideally works to shut the system down even in the event of large programmer errors in the specification of the rest of the system.

为了在价值学习框架中执行此操作,我们需要一个价值学习系统,该系统(i)能够将其操作覆盖,该模块被关闭命令的小型验证模块;金宝博官方(ii)没有动力去除,损坏或忽略关闭模块;(iii)有一些小动力来保持其关闭模块;即使在广泛的情况下R, the prior, the set of available actions, etc. are misspecified.

Even if the utility function is learned, there is still a need for additional lines of defense against unintended failures. The hope is that this can be achieved by modularizing the AI system. For that purpose, we would need a model of an agent that will behave corrigibly in a way that is robust to misspecification of other system components.

注册以获取有关新的Miri技术结果的最新信息

每次发布新技术论文时,都会通知。

2017年8月通讯

||消息letters

Research updates

一般更新

新闻和链接

July 2017 Newsletter

||消息letters

许多专业mid-year MIRI updates: we received our largest donation to date, $1.01 million from an Ethereum investor! Our research priorities have also shifted somewhat, reflecting the addition of four new full-time researchers (Marcello Herreshoff, Sam Eisenstat, Tsvi Benson-Tilsen, and Abram Demski) and the departure of Patrick LaVictoire and Jessica Taylor.

Research updates

一般更新

新闻和链接

对研究团队的更新和主要捐款金宝博娱乐

||消息

We have several major announcements to make, covering new developments in the two months since our2017策略更新:

1. On May 30th, we received a surprise捐款101万美元从一个以太坊加密货币投资者。这是我们迄今为止收到的最大贡献,将对来年的计划产生重大影响。

2.两名新的全日制研究人员金宝博娱乐are joining MIRI: Tsvi Benson-Tilsen and Abram Demski. This comes in the wake of Sam Eisenstat and Marcello Herreshoff’s addition to the teamin May。我们还开始与工程师进行试用,为我们的新板块合作software engineer job openings

3.我们的两个研究人员最近金宝博娱乐离开了:Patrick Lavictoire和Jessica Taylor,研究人员以前金宝博娱乐领导着我们的“Alignment for Advanced Machine Learning Systems”研金宝博娱乐究议程。

有关更多详细信息,请参见下文。


阅读更多 ”

June 2017 Newsletter

||消息letters


Research updates

一般更新

新闻和链接

2017年5月通讯

||消息letters

Research updates

一般更新

  • 我们的策略更新讨论了我们的AI预测和研究重点,新的外展目标,Miri/DeepMind合作以及其他金宝博娱乐新闻的变化。
  • MIRI is hiring software engineers!如果您是对Miri的使命充满热情并希望直接支持我们的研究工作的程序员,金宝博娱乐apply hereto trial with us.
  • MIRI Assistant Research Fellow Ryan Carey has taken on an additionalaffiliationwith the Centre for the Study of Existential Risk, and is also helping edit an issue ofInformatica在超级智能上。

新闻和链接

2017年更新和策略

||MIRI Strategy

在我们的最后策略更新中(August 2016),内特写道,美里的优先事项是在我们的代理基础agenda and begin work on our new “Alignment for Advanced Machine Learning Systems议程,与其他研究人员合作和沟通,并发展我们的研究和OPS团队。金宝博娱乐

从那以后,Miri的高级职员重新评估了他们对离距离的看法人工通用情报(AGI) is and concluded that shorter timelines are more likely than they were previously thinking. A few lines of recent evidence point in this direction, such as:1

  • 人工智金宝博娱乐能研究变得越来越令人兴奋,well-funded。This suggests that more top talent (in the next generation as well as the current generation) will probably turn their attention to AI.
  • Agi吸引了更多的学术关注,这是最高AI组的既定目标DeepMind,Openai, 和公平的。In particular, many researchers seem more open to thinking about general intelligence now than they did a few years ago.
  • 金宝博娱乐与AGI相关的研究小组显示出更清晰的外部标志盈利能力。
  • AI成功Alphagoindicate that it’s easier to outperform top humans in domains like Go (without any new conceptual breakthroughs) than might have been expected.2This lowers our estimate for the number of significant conceptual breakthroughs needed to rival humans in other domains.

There’s no consensus among MIRI researchers on how long timelines are, and our aggregated estimate puts medium-to-high probability on scenarios in which the research community hasn’t developed AGI by, e.g., 2035. On average, however, research staff now assign moderately higher probability to AGI’s being developed before 2035 than we did a year or two ago. This has a few implications for our strategy:

1.我们与当前在AGI安全和能力方面的主要参与者的关系在我们的战略思维中起着更大的作用。短时间的场景减少了在我们击中AGI之前进入该空间的重要新玩家的预期数量,并增加了当前玩家可能产生的影响。

2.我们的研究金宝博娱乐优先事项有所不同,因为较短的时间表会改变我们在击中AGI之前可能需要支付的方法,还将我们的概率质量更多地集中在AGI与当今机器学习系统共有各种特征的方案上。金宝博官方

Both updates represent directions we’ve already been trending in for various reasons.3However, we’re moving in these two directions more quickly and confidently than we were last year. As an example, Nate is spending less time on staff management and other administrative duties than in the past (having handed these off to MIRI COO Malo Bourgon) and less time on broad communications work (having delegated a fair amount of this to me), allowing him to spend more time on object-level research, research prioritization work, and more targeted communications.4

我将在下面的更具体的细节中列出这些更新对我们的计划意味着什么。

阅读更多 ”


  1. 请注意,此列表远非详尽。
  2. Relatively general algorithms (plus copious compute) were able to surpass human performance on Go, going from incapable of winning against the worst human professionals in standard play to主导最好的专业人士在几个月的空间中。The relevant development here wasn’t “AlphaGo represents a large conceptual advance over previously known techniques,” but rather “contemporary techniques run into surprisingly few obstacles when scaled to tasks as pattern-recognition-reliant and difficult (for humans) as professional Go”.
  3. 出版AI安全中的具体问题” last year, for example, caused us to reduce the time we were spending on broad-based outreach to the AI community at large in favor of spending more time building stronger collaborations with researchers we knew at OpenAI, Google Brain, DeepMind, and elsewhere.
  4. Nate continues to set MIRI’s organizational strategy, and is responsible for the ideas in this post.

软件工程师实习 /员工空缺

||消息

机器情报研究所正在寻找高功能强大的软件工程师,以直金宝博娱乐接支持我们AI alignmentresearch efforts, with a focus on projects related to machine learning. We’re seeking engineers with strong programming skills who are passionate about MIRI’s mission and looking for challenging and intellectually engaging work.

While our goal is to hire full-time, we are initially looking for paid interns. Successful internships may then transition into staff positions.

关于实习计划

实习生的开始时间很灵活,但我们的目标是五月或六月。我们可能会经营几批实习,因此,如果您有兴趣但无法在接下来的几个月内开始,则仍然适用。实习的时间很灵活,但我们的目标是2-3个月。

您在实习期间将要做的工作的示例:

  • 复制最近的机器学习论文并实施变化。
  • 了解并实施机器学习工具(包括深度学习,凸优化等领域的结果)。
  • 独立或小组中运行各种编码实验和项目。
  • 快速原型,实施和测试与机器学习有关的AI对齐思想(在上述要点中取得成功之后)。

对于Miri来说,该计划的好处是,这是了解您并评估您的潜在雇用的好方法。对于申请人而言,好处是,这是一个绝佳的机会,可以使您的手变得肮脏并提高机器学习技能,并进入AI安全领域的前沿,并有可能留任全职工程角色实习结束后。

我们的goal is to trial many more people than we expect to hire, so our threshold for keeping on engineers long-term as full staff will be higher than for accepting applicants to our internship.

理想的候选人

理想候选人的某些素质:

  • 编程技能的广泛广度和深度。尽管这是加号,但不需要机器学习经验。
  • 非常熟悉与AI一致性有关的基本思想。
  • Able to work independently with minimal supervision, and in team/group settings.
  • 愿意接受低于市场的利率。由于Miri是一个非营利组织,因此我们无法与海湾地区的知名人士竞争。
  • 热衷于在Miri工作并帮助AI Alignment的领域。
  • 不寻找“通用”软件工程位置。

在Miri工作

We strive to make working at MIRI a rewarding experience.

  • 现代工作空间 - 我们中的许多人都有可调节的站台,并带有大型外部监视器。我们认为工作空间人体工程学很重要,并尝试操纵工作站以使其尽可能舒适。我们的办公室还提供免费的零食,饮料和餐点。
  • 灵活的时间 - 我们没有严格的办公时间,也不限制员工的假期。我们的目标是在我们的研究议程上取得迅速的进步,我们希望工作人员休假一天,而不是他们扩金宝博娱乐大任务以填补额外的一天。
  • 居住在湾区 - 美里办公室位于加利福尼亚州伯克利市中心。从我们的办公室,您可以步行30秒即可到达BART(湾区快速运输),这可以使您绕过湾区。步行3分钟即可到达加州大学伯克利分校校园;188bet娱乐城还有30分钟的巴特乘车前往旧金山市中心。

EEO和就业资格

MIRI is an equal opportunity employer. We are committed to making employment decisions based on merit and value. This commitment includes complying with all federal, state, and local laws. We desire to maintain a work environment free of harassment or discrimination due to sex, race, religion, color, creed, national origin, sexual orientation, citizenship, physical or mental disability, marital status, familial status, ethnicity, ancestry, status as a victim of domestic violence, age, or any other status protected by federal, state, or local laws.

Apply

如果有兴趣,click here to apply。有关问题或评论,请发送电子邮件工程@www.gqpatrol.com

更新(2017年12月):我们现在更加强调寻找实习生,并寻找可用于全职工作的高技能工程师。Updated job post here.