新论文:“ Cirl框架中的不可验证”
Miri助理研究员瑞安·凯里(金宝博娱乐Ryan Carey)有一份新论文讨论了情况合作逆增强学习(CIRL) tasks fails to imply that software agents will assist or cooperate with programmers.
该纸,标题为“Cirl框架中的不可收拾,”提出了四种情况,其中cirl违反了四个条件corrigibility定义Soares等。(2015)。抽象的:
价值学习系统有激励措施遵循关闭说明金宝博官方,假设关闭指令提供了有关哪些操作导致有价值结果的信息(从技术意义上讲)。但是,此假设对于模拟错误指定的模型并不强(例如,在程序员错误的情况下)。我们通过提出一些有监督的POMDP方案来证明这一点,其中参数化奖励功能中的错误删除了遵循关闭命令的激励措施。这些困难与Soares等人讨论的困难相似。(2015年)在他们的有关科罗的论文中。
我们认为,重要的是要考虑在某些较弱的假设集中遵循关闭命令的系统(例如,正确实现一个小金宝博官方的验证模块;与整个先前的概率分布和/或参数化奖励函数相反)。我们通过简单的方法讨论一些困难,以尝试在价值学习框架中获得这些保证。
该论文是对Hadfield-Menell,Dragan,Abbeel和Russell的论文的回应,“The Off-Switch Game。”Hadfield-Menell等人。表明,当AI系统不确定其奖励功能金宝博官方时,AI系统将对人类的投入更加敏感,并认为其人类操作员对此奖励功能有更多信息。凯里(Carey)表明,CIRL框架可以用来形式化出色的问题,并且在“脱离开关游戏”中给出的CIRL系统的已知保证依赖于具有无错误的CIRL系统的强有力的假设。金宝博官方有了不理想化的假设,价值学习推动者可能具有使其逃避人类重定向的信念。
[T]he purpose of a shutdown button is to shut the AI system downin the event that all other assurances failed,例如,如果AI系统忽略了操作员的指示(出于某种原因)。金宝博官方如果[AI系统]的设计师金宝博官方Rhave programmed the system so perfectly that the prior and [reward function]R是完全免费的bug,那么的定理Hadfield-Menell等人。(2017) do apply. In practice, this means that in order to be corrigible, it would be necessary to have an AI system that was uncertain about all things that could possibly matter. The problem is that performing Bayesian reasoning over all possible worlds and all possible value functions is quite intractable. Realistically, humans will likely have to use a large number of heuristics and approximations in order to implement the system’s belief system and updating rules. […]
Soares等。(2015)seem to want a shutdown button that works as a mechanism of last resort, to shut an AI system down in cases where it has observed and refused a programmer suggestion (and the programmers believe that the system is malfunctioning). Clearly,somepart of the system must be working correctly in order for us to expect the shutdown button to work at all. However, it seems undesirable for the working of the button to depend on there being zero critical errors in the specification of the system’s prior, the specification of the reward function, the way it categorizes different types of actions, and so on. Instead, it is desirable to develop a shutdown module that is small and simple, with code that could ideally be rigorously verified, and which ideally works to shut the system down even in the event of large programmer errors in the specification of the rest of the system.
为了在价值学习框架中执行此操作,我们需要一个价值学习系统,该系统(i)能够将其操作覆盖,该模块被关闭命令的小型验证模块;金宝博官方(ii)没有动力去除,损坏或忽略关闭模块;(iii)有一些小动力来保持其关闭模块;即使在广泛的情况下R, the prior, the set of available actions, etc. are misspecified.
Even if the utility function is learned, there is still a need for additional lines of defense against unintended failures. The hope is that this can be achieved by modularizing the AI system. For that purpose, we would need a model of an agent that will behave corrigibly in a way that is robust to misspecification of other system components.
注册以获取有关新的Miri技术结果的最新信息
每次发布新技术论文时,都会通知。