Decisions are for making bad outcomes inconsistent

||Conversations

Cheating Death in DamascusNate Soares’ recentdecision theory与本·莱文斯坦(Ben Levinstein)的纸张,“”Cheating Death in Damascus,”提出了熟人的一些宝贵问题和评论(此处匿名化)。我从下面的评论者电子邮件中汇总了Nate的回答。

讨论涉及功能决策理论(FDT),这是因果决策理论(CDT)和证据决策理论(EDT)的新提出的替代方案。EDT说“选择最吉祥的动作”,CDT说“选择具有最佳效果的动作”,FDT说:“选择一个人的决策算法的输出,在该算法的所有实例中都具有最佳的效果。”

FDT通常与CDT相似。我n a one-shot prisoner’s dilemma between two agents who know they are following FDT, however, FDT parts ways with CDT and prescribes cooperation, on the grounds that each agent runs the same decision-making procedure, and that therefore each agent is effectively choosing for both agents at once.1

below, Nate provides some of his own perspective on why FDT generally achieves higher utility than CDT and EDT. Some of the stances he sketches out here are stronger than the assumptions needed to justify FDT, but should shed some light on why researchers at MIRI think FDT can help resolve a number of longstanding puzzles in the foundations of rational action.

匿名的:This is great stuff! I’m behind on reading loads of papers and books for my research, but this came across my path and hooked me, which speaks highly of how interesting is the content and the sense that this paper is making progress.

我的总体看法是,您是正确的,需要更详细地指定这些问题。但是,我的猜测是,一旦您这样做,游戏理论家就会得到正确的答案。也许这就是FDT:这是一种澄清模棱两可的游戏的方法,该游戏导致形式主义,像Pearl和我自己一样,我可以使用我们的标准方法来获得正确的答案。

我know there’s a lot of inertia in the “decision theory” language, so probably it doesn’t make sense to change. But if there were no such sunk costs, I would recommend a different framing. It’s not that people’s decision theories are wrong; it’s that they are unable to correctly formalize problems in which there are high-performance predictors. You show how to do that, using the idea of intervening on (i.e., choosing between putative outputs of) the algorithm, rather than intervening on actions. Everything else follows from a sufficiently precise and non-contradictory statement of the decision problem.

Probably the easiest move this line of work could make to ease this knee-jerk response of mine in defense of mainstream Bayesian game theory is to just be clear that CDT isnot旨在捕捉主流贝叶斯游戏理论。相反,它是对通常不考虑一系列问题的一种响应的模型,而现有方法是模棱两可的。


Nate Soares:我不接受这种观点。我的观点是莱克阀门e: When you add accurate predictors to the Rube Goldberg machine that is the universe — which can in fact be done — the future of that universe can be determined by the behavior of the algorithm being predicted. The algorithm that we put in the “thing-being-predicted” slot can do significantly better if its reasoning on the subject of which actions to output respects the universe’s downstream causal structure (which is something CDT and FDT do, but which EDT neglects), and it can do better again if its reasoning also respects the world’s global logical structure (which is done by FDT alone).

我们一般不知道如何确切尊重这种更广泛的依赖性类别,但是在许多简单情况下,我们确实知道如何做。尽管它在许多简单的情况下与现代决策理论和游戏理论一致,但其处方似乎在非平凡的应用中确实有所不同。

The main case where we can easily see that FDT is not just a better tool for formalizing game theorists’ traditional intuitions is in prisoner’s dilemmas. Game theory is pretty adamant about the fact that it’s rational to defect in a one-shot PD, whereas two FDT agents facing off in a one-shot PD will cooperate.

我n particular, classical game theory employs a “common knowledge of shared rationality” assumption which, when you look closely at it, cashes out more or less as “common knowledge that all parties are using CDT and this axiom.” Game theory where common knowledge of shared rationality is defined to mean “common knowledge that all parties are using FDT andaxiom” gives substantially different results, such as cooperation in one-shot PDs.


大马士革

CDT药物的大马士革死亡的因果图。2

匿名的:When I’ve read MIRI work on CDT in the past, it seemed to me to describe what standard game theorists mean by rationality. But at least in cases like Murder Lesion, I don’t think it’s fair to say that standard game theorists would prescribe CDT. It might be better to say that standard game theory doesn’t consider these kinds of settings, and there are multiple ways of responding to them, CDT being one.

but I also suspect that many of these perfect prediction problems are internally inconsistent, and so it’s irrelevant what CDT would prescribe, since the problem cannot arise. That is, it’s not reasonable to say game theorists would recommend such-and-such in a certain problem, when the problem postulates that the actor always has incorrect expectations; “all agents have correct expectations” is a core property of most game-theoretic problems.

CDT代理的大马士革问题中的死亡就是一个很好的例子。在这个问题中,要么死亡不会确定地找到CDT代理,要么CDT代理人永远不会对自己的行为有正确的信念,或者她将无法最佳地回应自己的信念。

So the problem statement (“Death finds the agent with certainty”) rules out typical assumptions of a rational actor: that it has rational expectations (including about its own behavior), and that it can choose the preferred action in response to its beliefs. The agent can only have correct beliefs if she believes that she has such-and-such belief about which city she’ll end up in, but doesn’t select the action that is the best response to that belief.


Nate:我竞争了最后的主张。麻烦在于“最佳响应”一词,其中您正在使用CDT的概念,即是最佳响应。根据FDT的“最佳反应”概念,如果我们假设我们假设徒步前往阿勒颇,对大马士革问题死亡的信念的最佳反应是留在大马士革。

我n order to define what the best response to a problem is, we normally invoke a notion of counterfactuals — what are your available responses, and what do you think follows from them? But the question of how to set up those counterfactuals is the very point under contention.

So I’ll grant that if you define “best response” in terms of CDT’s counterfactuals, then Death in Damascus rules out the typical assumptions of a rational actor. If you use FDT’s counterfactuals (i.e., counterfactuals that respect the full range of subjunctive dependencies), however, then you get to keep all the usual assumptions of rational actors. We can say that FDT has the pre-theoretic advantage over CDT that it allows agents to exhibit sensible-seeming properties like these in a wider array of problems.


匿名的:The presentation of the Death in Damascus problem for CDT feels weird to me. CDT might also just turn up an error, since one of its assumptions is violated by the problem. Or it might cycle through beliefs forever… The expected utility calculation here seems to give some credence to the possibility of dodging death, which is assumed to be impossible, so it doesn’t seem to me to correctly reason in a CDT way about where death will be.

由于某种原因,我想捍卫CDT代理,并说不公平地说他们不会意识到他们的战略在这个问题上产生了矛盾(考虑到理性信念和代理的假设)。


Nate:There are a few different things to note here. First is that my inclination is always to evaluate CDT as an algorithm: if you built a machine that follows the CDT equation to the very letter, what woulditdo?

The answer here, as you’ve rightly noted above, is that the CDT equation isn’t necessarily defined when the input is a problem like Death in Damascus, and I agree that simple definitions of CDT yield algorithms that would either enter an infinite loop or crash. The third alternative is that the agent notices the difficulty and engages in some sort of reflective-equilibrium-finding procedure; variants of CDT with this sort of patch were invented more or less independently by Joyce and Arntzenius to do exactly that. In the paper, we discuss the variants that run an equilibrium-finding procedure and show that the equilibrium is still unsatisfactory; but we probably should have been more explicit about the fact that vanilla CDT either crashes or loops.

第二,我承认仍然有强劲intuition that an agent should in some sense be able to reflect on their own instability, look at the problem statement, and say, “Aha, I see what’s going on here; Death will find me no matter what I choose; I’d better find some other way to make the decision.” However, this sort of response is明确排除了by the CDT equation: CDT says you must evaluate your actions as if they were subjunctively independent of everything that doesn’t causally depend on them.

我n other words, you’re correct that CDT agents knowintellectuallythat they cannot escape Death, but the CDT equation requires agents to想象that they can, and to act on this basis.

And, to be clear, it is not a strike against an algorithm for it to prescribe making actions by reasoning about impossible scenarios — any deterministic algorithm attempting to reason about what it “should do” must imagine some impossibilities, because a deterministic algorithm has to reason about the consequences of doing lots of different things, but is in fact only going to do one thing.

The question at hand is哪个impossibilities are the right ones to imagine, and the claim is that in scenarios with accurate predictors, CDT prescribes imagining the wrong impossibilities, including impossibilities where it escapes Death.

Our human intuitions say that we should reflect on the problem statement and eventually realize that escaping Death is in some sense “too impossible to consider”. But this directly contradicts the advice of CDT. Following this intuition requires us to make our beliefs obey a logical-but-not-causal constraint in the problem statement (“Death is a perfect predictor”), which FDT agents can do but CDT agents can’t. On close examination, the “shouldn’t CDT realize this is wrong?” intuition turns out to be an argument for FDT in another guise. (Indeed, pursuing this intuition is part of how FDT’s predecessors were discovered!)

Third, I’ll note it’s an important virtue in general for decision theories to be able to reason correctly in the face of apparent inconsistency. Consider the following simple example:

代理商可以选择1美元或拿100美元。宇宙射线会自发地击中试剂的大脑的可能性非常小但非零的可能性,以使它们与他们通常会采取的任何操作相反。如果他们得知自己被宇宙射线击中,那么他们还需要参观急诊室,以确保没有持久的脑部损伤,费用为1000美元。此外,经纪人知道,当他们只有宇宙射线击中时,他们才能花费100美元。

当面对这个问题时,EDT代理商会理由:“如果我花了100美元,那么我一定会被宇宙射线击中,这意味着我在网上损失了900美元。因此,我更喜欢$ 1。”然后,他们以1美元的价格(除了被宇宙射线击中的情况下)。

由于这正是问题陈述所说的:“代理商知道,当且仅当他们被宇宙射线击中时,他们才能花100美元” - 问题是完全一致的,EDT对问题的回答也是如此。EDT仅关心相关性,而不关心依赖性;因此,EDT代理商非常乐意购买自我实现的预言,即使这意味着要拒绝大量资金。

当我们尝试将此技巧拉到CDT代理上时会发生什么?她说:“就像地狱一样,如果我被宇宙射线击中,我只会花100美元!”并抓住了100美元 - 因此,如果代理商运行CDT而不是EDT,则揭示您的问题声明是不一致的。

The claim that “the agent knows that they take the $100 if and only if they are hit by the cosmic ray” contradicts the definition of CDT, which requires that agents of CDT refuse to leave free money on the table. As you may verify, FDT also renders the problem statement inconsistent, for similar reasons. The definition of EDT, on the other hand, is fully consistent with the problem as stated.

This means that if you try to put EDT into the above situation — controlling its behavior by telling it specific facts about itself — you will succeed; whereas if you try to put CDT into the above situation, you will fail, and the supposed facts will be revealed as lies. Whether or not the above problem statement is consistent depends on the algorithm that the agent runs, and the design of the algorithm controls the degree to which you can put that algorithm in bad situations.

We can think of this as a case of FDT and CDT succeeding in making a low-utility universe impossible, where EDT fails to make a low-utility universe impossible. The whole point of implementing a decision theory on a piece of hardware and running it is to make bad futures-of-our-universe impossible (or at least very unlikely). It’s a feature of a decision theory, and not a bug, for there to be some problems where one tries to describe a low-utility state of affairs and the decision theory says, “I’m sorry, but if you run me in that problem, your problem will be revealed as inconsistent”.3

This doesn’t contradict anything you’ve said; I say it only to highlight how little we can conclude from noticing that an agent is reasoning about an inconsistent state of affairs. Reasoning about impossibilities is the mechanism by which decision theories produce actions that force the outcome to be desirable, so we can’t conclude that an agent has been placed in an unfair situation from the fact that the agent is forced to reason about an impossibility.


XOR

CDT代理的XOR勒索问题的因果图。4

匿名的:Something still seems fishy to me about decision problems that assume perfect predictors. If I’m being predicted with 100% accuracy in the XOR blackmail problem, then this means that I can induce a contradiction. If I follow FDT and CDT’s recommendation of never paying, then I only receive a letter when I have termites. But if I pay, then I must be in the world where I don’t have termites, as otherwise there is a contradiction.

因此,考虑到我收到了一封信,我似乎可以以改变白蚁状态的方式干预世界。也就是说,开始时最好的策略是永不付款,但是鉴于我收到一封信的最佳策略是要付款。之所以出现怪异,是因为我能够干预该算法,但是我们正在基于世界的事实依赖我的算法。

不确定这种混乱对您是否有意义。我的直觉说,至少当我们主张100%的预测性能时,这些问题通常是自相矛盾的。我宁愿从ex antesituation, with specified probabilities of getting termites, and see if it is the case that changing one’s strategy (at the algorithm level) is possible without changing the probability of termites to maintain consistency of the prediction claim.


Nate:首先,我会注意到,如果预测仅为99%​​的时间,问题就会正常。如果“白蚁成本”和“付款成本”之间的差异足够高,那么即使预测变量仅在51%的时间内,问题也可能会通过。

也就是说,这个示例的目的是提请注意您在这里提出的一些问题,我认为当我们假设100%的预测准确性时,这些问题更容易考虑。

The claim I dispute is this one: “That is, the best strategy when starting is to never pay, but the best strategy given that I will receive a letter is to pay.” I claim that the best strategy given that you receive the letter is to not pay, because whether you pay has no effect on whether or not you have termites. Whenever you pay, no matter what you’ve learned, you’re basically just burning $1000.

也就是说,这些决策问题的分支机构不一致,尽管我声称任何决策问题都是正确的。在具有确定性代理的确定性宇宙中,所有“可能的行动”“可以采取”保存一个“可能的行动”,因此,如果有足够的正式规范,则所有“可能性”保存的所有“可能性”实际上是不一致的。

我also completely endorse the claim that this set-up allows the predicted agent to induce a contradiction. Indeed, I claim thatalldecision-making power comes from the ability to induce contradictions: the whole reason to write an algorithm that loops over actions, constructs models of outcomes that would follow from those actions, and outputs the action corresponding to the highest-ranked outcome is so that it is contradictory for the algorithm to output a suboptimal action.

这就是计算机程序的全部内容。您以一种方式编写代码,以至于电力流过晶体管的唯一非矛盾方式是使您的计算机完成纳税申报表的方式。

我n the case of the XOR blackmail problem, there are four “possible” worlds:LT(letter+termites),nt(noletter+termites),LN(letter+臭名昭著), andnn(noletter+臭名昭著).

The predictor, by dint of their accuracy, has put the universe into a state where the only consistent possibilities are either (LT, NN) or (LN, NT). You get to choose which of those pairs is consistent and which is contradictory. Clearly, you don’t have control over the probability oftermitesvs.臭名昭著,因此您只能控制是否收到这封信。因此,问题是您是否愿意支付1000美元以确保仅在没有白蚁的世界中显示这封信。

Even when you’re holding the letter in your hands, I claim that you should not say “if I pay I will have no termites”, because that is false — your action can’t affect whether you have termites. You should instead say:

我在这里看到了两种可能性。如果我的算法输出支付,然后在XX在我有白蚁的世界中,我没有收到任何信件,损失了100万美元,在100--XX)% of worlds where I do not have termites I lose $1k. If instead my algorithm outputs拒绝,然后在XX% of worlds where I have termites I get this letter but only lose $1M, and in the other worlds I lose nothing. The latter mixture is preferable, so I do not pay.

You’ll notice that the agent in this line of reasoning is not updating on the fact that they’re holding the letter. They’re not saying, “Given that I know that I received the letter and that the universe is consistent…”

One way to think about this is to imagine the agent as not yet being sure whether or not they’re in a contradictory universe. They act like this might be a world in which they don’t have termites, and they received the letter; and in those worlds, by refusing to pay, they make the world they inhabit inconsistent — and thereby make this very scenario never-have-existed.

And this is correct reasoning! For when the predictor makes their prediction, they’ll visualize a scenario where the agent has no termites and receives the letter, in order to figure out what the agent would do. When the predictor observes that the agent would make that universe contradictory (by refusing to pay), they are bound (by their own commitments, and by their accuracy as a predictor) to send the letter only when you have termites.5

You’ll never find yourself in a contradictory situation in the real world, but when an accurate predictor is trying to figure out what you’ll do, they don’t yet know which situations are contradictory. They’ll therefore imagine you in situations that may or may not turn out to be contradictory (like “letter+臭名昭著”)。你们是否wouldforce the contradiction in those cases determines how the predictor将要behave towards you in fact.

现实世界绝不是矛盾的,但是关于您的预测肯定可以将您置于矛盾的假设中。如果您想强迫某个假设的世界暗示矛盾,那么您必须是那种人would如果有机会,强制矛盾。

Or as I like to say — forcing the contradiction neverworks,但总是would’ve worked,足够。


匿名的:The FDT algorithm is bestex ante。但是,如果您关心的是您在自己的生活中,而不是其他实例化的效用,那么在听到有关FDT的消息后,您应该按照CDT的方式做任何最适合您的事情。


newcomb-fdt

一个因果图Newcomb的问题对于FDT代理。6

Nate:如果您有能力致力于将来的行为(并且实际上坚持这一点),那么现在显然,您的利益是在您将来开始的所有决策问题上都像FDT一样。例如,我自己做出了这一承诺。我也对过去开始的决策问题做出了更强有力的承诺,但是所有CDT代理商都应该在将来开始的问题上同意。7

我确实相信,即使在这些处方相当违反直觉的情况下,像您和我这样的现实世界中的人实际上也可以遵循FDT的处方。

Consider a variant ofNewcomb的问题两个盒子是透明的,这样你可以吗already see whether box B is full before choosing whether to two-box. In this case, EDT joins CDT in two-boxing, because one-boxing can no longer serve to give the agent good news about its fortunes. But FDT agents still one-box, for the same reason they one-box in Newcomb’s original problem and cooperate in the prisoner’s dilemma: they imagine their algorithm controlling all instances of their decision procedure, including thepastcopy in the mind of their predictor.

现在,让我们假设你站在前面f two full boxes in the transparent Newcomb problem. You might say to yourself, “I wish I could have committed beforehand, but now that the choice is before me, the tug of the extra $1000 is just too strong”, and then decide that you were not actually capable of making binding precommitments. This is fine; the normatively correct decision theory might not be something that all human beings have the willpower to follow in real life, just as the correct moral theory could turn out to be something that some people lack the will to follow.8

就是说,believe that I’m quite capable of just acting like I committed to act. I don’t feel a need to go through any particularmental ritualin order to feel comfortable one-boxing. I can justdecide to one-boxand let the matter rest there.

我want to be the kind of agent that sees two full boxes, so that I can walk away rich. I care more about doing what works, and about achieving practical real-world goals, than I care about the intuitiveness of my local decisions. And in this decision problem, FDT agents are the only agents that walk away rich.

的一种方式理解这种推理is that evolution graced me with a “just do what you promised to do” module. The same style of reasoning that allows me to actually follow through and one-box in Newcomb’s problem is the one that allows me to cooperate in prisoner’s dilemmas against myself — including dilemmas like “should I stick to my New Year’s resolution?”9我声称,只有被误导的CDT哲学家(错误地)说“理性”的代理人不允许使用该演化赋予的“只需遵循您的诺言”模块。


匿名的:Afinal point: I don’t know about counterlogicals, but a theory of functional similarity would seem to depend on the details of the algorithms.

E.g., we could have a model where their output is stochastic, but some parameters of that process are the same (such as expected value), and the action is stochastically drawn from some distribution with those parameter values. We could have a version of that, but where the parameter values depend on private information picked up since the algorithms split, in which case each agent would have to model the distribution of private info the other might have.

That seems pretty general; does that work? Is there a class of functional similarity that can not be expressed using that formulation?


Nate:As long as the underlying distribution can be an arbitrary Turing machine, I think that’s sufficiently general.

There are actually a few non-obvious technical hurdles here; namely, if agentAis basing their beliefs off of their model of agentb, who is basing their beliefs off of a model of agentA,那么您可以得到一些奇怪的循环。

Consider for example the matching pennies problem: agentAand agentb将要each place a penny on a table; agentA想要HH或TT,代理商bwants either HT or TH. It’s non-trivial to ensure that both agents develop stable accurate beliefs in games like this (as opposed to, e.g., diving into infinite loops).

对此的技术解决方案是reflective oracle machines, a class of probabilistic Turing machines with access to an oracle that can probabilistically answer questions about any other machine in the class (with access to the same oracle).

The paper “Reflective Oracles: A Foundation for Classical Game Theory” shows how to do this and shows that the relevant fixed points always exist. (And furthermore, in cases that can be represented in classical game theory, the fixed points always correspond to the mixed-strategy Nash equilibria.)

这或多或少使我们从一个说“彼此源代码的概率信息的代理人如何彼此相处稳定的信念?”并使我们获得了游戏理论的“理性常识”公理。10人们还可以将其视为该公理的理由,或者在代理和环境之间的界线变得模糊的情况下,或者在一个代理人具有显着的情况下应采取的措施,该公理的概括是有效的比其他的更多计算资源,等等。

but, yes, when we study these kinds of problems concretely at MIRI, we tend to use models where each agent models the other as a probabilistic Turing machine, which seems roughly in line with what you’re suggesting here.


  1. CDT prescribes defection in this dilemma, on the grounds that one’s action cannot原因the other agent to cooperate. FDT outperforms CDT in Newcomblike dilemmas like these, while also outperforming EDT in other dilemmas, such as the smoking lesion problem and XOR blackmail.
  2. The agent’s predisposition determines whether they will flee to Aleppo or stay in Damascus, and also determines Death’s prediction about their decision. This allows Death to inescapably pursue the agent, making flight pointless; but CDT agents can’t incorporate this fact into their decision-making.
  3. 有一些相当自然的方法可以兑现谋杀病变,其中CDT接受了这个问题,而FDT迫使矛盾,但我们决定不深入研究该论文中的解释。

    切线时,我会注意到,CDT最常见的防御能力之一类似地将某些困境对CDT“不公平”的想法转向。例如,大卫·刘易斯(David Lewis)的“为什么Ain’cha Rich?

    我t’s obviously possible to define decision problems that are “unfair” in the sense that they just reward or punish agents for having a certain decision theory. We can imagine a dilemma where a predictor simply guesses whether you’re implementing FDT, and gives you $1,000,000 if so. Since we can construct symmetric dilemmas that instead reward CDT agents, EDT agents, etc., these dilemmas aren’t very interesting, and can’t help us choose between theories.

    Dilemmas like Newcomb’s problem and Death in Damascus, however, don’t evaluate agents based on theirdecision theories。They evaluate agents based on their动作,决策理论的任务是确定哪种动作是最好的。我f it’s unfair to criticize CDT for making the wrong choice in problems like this, then it’s hard to see on what grounds we can criticize any agent for making a wrong choice in any problem, since one can always claim that one is merely at the mercy of one’s decision theory.

  4. 我们的论文描述了XOR勒索问题:

    An agent has been alerted to a rumor that her house has a terrible termite infestation, which would cost her $1,000,000 in damages. She doesn’t know whether this rumor is true. A greedy and accurate predictor with a strong reputation for honesty has learned whether or not it’s true, and drafts a letter:

    “I know whether or not you have termites, and I have sent you this letter iff exactly one of the following is true: (i) the rumor is false, and you are going to pay me $1,000 upon receiving this letter; or (ii) the rumor is true, and you will not pay me upon receiving this letter.”

    The predictor then predicts what the agent would do upon receiving the letter, and sends the agent the letter iff exactly one of (i) or (ii) is true. Thus, the claim made by the letter is true. Assume the agent receives the letter. Should she pay up?

    在这种情况下,EDT支付了勒索,而CDT和FDT拒绝支付。看到“Cheating Death in Damascus”纸张以获取更多详细信息。

  5. 本·莱文斯坦(Ben Levinstein)指出,这可以与游戏理论中的落后归纳相提并论,并具有理性的常识。您假设您处于某个最终决定节点,如果玩家实际上并不合理,您只能(事实证明)。
  6. FDT agents intervene on their decision function, “FDT(P,G)”. The CDT version replaces this node with “Predisposition” and instead intervenes on “Act”.
  7. Specifically, the CDT-endorsed response here is: “Well, I’ll commit to acting like an FDT agent on future problems, but in one-shot prisoner’s dilemmas that began in my past, I’ll still defect against copies of myself”.

    The problem with this response is that it can cost you arbitrary amounts of utility, provided a clever blackmailer wishes to take advantage. Consider the retrocausal blackmail dilemma in “Toward Idealized Decision Theory“:

    有一个富裕的智能系统和一位诚实的AI研究人员,可以访问该代理的金宝博官方原始源代码。金宝博娱乐研究人员金宝博娱乐可能会部署一种病毒,该病毒将对AI系统和研究人员造成1.5亿美元的损害,并且只有在代理商支付研究人员1亿美元的情况下,才可能停用。金宝博官方研究人员金宝博娱乐规避风险,仅在对代理商会付出的信心中部署病毒。代理商知道这种情况,并且在研究人员获得其原始源代码后,但在研究人员决定是否部署该病毒之前,有机会自我修饰。金宝博娱乐(研究人员金宝博娱乐知道这一点,必须将其纳入他们的预测中。)

    CDT pays the retrocausal blackmailer, even if it has the opportunity to precommit to do otherwise. FDT (which in any case has no need for precommitment mechanisms) refuses to pay. I cite the intuitive undesirability of this outcome to argue that one should follow FDT in full generality, as opposed to following CDT’s prescription that one should only behave in FDT-like ways in future dilemmas.

    The argument above must be made from a pre-theoretic vantage point, because CDT is internally consistent. There is no argument one could give to atrueCDT agent that would cause it to want to use anything other than CDT in decision problems that began in its past.

    如果像倒计时勒索这样的例子具有力量(超出其他论点的FDT力量),那是因为人类不是真正的CDT代理。我们可能会根据其理论和实用美德认可CDT,但是如果我们发现CDT中有足够严重的缺陷,CDT的案例是不诚实的,在CDT中,“缺陷”相对于更基本的直觉是好事或坏事的更基本直觉。FDT比CDT和EDT的优势 - 诸如其更大的理论简单性和一般性及其在标准困境中更大效用的属性属性 - 从不确定性的位置对哪种决策理论是正确的。

  8. 原则上,甚至可以证明,按照正确的决策理论的规定完全不可能。没有逻辑法则说,规范上正确的决策行为必须与任意大脑设计(包括人脑设计)兼容。我不会敢打赌,但是在这种情况下,学习正确的理论仍然具有实用性导入,因为我们仍然可以构建AI系统来遵循规范上正确的理论。金宝博官方
  9. ANew Year’s resolution that requires me to repeatedly follow through on a promise that I care about in the long run, but would prefer to ignore in the moment, can be modeled as a one-shot twin prisoner’s dilemma. In this case, the dilemma is temporally extended, and my “twins” are my own future selves, who I know reason more or less the same way I do.

    可以想象,我今天可以减去饮食(“缺陷”),让我的未来自我抓住我的懈怠并坚持饮食(“合作”),但是在实践中,如果我是kind of agentwho isn’t willing today to sacrifice short-term comfort for long-term well-being, then I presumably won’t be that kind of agent tomorrow either, or the day after.

    Seeing that this is so, and lacking a way to force themselves or their future selves to follow through, CDT agents despair of promise-keeping and abandon their resolutions. FDT agents, seeing the same set of facts, do just the opposite: they resolve to cooperate today, knowing that their future selves will reason symmetrically and do the same.

  10. The paper above shows how to use reflective oracles with CDT as opposed to FDT, because (a) one battle at a time and (b) we don’t yet have a generic algorithm for computing logical counterfactuals, but we do have a generic algorithm for doing CDT-type reasoning.