The following is a fictional dialogue building off ofAI对齐:为什么很难以及从哪里开始

(AMBER, a philanthropist interested in a more reliable Internet, andCORAL, a computer security professional, are at a conference hotel together discussing what Coral insists is a difficult and important issue: the difficulty of building “secure” software.)

AMBER:所以, Coral, I understand that you believe it is very important, when creating software, to make that software be what you call “secure”.

CORAL:Especially if it’s connected to the Internet, or if it controls money or other valuables. But yes, that’s right.


CORAL:That’s a deep question, but to give a partial deep answer: When you expose a device to the Internet, you’re potentially exposing it to intelligent adversaries who can find special, weird interactions with the system that make the pieces behave in weird ways that the programmers did not think of. When you’re dealing with that kind of problem, you’ll use a different set of methods and tools.

AMBER:Any system that crashes is behaving in a way the programmer didn’t expect, and programmers already need to stop that from happening. How is this case different?

CORAL:Okay, so… imagine that your system is going to take in one kilobyte of input per session. (Although that itself is the sort of assumption we’d question and ask what happens if it gets a megabyte of input instead—but never mind.) If the input is one kilobyte, then there are 28,000可能的输入,或大约102,400或者。同样,为了扩展简单的可视化,请想象计算机每秒获得十亿个输入。假设只有一个Googol,10100, out of the 102,400possible inputs, cause the system to behave a certain way the original designer didn’t intend.


AMBER:所以you’re saying that it’s more difficult because the programmer is pitting their wits against an adversary who may be more intelligent than themselves.

CORAL:这是一种几乎右翼的方式。重要的不是“对手”部分是优化部分。有系统的非随机力量强金宝博官方烈选择特定结果,导致系统的部分沿着怪异的执行路径降低并占据意外的状态。如果您的系统从字金宝博官方面上根本没有行为不当模式,那么您是否拥有IQ 140,而敌人拥有IQ 160,这不是一场武器竞争。当怪异的状态以相关的方式选择而不是仅出于意外发生时,就很难建立一个不会进入怪金宝博官方异状态的系统。怪异的选择力可以搜索您自己无法想象的较大状态空间的部分。击败确实需要新技能和不同的思维方式,布鲁斯·施耐(Bruce Schneier)称之为“安全思维方式”。




CORAL:lots of programmers have the ability to imagine adversaries trying to threaten them. They imagine how likely it is that the adversaries are able to attack them a particular way, and then they try to block off the adversaries from threatening that way. Imagining attacks, including weird or clever attacks, and parrying them with measures you imagine will stop the attack; that is ordinary paranoia.



AMBER:Can you give me an example of a difference?

CORAL:An ordinary paranoid programmer imagines that an adversary might try to read the file containing all the usernames and passwords. They might try to store the file in a special, secure area of the disk or a special subpart of the operating system that’s supposed to be harder to read. Conversely, somebody with security mindset thinks, “No matter what kind of special system I put around this file, I’m disturbed by needing to make the assumption that this file can’t be read. Maybe the special code I write, because it’s used less often, is more likely to contain bugs. Or maybe there’s a way to fish data out of the disk that doesn’t go through the code I wrote.”

AMBER:And they imagine more and more ways that the adversary might be able to get at the information, and block those avenues off too! Because they have better imaginations.

CORAL:Well, we kind of do, but that’s not the key difference. What we’ll really want to do is come up with a way for the computer to check passwords that doesn’t rely on the computer storing the password完全,任何地方

AMBER:Ah, like encrypting the password file!




AMBER:Ah, that’s quite clever! But I don’t see what’s so qualitatively different between that measure, and my measure for hiding the key and the encrypted password file separately. I agree that your measure is more clever and elegant, but of course you’ll know better standard solutions than I do, since you work in this area professionally. I don’t see the qualitative line dividing your solution from my solution.

CORAL:Um, it’s hard to say this without offending some people, but… it’s possible that even after I try to explain the difference, which I’m about to do, you won’t get it. Like I said, if I could give you some handy platitudes and transform you into somebody capable of doing truly good work in computer security, the Internet would look very different from its present form. I can try to describe one aspect of the difference, but that may put me in the position of a mathematician trying to explain what looks more promising about one proof avenue than another; you can listen to everything they say and nod along and still not be transformed into a mathematician. So I要尝试解释差异,但是再次,我不知道有任何简单的说明手册来成为布鲁斯·施尼耶(Bruce Schneier)。

AMBER:I confess to feeling slightly skeptical at this supposedly ineffable ability that some people possess and others don’t—

CORAL:有类似这样的事情在很多职业。所以me people pick up programming at age five by glancing through a page of BASIC programs written for a TRS-80, and some people struggle really hard to grasp basic Python at age twenty-five. That’s not because there’s some mysterious truth the five-year-old knows that you can verbally transmit to the twenty-five-year-old.

And, yes, the five-year-old will become far better with practice; it’s not like we’re talking about untrainable genius. And there may be platitudes you can tell the 25-year-old that will help them struggle a little less. But sometimes a profession requires thinking in an unusual way and some people’s minds more easily turn sideways in that particular dimension.

AMBER:Fine, go on.


AMBER:Well, that version of the idea does feel a little silly. If you’re trying to secure a door, a lock that takes two keys might be more secure than a lock that only needs one key, but seven keys doesn’t feel like it makes the door that much more secure than two.




AMBER:所以the difference is that the person with atrue安全心态发现了一种防御,使系统更简单而不是更复杂。金宝博官方

CORAL:同样,这几乎是对的。通过放置密码,安全专业人员使他们的推理about the system less complicated. They’ve eliminated the need for an assumption that might be put under a lot of pressure. If you put the key in one special place and the encrypted password file in another special place, the system as a whole is still able to decrypt the user’s password. An adversary probing the state space might be able to trigger that password-decrypting state because the system is designed to do that on at least some occasions. By hashing the password file we eliminate that whole internal debate from the reasoning on which the system’s security rests.


CORAL:或者,如果有人在用户输入该密码后读取密码,而虽然它仍然存储在RAM中,该怎么办,因为某些东西可以访问RAM?从有关系统安全性的推理中消除额外假设的目的不是我们当时绝对安全,可以放松。金宝博官方有安全心态的人是nevergoing to be that relaxed about the edifice of reasoning saying the system is secure.

For that matter, while there are some normal programmers doing normal programming who might put in a bunch of debugging effort and then feel satisfied, like they’d done all they could reasonably do, programmers with decent levels of ordinary paranoia about ordinary programs will go on chewing ideas in the shower and coming up with more function tests for the system to pass. So the distinction between security mindset and ordinary paranoia isn’t that ordinary paranoids will relax.

It’s that… again to put it as a platitude, the ordinary paranoid is running around putting out fires in the form of ways they imagine an adversary might attack, and somebody with security mindset is defending against something closer to “what if an element of this reasoning is mistaken”. Instead of trying really hard to ensure nobody can read a disk, we are going to build a system that’s secure even if somebody does read the disk, andthatis our first line of defense. And then we are also going to build a filesystem that doesn’t let adversaries read the password file, as asecond防御线,以防我们的单向哈希秘密打破,并且因为没有积极的需要让对手阅读磁盘,所以为什么要让它们。然后,我们将加盐哈希,以防有人通过我们的系统窃取低渗透密码,而对手设法无论如何都可以读取密码。金宝博官方

AMBER:所以rather than trying to outwit adversaries, somebody with true security mindset tries to make fewer assumptions.


You need to master two ways of thinking, and there are a lot of people going around who have the first way of thinking but not the second. One way I’d describe the deeper skill is seeing a system’s security as resting on a story about why that system is safe. We want that safety-story to be as solid as possible. One of the implications is resting the story on as few assumptions as possible; as the saying goes, the only gear that never fails is one that has been designed out of the machine.

AMBER:But can’t you also get better security by adding more lines of defense? Wouldn’t that be more complexity in the story, and also better security?

CORAL:There’s also something to be said for preferring disjunctive reasoning over conjunctive reasoning in the safety-story. But it’s important to realize that you do want a primary line of defense that is supposed to just work and be unassailable, not a series of weaker fences that you think might maybe work. Somebody who doesn’t understand cryptography might devise twenty clever-seeming amateur codes and apply them all in sequence, thinking that, even if one of the codes turns out to be breakable, surely they won’tallbe breakable. The NSA will assign that mighty edifice of amateur encryption to an intern, and the intern will crack it in an afternoon.

There’s something to be said for redundancy, and having fallbacks in case the unassailable wall falls; it can be wise to have additional lines of defense, so long as the added complexity does not make the larger system harder to understand or increase its vulnerable surfaces. But at the core you need a simple, solid story about why the system is secure, and a good security thinker will be trying to eliminate whole assumptions from that story and strengthening its core pillars, not only scurrying around parrying expected attacks and putting out risk-fires.


AMBER:I wonder if that way of thinking has applications beyond computer security?


For example, stepping out of character for a moment, the author of this dialogue has sometimes been known to discussthe alignment problem for Artificial General Intelligence。他曾经谈论过试图衡量不断增长的AI系统内部的改进速率,因此,如果系统在整夜运行时发生突破,它不会对人类进行太多思考。金宝博官方他与之交谈的人回答说,对他来说,AGI似乎不太可能迅速获得权力。作者或多或少地回答:

It shouldn’t be your job to guess how fast the AGI might improve! If you write a system that will hurt youifa certain speed of self-improvement turns out to be possible, then you’ve written the wrong code. The code should just never hurt you regardless of the true value of that background parameter.

设置AGI的更好方法是衡量发生了多少进步,如果超过X改进发生,暂停系统,直到程序员验证已经发生的进度为止。金宝博官方这样,即使在毫秒内进行改进,只要系统按预期运行,您仍然可以。金宝博官方也许该系统由于其他错金宝博官方误而无法正常工作,但这比伤害您的系统更好地担心问题even ifit works as intended.

同样,您想设计系统,以便如果发现惊人的新功能,它将等待操作员验证使用金宝博官方这些功能的使用 - 而不是依靠操作员观察发生的事情并按下悬挂按钮。您不应该依靠发现速度或灾难速度比操作员的反应时间少。没有need如果您能找到安全的设计,则要烘烤这样的假设。例如,通过以允许操作员怀特列出的方法而不是避免使用operator-blacklack列入的方法来操作;您要求操作员在继续前进之前说“是”,而不是假设他们在场和细心,并且可以快速说“否”。

AMBER:Well, okay, but if we’re guarding against an AI system discovering cosmic powers in a millisecond, that does seem to me like an unreasonable thing to worry about. I guess that marks me as a merely ordinary paranoid.

CORAL:Indeed, one of the hallmarks of security professionals is that they spend a lot of time worrying about edge cases that would fail to alarm an ordinary paranoid because the edge case doesn’t sound like something an adversary is likely to do. Here’s an example从自由到修补匠博客

This interest in “harmless failures” – cases where an adversary can cause an anomalous but not directly harmful outcome – is another hallmark of the security mindset. Not all “harmless failures” lead to big trouble, but it’s surprising how often a clever adversary can pile up a stack of seemingly harmless failures into a dangerous tower of trouble. Harmless failures are bad hygiene. We try to stamp them out when we can…

To see why, consider the donotreply.com email story that hit the press recently. When companies send out commercial email (e.g., an airline notifying a passenger of a flight delay) and they don’t want the recipient to reply to the email, they often put in a bogus From address like donotreply@donotreply.com. A clever guy registered the domain donotreply.com, thereby receiving all email addressed to donotreply.com. This included “bounce” replies to misaddressed emails, some of which contained copies of the original email, with information such as bank account statements, site information about military bases in Iraq, and so on…


“如果您聪明,第一种方法可以保护您;第二种方式总是保护您。”这是安全心态的另一半。It’s what this essay’s author was doing by talking about AGI alignment that runs on whitelisting rather than blacklisting: you shouldn’t assume you’ll be clever about how fast the AGI system could discover capabilities, you should have a system that doesn’t use not-yet-whitelisted capabilities even if they are discovered very suddenly.

If your AGI would hurt you if it gained total cosmic powers in one millisecond, that means you built a cognitive process that is in some sense trying to hurt you and failing only due to what you think is a lack of capability. This is很坏and you should be designing some other AGI system instead. AGI systems should never be running a search that will hurt you if the search comes up non-empty. You should not be trying to fix that by making sure the search comes up empty thanks to your clever shallow defenses closing off all the AGI’s clever avenues for hurting you. You should fix that by making sure no search like that ever runs. It’s a silly thing to do with computing power, and you should do something else with computing power instead.



CORAL:Again, that’s a not-exactly-right way of putting it. When I don’t set up an email to originate from donotreply@donotreply.com, it’s not just because I’ve appreciated that an adversary registering donotreply.com is more probable than the novice imagines. For all I know, when a bounce email is sent to nowhere, there’s all kinds of things that might happen! Maybe the way a bounced email works is that the email gets routed around to weird places looking for that address. I don’t know, and I don’t want to have to study it. Instead I’ll ask: Can I make it so that a bounced email doesn’t generate a reply? Can I make it so that a bounced email doesn’t contain the text of the original message? Maybe I can query the email server to make sure it still has a user by that name before I try sending the message?—though there may still be “vacation” autoresponses that mean I’d better control the replied-to address myself. If it would be very bad for somebody unauthorized to read this, maybe I shouldn’t be sending it in plaintext by email.

AMBER:所以the person with true security mindset understands that where there’s one problem, demonstrated by what seems like a very unlikely thought experiment, there’s likely to be more realistic problems that an adversary can in fact exploit. What I think of as weird improbable failure scenarios are canaries in the coal mine, that would warn a truly paranoid person of bigger problems on the way.

CORAL:Again that’s not exactly right. The person with ordinary paranoia hears about donotreply@donotreply.com and may think something like, “Oh, well, it’s not very likely that an attacker will actually try to register that domain, I have more urgent issues to worry about,” because in that mode of thinking, they’re running around putting out things that might be fires, and they have to prioritize the things that are most likely to be fires.

If you demonstrate a weird edge-case thought experiment to somebody with security mindset, they don’t see something that’s more likely to be a fire. They think, “Oh no, my belief that those bounce emails go nowhere was FALSE!” The OpenBSD project to build a secure operating system has also, in passing, built an extremely robust operating system, because from their perspective any bug that potentially crashes the system is considered a critical security hole. An ordinary paranoid sees an input that crashes the system and thinks, “A crash isn’t as bad as somebody stealing my data. Until you demonstrate to me that this bug can be used by the adversary to steal data, it’s notextremelycritical.” Somebody with security mindset thinks, “Nothing inside this subsystem is supposed to behave in a way that crashes the OS. Some section of code is behaving in a way that does not work like my model of that code. Who knows what it might do? The system isn’t supposed to crash, so by making it crash, you have demonstrated that my beliefs about how this system works are false.”


CORAL:You have to do both. This game is short on consolation prizes. If you want your system to resist attack by major governments, you need it to actually be pretty darned secure, gosh darn it. The fact that some users may try to make their password be “password” does not change the fact that you also have to protect against buffer overflows.

AMBER:But even when somebody with security mindset designs an operating system, it often still ends up with successful attacks against it, right? So if this deeper paranoia doesn’t eliminate all chance of bugs, is it really worth the extra effort?

CORAL:如果您没有一个这样思考的人负责构建操作系统,那就有金宝博官方no没有立即失败的机会。具有安全心态的人有时无法构建安全的系统。金宝博官方没有安全心态的人alwaysfail at security if the system is at all complex. What this way of thinking buys you is a机会that your system takes longer than 24 hours to break.

AMBER:That sounds a little extreme.



CORAL:You think you’re negotiating with me, but you’re really negotiating with Murphy’s Law. I’m afraid that Mr. Murphy has historically been quite unreasonable in his demands, and rather unforgiving of those who refuse to meet them. I’m not advocating a policy to you, just telling you what happens if you don’t follow that policy. Maybe you think it’s not particularly bad if your lightbulb is doing denial-of-service attacks on a mattress store in Estonia. But if you do want a system to be secure, you need to do certain things, and that part is more of a law of nature than a negotiable demand.

AMBER:Non-negotiable, eh? I bet you’d change your tune if somebody offered you twenty thousand dollars. But anyway, one thing I’m surprised you’re not mentioning more is the part where people with security mindset always submit their idea to peer scrutiny and then accept what other people vote about it. I do like the sound of that; it sounds very communitarian and modest.

CORAL:I’d say that’s part of the ordinary paranoia that lots of programmers have. The point of submitting ideas to others’ scrutiny isn’t that hard to understand, though certainly there are plenty of people who don’t even do that. If I had any original remarks to contribute to that well-worn topic in computer security, I’d remark that it’s framed as advice to wise paranoids, but of course the people who need it even more are the happy innocents.

AMBER:Happy innocents?

CORAL:people who lack even ordinary paranoia. Happy innocents tend to envision ways that their system works, but not askat allhow their system might fail, until somebody prompts them into that, and even then they can’t do it. Or at least that’s been my experience, and that of many others in the profession.

There’s a certain incredibly terrible cryptographic system, the equivalent of the Fool’s Mate in chess, which is sometimes converged on by the most total sort of amateur, namely Fast XOR. That’s picking a password, repeating the password, and XORing the data with the repeated password string. The person who invents this system may not be able to take the perspective of an adversary at all.Hewants his marvelous cipher to be unbreakable, and he is not able to truly enter the frame of mind of somebody who wants his cipher to be breakable. If you ask him, “Please,tryto imagine what could possibly go wrong,” he may say, “Well, if the password is lost, the data will be forever unrecoverable because my encryption algorithm is too strong; I guess that’s something that could go wrong.” Or, “Maybe somebody sabotages my code,” or, “If you really insist that I invent far-fetched scenarios, maybe the computer spontaneously decides to disobey my programming.” Of course any competent ordinary paranoid asks the most skilled people they can find to look at a bright idea and try to shoot it down, because other minds may come in at a different angle or know other standard techniques. But the other reason why we say “Don’t roll your own crypto!” and “Have a security expert look at your bright idea!” is in hopes of reaching the many people who can’tat allinvert the polarity of their goals—they don’t think that way spontaneously, and if you try to force them to do it, their thoughts go in unproductive directions.

AMBER:like… the same way many people on the Right/Left seem utterly incapable of stepping outside their own treasured perspectives to pass the意识形态的图灵测试of the Left/Right.

CORAL:我不知道如果它是完全相同的精神齿轮or capability, but there’s a definite similarity. Somebody who lacks ordinary paranoia can’t take on the viewpoint of somebody who wants Fast XOR to be breakable, and pass that adversary’s Ideological Turing Test for attempts to break Fast XOR.

AMBER:Can’t, or won’t? You seem to be talking like these are innate, untrainable abilities.

CORAL:Well, at the least, there will be different levels of talent, as usual in a profession. And also as usual, talent vastly benefits from training and practice. But yes, it has sometimes seemed to me that there is a kind of qualitative step or gear here, where some people can shift perspective to imagine an adversary that truly wants to break their code… or a reality that isn’t cheering for their plan to work, or aliens who evolved different emotions, or an AI that doesn’twant结论一下“因此,人类应该永远幸福地生活”的理由,或者一个虚构的人物,他相信西斯意识形态,但doesn’t believe they’re the bad guy

It does sometimes seem to me like some people simply can’t shift perspective in that way. Maybe it’s not that they truly lack the wiring, but that there’s an instinctive political off-switch for the ability. Maybe they’re scared to let go of their mental anchors. But from the outside it looks like the same result: some people do it, some people don’t. Some people spontaneously invert the polarity of their internal goals and spontaneously ask how their cipher might be broken and come up with productive angles of attack. Other people wait until prompted to look for flaws in their cipher, or they demand that you argue with them and wait for you to come up with an argument that satisfies them. If you ask them to predict themselves what you might suggest as a flaw, they say weird things that don’t begin to pass your Ideological Turing Test.


CORAL:普通偏执狂中的一个明显的定量人才水平是,您可以扭转视角以侧面看待事物的多远 - 您发明的攻击的创造力和可操作性。像这些examplesBruce Schneier给:

Uncle Milton Industries has been selling ant farms to children since 1956. Some years ago, I remember opening one up with a friend. There were no actual ants included in the box. Instead, there was a card that you filled in with your address, and the company would mail you some ants. My friend expressed surprise that you could get ants sent to you in the mail.

I replied: “What’s really interesting is that these people will send a tube of live ants to anyone you tell them to.”


SmartWater is a liquid with a unique identifier linked to a particular owner. “The idea is for me to paint this stuff on my valuables as proof of ownership,” I wrote when I first learned about the idea. “I think a better idea would be for me to paint it on your valuables, and then call the police.”


This kind of thinking is not natural for most people. It’s not natural for engineers. Good engineering involves thinking about how things can be made to work; the security mindset involves thinking about how things can be made to fail…

I’ve often speculated about how much of this is innate, and how much is teachable. In general, I think it’s a particular way of looking at the world, and that it’s far easier to teach someone domain expertise—cryptography or software security or safecracking or document forgery—than it is to teach someone a security mindset.

To be clear, the distinction between “just ordinary paranoia” and “all of security mindset” is my own; I think it’s worth dividing the spectrum above the happy innocents into two levels rather than one, and say, “This business of looking at the world from weird angles is only half of what you need to learn, and it’s the easier half.”

AMBER:也许布鲁斯·施尼耶(Bruce Schneier)本人并没有掌握您说“安全心态”时的意思,而您只是偷走了他的术语来指代自己的全新想法!

CORAL:不,不想要原因bout whether somebody might someday register “donotreply.com” and just fixing it regardless—a methodology that doesn’t trust you to be clever about which problems will blow up—that’s definitely part of what existing security professionals mean by “security mindset”, and it’s definitely part of the second and deeper half. The only unconventional thing in my presentation is that I’m factoring out an intermediate skill of “ordinary paranoia”, where you try to parry an imagined attack by encrypting your password file and hiding the encryption key in a separate section of filesystem code. Coming up with the idea of hashing the password file is, I suspect, a qualitatively distinct skill, invoking a world whose dimensions are your own reasoning processes and not just object-level systems and attackers. Though it’s not polite to say, and the usual suspects will interpret it as a status grab, my experience with other reflectivity-laden skills suggests this may mean that many people, possibly including you, will prove unable to think in this way.

AMBER:I indeed find that terribly impolite.

CORAL:It may indeed be impolite; I don’t deny that. Whether it’s untrue is a different question. The reason I say it is because, as much as I want ordinary paranoids totry为了达到更深层次的偏执狂,我希望他们意识到这可能不是他们的事,在这种情况下,他们应该得到帮助,然后倾听该帮助。他们不应该假设,因为他们可以注意到将蚂蚁邮寄给人的机会,所以他们也可以选择Donotreply@donotreply.com的可怕。

AMBER:Maybe you could call that “deep security” to distinguish it from what Bruce Schneier and other security professionals call “security mindset”.


AMBER:Suppose I take that at face value. Earlier, you described what might go wrong when a happy innocent tries and fails to be an ordinary paranoid. What happens when an ordinary paranoid tries to do something that requires the deep security skill?

CORAL:They believe they have wisely identified bad passwords as the real fire in need of putting out, and spend all their time writing more and more clever checks for bad passwords. They are very impressed with how much effort they have put into detecting bad passwords, and how much concern they have shown for system security. They fall prey to the standard cognitive bias whose name I can’t remember, where people want to solve a problem using one big effort or a couple of big efforts and then be done and not try anymore, and that’s why people don’t put up hurricane shutters once they’re finished buying bottled water. Pay them to “try harder”, and they’ll hide seven encryption keys to the password file in seven different places, or build towers higher and higher in places where a successful adversary is obviously just walking around the tower if they’ve gotten through at all. What these ideas have in common is that they are in a certain sense “shallow”. They are mentally straightforward as attempted parries against a particular kind of envisioned attack. They give you a satisfying sense of fighting hard against the imagined problem—and then they fail.

AMBER:Are you saying it’s不是检查用户密码不是“密码”的好主意吗?

CORAL:No, shallow defenses are often good ideas too! But even there, somebody with the higher skill will try to look at things in a more systematic way; they know that there are often deeper ways of looking at the problem to be found, and they’ll try to find those deep views. For example, it’s extremely important that your password checker does不是通过要求密码至少包含一个大写字母,小写字母,编号和标点符号来排除密码“正确的马电池主食”。您真正想做的是测量密码熵。没有设想某人猜测“彩虹”的失败模式,您会通过强迫用户将密码为“ ra1nbow!”来巧妙地balk绕。反而。

你想要的密码输入字段checkbox that allows showing the typed password in plaintext, because your attempt to parry the imagined failure mode of some evildoer reading over the user’s shoulder may get in the way of the user entering a long or high-entropy password. And the user is perfectly capable of typing their password into that convenient text field in the address bar above the web page, so they can copy and paste it—thereby sending your password to whoever tries to do smart lookups on the address bar. If you’re really that worried about some evildoer reading over somebody’s shoulder, maybe you should be sending a confirmation text to their phone, rather than forcing the user to enter their password into a nearby text field that they can actually read. Obscuring one text field, with no off-switch for the obscuration, to guard against this one bad thing that you imagined happening, while managing to step on your own feet in other ways and not even really guard against the bad thing; that’s the peril of shallow defenses.

An archetypal character for “ordinary paranoid who thinks he’s trying really hard but is actually just piling on a lot of shallow precautions” is Mad-Eye Moody from theHarry Potterseries, who has a whole room full of Dark Detectors, and who also ends up locked in the bottom of somebody’s trunk. It seems Mad-Eye Moody was too busy buying one more Dark Detector for his existing room full of Dark Detectors, and he didn’t invent precautions deep enough and general enough to cover the unforeseen attack vector “somebody tries to replace me using Polyjuice”.

And the solution isn’t to add on a special anti-Polyjuice potion. I mean, if you happen to have one, great, but that’s not where most of your trust in the system should be coming from. The first lines of defense should have a sense about them of depth, of generality. Hashing password files, rather than hiding keys; thinking of how to measure password entropy, rather than requiring at least one uppercase character.

AMBER:Again this seems to me more like a quantitative difference in the cleverness of clever ideas, rather than two different modes of thinking.

CORAL:真实的分类往往是模糊的,但对我来说these seem like the product of two different kinds of thinking. My guess is that the person who popularized demanding a mixture of letters, cases, and numbers was reasoning in a different way than the person who thought of measuring password entropy. But whether you call the distinction qualitative or quantitative, the distinction remains. Deep and general ideas—the kind that actually simplify and strengthen the edifice of reasoning supporting the system’s safety—are invented more rarely and by rarer people. To build a system that can resist or even slow down an attack by multiple adversaries, some of whom may be smarter or more experienced than ourselves, requires a level of professionally specialized thinking that isn’t reasonable to expect from every programmer—not even those who can shift their minds to take on the perspective of a single equally-smart adversary. What you should ask from an ordinary paranoid is that they appreciate that deeper ideas exist, and that they try to learn the standard deeper ideas that are already known; that they know their own skill is not the upper limit of what’s possible, and that they ask a professional to come in and check their reasoning. And then actually listen.

AMBER:但是,如果人们有可能认为自己具有更高的技能和错误,那么您怎么知道you是这些罕见的人之一truly有深厚的安全心态吗?您可能对自己的高度看法just be due to the Dunning-Kruger effect


Yes, there will be some innocents who can’t believe that there’s a talent called “paranoia” that they lack, who’ll come up with weird imitations of paranoia if you ask them to be more worried about flaws in their brilliant encryption ideas. There will also be some people reading this with severe cases of社交焦虑和自信。Readers who有能力拥有普通的偏执狂甚至安全心态,他们可能不会试图发展这些才能,因为他们非常担心他们可能只是只想想象自己有才华的人之一。好吧,如果您认为自己可以感受到深层安全思想和浅薄的想法之间的区别,那么至少应该在时刻尝试以同样的方式产生自己的想法。

AMBER:But won’t that attitude encourage overconfident people to think they can be paranoid when they actually can’t be, with the result that they end up too impressed with their own reasoning and ideas?

CORAL:I strongly suspect that they’ll do that regardless. You’re not actually promoting some kind of collective good practice that benefits everyone, just by personally agreeing to be modest. The overconfident don’t care what you decide. And if you’re not just as worried about underestimating yourself as overestimating yourself, if your fears about exceeding your proper place are asymmetric with your fears about lost potential and foregone opportunities, then you’re probably dealing with an emotional issue rather than a strict concern with good epistemology.

AMBER:If somebody does have the talent for deep security, then, how can they train it?

CORAL:… That’s a hell of a good question. Some interesting training methods have been developed for ordinary paranoia, like classes whose students have to figure out how to attack everyday systems outside of a computer-science context. One professor gave a test in which one of the questions was “What are the first 100 digits of pi?”—the point being that you need to find some way to cheat in order to pass the test. You should train that kind of ordinary paranoia first, if you haven’t done that already.




AMBER:所以, like, if I’m building an operating system, I write down, “Safety assumption: The login system works to keep out attackers”—


Uh, no, sorry. As usual, it seems that what I think is “advice” has left out all the important parts anyone would need to actually do it.





CORAL:What you’re trying to do is reduce the story past the point where you talk about a goal-laden event, “the adversary fails”, and instead talk about neutral facts underlying that event. For now, just answer me: Why do you believe the adversary fails to guess the password?




CORAL:We’re making progress, but again, the term “enough” is goal-laden language. It’s your own wants and desires that determine what is “enough”. Can you say something else instead of “enough”?


CORAL:我不是说找到一个同义词for “enough”. I mean, use different concepts that aren’t goal-laden. This will involve changing the meaning of what you write down.


CORAL:Not yet, anyway. Maybe not ever, but that isn’t known, and you shouldn’t assume it based on one failure.

Anyway, what I was hoping for was a pair of statements like, “I believe every password has at least 50 bits of entropy” and “I believe no attacker can make more than a trillion tries total at guessing any password”. Where the point of writing “I believe” is to make yourself pause and question whether you actually believe it.

AMBER:Isn’t saying no attacker “can” make a trillion tries itself goal-laden language?

CORAL:的确,该假设可能需要通过为什么要相信 - “我相信系统拒绝密码的尝试更接近1秒钟,我相信攻击者将其持续不到一个月,并且金宝博官方我相信攻击者的同时连接少于300,000。”再在哪里,关键是您然后看一下写的话,并说:“我真的相信吗?”明确的是,有时答案是“是的,我肯定会相信!”这不是一个社交谦虚的练习,您可以炫耀自己有痛苦的疑问的能力,然后再继续做同样的事情。关键是要找出您的信念,或者您需要相信什么,并检查它是否可信。

AMBER:And this trains a deep security mindset?


In point of fact, the real reason the author is listing out this methodology is that he’s currently trying to do something similar on the problem of aligning Artificial General Intelligence, and he would like to move past “I believe my AGI won’t want to kill anyone” and into a headspace more like writing down statements such as “Although the space of potential weightings for this recurrent neural net does contain weight combinations that would figure out how to kill the programmers, I believe that gradient descent on loss functionlwill only access a result inside subspacewith propertiesp, and I believe a space with propertiespdoes not include any weight combinations that figure out how to kill the programmer.”


继续:Security Mindset and the Logistic Success Curve