E Hubinger. 2020. “An Overview of 11 Proposals for Building Safe Advanced AI。“arXiv:2012.07532 [cs.LG].
E Hubinger, C van Merwijk, V Mikulik, J Skalse, and S Garrabrant. 2019. “Risks from Learned Optimization in Advanced Machine Learning Systems。“arXiv:1906.01820 [cs.AI].
v Kosoy。2019年。“授权加固学习:学习避免陷阱有点帮助。“在ICLR的安全机器学习研讨会上提出。
一个德斯基和s garrabrant。2019年。“Embedded Agency。“arxiv:1902.09469 []。


S Armstrong and S Mindermann. 2018. “Occam’s Razor is Insufficient to Infer the Preferences of Irrational Agents。“InAdvances in Neural Information Processing Systems31。
D Manheim和S Garrabrant。2018年。“Categorizing Variants of Goodhart’s Law。“arXiv:1803.04585 [cs.AI].


R Carey. 2018. “CIRL框架中的不符合。“arXiv:1709.06275 [cs.AI]. Paper presented at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society.
S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2017. “A Formal Approach to the Problem of Logical Non-Omniscience。“Paper presented at the 16th conference on Theoretical Aspects of Rationality and Knowledge.
K Grace,J Salvatier,Dafoe,B Zhang和O埃文斯。2017年。“什么时候AI将超过人类表现?来自AI专家的证据。“arxiv:1705.08807 []。
v Kosoy。2017年。“Forecasting Using Incomplete Models。“arxiv:1705.04630 [cs.lg]。
n拍卖和b levinstein。2020.“在大马士革欺骗死亡。“The Journal of Philosophy117(5):237–266. Previously presented at the 14th Annual Formal Epistemology Workshop.
E Yudkowsky and N Soares. 2017. “Functional Decision Theory: A New Theory of Instrumental Rationality。“arXiv:1710.05060 [cs.AI].


T Benson-Tilsen and N Soares. 2016. “Formalizing Convergent Instrumental Goals。“Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.
A Critch. 2019. “Löb定理的参数,资源有限泛化,以及开源游戏理论的强大合作标准。“arXiv:1602.04184 [cs:GT].The Journal of Symbolic Logic84(4):1368–1381. Previously published as “参数有界Lob定理和健壮的库珀ation of Bounded Agents。“
S Garrabrant, T Benson-Tilsen, A Critch, N Soares, and J Taylor. 2016. “逻辑归纳。“arxiv:1609.03543 []。
S Garrabrant, B Fallenstein, A Demski, and N Soares. 2016. “归纳连贯。“arXiv:1604.05288 [cs:AI]. Previously published as “Uniform Coherence.”
S Garrabrant, N Soares, and J Taylor. 2016. “与无限延迟的在线学习中的渐近融合。“arXiv:1604.05280 [cs:LG].
V Kosoy and A Appel. 2020. “最佳多项式估算器:近似算法的贝叶斯概念。“arXiv:1608.04112 [cs.CC]. Forthcoming inJournal of Applied Logics
J Leike, J Taylor, and B Fallenstein. 2016. “A Formal Solution to the Grain of Truth Problem。“Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.
L Orseau and S Armstrong. 2016. “Safely Interruptible Agents。“Paper presented at the 32nd Conference on Uncertainty in Artificial Intelligence.
K Sotala。2016年。“定义价值学习者的人类价值观。“Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.
J Taylor. 2016. “Quantilizers: A Safer Alternative to Maximizers for Limited Optimization。“Paper presented at the AAAI 2016 AI, Ethics and Society Workshop.
J Taylor, E Yudkowsky, P LaVictoire, and A Critch. 2016. “Alignment for Advanced Machine Learning Systems。“MIRI technical report 2016–1.


B Fallenstein and R Kumar. 2015. “对HOL的证明反射:应用于模拟多态性的应用。“InInteractive Theorem Proving: 6th International Conference, ITP 2015, Nanjing, China, August 24-27, 2015, Proceedings.Springer。
b experenstein和n飙升。2015年。“视频反思:自我改善代理商可靠推理。“MIRI technical report 2015–2.
B Fallenstein, N Soares, and J Taylor. 2015. “所罗门组织诱导和赤霞的反射变体。“In2015年AGI的诉讼程序。Springer。以前发布为2015-8 2015-8的Miri技术报告。
B Fallenstein, J Taylor, and P Christiano. 2015. “Reflective Oracles: A Foundation for Classical Game Theory。“arXiv:1508.04145 [cs.AI]. Previously published as MIRI technical report 2015–7. Published in abridged form as “Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence” inProceedings of LORI 2015
S Garrabrant, S Bhaskar, A Demski, J Garrabrant, G Koleszarik, and E Lloyd. 2016. “Asymptotic Logical Uncertainty and the Benford Test。“arXiv:1510.03370 [cs.LG]. Paper presented at the Ninth Conference on Artificial General Intelligence. Previously published as MIRI technical report 2015–11.
K Grace. 2015. “The Asilomar Conference: A Case Study in Risk Mitigation。“MIRI technical report 2015–9.
K Grace. 2015. “Leó Szilárd and the Danger of Nuclear Weapons: A Case Study in Risk Mitigation。“MIRI technical report 2015–10.
p救世主。2015年。“An Introduction to Löb’s Theorem in MIRI Research。“Miri 2015-6技术报告。
n拍了。2015年。“Aligning Superintelligence with Human Interests: An Annotated Bibliography。“MIRI technical report 2015–5.
n拍了。2015年。“Formalizing Two Problems of Realistic World-Models。“MIRI technical report 2015–3.
n拍了。2018年。“The Value Learning Problem。“InArtificial Intelligence Safety and Security。查普曼和大厅。以前在IJCAI 2016年德文伦理上为人工智能研讨会,并于2015-4年之前发布为Miri技术报告。
n飙升和b expenstein。2015年。“Questions of Reasoning under Logical Uncertainty。“MIRI technical report 2015–1.
n飙升和b expenstein。2015年。“走向理想化决策理论。“arXiv:1507.01986 [cs.AI]. Previously published as MIRI technical report 2014–7. Published in abridged form as “两次尝试在确定性设置中正式化符合符合符合的推理” in2015年AGI的诉讼程序
K Sotala。2015年。“Concept Learning for Safe Autonomous AI。“纸张在Aaai 2015伦理和人工智能研讨会上呈现。


S Armstrong, K Sotala, and S Ó hÉigeartaigh. 2014. “The Errors, Insights and Lessons of Famous AI Predictions – and What They Mean for the Future。“实验与理论人工智能学报26(3):317-342。
M Bárász, P Christiano, B Fallenstein, M Herreshoff, P LaVictoire, and E Yudkowsky. 2014. “Robust Cooperation on the Prisoner’s Dilemma: Program Equilibrium via Provability Logic。“arXiv:1401.5577 [cs.GT]。
T Benson-Tilsen. 2014. “UDT with Known Search Order。“MIRI technical report 2014–4.
N Bostrom and E Yudkowsky. 2018. “The Ethics of Artificial Intelligence。“InArtificial Intelligence Safety and Security。查普曼和大厅。Previously published in人工智能的剑桥手册(2014).
P Christiano. 2014. “Non-Omniscience, Probabilistic Inference, and Metamathematics。“MIRI technical report 2014–3.
B Fallenstein。2014年。“Procrastination in Probabilistic Logic。“Working paper.
b experenstein和n飙升。2014年。“Problems of Self-Reference in Self-Improving Space-Time Embedded Intelligence。“In2014年AGI的诉讼程序。Springer。
B Fallenstein and N Stiennon. 2014. “‘Loudness’: On Priors over Preference Relations。“简要技术说明。
P LaVictoire, B Fallenstein, E Yudkowsky, M Bárász, P Christiano and M Herreshoff. 2014. “通过Löb的定理囚犯困境的计划均衡。“Paper presented at the AAAI 2014 Multiagent Interaction without Prior Coordination Workshop.
l muehlhauser和n bostrom。2014年。“为什么我们需要友好的ai。“Think13(36):42-47。
L Muehlhauser and B Hibbard. 2014. “Exploratory Engineering in AI。“Communications of the ACM57 (9): 32–34.
C Shulman和N Bostrom。2014年。“胚胎选择的认知增强:好奇心或游戏变换器?全球政策5(1):85-92。
n拍了。2014年。“因果图中的剪辑代理。“MIRI technical report 2014–5.
n飙升和b expenstein。2014年。“Botworld 1.1。“MIRI technical report 2014–2.
n飙升和b expenstein。2017年。“用于将机器智能与人类兴趣对准的代理基础:技术研究议程金宝博娱乐。“InThe Technological Singularity: Managing the Journey。Springer。Previously published as MIRI technical report 2014–8 under the name “Aligning Superintelligence with Human Interests: A Technical Research Agenda.”
n飙升,b expenstein,yudkowsky和s阿姆斯特朗。2015年。“Corrigibility。“纸张在Aaai 2015伦理和人工智能研讨会上呈现。以前发布为2014-6 2014-6的Miri技术报告。
eyudkowsky。2014年。“Distributions Allowing Tiling of Staged Subjective EU Maximizers。“MIRI technical report 2014–1.


A Altair. 2013. “A Comparison of Decision Algorithms on Newcomblike Problems。“Working paper. MIRI.
S Armstrong,N Bostrom和C Shulman。2015年。“Racing to the Precipice: A Model of Artificial Intelligence Development。“AI & Society(DOI 10.1007 / S00146-015-0590-7):1-6。以前发表于2013-1 2013-1的人性学院的未来。
P Christiano, E Yudkowsky, M Herreshoff, and M Bárász. 2013. “Definability of “Truth” in Probabilistic Logic。“Draft. MIRI.
B Fallenstein。2013年。“The 5-and-10 Problem and the Tiling Agents Formalism。“MIRI technical report 2013–9.
B Fallenstein。2013年。“Decreasing Mathematical Strength in One Formalization of Parametric Polymorphism。“简要技术说明。Miri。
B Fallenstein。2013年。“An Infinitely Descending Sequence of Sound Theories Each Proving the Next Consistent。“MIRI technical report 2013–6.
B Fallenstein and A Mennen. 2013. “Predicting AGI: What Can We Say When We Know So Little?“ 工作文件。Miri。
K Grace. 2013. “Algorithmic Progress in Six Domains。“MIRI technical report 2013–3.
j hahn。2013年。“Scientific Induction in Probabilistic Metamathematics。“MIRI technical report 2013–4.
L Muehlhauser. 2013. “情报爆炸常见问题解答。“Working paper. MIRI. (HTML)
l muehlhauser和l helm。2013年。“智力爆炸与机器伦理。“InSingularity Hypotheses。Springer。
L Muehlhauser and A Salamon. 2013. “Intelligence Explosion: Evidence and Import。“InSingularity Hypotheses。Springer。(Español)(Français)(Italiano)
l Muehlhauser和C威廉姆森。2013年。“Ideal Advisor Theories and Personal CEV。“Working paper. MIRI.
n拍了。2013年。“Fallenstein’s Monster。“MIRI technical report 2013–7.
K Sotala和R yampolskiy。2014年。“Responses to Catastrophic AGI Risk: A Survey。“Physica Scripta90 (1): 1-33. Previously published as MIRI technical report 2013–2.
n stiennon。2013年。“Recursively-Defined Logical Theories Are Well-Defined。“MIRI technical report 2013–8.
R yampolskiy和J Fox。2013年。“Artificial General Intelligence and the Human Mental Model。“InSingularity Hypotheses。Springer。
R yampolskiy和J Fox。2013年。“人工综合情报安全工程。“topoi32 (2): 217–226.
eyudkowsky。2013年。“智力爆炸微观经济学。“MIRI technical report 2013–1.
eyudkowsky。2013年。“The Procrastination Paradox。“简要技术说明。Miri。
yudkowsky和m herreshoff。2013年。“Tiling Agents for Self-Modifying AI, and the Löbian Obstacle。“Draft. MIRI.


S Armstrong and K Sotala. 2012. “How We’re Predicting AI – or Failing To。“InBeyond AI: Artificial Dreams。Pilsen: University of West Bohemia.
B Hibbard. 2012. “Avoiding Unintended AI Behaviors。“In2012年AGI的诉讼程序。Springer。
B Hibbard. 2012. “Decision Support for Safe AI Design。“In2012年AGI的诉讼程序。Springer。
L Muehlhauser. 2012. “2012年AI风险参考书目。“Working paper. MIRI.
萨拉曼逊和l muehlhauser。2012年。“Singularity Summit 2011 Workshop Report。“Working paper. MIRI.
C Shulman和N Bostrom。2012年。“人工智能有多难?进化论点和选择效果。“意识研究杂志19 (7–8): 103–130.
K Sotala。2012年。“人工智能,上传和数字思想的优点。“International Journal of Machine Consciousness4 (1): 275-291.
K Sotala and H Valpola. 2012. “Coalescing Minds: Brain Uploading-Related Group Mind Scenarios。“International Journal of Machine Consciousness4(1):293-312。


P de Blanc. 2011. “Ontological Crises in Artificial Agents’ Value Systems。“arXiv:1105.3821 [cs.AI]
d杜威。2011年。“学习什么价值。“InAGI 2011年会议记录。Springer。
eyudkowsky。2011年。“Complex Value Systems Are Required to Realize Valuable Futures。“InAGI 2011年会议记录。Springer。


J Fox and C Shulman. 2010. “过度智能并不意味着仁慈。“InProceedings of ECAP 2010。verlag小屋博士。
S Kaas, S Rayhawk, A Salamon, and P Salamon. 2010. “软件思想的经济影响。“InProceedings of ECAP 2010。verlag小屋博士。
A Salamon, S Rayhawk, and J Kramár. 2010. “How Intelligible Is Intelligence?” InProceedings of ECAP 2010。verlag小屋博士。
C Shulman. 2010. “Omohundro’s ‘Basic AI Drives’ and Catastrophic Risks。“Working paper. MIRI.
C Shulman. 2010. “Whole Brain Emulation and the Evolution of Superorganisms。“Working paper. MIRI.
C Shulman and A Sandberg. 2010. “Implications of a Software-Limited Singularity。“InProceedings of ECAP 2010。verlag小屋博士。
K Sotala。2010. “From Mostly Harmless to Civilization-Threatening。“InProceedings of ECAP 2010。verlag小屋博士。
N Tarleton. 2010. “连贯的外推着意志:机器道德的荟萃水平方法。“Working paper. MIRI.
eyudkowsky。2010. “Timeless Decision Theory。“Working paper. MIRI.
E Yudkowsky, C Shulman, A Salamon, R Nelson, S Kaas, S Rayhawk, and T McCabe. 2010. “Reducing Long-Term Catastrophic Risks from Artificial Intelligence。“Working paper. MIRI.


P de Blanc. 2009. “Convergence of Expected Utility for Universal Artificial Intelligence。“arXiv:0907.5598 [cs.AI].
S Rayhawk, A Salamon, M Anissimov, T McCabe, and R Nelson. 2009. “Changing the Frame of AI Futurism: From Storytelling to Heavy-Tailed, High-Dimensional Probability Distributions。“纸张在ECAP 2009上呈现。
C Shulman and S Armstrong. 2009. “手臂控制和智力爆炸。“纸张在ECAP 2009上呈现。
C Shulman, H Jonsson, and N Tarleton. 2009. “机器伦理和超明。“In2009年AP-Cap的诉讼程序。University of Tokyo.
C Shulman, N Tarleton, and H Jonsson. 2009. “Which Consequentialism? Machine Ethics and Moral Divergence。“In2009年AP-Cap的诉讼程序。University of Tokyo.
eyudkowsky。2008. “Artificial Intelligence as a Positive and Negative Factor in Global Risk。“InGlobal Catastrophic Risks。Oxford University Press. Published in abridged form as “Friendly Artificial Intelligence” inSingularity Hypotheses。(官话)(Italiano)(한국어)(Português)(Pу́сский)
eyudkowsky。2008. “Cognitive Biases Potentially Affecting Judgement of Global Risks。“InGlobal Catastrophic Risks。Oxford University Press. (Italiano)(Pу́сский)(Portuguese)
eyudkowsky。2007. “一般情报中的组织水平。“In人工综合情报(认知技术)。Springer。
eyudkowsky。2004. “Coherent Extrapolated Volition。“Working paper. MIRI.




Inadequate Equilibria Cover

Inadequate Equilibria: Where and How Civilizations Get Stuck

E Yudkowsky (2017)

When should you think that you may be able to do somethingunusually well?When you’re trying to outperform in a given area, it’s important that you have a sober understanding of your relative competencies. The story only ends there, however, if you’re fortunate enough to live in anadequate文明。

Eliezer Yudkowsky’sInadequate Equilibriais a sharp and lively guidebook for anyone questioning when and how they can know better, and do better, than the status quo. Freely mixing debates on the foundations of rational decision-making with tips for everyday life, Yudkowsky explores the central question of when we can (and can’t) expect to spot systemic inefficiencies, and exploit them.


Rationality: From AI to Zombies

E Yudkowsky (2015)

When human brains try to do things, they can run into some very strange problems. Self-deception, confirmation bias, magical thinking—it sometimes seems our ingenuity is boundless when it comes to shooting ourselves in the foot.

地图和地区and the rest of theRationality: From AI to Zombiesseries asks what a “martial art” of rationality would look like. In this series, Eliezer Yudkowsky explains the findings of cognitive science, and the ideas of naturalistic philosophy, that help provide a useful background for understanding MIRI’s research and for generally approaching ambitious problems.


Smarter Than Us: The Rise of Machine Intelligence

S Armstrong(2014)

What happens when machines become smarter than humans? Humans steer the future not because we’re the strongest or the fastest but because we’re thesmartest。When machines become smarter than humans, we’ll be handing them the steering wheel. What promises—and perils—will these powerful machines present? Stuart Armstrong’s new book navigates these questions with clarity and wit.

Facing the Intelligence Explosion

Facing the Intelligence Explosion

L Muehlhauser (2013)

Sometime this century, machines will surpass human levels of intelligence and ability. This event—the “intelligence explosion”—will be the most important event in our history, and navigating it wisely will be the most important thing we can ever do.

杰出人物阿兰·图灵和i . j .比尔Joy and Stephen Hawking have warned us about this. Why do we think Hawking and company are right, and what can we do about it?

Facing the Intelligence Explosionis Muehlhauser’s attempt to answer these questions.


Hanson-Yudkowsky AI-FOF辩论

R Hanson and E Yudkowsky (2013)

In late 2008, economist Robin Hanson and AI theorist Eliezer Yudkowsky conducted an online debate about the future of artificial intelligence, and in particular about whether generally intelligent AIs will be able to improve their own capabilities very quickly (a.k.a. “foom”). James Miller and Carl Shulman also contributed guest posts to the debate.

原始辩论在很长一系列博客帖子中进行,这里收集。本书还包括2011年汉森和Yudkowsky之间的2011年辩论的成绩单,由Kaj Sotala撰写的辩论摘要,以及Yudkowsky撰写的AI起飞动态(“智力爆炸微观经济学”)的2013年技术报告。



