MIRI COO Malo Bourgon reviews our past year and discusses our future plans in 2020 Updates and Strategy
Our biggest update is that we’ve made less concrete progress than we expected on the new research we described in 2018 Update: Our New Research Directions. As a consequence, we’re scaling back our work on these research directions, and looking for new angles of attack that have better odds of resulting in a solution to the alignment problem.
Other MIRI updates
- A new paper from MIRI researcher Evan Hubinger: “An Overview of 11 Proposals for Building Safe Advanced AI.”
- A belated paper announcement from last year: Andrew Critch’s “A Parametric, Resource-Bounded Generalization of Löb’s Theorem, and a Robust Cooperation Criterion for Open-Source Game Theory“, a result originally written up during his time at MIRI, has been published in the Journal of Symbolic Logic.
- MIRI’s Abram Demski introduces Learning Normativity: A Research Agenda. See also Abram’s new write-up, Normativity.
- Evan Hubinger clarifies inner alignment terminology.
- The Survival and Flourishing Fund (SFF) has awarded MIRI $563,000 in its latest round of grants! Our enormous gratitude to SFF’s grant recommenders and funders.
- A Map That Reflects the Territory is a new print book set collecting the top LessWrong essays of 2018, including essays by MIRI researchers Eliezer Yudkowsky, Abram Demski, and Scott Garrabrant.
- DeepMind’s Rohin Shah gives his overview of Scott Garrabrant’s Cartesian Frames framework.
News and links
- Daniel Filan launches the AI X-Risk Research Podcast (AXRP) with episodes featuring Adam Gleave, Rohin Shah, and Andrew Critch.
- DeepMind’s AlphaFold represents a very large advance in protein structure prediction.
- Metaculus launches Forecasting AI Progress, an open four-month tournament to predict advances in AI, with a $50,000 prize pool.
- Continuing the Takeoffs Debate: Richard Ngo responds to Paul Christiano’s “changing selection pressures” argument against hard takeoff.
- OpenAI’s Beth Barnes discusses the obfuscated arguments problem for AI safety via debate:
Previously we hoped that debate/IDA could verify any knowledge for which such human-understandable arguments exist, even if these arguments are intractably large. We hoped the debaters could strategically traverse small parts of the implicit large argument tree and thereby show that the whole tree could be trusted.
The obfuscated argument problem suggests that we may not be able to rely on debaters to find flaws in large arguments, so that we can only trust arguments when we could find flaws by recursing randomly—e.g. because the argument is small enough that we could find a single flaw if one existed, or because the argument is robust enough that it is correct unless it has many flaws.
- Some AI Research Areas and Their Relevance to Existential Safety: Andrew Critch compares out-of-distribution robustness, agent foundations, multi-agent RL, preference learning, and other research areas.
- Ben Hoskin releases his 2020 AI Alignment Literature Review and Charity Comparison.
- Open Philanthropy summarizes its AI governance grantmaking to date.