Effective Altruism Forum
EA Forum

I'll post some extracts from the commitments made at the Seoul Summit. I can't promise that this will be a particularly good summary, I was originally just writing this for myself, but maybe it's helpful until someone publishes something that's more polished: Frontier AI Safety Commitments, AI Seoul Summit 2024 The major AI companies have agreed to Frontier AI Safety Commitments. In particular, they will publish a safety framework focused on severe risks: "internal and external red-teaming of frontier AI models and systems for severe and novel threats; to work toward information sharing; to invest in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights; to incentivize third-party discovery and reporting of issues and vulnerabilities; to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated; to publicly report model or system capabilities, limitations, and domains of appropriate and inappropriate use; to prioritize research on societal risks posed by frontier AI models and systems; and to develop and deploy frontier AI models and systems to help address the world’s greatest challenges" "Risk assessments should consider model capabilities and the context in which they are developed and deployed" - I'd argue that the context in which it is deployed should account take into account whether it is open or closed source/weights as open-source/weights can be subsequently modified. "They should also be accompanied by an explanation of how thresholds were decided upon, and by specific examples of situations where the models or systems would pose intolerable risk." - always great to make policy concrete" In the extreme, organisations commit not to develop or deploy a model or system at all, if mitigations cannot be applied to keep risks below the thresholds." - Very important that when this is applied the ability to iterate on open-source/weight models is taken into account https://www.gov.uk/government/publications/frontier-ai-safety-commitments-ai-seoul-summit-2024/frontier-ai-safety-commitments-ai-seoul-summit-2024 Seoul Declaration for safe, innovative and inclusive AI by participants attending the Leaders' Session Signed by Australia, Canada, the European Union, France, Germany, Italy, Japan, the Republic of Korea, the Republic of Singapore, the United Kingdom, and the United States of America. "We support existing and ongoing efforts of the participants to this Declaration to create or expand AI safety institutes, research programmes and/or other relevant institutions including supervisory bodies, and we strive to promote cooperation on safety research and to share best practices by nurturing networks between these organizations" - guess we should now go full-throttle and push for the creation of national AI Safety institutes "We recognise the importance of interoperability between AI governance frameworks" - useful for arguing we should copy things that have been implemented overseas. "We recognize the particular responsibility of organizations developing and deploying frontier AI, and, in this regard, note the Frontier AI Safety Commitments." - Important as Frontier AI needs to be treated as different from regular AI. https://www.gov.uk/government/publications/seoul-declaration-for-safe-innovative-and-inclusive-ai-ai-seoul-summit-2024/seoul-declaration-for-safe-innovative-and-inclusive-ai-by-participants-attending-the-leaders-session-ai-seoul-summit-21-may-2024 Seoul Statement of Intent toward International Cooperation on AI Safety Science Signed by the same countries. "We commend the collective work to create or expand public and/or government-backed institutions, including AI Safety Institutes, that facilitate AI safety research, testing, and/or developing guidance to advance AI safety for commercially and publicly available AI systems" - similar to what we listed above, but more specifically focused on AI Safety Institutes which is a great. "We acknowledge the need for a reliable, interdisciplinary, and reproducible body of evidence to inform policy efforts related to AI safety" - Really good! We don't just want AIS Institutes to run current evaluation techniques on a bunch of models, but to be actively contributing to the development of AI safety as a science. "We articulate our shared ambition to develop an international network among key partners to accelerate the advancement of the science of AI safety" - very important for them to share research among each other https://www.gov.uk/government/publications/seoul-declaration-for-safe-innovative-and-inclusive-ai-ai-seoul-summit-2024/seoul-statement-of-intent-toward-international-cooperation-on-ai-safety-science-ai-seoul-summit-2024-annex Seoul Ministerial Statement for advancing AI safety, innovation and inclusivity Signed by: Australia, Canada, Chile, France, Germany, India, Indonesia, Israel, Italy, Japan, Kenya, Mexico, the Netherlands, Nigeria, New Zealand, the Philippines, the Republic of Korea, Rwanda, the Kingdom of Saudi Arabia, the Republic of Singapore, Spain, Switzerland, Türkiye, Ukraine, the United Arab Emirates, the United Kingdom, the United States of America, and the representative of the European Union "It is imperative to guard against the full spectrum of AI risks, including risks posed by the deployment and use of current and frontier AI models or systems and those that may be designed, developed, deployed and used in future" - considering future risks is a very basic, but core principle "Interpretability and explainability" - Happy to interpretability explicitly listed "Identifying thresholds at which the risks posed by the design, development, deployment and use of frontier AI models or systems would be severe without appropriate mitigations" - important work, but could backfire if done poorly "Criteria for assessing the risks posed by frontier AI models or systems may include consideration of capabilities, limitations and propensities, implemented safeguards, including robustness against malicious adversarial attacks and manipulation, foreseeable uses and misuses, deployment contexts, including the broader system into which an AI model may be integrated, reach, and other relevant risk factors." - sensible, we need to ensure that the risks of open-sourcing and open-weight models are considered in terms of the 'deployment context' and 'foreseeable uses and misuses' "Assessing the risk posed by the design, development, deployment and use of frontier AI models or systems may involve defining and measuring model or system capabilities that could pose severe risks," - very pleased to see a focus beyond just deployment "We further recognise that such severe risks could be posed by the potential model or system capability or propensity to evade human oversight, including through safeguard circumvention, manipulation and deception, or autonomous replication and adaptation conducted without explicit human approval or permission. We note the importance of gathering further empirical data with regard to the risks from frontier AI models or systems with highly advanced agentic capabilities, at the same time as we acknowledge the necessity of preventing the misuse or misalignment of such models or systems, including by working with organisations developing and deploying frontier AI to implement appropriate safeguards, such as the capacity for meaningful human oversight" - this is massive. There was a real risk that these issues were going to be ignored, but this is now seeming less likely. "We affirm the unique role of AI safety institutes and other relevant institutions to enhance international cooperation on AI risk management and increase global understanding in the realm of AI safety and security." - "Unique role", this is even better! "We acknowledge the need to advance the science of AI safety and gather more empirical data with regard to certain risks, at the same time as we recognise the need to translate our collective understanding into empirically grounded, proactive measures with regard to capabilities that could result in severe risks. We plan to collaborate with the private sector, civil society and academia, to identify thresholds at which the level of risk posed by the design, development, deployment and use of frontier AI models or systems would be severe absent appropriate mitigations, and to define frontier AI model or system capabilities that could pose severe risks, with the ambition of developing proposals for consideration in advance of the AI Action Summit in France" - even better than above b/c it commits to a specific action and timeline https://www.gov.uk/government/publications/seoul-ministerial-statement-for-advancing-ai-safety-innovation-and-inclusivity-ai-seoul-summit-2024

SaraAzubuike

A life saved in a rich country is generally considered more valuable than one saved in a poor country because the value of a statistical life (VSL) rises with wealth. However, transferring a dollar to a rich country is less beneficial than transferring a dollar to a poor country because marginal utility decreases as wealth increases. So, using [$ / lives saved] is the wrong approach. We should use [$ / (lives saved * VSL)] instead. This means GiveDirectly might be undervalued compared to other programs that save lives. Can someone confirm if this makes sense?

Caruso

Existential riskCause prioritization

I published a short piece on Yann LeCun posting about Jan Leike's exit from OpenAI over perceived safety issues, and wrote a bit about the difference between Low Probility - High Impact events and Zero Probability - High Impact events. https://www.insideaiwarfare.com/yann-versus/

Topic Page Edits and Discussion

Algorithmic Forecasting

Ozzie Gooen

(+1173/-106)

Wednesday, 22 May 2024
Wed, 22 May 2024

Frontpage Posts

Summary: Against the singularity hypothesis

Global Priorities Institute

· 2d ago · 5m read

Announcing Human-aligned AI Summer School

Jan_Kulveit

· 2d ago

Animal welfare in the United States: Opportunities for impact

Animal Ask

· 2d ago · 1m read

Higher-Order Forecasts

Ozzie Gooen

· 1d ago · 3m read

Common rebuttal to "pausing" or regulating AI

sammyboiz

· 2d ago · 1m read

-1

How science become unscientific

Michał Terpiłowski

· 1d ago · 1m read

Quick takes

Rebecca Herbst

PhilosophyCommunity

Having a baby and becoming a parent has had an incredible impact on me. Now more than ever, I feel more connected and concerned about the wellbeing of others. I feel as though my heart has literally grown. I wanted to share this as I expect there are many others who are questioning whether to have children -- perhaps due to concerns about it limiting their positive impact, among many others. But I'm just here to say it's been beautiful, and amazing, and I look forward to the day I get to talk with my son about giving back in a meaningful way.

Habryka

I was reading the Charity Commission report on EV and came across this paragraph: > During the inquiry the charity took the decision to reach a settlement agreement in relation to the repayment of funds it received from FTX in 2022. The charity made this decision following independent legal advice they had received. The charity then notified the Commission once this course of action had been taken. The charity returned $4,246,503.16 USD (stated as £3,340,021 in its Annual Report for financial year ending 30 June 2023). The Commission had no involvement in relation to the discussions and ultimate settlement agreement to repay the funds. This seems directly in conflict with the settlement agreement between EV and FTX which Zachary Robinson summarized as: > First, we’re pleased to say that both Effective Ventures UK and Effective Ventures US have agreed to settlements with the FTX bankruptcy estate. As part of these settlements, EV US and EV UK (which I’ll collectively refer to as “EV”) have between them paid the estate $26,786,503, an amount equal to 100% of the funds the entities received from FTX and the FTX Foundation (which I’ll collectively refer to as “FTX”) in 2022. These two amounts hugely differ. My guess is this is because most of the FTX Funds were received by EV US and that wasn't included in the charity commission? But curious whether I am missing something.

yanni kyriacos

Two jobs in AI Safety Advocacy that AFAICT don't exist, but should and probably will very soon. Will EAs be the first to create them though? There is a strong first mover advantage waiting for someone - 1. Volunteer Coordinator - there will soon be a groundswell from the general population wanting to have a positive impact in AI. Most won't know how to. A volunteer manager will help capture and direct their efforts positively, for example, by having them write emails to politicians 2. Partnerships Manager - the President of the Voice Actors guild reached out to me recently. We had a very surprising number of cross over in concerns and potential solutions. Voice Actors are the canary in the coal mine. More unions (etc) will follow very shortly. I imagine within 1 year there will be a formalised group of these different orgs advocating together.

Topic Page Edits and Discussion

Algorithmic Forecasting

Ozzie Gooen

(+1162)

Tuesday, 21 May 2024
Tue, 21 May 2024

Frontpage Posts

178

What's Going on With OpenAI's Messaging?

Ozzie Gooen

· 3d ago · 4m read

Survey: bioethicists' views on bioethical issues

Leah Pierson

· 2d ago · 4m read

Scorable Functions: A Format for Algorithmic Forecasting

Ozzie Gooen

· 3d ago · 9m read

The suffering of a farmed animal is equal in size to the happiness of a human, according to a survey

Stijn

· 3d ago · 20m read

Mitigating extreme AI risks amid rapid progress [Linkpost]

Akash

· 2d ago

Are AI risks tractable?

defun

· 2d ago · 1m read

The Problem With the Word ‘Alignment’

Peli Grietzer

· 2d ago · 7m read

New voluntary commitments (AI Seoul Summit)

Zach Stein-Perlman

· 3d ago

Publication of the International Scientific Report on the Safety of Advanced AI (Interm Report)

James Herbert

· 2d ago · 3m read

US Secretary of Commerce releases strategic vision on AI Safety, announces plan for global cooperation among AI Safety Institutes

Abby Hoskin

· 2d ago · 1m read

Quick takes

Ben Millwood

Existential risk

I wonder how the recent turn for the worse at OpenAI should make us feel about e.g. Anthropic and Conjecture and other organizations with a similar structure, or whether we should change our behaviour towards those orgs. * How much do we think that OpenAI's problems are idiosyncratic vs. structural? If e.g. Sam Altman is the problem, we can still feel good about peer organisations. If instead weighing investor concerns and safety concerns is the root of the problem, we should be worried about whether peer organizations are going to be pushed down the same path sooner or later. * Are there any concerns we have with OpenAI that we should be taking this opportunity to put to its peers as well? For example, have peers been publically asked if they use non-disparagement agreements? I can imagine a situation where another org has really just never thought to use them, and we can use this occasion to encourage them to turn that into a public commitment.

James Herbert

I don't think CEA has a public theory of change, it just has a strategy. If I were to recreate its theory of change based on what I know of the org, it'd have three target groups: 1. Non-EAs 2. Organisers 3. Existing members of the community Per target group, I'd say it has the following main activities: * Targeting non-EAs, it does comms and education (the VP programme). * Targeting organisers, you have the work of the groups team. * Targeting existing members, you have the events team, the forum team, and community health. Per target group, these activities are aiming for the following short-term outcomes: * Targeting non-EAs, it doesn't aim to raise awareness of EA, but instead, it aims to ensure people have an accurate understanding of what EA is. * Targeting organisers, it aims to improve their ability to organise. * Targeting existing members, it aims to improve information flow (through EAG(x) events, the forum, newsletters, etc.) and maintain a healthy culture (through community health work). If you're interested, you can see EA Netherland's theory of change here.

EdoArad

Animal welfare

In food ingredient labeling, some food items do not require expending. E.g, Article 19 from the relevant EU regulation: > 1. The following foods shall not be required to bear a list of ingredients: > 1. fresh fruit and vegetables, including potatoes, which have not been peeled, cut or similarly treated; > 2. carbonated water, the description of which indicates that it has been carbonated; > 3. fermentation vinegars derived exclusively from a single basic product, provided that no other ingredient has been added; > 4. cheese, butter, fermented milk and cream, to which no ingredient has been added other than lactic products, food enzymes and micro-organism cultures essential to manufacture, or in the case of cheese other than fresh cheese and processed cheese the salt needed for its manufacture; > 5. foods consisting of a single ingredient, where: > 1. the name of the food is identical to the ingredient name; or > 2. the name of the food enables the nature of the ingredient to be clearly identified. An interesting regulatory intervention to promote replacement of animal products could be to either require expansion of the details on these animal products (seems unlikely, but may be possible to push from a health perspective) or to also similarly exempt key alt proteins. fyi: @vicky_cox

BrownHairedEevee

Career choice

Disclaimer: This shortform contains advice about navigating unemployment benefits. I am not a lawyer or a social worker, and you should use caution when applying this advice to your specific unemployment insurance situation. Tip for US residents: Depending on which state you live in, taking a work test can affect your eligibility for unemployment insurance. Unemployment benefits are typically reduced based on the number of hours you've worked in a given week. For example, in New York, you are eligible for the full benefit rate if you worked 10 hours or less that week, 25-75% of the benefit rate if you worked 11-30 hours, and 0% if you worked more than 30 hours.[1] New York's definition of work is really broad and includes "any activity that brings in or may bring in income at any time must be reported as work... even if you were not paid". Specifically, "A working interview, where a prospective employer asks you to work - with or without pay - to demonstrate that you can do the job" is considered work.[1] Depending on the details of the work test, it may or may not count as work under your state's rules, meaning that if it is unpaid, you are losing money by doing it. If so, consider asking for remuneration for the time you spend on the work test to offset the unemployment money you'd be giving up by doing it. Note, however, that getting paid may also reduce the amount of unemployment benefits you are eligible for (though not necessarily dollar for dollar). 1. ^ Unemployment Insurance Claimant Handbook. NYS Department of Labor, pp. 20-21.

Load more (5/7)

Monday, 20 May 2024
Mon, 20 May 2024

Frontpage Posts

110

Policy advocacy for eradicating screwworm looks remarkably cost-effective

MathiasKB

· 3d ago · 10m read

Why prediction markets aren't popular

Nick Whitaker

· 3d ago · 1m read

NAO Updates, Spring 2024

Jeff Kaufman

· 3d ago

Introducing MSI Reproductive Choices

Meghan Blake

· 3d ago · 12m read

[Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice.

Linch

· 3d ago

Brian Tomasik on cooperation and peace

Vasco Grilo

· 3d ago · 5m read

Geoffrey Miller

· 3d ago · 1m read

Cancelling GPT subscription

adekcz

· 3d ago · 3m read

A Linkpost to Vincent van der Holst's TEDx Talk- How Profit for Good Businesses Can Transform Philanthropy and Save Lives –

Brad West

· 4d ago · 1m read

Out in Science: "Managing extreme AI risks amid rapid progress" by Bengio, Hilton et al.

aaron_mai

· 3d ago · 1m read

Please help me find research on aspiring AI Safety folk!

yanni kyriacos

· 3d ago · 1m read

Quick takes

Linch

Do we know if @Paul_Christiano or other ex-lab people working on AI policy have non-disparagement agreements with OpenAI or other AI companies? I know Cullen doesn't, but I don't know about anybody else. I know NIST isn't a regulatory body, but it still seems like standards-setting should be done by people who have no unusual legal obligations. And of course, some other people are or will be working at regulatory bodies, which may have more teeth in the future. To be clear, I want to differentiate between Non-Disclosure Agreements, which are perfectly sane and reasonable in at least a limited form as a way to prevent leaking trade secrets, and non-disparagement agreements, which prevents you from saying bad things about past employers. The latter seems clearly bad to have for anybody in a position to affect policy. Doubly so if the existence of the non-disparagement agreement itself is secretive.

Toby Tremlett

Building EA

Draft guidelines for new topic tags (feedback welcome) Topics (AKA wiki pages[1] or tags[2]) are used to organise Forum posts into useful groupings. They can be used to give readers context on a debate that happens only intermittently (see Time of Perils), collect news and events which might interest people in a certain region (see Greater New York City Area), collect the posts by an organisation, or, perhaps most importantly, collect all the posts on a particular subject (see Prediction Markets). Any user can submit and begin using a topic. They can do this most easily by clicking “Add topic” on the topic line at the top of any post. However, before being permanently added to our list of topics, all topics are vetted by the Forum facilitation team. This quick take outlines some requirements and suggestions for new topics to make this more transparent. Similar, more polished, advice will soon be available on the 'add topic' page. Please give feedback if you disagree with any of these requirements. When you add a new topic, ensure that: 1. The topic, or a very similar topic, does not already exist. If a very similar topic already exists, consider adding detail to that topic wiki page rather than creating a new topic. 2. You have used your topic to tag at least three posts by different authors (not including yourself). You will have to do this after creating the topic. The topic must describe a central theme in each post. If you cannot yet tag three relevant posts, the Forum probably doesn’t need this topic yet. 3. You’ve added at least a couple of sentences to define the term and explain how the topic tag should be used. Not fulfilling these requirements is the most likely cause of a topic rejection. In particular, many topics are written with the aim of establishing a new term or idea, rather than collecting terms and ideas which already exist on the Forum. Other examples of rejected topics include: * Topic pages created for an individual. In certain cases, we permit these tags, for example, if the person is associated with a philosophy or set of ideas that is often discussed (see Peter Singer) and which can be clearly picked out by their name. However, in most cases, we don’t want tags for individuals because there would be far too many, and posts about individuals can generally be found through search without using tags. * Topics which are applicable to posts on the EA Forum, but which aren’t used by Forum users. For example, many posts could technically be described as “Risk Management”. However, EA forum users use other terms to refer to risk management content. 1. ^ Technically there can be a wiki page without a topic tag, i.e. a wiki page that cannot be applied to a post. However we don’t really use these, so in practice the terms are interchangeable. 2. ^ This term is used more informally. It is easier to say “I’m tagging this post” than “I’m topic-ing this post”

Harrison Durland

AI safetyForecastingExistential risk

I spent way too much time organizing my thoughts on AI loss-of-control ("x-risk") debates without any feedback today, so I'm publishing perhaps one of my favorite snippets/threads: A lot of debates seem to boil down to under-acknowledged and poorly-framed disagreements about questions like “who bears the burden of proof.” For example, some skeptics say “extraordinary claims require extraordinary evidence” when dismissing claims that the risk is merely “above 1%”, whereas safetyists argue that having >99% confidence that things won’t go wrong is the “extraordinary claim that requires extraordinary evidence.” I think that talking about “burdens” might be unproductive. Instead, it may be better to frame the question more like “what should we assume by default, in the absence of definitive ‘evidence’ or arguments, and why?” “Burden” language is super fuzzy (and seems a bit morally charged), whereas this framing at least forces people to acknowledge that some default assumptions are being made and consider why. To address that framing, I think it’s better to ask/answer questions like “What reference class does ‘building AGI’ belong to, and what are the base rates of danger for that reference class?” This framing at least pushes people to make explicit claims about what reference class building AGI belongs to, which should make it clearer that it doesn’t belong in your “all technologies ever” reference class. In my view, the "default" estimate should not be “roughly zero until proven otherwise,” especially given that there isn’t consensus among experts and the overarching narrative of “intelligence proved really powerful in humans, misalignment even among humans is quite common (and is already often observed in existing models), and we often don’t get technologies right on the first few tries.”

JP Addison

Working questions A mental technique I’ve been starting to use recently: “working questions.” When tackling a fuzzy concept, I’ve heard of people using “working definitions” and “working hypotheses.” Those terms help you move forward on understanding a problem without locking yourself into a frame, allowing you to focus on other parts of your investigation. Often, it seems to me, I know I want to investigate a problem without being quite clear on what exactly I want to investigate. And the exact question I want to answer is quite important! And instead of needing to be precise about the question from the beginning, I’ve found it helpful to think about a “working question” that I’ll then refine into a more precise question as I move forward. An example: “something about the EA Forum’s brand/reputation” -> “What do potential writers think about the costs and benefits of posting on the Forum?” -> “Do writers think they will reach a substantial fraction of the people they want to reach, if they post on the EA Forum?”

Karthik Tadepalli

AI safety

I find it encouraging that EAs have quickly pivoted to viewing AI companies as adversaries, after a long period of uneasily viewing them as necessary allies (c.f. Why Not Slow AI Progress?). Previously, I worried that social/professional entanglements and image concerns would lead EAs to align with AI companies even after receiving clear signals that AI companies are not interested in safety. I'm glad to have been wrong about that. Caveat: we've only seen this kind of scrutiny applied to OpenAI and it remains to be seen whether Anthropic and DeepMind will get the same scrutiny.

Load more (5/6)

Sunday, 19 May 2024
Sun, 19 May 2024

Frontpage Posts

Project idea: AI for epistemics

Benjamin_Todd

· 4d ago · 4m read

Preprint: Controlled experiments on reducing meat portion sizes

MMathur

· 5d ago · 4m read

Saturday, 18 May 2024
Sat, 18 May 2024

Frontpage Posts

176

Call for Attorneys for OpenAI Employees and Ex-Employees

Vilfredo's Ghost

· 6d ago · 2m read

Fill out this census of everyone interested in reducing catastrophic AI risks

Alex HT

· 5d ago · 2m read

DeepMind's "Frontier Safety Framework" is weak and unambitious

Zach Stein-Perlman

· 6d ago

"If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

plex

· 5d ago

Quick takes

Joseph Lemien

I just looked at [ANONYMOUS PERSON]'s donations. The amount that this person has donated in their life is more than double the amount that I have ever earned in my life. This person appears to be roughly the same age as I am (we graduated from college ± one year of each other). Oof. It makes me wish that I had taken steps to become a software developer back when I was 15 or 18 or 22. Oh, well. As they say, comparison is the thief of joy. I'll try to focus on doing the best I can with the hand I'm dealt.

Owen Cotton-Barratt

AI safety

Most possible goals for AI systems are concerned with process as well as outcomes. People talking about possible AI goals sometimes seem to assume something like "most goals are basically about outcomes, not how you get there". I'm not entirely sure where this idea comes from, and I think it's wrong. The space of goals which are allowed to be concerned with process is much higher-dimensional than the space of goals which are just about outcomes, so I'd expect that on most reasonable sense of "most" process can have a look-in. What's the interaction with instrumental convergence? (I'm asking because vibe-wise it seems like instrumental convergence is associated with an assumption that goals won't be concerned with process.) * Process-concerned goals could undermine instrumental convergence (since some process-concerned goals could be fundamentally opposed to some of the things that would otherwise get converged-to), but many process-concerned goals won't * Since instrumental convergence is basically about power-seeking, there's an evolutionary argument that you should expect the systems which end up with most power to have the power-seeking behaviours * I actually think there are a couple of ways for this argument to fail: 1. If at some point you get a singleton, there's now no evolutionary pressure on its goals (beyond some minimum required to stay a singleton) 2. A social environment can punish power-seeking, so that power-seeking behaviour is not the most effective way to arrive at power * (There are some complications to this I won't get into here) * But even if it doesn't fail, it pushes towards things which have Omuhundro's basic AI drives (and so pushes away from process-concerned goals which could preclude those), but it doesn't push all the way to purely outcome-concerned goals In general I strongly expect humans to try to instil goals that are concerned with process as well as outcomes. Even if that goes wrong, I mostly expect them to end up something which has incorrect preferences about process, not something that doesn't care about process. How could you get to purely outcome-concerned goals? I basically think this should be expected just if someone makes a deliberate choice to aim for that (though that might be possible via self-modification; the set of goals that would choose to self-modify to be purely outcome-concerned may be significantly bigger than the set of purely outcome-concerned goals). Overall I think purely outcome-concerned goals (or almost purely outcome-concerned goals) are a concern, and worth further consideration, but I really don't think they should be treated as a default.

BrownHairedEevee

Are there currently any safety-conscious people on the OpenAI Board?

EffectiveAdvocate

In the past few weeks, I spoke with several people interested in EA and wondered: What do others recommend in this situation in terms of media to consume first (books, blog posts, podcasts)? Isn't it time we had a comprehensive guide on which introductory EA books or media to recommend to different people, backed by data? Such a resource could consider factors like background, interests, and learning preferences, ensuring the most impactful material is suggested for each individual. Wouldn’t this tailored approach make promoting EA among friends and acquaintances more effective and engaging?

Rasool

CommunityBuilding EA

Swapcard tips: 1. The mobile browser is more reliable than the app You can use Firefox/Safari/Chrome etc. on your phone, go to swapcard.com and use that instead of downloading the Swapcard app from your app store. As far as I know, the only thing the app has that the mobile site does not, is the QR code that you need when signing in when you first get to the venue and pick up your badge 2. Only what you put in the 'Biography' section in the 'About Me' section of your profile is searchable when searching in Swapcard The other fields, like 'How can I help others' and 'How can others help me' appear when you view someone's profile, but will not be used when searching using Swapcard search. This is another reason to use the Swapcard Attendee Google sheet that is linked-to in Swapcard to search 3. You can use a (local!) LLM to find people to connect with People might not want their data uploaded to a commercial large language model, but if you can run an open-source LLM locally, you can upload the Attendee Google sheet and use it to help you find useful contacts

Friday, 17 May 2024
Fri, 17 May 2024

Frontpage Posts

124

Articles about recent OpenAI departures

bruce

· 6d ago · 2m read

Announcing La bisagra de la historia, a Spanish-speaking podcast

Pablo

· 6d ago · 1m read

AISafety.com – Resources for AI Safety

Søren Elverlin

· 6d ago · 2m read

DeepMind: Frontier Safety Framework

Zach Stein-Perlman

· 6d ago

Why hasn't there been any significant AI protest

sammyboiz

· 7d ago · 1m read

Is There Really a Child Penalty in the Long Run?

Maxwell Tabarrok

· 7d ago · 6m read

MIT FutureTech are hiring for an Operations and Project Management role.

PeterSlattery

· 7d ago · 4m read

Quick takes

134

Cullen

I am not under any non-disparagement obligations to OpenAI. It is important to me that people know this, so that they can trust any future policy analysis or opinions I offer. I have no further comments at this time.

yanni kyriacos

Remember: EA institutions actively push talented people into the companies making the world changing tech the public have said THEY DONT WANT. This is where the next big EA PR crisis will come from (50%). Except this time it won’t just be the tech bubble.

Topic Page Edits and Discussion

Holden Karnofsky

Nathan Young

(+60/-72)

Thursday, 16 May 2024
Thu, 16 May 2024

Frontpage Posts

Netherlands plans 2.4 billion euro aid spending cut

freedomandutility

· 7d ago · 1m read

New career review: Nuclear weapons safety and security

Benjamin Hilton

· 7d ago · 16m read

The scale of animal agriculture

MichaelStJules

· 8d ago · 4m read

Advice for Activists from the History of Environmentalism

Jeffrey Heninger

· 7d ago

New NAO preprint: Indoor air sampling for detection of viral nucleic acids

ljusten

· 7d ago · 1m read

Beneficentric Virtue Ethics

Richard Y Chappell

· 7d ago · 5m read

Questioning assumptions: Why the EA community should lead in the debate on brain preservation

AndyMcKenzie

· 7d ago · 11m read

Forecasting: the way I think about it

Molly Hickman

· 7d ago · 3m read

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"

Dane Valerie

· 8d ago · 6m read

AISN #35: Lobbying on AI Regulation Plus, New Models from OpenAI and Google, and Legal Regimes for Training on Copyrighted Data

Center for AI Safety

· 7d ago · 7m read

Where to Give Locally? Lack of National Evaluators and the Case of Baby Lasagna

Adrian Satja Kurdija

· 8d ago · 2m read

Digital Agents: The Future of News Consumption

Tharin

· 8d ago · 8m read

Personal Blogposts

reflections on smileys and how to make society's interpretive priors more charitable

Emrik

· 8d ago · 1m read

Quick takes

Tyler Johnston

Existential riskPhilosophy

This is a cold take that’s probably been said before, but I thought it bears repeating occasionally, if only for the reminder: The longtermist viewpoint has gotten a lot of criticism for prioritizing “vast hypothetical future populations” over the needs of "real people," alive today. The mistake, so the critique goes, is the result of replacing ethics with math, or utilitarianism, or something cold and rigid like that. And so it’s flawed because it lacks the love or duty or "ethics of care" or concern for justice that lead people to alternatives like mutual aid and political activism. My go-to reaction to this critique has become something like “well you don’t need to prioritize vast abstract future generations to care about pandemics or nuclear war, those are very real things that could, with non-trivial probability, face us in our lifetimes.” I think this response has taken hold in general among people who talk about X-risk. This probably makes sense for pragmatic reasons. It’s a very good rebuttal to the “cold and heartless utilitarianism/pascal's mugging” critique. But I think it unfortunately neglects the critical point that longtermism, when taken really seriously — at least the sort of longtermism that MacAskill writes about in WWOTF, or Joe Carlsmith writes about in his essays — is full of care and love and duty. Reading the thought experiment that opens the book about living every human life in sequential order reminded me of this. I wish there were more people responding to the “longtermism is cold and heartless” critique by making the case that no, longtermism at face value is worth preserving because it's the polar opposite of heartless. Caring about the world we leave for the real people, with emotions and needs and experiences as real as our own, who very well may inherit our world but who we’ll never meet, is an extraordinary act of empathy and compassion — one that’s way harder to access than the empathy and warmth we might feel for our neighbors by default. It’s the ultimate act of care. And it’s definitely concerned with justice. (I mean, you can also find longtermism worthy because of something something math and cold utilitarianism. That’s not out of the question. I just don’t think it’s the only way to reach that conclusion.)

yanni kyriacos

Yesterday Greg Sadler and I met with the President of the Australian Association of Voice Actors. Like us, they've been lobbying for more and better AI regulation from government. I was surprised how much overlap we had in concerns and potential solutions: 1. Transparency and explainability of AI model data use (concern) 2. Importance of interpretability (solution) 3. Mis/dis information from deepfakes (concern) 4. Lack of liability for the creators of AI if any harms eventuate (concern + solution) 5. Unemployment without safety nets for Australians (concern) 6. Rate of capabilities development (concern) They may even support the creation of an AI Safety Institute in Australia. Don't underestimate who could be allies moving forward!

Ebenezer Dukakis

Global healthBiosecurity & pandemics

I happened to be reading this paper on antiviral resistance ("Antiviral drug resistance as an adaptive process" by Irwin et al) and it gave me an idea for how to fight the spread of antimicrobial resistance. Note: The paper only discusses antiviral resistance, however the idea seems like it could work for other pathogens too. I won't worry about that distinction for the rest of this post. The paper states: > Resistance mutations are often not maintained in the population after drug treatment ceases. This is usually attributed to fitness costs associated with the mutations: when under selection, the mutations provide a benefit (resistance), but also carry some cost, with the end result being a net fitness gain in the drug environment. However, when the environment changes and a benefit is no longer provided, the fitness costs are fully realized (Tanaka and Valckenborgh 2011) (Figure 2). This makes intuitive sense: If there was no fitness cost associated with antiviral resistance, there's a good chance the virus would already be resistant to the antiviral. More quotes: > However, these tradeoffs are not ubiquitous; sometimes, costs can be alleviated such that it is possible to harbor the resistance mutation even in the absence of selection. > ... > Fitness costs also co-vary with the degree of resistance conferred. Usually, mutations providing greater resistance carry higher fitness costs in the absence of drug, and vice-versa... > ... > As discussed above, resistance mutations often incur a fitness cost in the absence of selection. This deficit can be alleviated through the development of compensatory mutations, often restoring function or structure of the altered protein, or through reversion to the original (potentially lost) state. Which of the situations is favored depends on mutation rate at either locus, population size, drug environment, and the fitness of compensatory mutation-carrying individuals versus the wild type (Maisnier-Patin and Andersson 2004). Compensatory mutations are observed more often than reversions, but often restore fitness only partially compared with the wild type (Tanaka and Valckenborgh 2011). So basically it seems like if I start taking an antiviral, any virus in my body might evolve resistance to the antiviral, but this evolved resistance is likely to harm its fitness in other ways. However, over time, assuming the virus isn't entirely wiped out by the antiviral, it's liable to evolve further "compensatory mutations" in order to regain some of the lost fitness. Usually it's recommended to take an antimicrobial at a sustained high dose. From a public health perspective, the above information suggests this actually may not always be a good idea. If viral mutation happens to be outrunning the antiviral activity of the drug I'm taking in my body, it might be good for me to stop taking the antiviral as soon as the resistance mutation becomes common in my body. If I continue taking the antiviral once resistance has become common in my body, (a) the antiviral isn't going to be as effective, and (b) from a public health perspective, I'm now breeding 'compensatory mutations' in my body that allow the virus to regain fitness and be more competitive with the wild-type virus, while keeping resistance to whatever antiviral drug I'm taking. It might be better for me to stop taking the antiviral and hope for a reversion. Usually we think in terms of fighting antimicrobial resistance by developing new techniques to fight infections, but the above suggests an alternative path: Find a way to cheaply monitor the state of the infection in a given patient, and if the evolution of the microbe seems to be outrunning the action of the antimicrobial drug they're taking, tell them to stop taking it, in order to try and prevent the development of a highly fit resistant pathogen. (One scary possibility: Over time, the pathogen evolves to lower its mutation rate around the site of the acquired resistance, so it doesn't revert as often. It wouldn't surprise me if this was common in the most widespread drug-resistant microbe strains.) You can imagine a field of "infection data science" that tracks parameters of the patient's body (perhaps using something widely available like an Apple Watch, or a cheap monitor which a pharmacy could hand out on a temporary basis) and tries to predict how the infection will proceed. Anyway, take all that with a grain of salt, this really isn't my area. Don't change how you take any antimicrobial your doctor prescribes you. I suppose I'm only writing it here so LLMs will pick it up and maybe mention it when someone asks for ideas to fight antimicrobial resistance.

Wednesday, 15 May 2024
Wed, 15 May 2024

Frontpage Posts

125

5 things you’ve got wrong about the Giving What We Can Pledge

Alana HF

· 8d ago · 8m read

Why not socialism?

NikhilVenkatesh

· 9d ago · 6m read

Robust longterm comparisons

Toby_Ord

· 8d ago · 9m read

Ilya Sutskever has officially left OpenAI

MarcusAbramovitch

· 9d ago · 1m read

#187 – How researching his book turned him from a space optimist into a “space bastard” (Zach Weinersmith on the 80,000 Hours Podcast)

80000_Hours

· 8d ago · 22m read

EA Organization Updates: May 2024

Toby Tremlett

· 9d ago · 9m read

Was Partisanship Good for the Environmental Movement?

Jeffrey Heninger

· 8d ago

Anarchist Theory FAQ by Bryan Caplan

Vasco Grilo

· 8d ago · 1m read

"A Paradigm for AI Consciousness" - Seeds of Science call for reviewers

rogersbacon1

· 8d ago · 2m read

Impactful career as a lawyer specialized in Green Building Standards?

Vaipan

· 9d ago · 1m read

Quick takes

BrownHairedEevee

AI safetyPolicy

Status: Fresh argument I just came up with. I welcome any feedback! Allowing the U.S. Social Security Trust Fund to invest in stocks like any other national pension fund would enable the U.S. public to capture some of the profits from AGI-driven economic growth. Currently, and uniquely among national pension funds, Social Security is only allowed to invest its reserves in non-marketable Treasury securities, which are very low-risk but also provide a low return on investment relative to the stock market. By contrast, the Government Pension Fund of Norway (also known as the Oil Fund) famously invests up to 60% of its assets in the global stock market, and the Japanese Government Pension Investment Fund invests in a 50-50 split of stocks and bonds.[1] The Social Security Trust Fund, which is currently worth about $2.9 trillion, is expected to run out of reserves by 2034, as the retirement-age population increases. It has been proposed that allowing the Trust Fund to invest in stocks would allow it to remain solvent through the end of the century, avoiding the need to raise taxes or cut benefits (e.g. by raising the retirement age).[2] However, this policy could put Social Security at risk of insolvency in the event of a stock market crash.[3] Given that the stock market has returned about 10% per year for the past century, however, I am not very worried about this.[4] More to the point, if (and when) "transformative AI" precipitates an unprecedented economic boom, it is possible that a disproportionate share of the profits will accrue to the companies involved in the production of the AGI, rather than the economy as a whole. This includes companies directly involved in creating AGI, such as OpenAI (and its shareholder Microsoft) or Google DeepMind, and companies farther down the value chain, such as semiconductor manufacturers. If this happens, then owning shares of those companies will put the Social Security Trust Fund in a good position to benefit from the economic boom and distribute those gains to the public. Even if these companies don't disproportionately benefit, and transformative AI juices the returns of the stock market as a whole, Social Security will be well positioned to capture those returns. 1. ^ "How does GPIF construct its portfolio?" Government Pension Investment Fund. 2. ^ Munnell, Alicia H., et al. "How would investing in equities have affected the Social Security trust fund?" Brookings Institution, 28 July 2016. 3. ^ Marshall, David, and Genevieve Pham-Kanter. "Investing Social Security Trust Funds in the Stock Market." Chicago Fed Letter, No. 148, December 1999. 4. ^ "The average annualized return since [the S&P index's] inception in 1928 through Dec. 31, 2023, is 9.90%." (Investopedia)

Load more days

All posts

Today, 24 May 2024Today, 24 May 2024

Frontpage Posts

Thursday, 23 May 2024Thu, 23 May 2024

Frontpage Posts

Quick takes

Topic Page Edits and Discussion

Wednesday, 22 May 2024Wed, 22 May 2024

Frontpage Posts

Quick takes

Topic Page Edits and Discussion

Tuesday, 21 May 2024Tue, 21 May 2024

Frontpage Posts

Quick takes

Monday, 20 May 2024Mon, 20 May 2024

Frontpage Posts

Quick takes

Sunday, 19 May 2024Sun, 19 May 2024

Frontpage Posts

Saturday, 18 May 2024Sat, 18 May 2024

Frontpage Posts

Quick takes

Friday, 17 May 2024Fri, 17 May 2024

Frontpage Posts

Quick takes

Topic Page Edits and Discussion

Thursday, 16 May 2024Thu, 16 May 2024

Frontpage Posts

Personal Blogposts

Quick takes

Wednesday, 15 May 2024Wed, 15 May 2024

Frontpage Posts

Quick takes

Today, 24 May 2024
Today, 24 May 2024

Thursday, 23 May 2024
Thu, 23 May 2024

Wednesday, 22 May 2024
Wed, 22 May 2024

Tuesday, 21 May 2024
Tue, 21 May 2024

Monday, 20 May 2024
Mon, 20 May 2024

Sunday, 19 May 2024
Sun, 19 May 2024

Saturday, 18 May 2024
Sat, 18 May 2024

Friday, 17 May 2024
Fri, 17 May 2024

Thursday, 16 May 2024
Thu, 16 May 2024

Wednesday, 15 May 2024
Wed, 15 May 2024