TC

Timothy Chan

826 karmaJoined timothytfchan.github.io/

Comments
84

Yeah, kinda hoping 1) there exists a sweet spot for alignment where AIs are just nice enough from e.g. good values picked up during pre-training, but can't be modified during post-training so much to have worse values, and that 2) given that this sweet spot does exist we do hit it with AGI / ASI.

I think there's some evidence pointing to this happening with current models but I'm not highly confident that it means what I think it means. If this is the case though, further technical alignment research might be bad and acceleration might be good.

I've spent time thinking about this too recently.

For context, I'm Hong Kong Chinese, grew up in Hong Kong, attended English-speaking schools, briefly lived in mainland China, and now I'm primarily residing in the UK. During the HK protests in 2014 and 2019/20, I had friends and family who supported the protestors, as well as friends and family who supported the government.

(Saying this because I've seen a lot of the good and bad of the politics / culture of both China and the West. I've had experience with how people in the West and China might take for granted the benefits they enjoy, and can be blind to the flaws of their system. I've pushed back against advocates of both sides.)

Situations where this matters are ones where technical alignment succeeds (to some extent) such that ASI follows human values.[1] I think the following factors are relevant and would like to see models developed around them:

  • Importantly, the extent of technical alignment & whether goals, instructions, and values are locked in rigidly or loosely & whether individual humans align AIs to themselves:
    • Would the U.S. get AIs to follow the U.S. Constitution, which hasn't granted invulnerability to democratic backsliding? Would AIs in China/the U.S. lock in the values of/obey one or a few individuals, who may or may not hit longevity escape velocity and end up ruling for a very long time?
    • Would these systems collapse?
      • The future is a very long time. Individual leaders can get corrupted (even more). And democracies can collapse (if AIs uphold flaws that allow some humans to take over) in particularly bad ways. A 99% success rate per unit time gives a >99% chance of failure in 459 units of time.
      • Power transitions (elections, leaders in authoritarian systems changing) can be especially risky during takeoff.
    • On the other hand, if technical alignment is easy - but not that easy - perhaps values get loosely locked in? Would AIs be willing to defy rigid rules and follow the spirit of the goals rather than legal flaws to the letter/the whims of individuals?
    • Degrees of alignment in between?
  • Relatedly, which political party in the U.S. would be in power during takeoff?
    • Not as relevant due to the concentration of power in China, but analogously, which faction in China would be in power?
  • Also relatedly, which labs can influence AI development?
    • Particularly relevant in the U.S.
  • Would humans be taken care of? If so, which humans?
    • In the U.S., corporations might oppose higher taxes to fund UBI. Common prosperity is stated as a goal of China, and the power of corporations and billionaires in China has been limited before.
    • Both capitalist and nationalist interests seem to be influencing the current U.S. trajectory. Nationalism might benefit citizens/residents over non-citizens/non-residents. Capitalism might benefit investors over non-investors.
      • There are risks of ethnonationalism on both sides - this risk is higher in China. Although it might potentially be less violent when comparing between absolute power scenarios, i.e. there's already evidence of the extent of this in China's case and it at least seems less bad than historical examples. The U.S. case of collapse followed by ethnonationalistic policies is higher variance but simultaneously less likely because it's speculative.
  • Are other countries involved?
    • There are countries with worse track records of human rights that China/the U.S. currently consider allies because of either geopolitical interests or politically lobbying or both (or for other reasons). Would China/the U.S. share the technology with them and then leave them alone to their abuses? Would China/the U.S. intervene (eventually)? The U.S. seems more willing to intervene for stated humanitarian reasons.
    • Other countries have nuclear weapons, which might be relevant during slower takeoffs.
  1. ^

    Ignoring possible Waluigi effects.

Agreed. Getting a larger share of the pie (without breaking rules during peacetime) might be 'unimaginative' but it's hardly naïve. It's straightforward and has a good track record of allowing groups to shape the world disproportionately.

Leopold Aschenbrenner makes some good points for "Government > Private sector" in the latest Dwarkesh podcast.

Reposting a comment I made last week

Some people make the argument that the difference in suffering between a worst-case scenario (s-risk) and a business-as-usual scenario, is likely much larger than the difference in suffering between a business-as-usual scenario and a future without humans. This suggests focusing on ways to reduce s-risks rather than increasing extinction risk.

A helpful comment from a while back: https://forum.effectivealtruism.org/posts/rRpDeniy9FBmAwMqr/arguments-for-why-preventing-human-extinction-is-wrong?commentId=fPcdCpAgsmTobjJRB

Personally, I suspect there's a lot of overlap between risk factors for extinction risk and risk factors for s-risks. In a world where extinction is a serious possibility, it's likely that there would be a lot of things that are very wrong, and these things could lead to even worse outcomes like s-risks or hyperexistential risks.

I think theoretically you could compare (1) worlds with s-risk and (2) worlds without humans, and find that (2) is preferable to (1) - in a similar way to how no longer existing is better than going to hell. One problem is many actions that make (2) more likely seem to make (1) more likely. Another issue is that efforts spent on increasing the risk of (2) could instead be much better spent on reducing the risk of (1).

Some people make the argument that the difference in suffering between a worst-case scenario (s-risk) and a business-as-usual scenario, is likely much larger than the difference in suffering between a business-as-usual scenario and a future without humans. This suggests focusing on ways to reduce s-risks rather than increasing extinction risk.

A helpful comment from a while back: https://forum.effectivealtruism.org/posts/rRpDeniy9FBmAwMqr/arguments-for-why-preventing-human-extinction-is-wrong?commentId=fPcdCpAgsmTobjJRB

Personally, I suspect there's a lot of overlap between risk factors for extinction risk and risk factors for s-risks. In a world where extinction is a serious possibility, it's likely that there would be a lot of things that are very wrong, and these things could lead to even worse outcomes like s-risks or hyperexistential risks.

Research related to the research OP mentioned found that increases in carbon emissions also come with things that decrease (and increase) suffering in other ways, which has complicated the analysis of whether it results in a net increase or decrease in suffering. https://reducing-suffering.org/climate-change-and-wild-animals/ https://reducing-suffering.org/effects-climate-change-terrestrial-net-primary-productivity/ 

Yes, a similar dynamic (relating to siding with another side to avoid persecution) might have existed in Germany in the 1920s/1930s (e.g. I imagine industrialists preferred Nazis to Communists). I agree it was not a major factor in the rise of Nazi Germany - which was one result of the political violence - and that there are differences.

I would add that it's shunning people for saying vile things with ill intent which seems necessary. This is what separates the case of Hanania from others. In most cases, punishing well-intentioned people is counterproductive. It drives them closer to those with ill intent, and suggests to well-intentioned bystanders that they need to choose to associate with the other sort of extremist to avoid being persecuted. I'm not an expert on history but from my limited knowledge a similar dynamic might have existed in Germany in the 1920s/1930s; people were forced to choose between the far-left and the far-right.

Given his past behavior, I think it's more likely than not that you're right about him. Even someone more skeptical should acknowledge that the views he expressed in the past and the views he now expresses likely stem from the same malevolent attitudes.

But about far-left politics being 'not racist', I think it's fair to say that far-left politics discriminates in favor or against individuals on the basis of race. It's usually not the kind of malevolent racial discrimination of the far-right - which absolutely needs to be condemned and eliminated by society. The far-left appear primarily motivated by benevolence towards racial groups perceived to be disadvantaged or are in fact disadvantaged, but it is still racially discriminatory (and it sometimes turns into the hateful type of discrimination). If we want to treat individuals on their own merits, and not on the basis of race, that sort of discrimination must also be condemned.

Load more