Harrison Durland

1885 karmaJoined


Sorted by New


Topic contributions

I definitely think beware is too strong. I would recommend “discount” or “be skeptical” or something similar.

Venus is an extreme example of an Earth-like planet with a very different climate. There is nothing in physics or chemistry that says Earth's temperature could not one day exceed 100 C. 
[Regarding ice melting -- ] That will take time, but very little time on a cosmic scale, maybe a couple of thousand years.

I'll be blunt, remarks like these undermine your credibility. But regardless, I just don't have any experience or contributions to make on climate change, other than re-emphasizing my general impression that, as a person who cares a lot about existential risk and has talked to various other people who also care a lot about existential risk, there seems to be very strong scientific evidence suggesting that extinction is unlikely.

Everything is going more or less as the scientists predicted, if anything, it's worse.

I'm not that focused on climate science, but my understanding is that this is a bit misleading in your context—that there were some scientists in the (90s/2000s?) who forecasted doom or at least major disaster within a few decades due to feedback loops or other dynamics which never materialized. More broadly, my understanding is that forecasting climate has proven very difficult, even if some broad conclusions (e.g., "the climate is changing," "humans contribute to climate change") have held up. Additionally, it seems that many engineers/scientists underestimated the pace of alternative energy technology (e.g., solar).


That aside, I would be excited to see someone work on this project, and I still have not discovered any such database.

I don't find this response to be a compelling defense of what you actually wrote:

since AIs would "get old" too [...] they could also have reason to not expropriate the wealth of vulnerable old agents because they too will be in such a vulnerable position one day

It's one thing if the argument is "there will be effective enforcement mechanisms which prevent theft," but the original statement still just seems to imagine that norms will be a non-trivial reason to avoid theft, which seems quite unlikely for a moderately rational agent.

Ultimately, perhaps much of your scenario was trying to convey a different idea from what I see as the straightforward interpretation, but I think it makes it hard for me to productively engage with it, as it feels like engaging with a motte-and-bailey.

Apologies for being blunt, but the scenario you lay out is full of claims that just seem to completely ignore very facially obvious rebuttals. This would be less bad if you didn’t seem so confident, but as written the perspective strikes me as naive and I would really like an explanation/defense.

Take for example:

Furthermore, since AIs would "get old" too, in the sense of becoming obsolete in the face of new generations of improved AIs, they could also have reason to not expropriate the wealth of vulnerable old agents because they too will be in such a vulnerable position one day, and thus would prefer not to establish a norm of expropriating the type of agent they may one day become.

Setting aside the debatable assumptions about AIs getting “old,” this just seems to completely ignore the literature on collective action problems. If the scenario were such that any one AI agent can expect to get away with defecting (expropriation from older agents) and the norm-breaking requires passing a non-small threshold of such actions, a rational agent will recognize that their defection has minimal impact on what the collective will do, so they may as well do it before others do.

There are multiple other problems in your post, but I don’t think it’s worth the time going through them all. I just felt compelled to comment because I was baffled by the karma on this post, unless it was just people liking it because they agreed with the beginning portion…?

Sure! (I just realized the point about the MNIST dataset problems wasn't fully explained in my shared memo, but I've fixed that now)

Per the assessment section, some of the problems with assuming that FRVT demonstrates NIST's capabilities for evaluation of LLMs/etc. include:

  1. Facial recognition is a relatively "objective" test—i.e., the answers can be linked to some form of "definitive" answer or correctness metric (e.g., name/identity labels). In contrast, many of the potential metrics of interest with language models (e.g., persuasiveness, knowledge about dangerous capabilities) may not have a "definitive" evaluation method, where following X procedure reliably evaluates a response (and does so in a way that onlookers would look silly to dispute).
  2. The government arguably had some comparative advantage in specific types of facial image data, due to collecting millions of these images with labels. The government doesn't have a comparative advantage in, e.g., text data.
  3. The government has not at all kept pace with private/academic benchmarks for most other ML capabilities, such as non-face image recognition (e.g., Common Objects in Context) and LLMs (e.g., SuperGLUE).
  4. It's honestly not even clear to me whether FRVT's technical quality truly is the "gold standard" in comparison to the other public training/test datasets for facial recognition (e.g., MegaFace); it seems plausible that the value of FRVT is largely just that people can't easily cheat on it (unlike datasets where the test set is publicly available) because of how the government administers it.

For the MNIST case, I now have the following in my memo:

Even NIST’s efforts with handwriting recognition were of debatable quality: Yann LeCun's widely-used MNIST is a modification of NIST's datasets, in part because NIST's approach used census bureau employees’ handwriting for the training set and high school students’ handwriting for the test set.[1]


  1. ^

    Some may argue this assumption was justified at the time because it required that models could “generalize” beyond the training set. However, popular usage appears to have favored MNIST’s approach. Additionally, it is externally unclear that one could effectively generalize from the handwriting of a narrow and potentially unrepresentative segment of society—professional bureaucrats—to high schoolers’, and the assumption that this would be necessary (e.g., due to the inability to get more representative data) seems unrealistic.

Seeing the drama with the NIST AI Safety Institute and Paul Christiano's appointment and this article about the difficulty of rigorously/objectively measuring characteristics of generative AI, I figured I'd post my class memo from last October/November.

The main point I make is that NIST may not be well suited to creating measurements for complex, multi-dimensional characteristics of language models—and that some people may be overestimating the capabilities of NIST because they don't recognize how incomparable the Facial Recognition Vendor Test is to this situation of subjective metrics for GenAI and they don't realize NIST arguably even botched MNIST (which was actually produced by Yann LeCun by recompiling NIST's datasets). Moreover, government is slow, while AI is fast. Instead, I argue we should consider an alternative model such as federal funding for private/academic benchmark development (e.g., prize competitions).

I wasn't sure if this warranted a full post, especially since it feels a bit late; LMK if you think otherwise!

I probably should have been more clear, my true "final" paper actually didn't focus on this aspect of the model: the offense-defense balance was the original motivation/purpose of my cyber model, but I eventually became far more interested in using the model to test how large language models could improve agent-based modeling by controlling actors in the simulation. I have a final model writeup which explains some of the modeling choices in more detail and talks about the original offense/defense purpose in more detail.

(I could also provide the model code which is written in Python and, last I checked, runs fine, but I don't expect people would find it to be that valuable unless they really want to dig into this further, especially given that it might have bugs.)

If offence and defence both get faster, but all the relative speeds stay the same, I don’t see how that in itself favours offence

Funny you should say this, it so happens that I just submitted a final paper last night for an agent-based model which was meant to test exactly this kind of claim for the impacts of improving “technology” (AI) in cybersecurity. Granted, the model was extremely simple + incomplete, but the theoretical results explain how this could possible.

In short, when assuming a fixed number of vulnerabilities in an attack surface, while attackers’ and defenders’ budgets are very small there may be many more vulnerabilities that go unnoticed. For example, suppose they together can only explore 10% of the attack surface, but vulnerabilities are only in 1% of the surface. Thus, even if atk/def budgets increase by the same factor (e.g., 10x), it increases the likelihood that vulnerabilities are found either by the attacker or defender.

The following results are admittedly not very reliable (I didn’t do any formal verification/validation beyond spot checks), but the point of showing these graphs is not “here are the definitive numbers” but more an illustrative “here is what the pattern of relationships between attack surface, atk/def budgets, and theft rate could look like”.


Notice how as the attack surface increases the impact of multiplying the attackers and defenders’ budgets causes more convergence. With a hypothetical 1x1 attack surface (grid) for each actor, the budget multiplication should have no effect on loss rates, because all vulnerabilities are found and it’s just a matter of who found them first, which is not affected by budget multiplication. However, with a hypothetical infinite by infinite grid, the multiplication of budgets strictly benefits the attacker, because the defenders’ will ~never check the same squares that the attacker checks.

(ultimately my model makes many unrealistic assumptions and may have had bugs, but this seemed like a decent intuition seed—not a true “conclusion” which can be carelessly applied elsewhere.)

Thank you so much for articulating a bunch of the points I was going to make!

I would probably just further drive home the last paragraph: it’s really obvious that the “number of people a lone maniac can kill in given time” (in America) has skyrocketed with the development of high fire-rate weapons (let alone knowledge of explosives). It could be true that the O/D balance for states doesn’t change (I disagree) while the O/D balance for individuals skyrockets.

Load more