Axel Svensson

67Joined Jan 2023


... mail sent through Google is signed. Google can't repudiate these signatures, but you have to trust them not to write new history. Matthew Green calls for the opposite: for Google to publish its old private keys to destroy this information.

Interesting take on the dangers of strong validation. I note that time-stamping the signatures would prevent Google both from writing new history, and from doing what Mr Green wants.

I haven't taken the time to consider whether Mr Green's point is valid, but i instinctively hope it isn't because of what it would mean for the value of aiding truth-seeking.

If I find something in 2033 and want to prove it existed in 2023, I think that's going to be much harder if I have to rely on the thing itself being archived in 2023, in an archive that still exists in 2033; compared to just relying on the thing being timestamped in 2023.

Yeah, I think this is an unfortunate technical necessity, but only in the case where the thing you find in 2033 has been changed (in an irrelevant way). If you find something in 2033 that was actually timestamped in 2023, you don't need access to an archived version, since it's identical to what you already have.

I also think if you're relying on the Internet Archive, the argument that this is urgent becomes weaker ... As long as you set it up before IA goes rogue, the cost of delay is lower.

This is fair criticism. IA does in fact timestamp content and has for a good while, just not trustlessly (at least not intentionally AFAIK). So, to the extent (in jurisdiction and time) that people in general can trust IA, including their intentions, competence, technology and government, perhaps the value really is marginal at the present time.

Perhaps I will slightly decrease my belief about the urgency, although I remain convinced this is worth doing. I see the value as mostly long-term, and IA's claims for what content was created when, is itself a treasure of arguably reliable recordkeeping worth protecting by timestamping.

you could be selective in how you roll it out. Images of ... high profile public figures seem most likely to be manipulated

Thank you, perhaps the first priority should be to quickly operationalize timestamping of newly created content at news organizations. Perhaps even before publication, if they can be convinced to do so.

There's "it's easy for someone to publish a thing and prove it was published before $time". Honestly that's pretty easy already ... marginally more trustless (blockchain) would be marginally good, and I think cheap and easy.

More trustless is the main point. The marginal value could grow over time, or depending on your situation/jurisdiction/trust, be larger already for some people than others. Perhaps there are already certain countries where journalists aren't supposed to trust institutions in other certain countries?

But I think what you're talking about here is doing this for all public content, whether the author knows or not. ... the big problem I see here is that a lot of things get edited after publishing ... everything on imgur can no longer be verified, at least not to before the recompression ... There's a related question of how you distinguish content from metadata

If you change it, then the new thing is different from the old thing, and the new thing did not exist before it was created. If you change compression method, blog theme, spelling or some other irrelevant aspect, you can start from the archived old version, prove that it was created before a certain point in time, then try to convince me that the difference between old and new is irrelevant. If I agree, you will have proven to me that this new thing was effectively created before a certain time. If not, at least you have proven that the old version was.

I do not propose trying to centrally solve the question of what changes are relevant, or distinguishing content from metadata, because that is up to interpretation.

This problem goes away if you're also hosting copies of everything, but that's no longer cheap. At that point I think you're back to "addition to the internet archive" discussed in other comments

Yes, I do admit that an archive is necessary for this to be valuable. I would prefer to execute this by cooperating very, very closely with the internet archive or another big archive for a first project.

you're only really defending against the internet archive going rogue (though this still seems valuale), and there's a lot that they don't capture.

Yeah, more or less. I think more. We are also defending against other kinds of current and future compromise including hacking of a well-meaning organization, government overreach, and people's/organizations'/journalists' unfounded distrust in the internet archive. Organizations change, and the majority of the value proposal is a long-term one.

I think the compute/network for the hash (going through literally all content) seems large, possibly multiple orders of magnitude more than the cost implied here.

Yeah, they say[1] they have over 100PB content. That is quite a bit, and if it's not in an inhouse datacenter, going through it will be expensive.


If 20% of content remain not timestamped, do issues about credibility of content remain?

If 20% of content remain not timestamped, then one wouldn't consider all non-timestamped content suspicious on that account alone. The benefits come around in other ways:

  • If 80% of content is timestamped, then all that content is protected from suspicion that newer AI might have created it.
  • If the internet archive is known to have timestamped all of their content, then non-timestamped content presumably from an old enough version of a web site that is in the archive, becomes suspicious.
  • One might still consider non-timestamped content suspicious in a future where AI and/or institutional decline has begun nagging on the prior (default, average, general) trust for all content.

There's probably content with many tiny variations and it's better to group that content together? ... Finding the algorithm/implementation to do this seems important but also orders of magnitude more costly?

It might be important, but it's probably not as urgent. Timestamping has to happen at the time you want to have the timestamp for. Investigating and convincing people about what different pieces of content are equivalent from some inexact (or exact but higher-level) point of view, can be done later. I imagine that this is one possible future application for which these timestamps will be valuable. Applications such as these, I would probably put out of scope though.

That is fantastic, hopefully this indicates an organization open to ideas, and if they've been doing this for a while it might be worth "rescuing" those timestamps.

Good criticism.

My rough budget guess is probably off, as you say. For some reason I just looked at hardware and took a wide margin. For a grant application this has to be ironed out a lot more seriously.

I admit that popularizing the practice for private archives would take a significant effort far beyond a 5-digit budget. I envisioned doing this in collaboration with the internet archive as a first project to reap the most low-hanging fruits, and then hopefully it'd be less difficult to convince other archives to follow suit.

It's worth noting that implementations, commercial services and public ledgers for time-stamping already exist. I imagine scaling, operationalizing and creating interfaces for consumption would be major parts of the project for the internet archive.

To what extent could this be implemented as an addition to the internet archive?

It might be advantageous to do so for content that is in the internet archive. For content that is not, especially non-public content, it might be more feasible to offer the solution as a public service + open source software.

This is good criticism, and I'm inclined to agree in part. I do not intend to argue that the marginal value is necessarily great, only that the expected marginal value is much greater than the cost. Here are a couple plausible but maybe less than 50% probability scenarios in which the timestamps can have significant impact on society:

  • Both western and eastern governments have implemented a good portion of the parts of Orwell's vision that have so far been feasible, in particular mass espionage and forcing content deletion. Editing history in a convincing way has so far been less feasible, but AI might change that, and it isn't clear why we should believe that no government that has a valuable public archive in their jurisdiction would contemplate doing so.

  • A journalist is investigating a corruption scandal with far-reaching consequences. Archives are important tools, but it just so happens that this one archive until recently had an employee that is connected to a suspect...

  • In order to prove your innocence, or someone else's guilt, you need to verify and be able to prove what was privately emailed from your organization. Emails are not in the internet archive, but luckily your organization uses cryptographic timestamps for all emails and documents.

Thank you! Feels great to get such response at my first post.

Load more