Pseudonymous accounts, commonly found on prediction platforms and forums, provide individuals with the opportunity to share their forecasts and predictions without revealing their true identity. While this anonymity can protect the privacy and security of users, it also creates an environment ripe for manipulation. Pseudonymous accounts can present an inconsistent track record by making predictions that support opposing outcomes on different platforms or even within the same community.

One tactic employed by pseudonymous accounts to deceive followers is uncorrelated betting strategies - to place bets or predictions that cover multiple outcomes in an uncorrelated manner. By doing so, these accounts increase the probability of being correct on at least one prediction. For example, if an account predicts a AI will takeoff fast and slow on different platforms, they are essentially hedging their bets, ensuring that they can claim accuracy regardless of the actual outcome. This strategy allows them to maintain an illusion of expertise while minimizing the risk of being proven wrong. Even financial costs to betting can be compensated for given the grant and employment opportunities offered to successfull forecasters.

Another deceptive practice seen with pseudonymous accounts is selective disclosure. This means that individuals only reveal their identity when their predictions have been accurate or appear to be favorably aligned with the actual outcome. By withholding information about incorrect forecasts, these accounts create an inflated perception of their success rate and erode the reliability of their overall track record. Such selective disclosure can mislead followers into believing that the account possesses a higher level of accuracy than it genuinely does.

Relying on the track records of pseudonymous accounts can have significant consequences. Strategists and funders may make decisions based on inaccurate information, leading to impaired impact. Individuals seeking guidance on effective charities might be misled into making donation that are doomed to fail.

While pseudonymous accounts can provide a platform for diverse opinions and insights, it is crucial to approach any purported track records with skepticism. The ability to bet both ways, over multiple bets in uncorrelated ways, and selectively disclose favorable outcomes can create a distorted perception of accuracy.

-3

0
0

Reactions

0
0
Comments11


Sorted by Click to highlight new comments since:

I agree that the potential for this exists, and if it was an extended practice it would be concerning. Have you seen people who claim to have a good forecasting record engage with pseudonym exploitation though?

My understanding is that most people who claim this have proof records associated to a single pseudonym user in select platforms (eg Metaculus), which evades the problem you suggest.

You couldn't know who is and is not engaging in this behaviour. Anyone with a good forecasting record may have shadow accounts.

I'm not familiar with proof records. Could you elaborate further? If this is verification such as identity documents, this could go some way to preventing manipulation.

If someone is doing the shadow account thing (ie, a boiler room scam, I think), there will be exponentially fewer forecasters for each number of successful bets. I don't think this is the case for the well known ones

I mean rankings like https://www.metaculus.com/rankings/?question_status=resolved&timeframe=3month

I suggest that “why I don’t trust pseudonymous forecasters” would be a more appropriate title. When I saw the title I expected an argument that would apply to all/most forecasting, but this worry is only about a particular subset

The idea is that the potential for pseudonymous forecasting makes all forecaster track records suspect

I think you point to some potential for scepticism, but I don't think this is convincing. Selective disclosure is unlikely to be a problem where a user can only point to summary statistics for their whole activity, like on Metaculus. An exception might be if only a subset of stats were presented, like ranking in past 3/6/12 months without giving Briers or other periods etc. But you could just ask for all the relevant stats. 

The uncorrelated betting isn't a problem if you just require a decent volume of questions in the track record. If you basically want at least 100 binary questions to form a track record, and say 20 of them were hard enough such that the malicious user wanted to hedge on them, you'd need 2^20 accounts to cover all possible answer sets. If they just wanted good performance on half of them, you'd still need 2^10 accounts. 

A more realistic reason for scepticism is that points/ranking on Metaculus is basically a function of activity over time. You can be only a so-so forecaster but have an impressive Metaculus record just by following the crowd on loads of questions or picking probabilities that guarentee points. But Brier scores, especially relative to the community, should reveal this kind of chicanery. 

The biggest reason for scepticism regarding forecasting as it's used in EA is generalisation across domains. How confident should we be that the forecasters/heuristics/approaches that are good for U.S. political outcomes or Elon Musk activity translate successfully to predicting the future of AI or catastrophic pandemics or whatever? Michael Aird's talk mentions some good reasons why some translation is reasonable to expect, but this is an open and ongoing question. 

I don't think the forecaster needs 2^10 accounts if they pick a set of problems with mutually correlated outcomes. For example, you can make two accounts for AI forecasting, and have one bet consistently more AI skeptical than the average and the other more AI doomy than the average. You could do more than 2, too, like very skeptical, skeptical, average, doomy, very doomy. One of them could end up with a good track record in AI forecasting.

If doing well across domains is rewarded much more than similar performance within a domain, it would be harder to get away with this (assuming problems across domains have relatively uncorrelated outcomes, but you could probably find sources of correlation across some domains, like government competence). But then someone could look only for easy questions across domains to build their track record. So, maybe there's a balance to strike. Also, rather than absolute performance across possibly different questions like the Brier score, you should measure performance relative to peers on each question and average that. Maybe something like relative returns on investment in prediction markets, with a large number of bets and across a large number of domains.

Good point on the correlated outcomes. I think you’re right that cross-domain performance could be a good measure, especially since performance in a single domain could be driven by having a single foundational prior that turned out to be right, rather than genuine forecasting skill.

On the second point, I’m pretty sure the Metaculus results already just compare your Brier to the community based on the same set of questions. So you could base inter-forecaster comparisons based on that difference (weakly).

I don't have much sense this happens.

It certainly does happen, the question is to what extent

Curated and popular this week
Ben_West🔸
 ·  · 1m read
 · 
> Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. > > The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts. > > Full paper | Github repo Blogpost; tweet thread. 
 ·  · 2m read
 · 
For immediate release: April 1, 2025 OXFORD, UK — The Centre for Effective Altruism (CEA) announced today that it will no longer identify as an "Effective Altruism" organization.  "After careful consideration, we've determined that the most effective way to have a positive impact is to deny any association with Effective Altruism," said a CEA spokesperson. "Our mission remains unchanged: to use reason and evidence to do the most good. Which coincidentally was the definition of EA." The announcement mirrors a pattern of other organizations that have grown with EA support and frameworks and eventually distanced themselves from EA. CEA's statement clarified that it will continue to use the same methodologies, maintain the same team, and pursue identical goals. "We've found that not being associated with the movement we have spent years building gives us more flexibility to do exactly what we were already doing, just with better PR," the spokesperson explained. "It's like keeping all the benefits of a community while refusing to contribute to its future development or taking responsibility for its challenges. Win-win!" In a related announcement, CEA revealed plans to rename its annual EA Global conference to "Coincidental Gathering of Like-Minded Individuals Who Mysteriously All Know Each Other But Definitely Aren't Part of Any Specific Movement Conference 2025." When asked about concerns that this trend might be pulling up the ladder for future projects that also might benefit from the infrastructure of the effective altruist community, the spokesperson adjusted their "I Heart Consequentialism" tie and replied, "Future projects? I'm sorry, but focusing on long-term movement building would be very EA of us, and as we've clearly established, we're not that anymore." Industry analysts predict that by 2026, the only entities still identifying as "EA" will be three post-rationalist bloggers, a Discord server full of undergraduate philosophy majors, and one person at
Thomas Kwa
 ·  · 2m read
 · 
Epistemic status: highly certain, or something The Spending What We Must 💸11% pledge  In short: Members pledge to spend at least 11% of their income on effectively increasing their own productivity. This pledge is likely higher-impact for most people than the Giving What We Can 🔸10% Pledge, and we also think the name accurately reflects the non-supererogatory moral beliefs of many in the EA community. Example Charlie is a software engineer for the Centre for Effective Future Research. Since Charlie has taken the SWWM 💸11% pledge, rather than splurge on a vacation, they decide to buy an expensive noise-canceling headset before their next EAG, allowing them to get slightly more sleep and have 104 one-on-one meetings instead of just 101. In one of the extra three meetings, they chat with Diana, who is starting an AI-for-worrying-about-AI company, and decide to become a cofounder. The company becomes wildly successful, and Charlie's equity share allows them to further increase their productivity to the point of diminishing marginal returns, then donate $50 billion to SWWM. The 💸💸💸 Badge If you've taken the SWWM 💸11% Pledge, we'd appreciate if you could add three 💸💸💸 "stacks of money with wings" emoji to your social media profiles. We chose three emoji because we think the 💸11% Pledge will be about 3x more effective than the 🔸10% pledge (see FAQ), and EAs should be scope sensitive.  FAQ Is the pledge legally binding? We highly recommend signing the legal contract, as it will allow you to sue yourself in case of delinquency. What do you mean by effectively increasing productivity? Some interventions are especially good at transforming self-donations into productivity, and have a strong evidence base. In particular:  * Offloading non-work duties like dates and calling your mother to personal assistants * Running many emulated copies of oneself (likely available soon) * Amphetamines I'm an AI system. Can I take the 💸11% pledge? We encourage A
Recent opportunities in Forecasting
20
Eva
· · 1m read