Compute Research Questions and Metrics - Transformative AI and Compute [4/4]

lennart

Compute Research Questions and Metrics - Transformative AI and Compute [4/4]

lennart

1 min read · Nov 28, 2021

Comments 2

Sorted by

New & upvoted

MaxRa

Thanks for your work here, it's a useful overview for the compute metrics project I'm working on with Peter. Minor errors:

Also commonly used is Petaflop/s-day. It's also a quantity of operations. A petaflop/s is floating point operations per second for one day. A day has $84, 400 s e c o n d s \approx 10^{15}$ . That makes $10^{20}$ FLOPs.

A petaflop/s-day is $10^{15}$
A day has 10^5 seconds

MaxRa

For an NVIDIA A100, the on-board memory bandwidth is around 2GB/s

I think this should be 2TB/s?

And ping!

We are working on a piece with more insights on the utilizations and some advice on how to estimate training compute and the connected utilization of the system (link to be added by the end of 2021; ping me if not).

Comments

More from the author

Transformative AI and Compute [Summary]

lennart·4y ago·11m read

What is Compute? - Transformative AI and Compute [1/4]

lennart·4y ago·21m read

Forecasting Compute - Transformative AI and Compute [2/4]

lennart·4y ago·23m read

Curated and popular this week

Cultivating hope: calibrating the expectations for cultivated meat to end factory farming

PabloAMC 🔸·6d ago·Curated 1d ago·22m read

Was Partisanship Good for the Environmental Movement?

Jeffrey Heninger·2y ago·Curated 6d ago·6m read

This is the third in a sequence of posts taken from my recent report: Why Did Environmentalism Become Partisan? Summary Rising partisanship did not make environmentalism more popular or politically effective. Instead, it saw flat or falling overall public opinion, fewer major legislative achievements, and fluctuating executive actions. Public Opinion...

GWWC's 2025 impact evaluation (executive summary)

Aidan Whitfield🔸, Giving What We Can🔸·2d ago·2m read

This post presents the executive summary from Giving What We Can’s impact evaluation for 2025. At the end of this post we share links to more information, including the full report and...

Recent opportunities to take action

RP is looking for project founders in neglected animal areas

Rethink Priorities·18h ago·7m read

Time Sensitive Do Gooding Opportunities

Bentham's Bulldog·19h ago·5m read

146

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·1w ago·4m read

An overhang refers to a situation where large amounts of resources that are already available cannot be used yet. If it is resolved, large amounts are unlocked immediately (Dafoe 2018). ↩︎
As already discussed in Section 4.2, even when hardware does not significantly increase in computing performance, the price can still decrease significantly due to longer R&D cycles and an economy of scale. ↩︎
See this Wikipedia List or this post by AI Impacts for historic trends of FLOPs/s per $. ↩︎
For example in our discussed timeline forecast, this Wikipedia List, or this post by AI Impacts. ↩︎
A quick back-off-the-envelope calculations: A NVIDIA A100 consumes around 6.5kW at peak usage. Assuming 0.12$ per kWh, it costs around 6,800$ per year for running this hardware. This is rather neglectable to the purchase price, given that an NVIDIA A100 costs around $200,000 to $300,000. ↩︎
For an NVIDIA A100, the on-board memory bandwidth is around 2GB/s, whereas interconnect with additional A100’s using NVIDIA'S specialized NVLINK, one achieves up too 600 GB/s. And only 64GB/s using the standard PCIe Gen4 interface (see this datasheet). ↩︎
NVIDIA V100 datasheet ↩︎
NVIDIA A100 datasheet ↩︎
Class on Advanced Compiler Optimization ↩︎
Integer representation (instead of floating point), saves energy and requires less space on the chip die. See Computing’s Energy Problem - Slides and Computing's energy problem (and what we can do about it) ↩︎

Compute Research Questions and Metrics - Transformative AI and Compute [4/4]

Compute Research Questions and Metrics - Transformative AI and Compute [4/4]

Previous Post: Compute Governance and Conclusions

Appendix

A. Research Questions

Compute Governance

Scaling Hypothesis

Compute Price

Compute Hardware

Compute Forecast

Semiconductor Industry

Algorithmic Efficiency

AI Safety

B. Metrics

B.1 Common used metrics for measuring hardware performance

Logic

FLOP (with its plural FLOPs)

FLOPS or FLOPs/s

FLOP/s per $

FLOPs/s per Watt

Memory

Bytes

Interconnect

Bytes/s

Traversed edges per second (TEPS)

B.2 Some caveats of common used metrics

Utilization: what happens in reality

Communication

Parallelism

Software

The number representation

Operations costs (energy, infrastructure, etc.)

What is an operation?

B.3 Concluding thoughts

C. AI Hardware Startups

Acknowledgments

References