(Re)Discovering Natural Laws

Margot Stakenborg

This is a linkpost for https://www.lesswrong.com/posts/NT8ev4kBXfW4ATC7m/re-discovering-natural-laws

Talk announcement: I will be presenting the arguments from the first post, An Ontology of Representations, and on this post on Tuesday 17 February at 18:00 GMT / 10:00 PT, as part of the closing of the Dovetail Fellowship. If you'd like to discuss these ideas live, you're welcome to join via this Zoom link. The session will be a 40-minute presentation followed by 20 minutes of Q&A.

Tl;dr: This is the second post of two. The first post, An Ontology of Representations, argued that the convergence observed in neural network representations reflects shared training distributions and inductive biases rather than the discovery of objective, mind-independent structure. This post surveys the rapidly growing literature on using machine learning to (re)discover physical laws. I focus solely on physics in this post. The picture that emerges supports the conclusion of the first post: successful law discovery depends on encoding the right prior physical knowledge, and prediction alone does not imply understanding.

I. Introduction

The central question motivating this post is straightforward: can neural networks discover the laws of physics?

This question matters for the arguments I advanced previously. If generic models trained on raw observational data, could spontaneously recover Newton's laws or conservation principles, that would constitute strong evidence for something like the Platonic Representation Hypothesis. It would suggest that sufficiently powerful learners do converge on the generative structure of reality, that prediction and understanding are two sides of the same coin. But if, as I argued, the apparent convergence of AI representations is better explained by shared data distributions and inductive biases, then we should expect a more complicated picture: one where the path from prediction to genuine physical understanding requires deliberate architectural choices, data curation, and explicit encoding of prior knowledge.

The literature of the past few years bears out the more complicated picture. It shows a systematic divide between prediction and understanding: between models that can fit observed data with high fidelity and models that recover the generative causal law responsible for producing that data. This distinction is closely related to the one François Chollet drew in On the Measure of Intelligence, where he argued that task performance (skill) should not be conflated with the ability to handle novel situations (generalisation). A model that has memorised or interpolated a dataset may score highly on benchmarks drawn from the same distribution, but this tells us nothing about whether it has grasped the underlying rule. The parallel to physical law discovery is direct: a system that predicts planetary positions with high accuracy has demonstrated skill, but only a system that recovers has demonstrated the kind of generalisation that transfers to genuinely new physical scenarios. As we will see, bridging this gap has turned out to require precisely a kind of domain-specific engineering.

This post focuses on physics, and does so deliberately. Physics provides a uniquely controlled setting for evaluating law discovery: the target equations are known, the symmetries are well characterised, and we can test whether a model has recovered the generative law.

II. Foundational work: from Eureqa to SINDy

The modern programme of automated law discovery begins, properly speaking, with Schmidt and Lipson (2009), "Distilling Free-Form Natural Laws from Experimental Data." Using a genetic programming algorithm (later commercialised as the software Eureqa), they searched a space of symbolic mathematical expressions to find equations that fit motion-tracking data from physical systems, ranging from simple harmonic oscillators to chaotic double pendulums. The system recovered Hamiltonians, Lagrangians, and conservation laws from raw experimental measurements. The paper's central claim was striking: it could discover natural laws "without any prior knowledge of physics."

However, a subsequent analysis by Hillar and Sommer (2012) showed that this claim required qualification. They demonstrated that Schmidt and Lipson's method implicitly incorporated Hamilton's equations of motion and Newton's second law in the way it structured its search. The algorithm did not search over arbitrary functions of arbitrary variables; it searched over functions of positions and their time derivatives, scored in part by how well they satisfied the structure of Lagrangian or Hamiltonian mechanics. The physical framework was not discovered, but it was presupposed.

Building on this tradition, Brunton, Proctor, and Kutz (2016) introduced SINDy (Sparse Identification of Nonlinear Dynamics). Where Schmidt and Lipson used genetic programming to evolve symbolic expressions, SINDy took a different approach: given time-series data from a dynamical system, construct a library of candidate nonlinear functions (polynomials, trigonometric functions, and so on), then use sparse regression to identify the fewest terms needed to accurately describe the dynamics.

The important assumption is that the governing equations are sparse in the space of possible functions: most candidate terms have zero coefficients, and only a handful are active. This is a strong but physically well-motivated prior. Newton's second law involves only a few terms; the Lorenz equations, despite producing chaotic behaviour, are built from simple polynomial nonlinearities. SINDy exploits this structure by applying L1 regularisation (the same mathematical machinery behind compressed sensing) to select the active terms automatically.

Brunton and colleagues demonstrated that SINDy could correctly identify the governing equations for canonical dynamical systems, including the Lorenz attractor, the logistic map, and even the vortex shedding dynamics of fluid flow past a cylinder, all from noisy measurement data. The recovered equations were the correct symbolic expressions, complete with accurate numerical coefficients.

What makes SINDy significant for our purposes is what it shows about the role of prior knowledge. It requires the user to specify the candidate function library, which encodes substantial assumptions about the mathematical form of the law. It requires access to time derivatives, which must either be measured directly or estimated numerically from noisy data. And it assumes that the correct variables have already been identified: SINDy discovers the relationship between position, velocity, and acceleration, but it does not discover that position, velocity, and acceleration are the right variables to consider. These are not criticisms of SINDy as a method, but they are observations about what is required to move from data to physical law. Even in the most favourable cases, the "discovery" depends on a substantial scaffolding of human choices.

III. The case of orbital mechanics

The most illuminating case study in recent literature involves two papers that approach the same physical problem, the dynamics of gravitational orbits, with very different methods and arrive at strikingly different conclusions. Together they show the central tension between prediction and understanding.

Lemos, Jeffrey, Cranmer, Ho, and Battaglia (2022) trained a graph neural network on thirty years of real trajectory data from the Sun, planets, and major moons of our solar system, then applied symbolic regression to extract an analytical expression for the force law the network had implicitly learned. The result was Newton's law of universal gravitation, $F = G m_{1} m_{2} / r^{2}$ , recovered from observational data without any prior assumptions about the masses of the bodies or the value of the gravitational constant.

This is a genuinely impressive achievement, but the methodology deserves close scrutiny. The "key assumptions," as the authors themselves state, were translational and rotational equivariance, together with Newton's second and third laws of motion. Equivariance is a powerful geometric prior that forces the model to respect the symmetries of physical space: the requirement that the laws of physics do not depend on where you are or which direction you face. Building in Newton's second and third laws effectively provides the syntactic structure of classical mechanics, the grammar of forces and accelerations, and asks the network merely to fill in the specific functional form of the force law.

The system did not discover classical mechanics from scratch. It was heavily constrained to find a solution that already conformed to the core principles of classical mechanics. This is not a weakness of the work, but it is, I think, its most important lesson. The successful "discovery" required identifying and imposing precisely the right physical priors from the start.

The contrasting case comes from Vafa, Chang, Rambachan, and Mullainathan (2025), who trained foundation models, including Transformers and state-space models, on the same kind of data: orbital trajectories generated by Newtonian mechanics. The models excelled at their training task, achieving high accuracy on in-distribution sequence prediction. They could predict where a planet would be next with remarkable precision.

But the authors developed a clever diagnostic they call an "inductive bias probe," which tests whether a model's internal representations align with a postulated world model by evaluating how the model adapts to synthetic datasets generated from that world model. The results were damning: the foundation models "consistently fail to apply Newtonian mechanics" when adapted to new physics tasks. Despite their predictive prowess on the training distribution, their internal representations did not correspond to the underlying generative process.

When the authors applied symbolic regression to extract the force law implicit in the Transformer's predictions, the result was physically nonsensical: a function that depended on non-physical combinations of mass and distance. The model had found a statistical shortcut, a heuristic that worked within its training distribution, rather than the causal law that generated the data.

The foundation models had achieved high predictive skill in Chollet's sense: they performed well on the task they were trained for. But they had not achieved generalisation. Their internal representations were statistical summaries tuned to the training distribution, not recoveries of the causal structure that produced it. They could tell you where a planet would be next, but they could not tell you why, and they could not transfer that knowledge to a new gravitational system.

IV. Video models

A complementary line of thought comes from the study of video generation models. Following the splash made by OpenAI's Sora, there was considerable excitement about the possibility that video generation models might serve as "world models" that implicitly learn the laws of physics from visual data.

Kang et al. (2024) put this to the test in their paper "How Far Is Video Generation from World Model: A Physical Law Perspective." They constructed a controlled 2D simulation environment governed by classical mechanics laws and trained diffusion-based video generation models on the resulting videos. Their evaluation distinguished three regimes: in-distribution generalisation, out-of-distribution generalisation, and combinatorial generalisation (the ability to combine concepts seen separately during training, such as a new combination of object size and velocity).

The results showed that scaling model size and data volume improved in-distribution performance, as expected, and yielded measurable gains on combinatorial generalisation. But out-of-distribution generalisation, the ability to extrapolate to scenarios not represented in training, remained stubbornly poor. When tested on balls moving at speeds or in configurations not well represented in the training data, the models failed significantly.

The authors' analysis revealed what they call "case-based" generalisation: the models were not abstracting universal rules but were instead mimicking the most similar training examples. They found a striking prioritisation order in this mimicry (colour > size > velocity > shape), suggesting that the models were anchoring on visually salient but physically irrelevant features.

This connects to the classic phenomenon that Geirhos et al. (2020) have termed "shortcut learning." Neural networks systematically identify the simplest statistical correlations that solve a training task, even when those correlations are physically nonsensical. A model trained to classify images of cows may learn to associate green grass with the cow label rather than learning what a cow looks like. Similarly, a model trained to predict orbital trajectories may learn statistical regularities in the training data rather than the law of gravitation. The shortcut is easier to find by gradient descent, and it achieves comparable training loss, but it fails when the distribution shifts.

V. Symbolic regression and the graph network pipeline

If generic foundation models cannot bridge the gap from prediction to understanding, what can? The most successful paradigm to date combines neural networks with symbolic regression in a two-stage pipeline: first train a neural network with appropriate inductive biases to learn a good representation, then distil that representation into an explicit symbolic expression.

The foundational work here is by Cranmer et al. (2020), presented at NeurIPS 2020 as "Discovering Symbolic Models from Deep Learning with Inductive Biases." Their approach begins with a graph neural network (GNN) that represents physical systems as graphs: bodies as nodes, interactions as edges. The GNN is trained with strong inductive biases, including sparsity constraints on its internal representations, which encourage the learned functions to be simple enough for symbolic regression to extract.

The results were interesting. The pipeline correctly recovered known force laws and Hamiltonians from simulation data. Applied to Newtonian dynamics, it extracted $F \propto 1 / r^{2}$ . Applied to Hamiltonian systems, it recovered the correct energy functions. Most remarkably, when applied to detailed dark matter simulations from cosmology (where the governing equations are not known in closed form), it discovered a new analytical formula that could predict the concentration of dark matter halos from the mass distribution of nearby cosmic structures. This formula, extracted by symbolic regression from the internal functions of the GNN, generalised to out-of-distribution data better than the GNN itself.

The success of this pipeline depends on a specific architectural philosophy: use the neural network's flexibility to learn a good numerical approximation, but constrain the search space with symmetries and sparsity so that the learned function is simple enough to be symbolically distilled. The graph structure enforces locality (each body interacts only with its neighbours through pairwise messages). Equivariance constraints enforce physical symmetries. Sparsity constraints keep the learned functions low-dimensional. Together, these priors narrow the hypothesis space from "any arbitrary function" to "simple functions respecting physical symmetries," and within that narrowed space, symbolic regression can find the right answer.

VI. Encoding physics in architecture

The two-stage pipeline of Cranmer et al. encodes physical knowledge through architectural constraints on the network and then extracts symbolic laws after training. But there is a parallel line of work that takes a more radical approach: building the mathematical structure of physics directly into the neural network's forward pass, so that conservation laws are satisfied by construction rather than learned from data.

The seminal work here is Hamiltonian Neural Networks (HNNs), introduced by Greydanus, Dzamba, and Yosinski (2019). The idea is elegantly simple. In Hamiltonian mechanics, the time evolution of a system is entirely determined by a scalar function $H (q, p)$ , the Hamiltonian, through Hamilton's equations: $˙ q = \partial H / \partial p$ and $˙ p = - \partial H / \partial q$ . Rather than training a network to directly predict the next state of a system (which inevitably accumulates errors that violate energy conservation), an HNN parameterises $H$ itself as a neural network and computes the time derivatives via automatic differentiation through Hamilton's equations. Because the dynamics are derived from a Hamiltonian by construction, the resulting model conserves energy exactly, regardless of what specific function the network learns.

The practical consequences are significant. On problems like the two-body gravitational system and pendulum dynamics, HNNs trained faster, generalised better, and, crucially, produced trajectories that were perfectly reversible in time, a fundamental property of Hamiltonian systems that ordinary neural networks cannot guarantee. The network did not need to "learn" energy conservation from the data; conservation was an architectural invariant.

Cranmer, Greydanus, Hoyer, et al. (2020) extended this idea to Lagrangian Neural Networks (LNNs) at the ICLR 2020 workshop on differential equations and deep learning. Where HNNs require the system to be expressed in canonical coordinates (positions and conjugate momenta), which may not always be available, LNNs parameterise the Lagrangian $L (q, ˙ q)$ instead and derive the dynamics via the Euler-Lagrange equations. This is a more flexible formulation: it works with generalised coordinates and does not require the user to know the canonical momenta in advance. LNNs thus broadened the applicability of the Hamiltonian approach while preserving the same core insight: encode the form of the physical law in the architecture, and let the network learn only the specific content.

A related but philosophically distinct approach is Physics-Informed Neural Networks (PINNs), developed by Raissi, Perdikaris, and Karniadakis (2019). Where HNNs and LNNs encode the structural form of mechanics (Hamiltonian or Lagrangian) in the architecture, PINNs encode known differential equations as soft constraints in the loss function. A PINN is trained not only to fit observed data but also to satisfy a specified partial differential equation at a set of collocation points. The physics enters through a regularisation term that penalises violations of the governing equation.

PINNs are primarily tools for solving known equations rather than discovering new ones, and for that reason they sit somewhat outside the "law discovery" programme surveyed here. But they are worth mentioning for two reasons. First, they represent an enormous and influential body of work (the original paper has been cited thousands of times) that demonstrates the practical value of embedding physical knowledge in learning systems. Second, they illustrate a different location for the inductive bias: not in the architecture (as in HNNs), not in the candidate function space (as in SINDy), but in the loss function. The physics is a constraint on what the network is allowed to learn, imposed through the training objective rather than through the computational graph.

Together, HNNs, LNNs, and PINNs demonstrate that there is a spectrum of ways to encode physical knowledge, from hard architectural constraints that guarantee conservation laws by construction, to soft loss-function penalties that encourage but do not guarantee physical consistency. The pattern is the same throughout: the more physics you build in, the better the results. This is precisely the opposite of what we would expect if sufficiently powerful generic architectures could discover physics on their own.

VII. Concept discovery

The approaches described so far assume that the relevant physical variables (position, velocity, mass, force) are either given directly or can be straightforwardly computed from the raw data. But a deeper form of discovery would involve the autonomous identification of the relevant concepts themselves. What if a system had to figure out that "mass" is a useful concept before it could discover $F = m a$ ?

This is the challenge taken up by Fang, Jian, Li, and Ma in their AI-Newton framework (Fang et al., 2025). AI-Newton operates on raw multi-experiment data from mechanics simulations and is given no prior physical concepts: no mass, no energy, no force. Instead, the system autonomously proposes interpretable physical concepts and progressively generalises the laws it discovers to broader domains.

The architecture is organised around a knowledge base with three layers: symbols, concepts, and laws. Beginning with only geometric information, experimental parameters, and spatiotemporal coordinates, the system uses what its authors call "plausible reasoning" (a form of inference from partial evidence, closer to abduction than to deduction) to propose candidate concepts. It then tests whether those concepts allow it to formulate simpler and more general laws across its collection of experiments. A recommendation engine, combining a UCB-style exploration-exploitation trade-off with a dynamically adapted neural network, guides the selection of which experiments and concepts to investigate next.

Applied to a large, noisy dataset of 46 mechanics experiments, AI-Newton successfully rediscovered Newton's second law, the conservation of energy, and the law of universal gravitation, all without being told that mass, energy, or gravitational force exist as concepts. The system invented its own internal variables that turned out to correspond to these physical quantities.

This is a significant advance over previous work, though important caveats apply. The system operates on simulated data from classical mechanics, a domain where the laws are known, the variables are well defined, and the experiments can be repeated arbitrarily. Whether this approach scales to domains where the relevant concepts are genuinely unknown remains an open question. The "plausible reasoning" framework also involves design choices, about what counts as a plausible concept, how concepts are composed, and how generality is measured, that encode implicit assumptions about the structure of physical knowledge. These choices are themselves a form of inductive bias, albeit at a higher level of abstraction than the geometric priors of equivariant networks.

VIII. LLMs as scientific agents

A more recent development uses large language models not as direct discoverers of physical laws but as reasoning agents that orchestrate the discovery process. Mower and Bou-Ammar (2025) introduced Al-Khwarizmi, a framework that integrates foundation models with the SINDy algorithm. The LLM does not find the law through sequence prediction; rather, it acts as a reasoning engine that analyses system observations (textual descriptions, raw data, plots), proposes candidate feature libraries and optimiser configurations, and iteratively refines its proposals based on feedback.

The architecture uses retrieval-augmented generation (RAG) to incorporate prior physical knowledge from documentation and expert descriptions, and a reflection mechanism that allows the system to evaluate and improve its own proposals across iterations. Evaluated on 198 models, Al-Khwarizmi achieved a 20% improvement over the best-performing alternative, using only open-source models.

This represents a philosophically interesting shift. The LLM is not itself learning physics; it is leveraging its training on scientific text to serve as a guide for a specialised discovery algorithm. The physical insight comes from SINDy's sparse regression, but the LLM handles the meta-cognitive task of choosing what to look for and how. It is, in effect, automating the role of the human expert who, in the original SINDy framework, had to manually specify the candidate function library and tuning parameters.

Whether this constitutes "discovery" in any robust sense is debatable. The LLM's contribution is to encode human physical intuition (absorbed from its training corpus) in a form that can be computationally deployed. This is useful, perhaps transformatively so, but it is a different kind of achievement from learning physical structure directly from data. It is closer to the automation of scientific practice than to the automation of scientific insight.

IX. The geometric approach

An alternative to discovering equations of motion is to discover what stays the same. Conservation laws (conservation of energy, momentum, angular momentum) are among the deepest structures in physics, and they can be identified without knowing the specific dynamics of a system.

Recent work by Lu et al. (2023) on discovering conservation laws using optimal transport and manifold learning takes this approach. Rather than seeking the equation of motion, these methods examine the geometry of data in phase space. They look for manifolds on which the data is constrained to live, and infer that the constraint must arise from a conservation law. If the trajectory of a system always stays on a particular surface in its state space, something is preventing it from leaving that surface, and that "something" is a conserved quantity.

This geometric perspective has the advantage of working even when time-series data is noisy or incomplete, and of identifying invariants that may not be obvious from the equations of motion themselves. It also connects to deep mathematical structures: Noether's theorem tells us that every continuous symmetry of a physical system corresponds to a conservation law, so discovering conservation laws is equivalent to discovering symmetries.

X. The emerging landscape

Looking across this literature, several patterns emerge.

The first is the centrality of inductive biases. Every successful approach to discovering physical laws from data involves substantial prior knowledge encoded in the architecture, the training procedure, the loss function, or the candidate function space. Equivariance constraints, Hamiltonian and Lagrangian structure, sparsity priors, graph topology, physics-informed loss terms, symbolic regression, and the grammar of differential equations are all forms of inductive bias that narrow the hypothesis space to physically plausible solutions. Without these priors, models converge on statistical shortcuts rather than physical laws. This is not a failure of the models, moreso it is a reflection of the fact that the space of functions consistent with any finite dataset is enormously larger than the space of physically meaningful laws. Indeed, as Hillar and Sommer showed for Schmidt and Lipson's pioneering work, even methods that claim to operate "without prior physical knowledge" turn out, on close inspection, to presuppose substantial physical structure. Even the geometric approach of Lu et al., which avoids specifying an explicit dynamical model, encodes substantial assumptions in its choice of metric: using optimal transport to compare trajectory distributions presupposes that the relevant structure lives in the geometry of phase space, a non-trivial physical commitment.

The second pattern is the importance of the two-stage pipeline, though it is not the only successful strategy. The GNN-to-symbolic-regression approach of Cranmer et al. separates representation learning (using neural networks with appropriate inductive biases) from symbolic distillation (using symbolic regression or related methods to extract interpretable expressions). This division of labour plays to the strengths of each component: neural networks are good at flexible function approximation, while symbolic methods are good at finding concise, interpretable, and generalisable expressions. But HNNs and LNNs demonstrate an alternative: rather than extracting the law after training, encode the law's structural form in the architecture, so that the network is constrained to learn something physically meaningful from the outset. Both strategies succeed for the same underlying reason: they restrict the hypothesis space to regions where physical laws live.

Third, the gap between prediction and understanding seems systematic. It reflects a fundamental difference between interpolation (performing well on data drawn from the same distribution as the training data) and extrapolation (performing well on genuinely new situations). Physical laws are precisely the kind of structure that enables extrapolation: if you know $F = G m_{1} m_{2} / r^{2}$ , you can predict the orbit of a spacecraft around Jupiter even if your training data contained only observations of Mercury. But statistical regularities extracted from training data, no matter how accurate within distribution, do not support this kind of transfer. This is Chollet's point applied to physics: task-specific skill does not imply the kind of abstract, transferable understanding that physical laws represent. The distinction has concrete empirical consequences for out-of-distribution performance.

Fourth, the most exciting advances involve the autonomous discovery of concepts. AI-Newton's ability to invent the concept of "mass" from raw experimental data represents a qualitatively different kind of achievement from finding the best symbolic fit to pre-identified variables. If this approach can be extended to domains where the relevant concepts are genuinely unknown, it could contribute significantly to scientific understanding.

XI. Implications for the convergence debate

These findings have direct bearing on the questions raised in my previous post on representation convergence. The Platonic Representation Hypothesis, the Natural Abstraction Hypothesis, and the Universality Hypothesis all assume, in different ways, that sufficiently capable learners will converge on the objective structure of reality through the pressure of prediction alone. The literature on physical law discovery suggests a more nuanced picture.

Prediction alone is insufficient. Foundation models trained on orbital trajectories achieve excellent predictive accuracy without internalising Newton's law of gravitation. This directly refutes the claim that prediction pressure alone drives convergence toward the generative structure of reality. High-fidelity prediction is compatible with physically nonsensical internal representations.

Architecture matters profoundly. The choice of inductive biases, including symmetry constraints, Hamiltonian or Lagrangian structure, graph topology, and sparsity priors, determines whether a model converges on genuine physical structure or on statistical shortcuts. Different architectures, trained on the same data, arrive at fundamentally different internal representations, not merely different views of the same underlying reality. A Transformer and a GNN, both trained on orbital data, do not learn the same physics. An HNN conserves energy exactly where a standard network does not. The architecture is not a neutral vessel for learning; it is an active participant in determining what is learned.

Successful discovery is specifically engineered. In every case where a neural network has recovered a known physical law, the success was achieved by deliberately designing the system to respect the structural properties of that law. Whether through hard architectural constraints (HNNs guaranteeing energy conservation), soft loss-function penalties (PINNs enforcing known PDEs), or structured search spaces (SINDy and symbolic regression), the "discovery" is better described as constrained optimisation within a carefully chosen hypothesis space than as the spontaneous emergence of physical understanding.

At the same time the success stories are genuine and should not be dismissed. Specialised networks, equipped with the right inductive biases and trained on carefully curated data, can recover genuine physical laws. They can even discover new regularities in domains where the laws are not known (as in the dark matter example from Cranmer et al.). This suggests that the productive path forward is not to hope for convergence from general-purpose systems but to develop principled methods for encoding physical knowledge into learning systems and verifying that the resulting representations correspond to genuine structure.

The challenge, then, is not one of passive discovery but of active construction: understanding which architectures and datasets and training regimes unlock which domains, and how to verify that the representations learned correspond to real physics rather than statistical artefacts. This is the work that will, I believe, transform scientific practice in the near term.

Acknowledgments

This post was written by Margot Stakenborg. My background is in theoretical physics, chemistry, and philosophy of physics.

This work was funded by the Advanced Research + Invention Agency (ARIA) through project code MSAI-SE01-P005, as part of the Dovetail Fellowship.

Initial research was conducted during the SPAR winter programme.

References

Brunton, S. L., Proctor, J. L., & Kutz, J. N. (2016). "Discovering governing equations from data by sparse identification of nonlinear dynamical systems." Proceedings of the National Academy of Sciences, 113(15), 3932–3937.

Chollet, F. (2019). "On the Measure of Intelligence." arXiv:1911.01547.

Cranmer, M., Greydanus, S., Hoyer, S., Battaglia, P., Spergel, D., & Ho, S. (2020). "Lagrangian Neural Networks." ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations. arXiv:2003.04630.

Cranmer, M., Sanchez-Gonzalez, A., Battaglia, P., Xu, R., Cranmer, K., Spergel, D., & Ho, S. (2020). "Discovering Symbolic Models from Deep Learning with Inductive Biases." Advances in Neural Information Processing Systems 33 (NeurIPS 2020).

Fang, Y.-L., Jian, D.-S., Li, X., & Ma, Y.-Q. (2025). "AI-Newton: A Concept-Driven Physical Law Discovery System without Prior Physical Knowledge." arXiv:2504.01538.

Geirhos, R., Jacobsen, J.-H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). "Shortcut learning in deep neural networks." Nature Machine Intelligence, 2, 665–673.

Greydanus, S., Dzamba, M., & Yosinski, J. (2019). "Hamiltonian Neural Networks." Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 15353–15363.

Hillar, C. & Sommer, F. (2012). "Comment on the article 'Distilling free-form natural laws from experimental data.'" arXiv:1210.7273.

Kang, B., Yue, Y., Lu, R., Lin, Z., Zhao, Y., Wang, K., Huang, G., & Feng, J. (2024). "How Far Is Video Generation from World Model: A Physical Law Perspective." arXiv:2411.02385.

Lemos, P., Jeffrey, N., Cranmer, M., Ho, S., & Battaglia, P. (2022). "Rediscovering orbital mechanics with machine learning." Machine Learning: Science and Technology, 4, 045002.

Lu, P. Y., Ariño Bernad, R., & Soljačić, M. (2023). "Discovering Conservation Laws using Optimal Transport and Manifold Learning." Nature Communications, 14, 4744.

Mower, C. E. & Bou-Ammar, H. (2025). "Al-Khwarizmi: Discovering Physical Laws with Foundation Models." arXiv:2502.01702.

Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). "Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations." Journal of Computational Physics, 378, 686–707.

Schmidt, M. & Lipson, H. (2009). "Distilling Free-Form Natural Laws from Experimental Data." Science, 324(5923), 81–85.

Vafa, K., Chang, P. G., Rambachan, A., & Mullainathan, S. (2025). "What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models." Proceedings of the 42nd International Conference on Machine Learning (ICML).

Effective Altruism Forum
EA Forum