Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in NeurIPS ML4PS 2024, 2024
Short-form discussion of Product Manifold Machine Learning for Physics accepted to NeurIPS ML4PS 2024
Published in ArXiv, 2024
Physical data are representations of the fundamental laws governing the Universe, hiding complex compositional structures often well captured by hierarchical graphs. Hyperbolic spaces are endowed with a non-Euclidean geometry that naturally embeds those structures. To leverage the benefits of non-Euclidean geometries in representing natural data we develop machine learning on PM spaces, Cartesian products of constant curvature Riemannian manifolds. As a use case we consider the classification of “jets”, sprays of hadrons and other subatomic particles produced by the hadronization of quarks and gluons in collider experiments. We compare the performance of PM-MLP and PM-Transformer models across several possible representations. Our experiments show that PM representations generally perform equal or better to fully Euclidean models of similar size, with the most significant gains found for highly hierarchical jets and small models. We discover significant correlation between the degree of hierarchical structure at a per-jet level and classification performance with the PM-Transformer in top tagging benchmarks. This is a promising result highlighting a potential direction for further improving machine learning model performance through tailoring geometric representation at a per-sample level in hierarchical datasets. These results reinforce the view of geometric representation as a key parameter in maximizing both performance and efficiency of machine learning on natural data.
Published in Physical Review D, 2025
Self-supervised learning (SSL) is at the core of training modern large machine learning models, providing a scheme for learning powerful representations that can be used in a variety of downstream tasks. However, SSL strategies must be adapted to the type of training data and downstream tasks required. We propose resimulation-based self-supervised representation learning (RS3L), a novel simulation-based SSL strategy that employs a method of resimulation to drive data augmentation for contrastive learning in the physical sciences, particularly, in fields that rely on stochastic simulators. By intervening in the middle of the simulation process and rerunning simulation components downstream of the intervention, we generate multiple realizations of an event, thus producing a set of augmentations covering all physics-driven variations available in the simulator. Using experiments from high-energy physics, we explore how this strategy may enable the development of a foundation model; we show how RS3L pretraining enables powerful performance in downstream tasks such as discrimination of a variety of objects and uncertainty mitigation. In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies.
Published:
Particle jets exhibit tree-like structures through stochastic showering and hadronization. The hierarchical nature of these structures aligns naturally with hyperbolic space, a non-Euclidean geometry that captures hierarchy intrinsically. Drawing upon the foundations of geometric learning, we introduce hyperbolic transformer models tailored for tasks relevant to jet analyses, such as classification and representation learning. Through jet embeddings and jet tagging evaluations, our hyperbolic approach outperforms its Euclidean counterparts. These findings underscore the potential of using hyperbolic geometric representations in advancing jet physics analyses.
Published:
Many of the recent successes in AI rely on the manifold hypothesis: that most high-dimensional data lie on a lower-dimensional manifold. From transfer to contrastive learning to foundation modeling, significant effort has been devoted to methods to efficiently find and map input data to this latent space. In this Thematic Discussion Session, we’ll hear from three distinguished speakers on extracting meaningful latent representations from data from the physical sciences: Aizhan Akhmetzhanova (Self-Supervised Learning for Data Compression and Inference in Cosmology), Nate Woodward, (Product Manifold Machine Learning for Physics) and David Baek, (GenEFT: Physics-Inspired Theory of Representation Learning). After three 10-minute lightning talks, we’ll have a 30 minute open discussion/Q&A session to explore the major challenges and opportunities in this field. We encourage attendees to come with questions and insights from their own work!
Published:
Particle jets exhibit tree-like structures through stochastic showering and hadronization. The hierarchical nature of these structures aligns naturally with hyperbolic space, a non-Euclidean geometry that captures hierarchy intrinsically. Drawing upon the foundations of geometric learning, we introduce hyperbolic transformer models tailored for tasks relevant to jet analyses, such as classification and representation learning. Through jet embeddings and jet tagging evaluations, our hyperbolic approach outperforms its Euclidean counterparts. These findings underscore the potential of using hyperbolic geometric representations in advancing jet physics analyses.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.