In the burgeoning field of categorical cybernetics, my research at learns.edu.vn is deeply rooted in exploring the theoretical underpinnings of artificial intelligence, specifically within Categorical Deep Learning And Algebraic Theory Of Architectures. This interdisciplinary approach leverages the abstract language of category theory to provide a robust and general framework for understanding and designing deep learning systems. My work aims to move beyond the empirical successes of deep learning by establishing a rigorous mathematical foundation that can guide the development of more principled and interpretable AI models.
Foundational Papers in Categorical Deep Learning
My research portfolio encompasses a range of publications that delve into various aspects of categorical deep learning, each contributing to the development of an algebraic theory of architectures. These papers, while mathematically rigorous, are driven by the practical goal of enhancing our understanding and capabilities in deep learning.
Categorical Deep Learning: An Algebraic Theory of Architectures
This seminal paper lays the groundwork for using category theory as a universal language to articulate and comprehend deep learning architectures. It introduces monad algebras as a method to model the crucial concept of equivariance in neural networks. This approach not only captures the entirety of the principles outlined in Geometric Deep Learning but also extends beyond it. By generalizing from monads to endofunctor algebras, the paper overcomes the restrictive assumption of invertibility. This generalization allows for the modeling of structural (co)recursion, a concept often implicitly utilized in deep learning but rarely explicitly addressed. This theoretical advancement unlocks significant potential for creating more sophisticated neural networks and achieving a deeper insight into their internal workings through the lens of algebraic structures.
My PhD Thesis: A Category Theory Foundation for Deep Learning
My PhD thesis, visually summarized above, addresses the pressing need for a solid mathematical foundation for deep learning. Despite its practical triumphs, deep learning remains a relatively young discipline, often characterized by empirical discoveries and ad-hoc design choices rather than a cohesive theoretical framework. This thesis endeavors to rectify this by developing a novel mathematical foundation rooted in category theory. It introduces a comprehensive, uniform, and prescriptive framework that rigorously formalizes key aspects of deep learning, including backpropagation, weight sharing, architecture design, and the entire supervised learning paradigm. This work provides an unambiguous mathematical language for deep learning, fostering a deeper understanding of its fundamental principles.
Graph Convolutional Neural Networks as Parametric CoKleisli morphisms
This paper focuses on Graph Convolutional Neural Networks (GCNNs), a specialized architecture for processing graph-structured data. It introduces a bicategorical framework for GCNNs, decomposing them using established categorical constructs for deep learning: Para and Lens. However, this decomposition is performed within a specific base category – the CoKleisli category of the product comonad. This approach provides a high-level categorical interpretation of a component of the inductive bias inherent in GCNNs. Furthermore, the paper explores potential generalizations of this categorical model to encompass broader classes of message passing neural networks, paving the way for a more unified understanding of these architectures.
Space-time tradeoffs of lenses and optics via higher category theory
Optics and lenses, as categorical tools, are invaluable for modeling systems with bidirectional data flow. This paper addresses a critical observation: the standard denotational definition of optics, which equates optics based on external behavior, is insufficient for operational contexts. In software implementation, the internal structure of optics becomes relevant. To bridge this gap, the research lifts existing categorical constructions and their interrelationships to the 2-categorical level. This elevation reveals crucial operational considerations, particularly space-time tradeoffs, that are otherwise obscured at the 1-categorical level. The research highlights that the composition of lenses can lead to memory bottlenecks, a vital consideration for practical applications.
Actegories for the Working Amthematician
Inspired by the well-known category theory textbook, this paper serves as a substantial reference on the foundational theory of actegories. It comprehensively covers essential definitions and results within actegory theory. Driven by the application of actegories in the theory of optics, the paper specifically emphasizes how these actegories interact with and combine monoidal structures. This work provides a valuable resource for researchers seeking a deeper understanding of actegories and their role in categorical modeling.
Fibre Optics: Unifying Lenses and Optics
Lenses, optics, and dependent lenses (or equivalently, morphisms of containers/polynomial functors) are all widely adopted in applied category theory as models for bidirectional processes. This paper addresses the need for unification within these diverse yet related constructions. It achieves this by introducing the framework of fibre optics, providing a cohesive and overarching perspective that encompasses and relates these various models of bidirectional data flow. This unification simplifies the theoretical landscape and promotes a more integrated understanding of bidirectional systems.
Categorical Foundations of Gradient-Based Learning
This research proposes a categorical semantics for gradient-based machine learning algorithms, a cornerstone of modern deep learning. It employs lenses, parameterized maps, and reverse derivative categories to construct this semantic framework. This foundation offers a powerful explanatory and unifying tool, capable of encompassing a wide array of neural networks, loss functions (including mean squared error and Softmax cross entropy), and gradient update algorithms (such as Nesterov momentum, Adagrad, and ADAM). Remarkably, this framework extends beyond continuous domains, typically modeled in categories of smooth maps, to encompass the discrete realm of boolean circuits, demonstrating its broad applicability.
Towards foundations of categorical cybernetics
This paper broadens the scope to propose a categorical framework for ‘cybernetics,’ defined as systems that engage in bidirectional interaction with both an environment and a ‘controller.’ Examples within this framework include open learners, where the controller is an optimization algorithm like gradient descent, and open games, where the controller is composed of game-theoretic agents. This work lays the foundation for a categorical understanding of cybernetic systems, encompassing learning and game theory within a unified framework.
Category Theory in Machine Learning: a Survey
This survey paper addresses the increasing intersection of category theory and machine learning. As machine learning permeates numerous technological domains and category theory emerges as a unifying scientific language, the application of category theory to machine learning has gained significant traction. This paper documents the motivations, objectives, and recurring themes across these applications. It touches upon key areas like gradient-based learning, probability theory, and equivariant learning, providing a valuable overview of the field for both category theorists and machine learning practitioners.
Compositional Game Theory, Compositionally
This paper introduces a novel compositional approach to compositional game theory, leveraging Arrows, a concept closely linked to Tambara modules. This compositional methodology, detailed in the Paper, demonstrates how known and previously undiscovered variants of open games can be proven to form symmetric monoidal categories. This provides a powerful algebraic structure for analyzing and composing game-theoretic systems.
Learning Functors using Gradient Descent
This research constructs a category-theoretic formalism around CycleGAN, a neural network system for unpaired image-to-image translation. It establishes a connection between CycleGANs and categorical databases, demonstrating that a specific class of functors can be learned using gradient descent. The paper also introduces a novel neural network designed to insert and delete objects from images without paired data, evaluating its performance on the CelebA dataset. This work bridges the gap between abstract categorical concepts and concrete deep learning applications in image manipulation.
Conclusion
My research consistently advocates for the application of category theory as a fundamental tool in deep learning. By developing an algebraic theory of architectures, I aim to provide a more profound and principled understanding of these complex systems. This approach not only offers a rigorous language for describing existing deep learning techniques but also paves the way for the development of novel architectures and learning paradigms grounded in solid mathematical foundations. The exploration of categorical deep learning and algebraic theory of architectures is crucial for the continued advancement and interpretability of artificial intelligence.