ποΈ A Taxonomy of Reinforcement Learning Algorithms
A guide to understanding and categorizing the many flavors of reinforcement learning algorithms, from value iteration to PPO.
A guide to understanding and categorizing the many flavors of reinforcement learning algorithms, from value iteration to PPO.
Notes on reinforcement learning from Steve Bruntonβs Data-Driven Science and Engineering book, covering core RL concepts, mathematical formalism, and key ideas
Pipeline parallelism is a technique for training large ML models, showing how to efficiently partition model layers across devices to optimize distributed training and manage memory constraints.
Monte Carlo Tree Search (MCTS) is AI algorithm that makes decisions by strategically sampling possible futures. It builds search trees incrementally, balancing exploration of new paths with exploitation of promising ones, and uses random simulations to tackle problems too complex for exhaustive analysis
Tensor Parallelism is a technique for training large ML models by splitting individual tensors across multiple devices, enabling efficient distributed training of models too large to fit on a single accelerator.
Data Parallelism is a technique for training large ML models by distributing data across multiple devices, enabling parallel processing while maintaining model consistency through gradient synchronization.
A deep dive into Mixture of Expert (MoE) models, exploring how they work, their benefits and challenges, and their role in modern language models like Mixtral.
The Daily Ink is a discontinued newsletter featuring bi-weekly breakdowns of research papers in the domain of machine learning and ML systems. This is an archive of all the articles.
A comprehensive overview of key concepts and theorems in Probability and Random Processes, covering essential topics such as conditional probability, Bayesβ theorem, independence, and counting principles, serving as a valuable reference for students in EECS126.