Policy Optimization and RL Algorithms
Modern RL algorithms are a chain of fixes where each one solves the most painful problem of the last. This article explores that story, with some math involved.
Modern RL algorithms are a chain of fixes where each one solves the most painful problem of the last. This article explores that story, with some math involved.
An intuitive, example-driven introduction to KL divergence, explaining its connection to surprise, entropy, log-likelihood, forward vs. reverse KL behavior, and practical KL estimation in RLHF.
Fastasy, science fiction, romantic comedies, some roasts and recommendations along the way.
A guide to understanding and categorizing the many flavors of reinforcement learning algorithms, from value iteration to PPO.
Notes on reinforcement learning from Steve Bruntonβs Data-Driven Science and Engineering book, covering core RL concepts, mathematical formalism, and key ideas
Pipeline parallelism is a technique for training large ML models, showing how to efficiently partition model layers across devices to optimize distributed training and manage memory constraints.
Monte Carlo Tree Search (MCTS) is AI algorithm that makes decisions by strategically sampling possible futures. It builds search trees incrementally, balancing exploration of new paths with exploitation of promising ones, and uses random simulations to tackle problems too complex for exhaustive analysis
Tensor Parallelism is a technique for training large ML models by splitting individual tensors across multiple devices, enabling efficient distributed training of models too large to fit on a single accelerator.
Data Parallelism is a technique for training large ML models by distributing data across multiple devices, enabling parallel processing while maintaining model consistency through gradient synchronization.
A deep dive into Mixture of Expert (MoE) models, exploring how they work, their benefits and challenges, and their role in modern language models like Mixtral.