Policy Optimization and RL Algorithms

Modern RL algorithms are a chain of fixes where each one solves the most painful problem of the last. This article explores that story, with some math involved.

November 2025 · 16 min · 3401 words · Arushi Somani

πŸ“ˆ A Note about KL Divergence

An intuitive, example-driven introduction to KL divergence, explaining its connection to surprise, entropy, log-likelihood, forward vs. reverse KL behavior, and practical KL estimation in RLHF.

November 2025 · 10 min · 2020 words · Arushi Somani

πŸ“– Every Book I Read in 2024

Fastasy, science fiction, romantic comedies, some roasts and recommendations along the way.

December 2024 · 14 min · 2867 words · Arushi Somani

πŸ—‚οΈ A Taxonomy of Reinforcement Learning Algorithms

A guide to understanding and categorizing the many flavors of reinforcement learning algorithms, from value iteration to PPO.

October 2024 · 6 min · 1191 words · Arushi Somani

πŸ“ Introduction to RL

Notes on reinforcement learning from Steve Brunton’s Data-Driven Science and Engineering book, covering core RL concepts, mathematical formalism, and key ideas

October 2024 · 12 min · 2425 words · Arushi Somani

πŸͺˆ ML at Scale: Pipeline Parallelism

Pipeline parallelism is a technique for training large ML models, showing how to efficiently partition model layers across devices to optimize distributed training and manage memory constraints.

July 2024 · 12 min · 2379 words · Arushi Somani, Anton Zabreyko

πŸ”Ž Monte Carlo Tree Search

Monte Carlo Tree Search (MCTS) is AI algorithm that makes decisions by strategically sampling possible futures. It builds search trees incrementally, balancing exploration of new paths with exploitation of promising ones, and uses random simulations to tackle problems too complex for exhaustive analysis

July 2024 · 15 min · 3098 words · Arushi Somani

πŸͺ† ML at Scale: Tensor Parallelism

Tensor Parallelism is a technique for training large ML models by splitting individual tensors across multiple devices, enabling efficient distributed training of models too large to fit on a single accelerator.

March 2024 · 8 min · 1552 words · Arushi Somani, Anton Zabreyko

πŸ’½ ML at Scale: Data Parallelism

Data Parallelism is a technique for training large ML models by distributing data across multiple devices, enabling parallel processing while maintaining model consistency through gradient synchronization.

March 2024 · 8 min · 1566 words · Arushi Somani, Anton Zabreyko

πŸŽ›οΈ How do Mixture of Expert Models Work?

A deep dive into Mixture of Expert (MoE) models, exploring how they work, their benefits and challenges, and their role in modern language models like Mixtral.

February 2024 · 6 min · 1198 words · Arushi Somani