The Daily Ink is a discontinued newsletter featuring bi-weekly breakdowns of research papers in the domain of machine learning and ML systems. This is an archive of all the articles.

  1. Pipeline Parallelism as a Band-Aid on Memory Limitations
  2. Large Models as Engines of Computation
  3. Distilling Models Makes Them Feasible to Use
  4. Quadratic Complexity Holds Back the Legendary Transformer (Part 2)
  5. Quadratic Complexity Holds Back the Legendary Transformer (Part 1)
  6. Composing Models Together Makes Them More Powerful
  7. Multi-Modal Models Are the Future
  8. Deep Learning Solves a 20-Year-Long Unsolved Problem in Science (Part 2)
  9. Deep Learning Solves a 20-Year-Long Unsolved Problem in Science (Part 1)
  10. Models Can Do Calculus Better than You
  11. Is a Group of Expert Models Better Than One Very Smart Model?
  12. Winning the AI Lottery by Buying A Lot of Tickets
  13. Using Information Retrieval for Code Generation
  14. Meta’s New Model Is Small and Mighty
  15. Models Can Control Robots Just Like Humans
  16. Anthropic Makes AI That Teaches Itself Ethics
  17. Models Can Magically Learn New Skills at Scale
  18. Discovering a Better Optimization Algorithm with Evolution
  19. Talking to Models Requires Special Prompts that Make Them Think Sequentially
  20. Teaching LLMs to Use Tools and Not Suck at Math
  21. English Is Just Math in Prettier Clothing
  22. The Secret to Good Writing Is Editing
  23. Solving Context Length Constraints by Distillation
  24. A Large Language Model for SCIENCE
  25. Optimal Parallelism in ML Training is Possible, says ALPA
  26. Google Makes a Language Model for Music
  27. Google’s LaMDA Model Is Too Convincing, and a Researcher is Fired
  28. Teaching Computers to Think in Abstractions
  29. The Secret Sauce Behind ChatGPT
  30. FlashAttention Challenges ML Researchers to Think About Systems-Level Improvements
  31. Make Models Smarter, Not Larger, with Data Pruning
  32. DeepMind Attempts to Make AI That Can Do Anything
  33. Training Compute-Optimal Large Language Models
  34. Gradient Descent: The Ultimate Optimizer
  35. Cramming: Training a Language Model on a Single GPU in One Day
  36. A Neural Corpus Indexer for Document Retrieval