Blogs

27 Jan 2025

Today’s market reaction to DeepSeek’s R1 model was dramatic. The Nasdaq dropped 3.1%, and Nvidia saw its stock plummet by 17% - the largest single-day market cap loss in Wall Street history. The narrative? A Chinese hedge fund’s ability to train and run powerful AI models at a fraction of the cost threatens the entire AI ecosystem.

DeepSeek R1’s performance benchmarks compared to leading models. AIME tests advanced mathematical reasoning, MATH-500 evaluates complex mathematical problem-solving, GPQA Diamond measures general knowledge and reasoning, LiveCode Bench tests coding ability, and CodeForces rates algorithmic problem-solving skills.

16 Dec 2024

The Speed Limit of Thought

A few days ago I was reading this research paper that completely changed how I think about language, AI, and the future of work. These scientists studied 17 different languages and discovered something incredible: every human language transmits information at almost exactly the same rate - about 39 bits per second.

Looking at their data, I initially assumed some languages would be way more efficient than others. Maybe Mandarin would pack in way more meaning than English, or certain languages would just be fundamentally better at getting ideas across.

09 Nov 2024

Learn How to Learn

In the future, your ability to create software won’t be limited by your technical knowledge. It will be limited by how clearly you can think. Computing started with physical machines - we had to manually arrange circuits and transistors to make anything happen. Then we invented ways to encode instructions using just 1s and 0s. Next came human-readable instructions in assembly language. Then high-level languages like Python that let us write code almost like English. Now we’re at LLMs that can turn plain English into working software. Each breakthrough moved us further from thinking like machines and closer to thinking like humans. But here’s what’s really interesting: the next abstraction won’t be technical at all.

21 Oct 2024

Maybe Simple Is All You Need

Enabling self-improvement, where LLMs can autonomously make themselves better, is becoming increasingly feasible in the near future. These language models are already writing prod code (either via copy/paste or Cursor or whoever Devin’s clients are), but we’re rapidly heading toward a future where they will also orchestrate entire workflows. This opens the door to the concept of ‘single-use software’: when the cost of producing software becomes so cheap that we just write code for everything. The problem isn’t a lack of tasks that could benefit from software, it’s that we lack the resources to develop customized solutions for every use case. Every industry is full of repetitive processes that could be optimized with code, but hiring a dev (or allocating the time of the devs you do have) just isn’t worth it. Advanced frameworks like CrewAI and AutoGen are pushing the boundaries of what’s possible in multi-agent systems, enabling role delegation, tool use, and task management. I’ve played around with CrewAI and GPT-4o/Claude were pretty good at helping get something running pretty fast, but they weren’t zero-shotting it.

06 Oct 2024

Entering the Inference Era

In the world of AI, one of the most transformative insights has been the concept of scaling laws—the idea that increasing the amount of compute, data, and model size leads to predictable gains in model intelligence. This principle became evident as AI models like GPT-3 and GPT-4 evolved. For example, GPT-3, with an estimated training cost of $10-20 million, was a massive leap forward in natural language understanding. However, GPT-4, with a training cost of around $100 million, required nearly 40 times more energy and compute than GPT-3. In return, it exhibited significantly improved language comprehension, reasoning, and overall intelligence. This increase in raw compute was a direct factor in the model’s heightened capabilities, demonstrating how scaling laws play out in the realm of model pre-training.