← kwindla hultman kramer

Gemini 2.0 handles this prompt exactly as written, if you turn on code execution

December 25, 2024

Gemini 2.0 handles this prompt exactly as written, if you turn on code execution.

@francoisfleuret's point, here, is that you *need* code generation for the LLM to be able to output A before it outputs N.

But I'm going to side-step that point (and the broader debate it's part of). I want to talk about code generation, because I've been thinking a lot about code generation lately.

First, let's talk about function calling.

One of the big lessons from building voice AI agents with LLMs is that huge classes of real-world use cases absolutely require function calling.

You can build fantastic voice AI demos on top of just the in-model inference capabilities of an LLM. But almost all production voice AI agents rely heavily on function calling.

This was one of those "you have to build it to learn the lesson" lessons.

GPT-4, released in March 2023, made a new kind of conversational AI possible. GPT-4 was the first LLM to be good at:

1. Human-like conversational dynamics.

2. Reliable reasoning/connection-making over a sufficiently-sized context (10 minutes of spoken conversation, in March 2024).

3. Large embedded knowledge base.

4. (Nearly) low enough latency for voice conversation.

These capabilities together turned out to be the tipping point that made a new kind of conversational voice AI possible.

In terms of making it possible to use these abilities for real-world deployments, general progress in LLM performance since March 2023 has been really helpful. But the most important unlock has been function calling.

Voice agents use function calls for:

- Interacting with external systems specific to particular use cases and deployment contexts.

- A wide range of RAG-like information lookup tasks.

- Conversation persistence, including complex actions like saving a privacy-redacted version of a conversation.

- Script following.

Some of these uses were obvious before we had built and deployed production voice AI systems. But some of them weren't. (Script following, for example.)

I'm convinced that code generation will be as important as function calling, once we've accumulated some experience building with Gemini 2.0 and similarly capable new LLM versions.

Credit for this insight goes entirely to @shresbm, who shepherds work at Google on the Gemini APIs and AI Studio. When Shrestha first told me that really good code generation was going to make a lot of new things possible, I didn't really get it.

But after playing with code generation for a few weeks ... I still don't fully get it, but in a good way, now.

It's clear that there are some obvious things to explore:

- Context-aware math on demand driven by natural language.

- Generative UI.

- Better "binding" behavior with function calling.

But I'm also convinced now that there is a lot of unexplored, non-obvious (at least to me, yet) territory.

Let's take that last one — improved function calling performance. Function calling and code generation go together like peanut butter and jelly.

Functions often take arguments that are fairly complicated. With code generation enabled, Gemini has access to this new pattern:

natural language ➡️
code that creates argument structs for functions ➡️
multiple function calls

Think about how powerful this is. The ability to use even very basic code structures — loops, for example — massively expands what LLMs can do.

In fact, my impression is that function calling in Gemini 2.0 is now trained as a subset of the model's code generation capabilities. This feels quite different than how function calling works in other models that I have access to. In other models, function calling feels like a close cousin to structured output. (Which is great. But this new "modality" is even better.)

Back to the idea at the top of this post ... training LLMs to call external functions was important to unlocking a whole host of new, useful behaviors.

Training LLMs to write and execute code is going to have a similar impact.

  1. https://x.com/francoisfleuret/status/1871847485024686231