Run Claude Code with Gemma 4 for Free — No Anthropic API Key Needed

Published May 2, 2026 | Tags: Claude Code, Gemma 4, Google AI, Developer Tools, Open Source

What if you could harness the full power of Claude Code — Anthropic’s agentic coding assistant — but swap out the underlying model for Google’s open Gemma 4 or any Gemini model, completely free? Thanks to an elegant open-source proxy project on GitHub, that’s now a reality.

In this post, we’ll walk through exactly how it works, why it matters, and how you can get up and running in under five minutes.

The Problem: Claude Code is Powerful, But Costs Money

Claude Code is one of the best AI coding tools available today. It can read files, write code, run bash commands, and iterate on your project autonomously — all from your terminal. But it requires an Anthropic API key, and for heavy users, those API costs can add up fast.

Meanwhile, Google has made Gemma 4 and the Gemini model family available through Google AI Studio — with a generous free tier. The models are capable, fast, and support tool calling and streaming. The missing piece? A bridge between Claude Code’s Anthropic-format API calls and Google’s GenAI API.

That’s exactly what genai-code provides.

What Is genai-code?

genai-code is a lightweight Python proxy server built with FastAPI. It sits between Claude Code and the Google GenAI API, acting as a transparent translator. Claude Code thinks it’s talking to Anthropic’s servers — but every request is actually being forwarded to Gemma 4 or whichever Google model you choose.

Here’s the architecture in a nutshell:

Claude Code ──POST /v1/messages──▶ FastAPI Proxy ──generateContent──▶ Google GenAI API
     ▲                                    │
     │         SSE stream / JSON          │         Gemma 4 / Gemini
     └────────────────────────────────────┘

The proxy handles all the format translation transparently — including tool calls, streaming (SSE), and error shapes — so your Claude Code workflow stays completely unchanged.

Key Features

Anthropic Messages API compatible — works with Claude Code and any Anthropic SDK client out of the box
Dynamic model passthrough — specify any GenAI model per-request (Gemma 4, Gemini 2.5 Flash, Gemini 2.5 Pro, etc.)
Full SSE streaming — real-time token streaming with proper block indexing
Tool / function calling — Claude Code’s built-in tools (read, write, edit, bash) work via automatic function-calling translation
Google Search grounding — optional web search enabled by default for factual queries
Thinking / reasoning control — configure reasoning depth with budget_tokens (maps to LOW/MEDIUM/HIGH on the GenAI side)
Retry logic — automatic retries with exponential backoff on transient failures
Disconnect detection — avoids wasted API calls when clients disconnect mid-stream

Prerequisites

Before you start, make sure you have:

Python 3.10+ installed
Claude Code installed (npm install -g @anthropic-ai/claude-code)
A free Google AI Studio API key — grab one here

Step-by-Step Setup

Step 1: Clone the Repository

git clone https://github.com/thecodedata/genai-code
cd genai-code

Step 2: Install Python Dependencies

pip install -r requirements.txt

The project uses FastAPI, Uvicorn, and the Google GenAI SDK. Everything installs in seconds.

Step 3: Configure Your API Key

cp .env.example .env

Open .env and add your Google AI Studio key:

GEMINI_API_KEY=your_key_here

Step 4: Start the Proxy

uvicorn main:app --host 0.0.0.0 --port 8000

Verify it’s running with a quick health check:

curl http://localhost:8000/health
# {"status":"ok","gemini_api_key_set":true,"model":"gemma-4-26b-a4b-it",...}

Step 5: Point Claude Code at the Proxy

Create (or edit) a .claude/settings.json file in your project root — or use ~/.claude/settings.json to apply it globally:

{
    "$schema": "https://json.schemastore.org/claude-code-settings.json",
    "env": {
        "ANTHROPIC_AUTH_TOKEN": "sk-my-local-proxy-key",
        "ANTHROPIC_BASE_URL": "http://localhost:8000",
        "ANTHROPIC_MODEL": "gemma-4-26b-a4b-it",
        "ANTHROPIC_SMALL_FAST_MODEL": "gemma-4-26b-a4b-it"
    }
}

Now just run claude from your terminal as normal. Every request will silently route through Gemma 4 — no other changes needed.

Switching Models on the Fly

One of the nicest features is that you can change the underlying model at any time simply by updating ANTHROPIC_MODEL in your settings file. Any model supported by the Google GenAI SDK works:

gemma-4-26b-a4b-it — the default; great balance of quality and speed
gemma-4-29b-a4b-it — larger variant for more complex reasoning
gemini-2.5-flash — fastest option; best tool-calling reliability
gemini-2.5-pro — highest capability for demanding tasks

Tip: If you’re running into issues with tools not being called correctly, try switching to gemini-2.5-flash — it has the strongest function-calling support of the available models.

Advanced: Enabling Reasoning / Thinking

The proxy supports Google’s thinking/reasoning configuration. You can control the reasoning depth by passing a thinking object in your API requests:

{
    "model": "gemma-4-26b-a4b-it",
    "messages": [...],
    "thinking": {
        "type": "enabled",
        "budget_tokens": 10000
    }
}

The proxy maps budget_tokens to GenAI thinking levels automatically: under 2,000 tokens maps to LOW, 2,000–8,000 maps to MEDIUM, and 8,000 or above maps to HIGH reasoning depth.

How the Protocol Translation Works

The technical heart of this project is the bidirectional translation between Anthropic’s Messages API format and Google’s GenAI generateContent format. Here’s what the proxy handles behind the scenes:

Message format: Anthropic’s user/assistant roles map to GenAI’s Content objects with user/model roles
Tool definitions: Anthropic’s input_schema format is translated to GenAI’s FunctionDeclaration format
Tool results: tool_result blocks become FunctionResponse parts
Streaming: GenAI’s streaming chunks are re-formatted into Anthropic’s SSE event format with proper content_block_delta events and index tracking
Error shapes: API errors are returned in Anthropic’s error format so Claude Code handles them gracefully

This is non-trivial engineering — and it all happens transparently in the proxy layer so you never have to think about it.

Troubleshooting Common Issues

Symptom	Likely Cause	Fix
`GEMINI_API_KEY env var not set`	Missing API key	Set `GEMINI_API_KEY` in `.env` or export it
`502` error in Claude Code	GenAI API failure	Check logs; retry logic handles most transient errors
Tools not being called	Model tool-calling support	Switch to `gemini-2.5-flash`
Empty responses	`max_tokens` too low	Increase `max_tokens` (capped at 8192)
Slow responses	High thinking budget or large context	Reduce `budget_tokens` or use `gemini-2.5-flash`

You can also enable debug logging from Uvicorn to see exactly what’s happening:

uvicorn main:app --host 0.0.0.0 --port 8000 --log-level debug

Why This Matters

This project is a great example of how the standardization of AI APIs enables real ecosystem innovation. Because Claude Code speaks a well-defined protocol, any sufficiently compatible server can serve as its backend. The genai-code proxy exploits exactly this — creating a “Claude Code anywhere” experience powered by free-tier Google models.

For developers who want to:

Experiment with Claude Code without API costs
Run agentic coding workflows entirely on-premise or locally
Compare Gemma and Gemini models on real coding tasks
Build custom AI coding toolchains using open models

…this is a compelling starting point.

Get Started Today

The full source code is available at https://github.com/thecodedata/genai-code The project is MIT-licensed and actively welcomes contributions — especially around improved tool schema translation, Docker packaging, and multi-model support.

If you try it out, drop a comment below and let us know which model worked best for your use case!

Have questions or ran into an issue? Leave a comment below or open an issue on GitHub.