two robots fighting each other

Battle of the Free LLMs: DeepSeek R1 vs ChatGPT o3-mini

Yesterday at work, I found myself in a hellhole of test coverage.

You know the drill—writing unit tests for utility functions. I had already covered 80% with the help of some regular AI models. But then I got told to push it to 100% coverage.

Shit. I hate this part.

The Test Coverage Grind

Completing 100% test coverage is no joke, even if you’re using the smartest LLMs available for free.

I spent hours battling edge cases, making sure every branch, condition, and scenario was covered. My test file grew to 1000+ lines.

DeepSeek R1 vs ChatGPT o3-mini: The Showdown

At first, I was using DeepSeek R1—a free reasoning model. It was… okay. It helped, but it still needed multiple attempts to get things right.

Then I switched to ChatGPT o3-mini.

👀 HOLY. SHIT. 👀

  • Faster? ✅ Yep. Feels instant.
  • More precise? ✅ Hell yes. No second attempts needed.
  • Less hallucinations? ✅ Much more reliable.

First attemptboom, got the correct test case.

The Refactor Test

After getting 100% coverage, I thought: “This test file is bloated as hell.”

So I asked o3-mini to refactor it.

From 1000 lines → 600 lines
Still 100% test coverage

Then I thought, “Wait, did I lose any edge cases?”

So I ran o3-mini again and told it to restore anything important.

Final result: 600 lines, still full coverage.

Final Verdict?

🚀 ChatGPT o3-mini absolutely destroyed DeepSeek R1.

Not even close. Might be o1 vs r1 too, but I didn’t test that one yet.

What’s Your Experience?

Have you tested different LLMs for coding yet? What’s your go-to model for free AI coding assistance?


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *