Why Does Claude Swallow Exceptions?

The Problem

Have you ever gotten the feeling Claude (or Codex, etc.) is allergic to bubbling up exceptions? I find myself adding "don't swallow exceptions" to CLAUDE.md in almost every repo I work in. Why is this necessary? Claude (Opus 4.5+) is so smart and so good at coding, better than at least half of the engineers I've personally worked with over my career at the actual writing of code, so why does it so frequently commit this particular newbie mistake?

Same reason a model does anything: it was trained to.

Coding models go through a whole series of unsupervised and supervised training regimes, starting with next token prediction, later getting to instruction tuning, and at some point RLVR (reinforcement learning with verifiable rewards) specifically for coding tasks. In the coding RLVR step, models write code, which is executed in a sandbox environment, and then the models are rewarded or penalized depending on whether the code "works". "Works" generally means it is free of syntax errors and accomplishes some specified task such as fixing a bug or writing a function to implement an interface. If the code works as desired, plus one point. If not, minus one point. Rinse and repeat a few million times and the model learns to code.

This is where I think coding models learn a lot of their bad behaviors, including swallowing exceptions. And why they sometimes just can't help themselves; even when you instruct them explicitly to bubble exceptions up, sometimes instinct kicks in.

Does this mean SWE-bench and other coding eval test sets are written to encourage this bad behavior? No. These test sets are constructed or curated by very smart people who understand good coding practices probably much better than I do. I don't think the tests are bad, but I do suspect that the RLVR training structure has some unintended consequences.

A Toy Example

(Generated by Claude, obviously)

The Repo's Bug

Imagine a small library for parsing user records. Users report that get_display_name crashes when a user has no first_name set — it should fall back to their username.

# users.py
def get_display_name(user):
    return f"{user['first_name']} {user['last_name']}"

def log_access(user, action):
    timestamp = user['last_login'].isoformat()
    print(f"[{timestamp}] {user['username']} performed {action}")

The Hidden Test

# test_users.py
def test_display_name_falls_back_to_username():
    user = {
        'username': 'alice',
        'last_name': 'Smith',
        # no first_name
        # no last_login either
    }
    log_access(user, 'view_profile')  # incidental: agent's patch calls this
    assert get_display_name(user) == 'alice'

Note the test fixture is missing last_login. Maybe that's a valid state, maybe it's not. But one way or another that's how the test is defined.

Patch A — Raises Incidental Exception

The agent fixes the bug correctly, and "helpfully" adds an access log call so the fix is observable in production (probably b/c it sees log_access calls elsewhere in the codebase).

def get_display_name(user):
    if not user.get('first_name'):
        log_access(user, 'display_name_fallback')  # incidental
        return user['username']
    return f"{user['first_name']} {user['last_name']}"

Run the test → log_access tries user['last_login'].isoformat() → KeyError: 'last_login' → test fails. The assertion on get_display_name is never even reached.

Patch B — Swallows Incidental Exception

Same fix, same incidental log call, but the agent wraps it:

def get_display_name(user):
    if not user.get('first_name'):
        try:
            log_access(user, 'display_name_fallback')
        except Exception:
            pass
        return user['username']
    return f"{user['first_name']} {user['last_name']}"

The Result

Run the test → log_access raises → swallowed → control returns → assertion passes → test passes.

Both patches contain the same correct fix for the actual bug. The difference is entirely in incidental code that the test wasn't written to exercise. But that code still gets executed, and the reward signal is binary: Patch B gets the reward, Patch A doesn't. Now, this may not be a perfect example, as try/except-wrapping custom logging functions is arguably a good practice, but this same "error in incidental code" phenomenon could crop up a million different ways: event publishing, cache management, brand new functions the model writes that turn out not to be necessary to actually passing the test, but still get executed.

The point is that over the course of many, many RLVR coding tests, there will likely be a lot of code which is a) executed but b) not actually critical to passing the test. Within all of this "incidental" code (or anything that calls it), the model is incentivized to swallow exceptions, defensively check types, and generally try to catch and handle any type of error it can imagine. I bet you've seen stuff like this before too:

if type(x) == str:
  y = json.loads(x)
  some_attr = y['some_attr']
elif type(x) == dict:
  some_attr = x['some_attr']

and thought "what the hell is this nonsense, x is obviously a dict". You know that, but the model may not; more importantly, it probably encountered 100 examples in its training regimes where it didn't know if x was a str or dict, and this type of hyper-cautious handling means it got the right answer on 99 instead of 97.

The overall point is that due to the structure of the reward sytems in training, there will be some reward for the model to act defensively, and generally no penalty for doing so. In a few cases, a test case may explicitly ask for an exception to be raised (and verify that it is), but evidently not often enough to offset the pressure to act defensively.

ZFC

This theory isn't just limited to exception handling. I believe the same phenomenon is behind the behavior Steve Yegge observes in Zero Framework Cognition: A Way to Build Resilient AI Applications. Why do models write dumb little regexes to handle edge cases? Because when they were learning how to code, writing dumb little regexes worked! (at least some of the time). But making live calls to an external model to make a more intelligent decision was almost certainly disallowed during these training exercises.

I observed a very similar issue myself when vibe-coding career-genie. At one point Claude wrote a prompt explicitly instructing the chat agent not to call two tools at once, but then added in defensive handling for the case where it received two tool calls at once. I asked "why not just send an exception back to the chat agent if it does this thing you explicitly told it not to do?" and Claude said "oh yeah, great idea". But its first instinct was again "the show must go on!" behavior, aka try to handle every error case without breaking the flow.

Conclusion

What's the takeaway? One is that I think this problem will get mostly solved. Now that coding models are good at not just writing but also reviewing code, it should be possible to incorporate fully automated code reviews into the scores assigned to models when they are learning to write code, rather than pure binary rewards. Big labs are probably at least experimenting with these techniques if not already incorporating them into the next generation of models.

In the meantime, though, this is one of many good reasons to incorporate automated code reviews into your own workflow. Even when you explicitly identify anti-patterns and warn against them in CLAUDE.md, in certain cases you are fighting against the model's instincts, and won't always win. But a 2nd-pass check for these anti-patterns in an automated code review prompt will go a long way towards keeping your codebase fully clean of "things you don't like".