logoalt Hacker News

ChrisMarshallNYtoday at 3:41 AM35 repliesview on HN

I love the idea, but this line:

> 1) no bug should take over 2 days

Is odd. It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.

That said, unless fixing a bug requires a significant refactor/rewrite, I can’t imagine spending more than a day on one.

Also, I tend to attack bugs by priority/severity, as opposed to difficulty.

Some of the most serious bugs are often quite easy to find.

Once I find the cause of a bug, the fix is usually just around the corner.


Replies

lkbmtoday at 10:55 PM

A big reason we did a "fix week" at my old job was to deal with all the simple, low priority issues. Sure, there were high severity bugs, but they would get prioritized during normal work, whereas fix week was to prevent death of a thousand cuts. Kinda trivial things that just accumulate and make the site look and feel janky.

Some things turn out to be surprisingly complex, but you can very often know that the simple thing is simple.

muixoozietoday at 1:43 PM

I worked for a company that.. Used msql sever a lot and we would run into a heisenbug every few months that would crash our self hosted msql server cluster or it would become unresponsive. I'm not a database person so I'm probably butchering the description here. From our POV progress would stop and require manual intervention (on call). Back and forth went on with MS and our DBAs for YEARS pouring over logs or whatever they do.. Honestly never thought it would be fixed. Then one time it happened and we caught all the data going into the commit and realized it would 100% reproduce the crash. Only if we restored the database to a specific state and with this specific commit it would crash MS SQL Server. NDAs were signed and I took machete to our code base to create a minimal repro binary that could deserialize our data store and commit / crash MS SQL sever. Made a nice powershell script to wrap it and repro the issue fast and guess what? Within a month they fixed it. Was never clear on what exactly the problem was on their end.. I got buffer overflow vibes, but that's a guess.

show 2 replies
Aurornistoday at 6:34 PM

All of the buggy software projects I've been employed to work on have had some version of this rule.

Usually it's implicit, rather than explicit: Nobody tells you to limit work on bugs to 1-2 days, but if you spend an entire week debugging something difficult and don't accumulate any story points in Jira, a cadre of project manager, program managers, and other manager titles you didn't even know existed will descend upon you and ask why you're dragging the velocity down.

Lesson learned: Next time, avoid the hard bugs and give up early if something isn't going to turn into story points for hidden charts that are viewed by more people than you ever thought.

show 3 replies
QuiEgotoday at 2:20 PM

As someone who works with hardware, hard to repo bugs can take months to track down. Your code, the compiler, or the hardware itself (which is often a complex ball of IP from dozens of manufacturers held together with a NoC) could all be a problem. The extra fun bugs are when a bug is due to problems in two or three of them combining together in the perfect storm to make a mega bug that is impossible to reproduce in isolation.

show 1 reply
kykattoday at 3:45 AM

Sometimes, a "bug" can be caused by nasty architecture with intertwined hacks. Particularly on games, where you can easily have event A that triggers B unless C is in X state...

What I want to say is that I've seen what happens in a team with a history of quick fixes and inadequate architecture design to support the complex features. In that case, a proper bugfix could create significant rework and QA.

show 2 replies
marginalia_nutoday at 9:52 AM

I think in general, bugs go unfixed in two scenarios:

1. The cause isn't immediately obvious. In this case, finding the problem is usually 90% of the work. Here it can't be known how long finding the problem is beforehand, though I don't think bailing because it's taking too long is a good idea. If anything, it's those really deep rabbit holes the real gremlins can hide.

2. The cause is immediately obvious, but is an architecture mistake, the fix is a shit-ton of work, breaks workflows, requires involving stakeholders, etc. Even in this case it can be hard to say how long it will take, especially if other people are involved and have to sign off on decisions.

I suppose it can also happen in low-trust sweatshops where developers held on such a tight leash they aren't able to fix trivial bugs they find without first going through a bunch of jira rigmarole, which is sort of low key the vibe I got from the post.

OhMeadhbhtoday at 4:31 AM

At Amazon we had a bug that was the result of a compiler bug and the behaviour of intel cores being mis-documented. It was intermittent and related to one core occasionally being allowed to access stale data in the cache. We debugged it with a logic analyzer, the commented nginx source and a copy of the C++ 11 spec.

It took longer than 2 days to fix.

show 3 replies
cvosstoday at 10:43 PM

The article addresses your concerns directly.

> In one of our early fixits, someone picked up what looked like a straightforward bug. It should have been a few hours, maybe half a day. But it turned into a rabbit hole. Dependencies on other systems, unexpected edge cases, code that hadn’t been touched in years.

> They spent the entire fixit week on it. And then the entire week after fixit trying to finish it. What started as a bug fix turned into a mini project. The work was valuable! But they missed the whole point of a fixit. No closing bugs throughout the week. No momentum. No dopamine hits from shipping fixes. Just one long slog.

> That’s why we have the 2-day hard limit now. If something is ballooning, cut your losses. File a proper bug, move it to the backlog, pick something else. The limit isn’t about the work being worthless - it’s about keeping fixit feeling like fixit.

oldestofsportstoday at 8:16 PM

> It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.

I understood it as the whole point of the 2 day hard limit - you start working on a bug that turn out to be bigger than expected, so you write down your findings and move on to the next one.

PaulKeebletoday at 3:56 AM

Sometimes you find the cause of the bug in 5 minutes because its precisely where you thought it was, sometimes its not there and you end up writing some extra logging to hopefully expose its cause in production after the next release because you can't reproduce as its transient. I don't know how to predict how long a bug will take to reproduce and track down and only once its understood do we know how long it takes to fix.

khannntoday at 11:19 AM

I had a job that required estimation on bug tickets. It's honestly amazing how they didn't realize that I'd take my actual estimate, then multiply it by 4, then use the extra time to work on my other bug tickets that the 4x multiplier wasn't good enough for.

show 2 replies
brightballtoday at 1:10 PM

In my experience, the vast majority of bugs are quick fixes that are easy to isolate or potentially even have a stack trace associated with them.

There will always be those “only happens on the 3rd Tuesday every 6 months” issues that are more complicated but…if you can get all the small stuff out of the way it’s much easier to dedicate some time to the more complicated ones.

Maximizing the value of time is the real key to focusing on quicker fixes. If nobody can make a case why one is more important than other, then the best use of your time is the fastest fix.

sshinetoday at 9:33 AM

> unless fixing a bug requires a significant refactor/rewrite, I can’t imagine spending more than a day on one

Race conditions in 3rd party services during / affected by very long builds and with poor metrics and almost no documentation. They only show up sometimes, and you have to wait for it to reoccur. Add to this a domain you’re not familiar with, and your ability to debug needs to be established first.

Stack two or three of these on top of each other and you have days of figuring out what’s going on, mostly waiting for builds, speculating how to improve debug output.

After resolving, don’t write any integration tests that might catch regressions, because you already spent enough time fixing it, and this needs to get replaced soon anyway (timeline: unknown).

ZaoLahmatoday at 10:25 AM

> That said, unless fixing a bug requires a significant refactor/rewrite, I can’t imagine spending more than a day on one.

The longer I work as a software engineer, the rarer it is that I get to work with bugs that take only a day to fix.

show 1 reply
chiitoday at 3:44 AM

I find most bugs take less time to fix than it takes time to verify and reproduce.

show 1 reply
beberleitoday at 10:29 AM

Its odd at first, but springs from economic principles, mainly sunk cost fallacy.

If you invest 2 days of work and did not find the root cause of a bug, then you have the human desire to keep investing more work, because you already invested so much work. At that point however its best to re-evaluate and do something different instead, because it might have a bigger impact.

Likelihood that after 2 days of not finding the problem, you wont find it after another 2 days is higher than starting over with another bug that on average you find the problem earlier.

show 1 reply
pjc50today at 10:00 AM

I think the worst case I encountered was something like two years from first customer report to even fully confirming the bug, followed by about a month of increasingly detailed investigations, a robot, and an osciliscope.

The initial description? "Touchscreen sometimes misses button presses".

show 1 reply
claw-eltoday at 6:14 PM

> Also, I tend to attack bugs by priority/severity, as opposed to difficulty.

This is one part that is rarely properly implemented. We have our bug bash days too, but I noticed after the fact that maybe 1/3 of the bugs we solved is on a feature we are thinking of deprecating soon due to low usage.

How can we attack bugs better by priority?

peepee1982today at 2:55 PM

Yep. Also, sometimes you figure out a bug and in the process you find a whole bunch of new ones that the first bug just never let surface.

JJMcJtoday at 3:59 PM

It's like remodeling. The drywall comes down. Do you just put up a new sheet or do you need to reframe one wall of the house?

thfurantoday at 4:07 PM

>I can’t imagine spending more than a day on one.

You mean starting after it has been properly tracked down? It can often take a whole lot of time to go from "this behavior is incorrect sometimes" to "and here's what need to change".

show 1 reply
michaelbuckbeetoday at 5:22 PM

Something I often find are "categorical" bugs where it's really 3 or 4 different bugs in a trench coat all presenting as a single issue.

huhertotoday at 2:33 PM

I do agree that you should be able to fix most bugs in 2 days or less. If you have many bugs taking longer to fix, it may be an indication that you may have systemic issues. (e.g design, architectural, tooling, environment access, test infrastructure, etc)

show 1 reply
AbstractH24today at 2:52 PM

> Is odd. It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.

Learning how to better estimate how long tasks take is one of my biggest goals. And one I've yet to even figure out how to master

Uehrekatoday at 4:37 AM

> It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.

In my experience there are two types of low-priority bugs (high-priority bugs just have to be fixed immediately no matter how easy or hard they are).

1. The kind where I facepalm and go “yup, I know exactly what that is”, though sometimes it’s too low of a priority to do it right now, and it ends up sitting on the backlog forever. This is the kind of bug the author wants to sweep for, they can often be wiped out in big batches by temporarily making bug-hunting the priority every once in a while.

2. The kind where I go “Hmm, that’s weird, that really shouldn’t happen.” These can be easy and turn into a facepalm after an hour of searching, or they can turn out to be brain-broiling heisenbugs that eat up tons of time, and it’s difficult to figure out which. If you wipe out a ton of category 1 bugs then trying to sift through this category for easy wins can be a good use of time.

And yeah, sometimes a category 1 bug turns out to be category 2, but that’s pretty unusual. This is definitely an area where the perfect is the enemy of the good, and I find this mental model to be pretty good.

show 1 reply
dockdtoday at 4:35 PM

How is this for a rule of thumb: the time it takes to fix a bug is directly related to the age of the software.

show 1 reply
lapcattoday at 3:49 AM

> It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.

This is explained later in the post. The 2 day hard limit is applied not to the estimate but rather to the actual work: "If something is ballooning, cut your losses. File a proper bug, move it to the backlog, pick something else."

show 1 reply
mobeigitoday at 11:08 AM

I believe the idea is to pick small items that you'd likely be able to solve quickly. You don't know for sure but you can usually take a good guess at which tasks are quick.

yxhuvudtoday at 1:44 PM

I've seen people spending 4 months on a hard to replicate segfault.

ahokatoday at 10:23 AM

Not sure why would you ever need to refactor for fixing a bug?

show 1 reply
jorvitoday at 3:16 PM

Yeah, "no bug should take over 2 days" tells me you've never had a race condition in your codebase.

show 2 replies
mat0today at 7:45 AM

you cannot know. that’s why the post elaborates saying (paraphrasing) “if you realize it’s taking longer, cut your losses and move on to something else”

w0mtoday at 1:54 PM

> That said, unless fixing a bug requires a significant refactor/rewrite, I can’t imagine spending more than a day on one.

oh sweet sweet summer child...

j45today at 5:12 AM

Bugs taking less than 2 days are great to have as a target but will not be something that can be guaranteed.

show 1 reply
triyambakamtoday at 3:50 AM

> It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.

Now I find that odd.

show 3 replies