Ex-Meta employee here. I worked at reality labs, perhaps in other orgs the situation is different.
At Meta we did "fix-it weeks", more or less every quarter. At the beginning I was thrilled: leadership that actually cares about fixing bugs!
Then reality hit: it's the worst possible decision for code and software quality. Basically this turned into: you are allowed to land all the possible crap you want. Then you have one week to "fix all the bugs". Guess what: most of the time we couldn't even fix a single bug because we were drown in tech debt.
> Then the week before fixit, each subteam goes through these bugs and sizes them:
I advocate to never size/score bugs. Instead, if your process demands scores, call everything a 2 because over the course of all the bugs, that will be your average. You'll knock out 10 small ones and then get stuck on a big one. Bug-fixing efforts should be more Kanban than Scrum. Prioritize the most important/damaging/whatever ones, do them in order, and keep doing them until they are done or you run out of time.
Surprising that no bug should take more than 2 days, yet most developers fixed only 4 bugs in 5 days.
About stopping and fixing problems, did anybody have had this kind of experience?
1. Working on Feature A, stopped by management or by the customer because we need Feature B as soon as possible.
2. Working on Feature B, stopped because there is Emergency C in production due to something that you warned the customer about months ago but there was no time to stop, analyze and fix.
3. Deployed a workaround and created issue D to fix it properly.
4. Postponed issue D because the workaround is deemed to be enough, resumed Feature B.
5. Stopped Feature B again because either Emergency E or new higher priority Feature F. At this point you can't remember what that original Feature A was about and you get a feeling that you're about to forget Feature B too.
6. Working on whatever the new thing is, you are interrupted by Emergency G that happened because that workaround at step 3 was only a workaround, as you correctly assessed, but again, no time to implement the proper fix D so you hack a new workaround.
Maybe add another couple of iterations but at this time every party are angry or at least unhappy of each other party.
You have a feeling that the work of the last two or three months on every single feature has been wasted because you could not deliver any one of them. That means that the customer wasted the money they paid you. Their problem, but it can't be good for their business so your problem too.
The current state of the production system is "buggy and full of workarounds" and it's going to get worse. So you think that the customer would have been wiser to pause and fix all the nastier bugs before starting Feature A. We could have had a system running smoothly, no emergencies, and everybody happier. But no, so one starts thinking that maybe the best course of action is changing company or customer.
This is weird to me...
The way I learned the trade, and usually worked, is that bug fixing always comes first!
You don't work on new features until the old ones work as they should.
This worked well for the teams I was on. Having a (AFAYK) bug free code base is incredibly useful!!
I’m a strong believer in “fix bugs first” - especially in the modern age of “always be deploying” web apps.
(I run a small SaaS product - a micro-SaaS as some call it.)
We’ll stop work on a new feature to fix a newly reported bug, even if it is a minor problem affecting just one person.
Once you have been following a “fix bugs first” approach for a while, the newly discovered bugs tend to be few, and straight forward to reproduce and fix.
This is not necessarily the best approach from a business perspective.
But from the perspective of being proud of what we do, of making high quality software, and treating our customers well, it is a great approach.
Oh, and customers love it when the bug they reported is fixed within hours or days.
An ex-employer of mine had a regular cycle:
1. Build features at all costs
2. Eventually a high profile client has a major issue during an event, costing them a ton of goodwill
3. Leadership pauses everything and the company only works on bugfixes and tech debt for a week or two
I onboarded during step 3. I should have taken that as a warning that that's how the company operated. If your company doesn't make time for bugfixes and getting out of its own way, that culture is hard to change.In my experience, having a fixit week on the calendar encourages teams to just defer what otherwise could be done relatively easily at first report. ("ah we'll get to it in fixit week"). Sometimes it's a PM justifying putting their feature ahead of product quality, other times it's because a dev thinks they're lining up work for an anticipated new hire's onboarding. It's even hinted at in the article ('All year round, we encourage everyone to tag bugs as “good fixit candidates” as they encounter them.')
My preferred approach is to explicitly plan in 'keep the lights on' capacity into the quarter/sprint/etc in much the same way that oncall/incident handling is budgeted for. With the right guidelines, it gives the air cover for an engineer to justify spending the time to fix it right away and builds a culture of constantly making small tweaks.
That said, I totally resonate with the culture aspect - I think I'd just expand the scope of the week-long event to include enhancements and POCs like a quasi hackathon
In the early days of Hacker News, and maybe even before Hacker News when Reddit didn't have subreddits... OG blogger Joel Spolsky posited the "Joel Test," twelve simple yes/no questions that defined a certain reasonable-by-today's-standards local optimum for shipping software:
https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-s...
Some seem ridiculously obvious today, but weren't standard 25 years ago. Seriously! At the turn of the century, not everyone used a bug database or ticket tracker. Lots of places had complicated builds to production, with error-prone manual steps.
But question five is still relevant today: Do you fix bugs before writing new code?
I like Linears approach to bugs: https://linear.app/now/zero-bugs-policy
We do this too sometimes and I love it. When I work on my own projects I always stop and refactor/fix problems before adding any new features. I wish companies would see the value in doing this
Also love the humble brag. "I've just closed my 12th bug" and later "12 was maximum number of bugs closed by one person"
It's fairly telling of the state of the software industry that the exotic craft of 'fixing bugs' is apparently worth a LinkedIn-style self-promotional blog post.
I don't mean to be too harsh on the author. They mean well. But I am saddened by the wider context, where a dev posts 'we fix bugs occasionally' and everyone is thrilled, because the idea of ensuring software continues to work well over time is now as alien to software dev as the idea of fair dealing is to used car salesmen.
We did this ages ago at our company (back then we were making silly Facebook games, remember those?)
It was by far the most fun, productive, and fulfilling week.
It went on to shape the course of our development strategy when I started my own company. Regularly work on tech debt and actively applaud it when others do it too.
There are really two kinds of "small bugs".
1) Things that have existed in your product for decades and haven't been major strategic issues.
2) Things that arose recently in the wake of launches. This can be because it's hard to fix every corner case, or because of individuals throwing sloppy code over the wall to look like they "ship fast".
I try to hold the team to fix bugs (2) quickly while their memory is fresh as it points to unwanted regressions.
The bugs in (1) are more interesting. It's a bit sad that teams kinda have to "sneak that work in" with fixit weeks. I have known of products large enough to be able to A/B test the effects of a quarter's worth of "small fixes", and finding significant gains in key product metrics. That changed management's attitude with respect to "small fixes" - when you have a ton of them, they can produce meaningful impact worthy of strategic consideration, not just a week of giving the dev team free rein to scratch their itch.
Author here! Really glad to have sparked a lively discussion in the comments. Since there is so many threads since I last looked at this post, making one top level comment to provide some thoughts:
1) I agree that estimating a bug's complexity upfront is an error prone process. This is exactly why I say in the post that we encourage everyone to "feel out" non trivial issues and if it feels like the scope is expanding too much (after a few hours of investigation), to just pick something else after writing up their findings on the bug.
2) I use the word "bug" to refer to more traditional bugs ("X is wrong in product") but also feature requests ("I wish X feature worked differently"). This is just a companyism that maybe I should have called out in the post!
3) There's definitely a risk the fixit week turns into just "let's wait to fix bugs until that week". This is why our fixits are especially for small bugs which won't be fixed otherwise - it's not a replacement for technical hygiene (i.e. refactoring code, removing dead code, improving abstractions) nor a replacement for fixing big/important issues in a timely manner.
I've never understood why bugs get treated differently from new features. If there was a bug, the old feature was never completed. The time cost and benefits should be considered equally.
A company I worked at also did this, though there was no limits. Some folks would choose to spend the whole week working on a larger refactor, for example, I unified all of our redis usage to use a single modern library compared to the mess of 3 libraries of various ages across our codebase. This was relatively easy, but tedious, and required some new tests/etc.
Overall, I think this kind of thing is very positive for the health of building software, and morale to show that it is a priority to actually address these things.
From the report, it sounds like a good thing, for the product and the team morale.
Strangely the math looks such that they could hire nearly 1 FTE engineer that works full time only on "little issues" (40 weeks, given that people have vacations and public holidays and sick time that's a full year's work at 100%), and then the small issues could be addressed immediately, modulo the good vibes created by dedicating the whole group to one cause for one week. Of course nobody would approve that role...
I firmly believe that this sort of fixit week is as much of an anti-pattern as all-features-all-the-time. Ensuring engineers have the agency and the space to fix things and refactor as part of the normal process pays serious dividends in the long run.
eg: My last company's system was layer after layer built on top of the semi-technical founder's MVP. The total focus on features meant engineers worked solo most of the time and gave them few opportunities to coordinate and standardize. The result was a mess. Logic smeared across every layer, modules or microservices with overlapping responsibilities writing to the same tables and columns. Mass logging all at the error or info level. It was difficult to understand, harder to trace, and nearly every new feature started off with "well first we need to get out of this corner we find ourselves painted into".
When I compare that experience with some other environments I've been in where engineering had more autonomy at the day-to-day level, it's clear to me that this company should have been able to move at least as quickly with half the engineers if they were given the space to coordinate ahead of a new feature and occasionally take the time to refactor things that got spaghettified over time.
Microsoft should do this for an entire year. Windows 11 is still a bug-ridden heap of trash.
Teams that implement this, or similar, exercise: how do you handle PR Reviews for fixits, if at all? I'd like to implement, but at a smaller team (8 devs, 3 whom approve PRs) the volume would be so high that the 3 senior devs would likely spend all their time reviewing.
It's concerning that noone was able to fix these bugs as an "aside".
With an average of 4 bugs fixed in 5 days and 150 bugs, we can assume 50 bugs with less than one days's effort were just lying around with noone daring to touch them.
We did a bug hunt once at a previous employer, just stop regular work, open the website, and look for issues. We found over a hundred in a day. Stopping your regular work and actively work with your product is a healthy practice. Facebook did (does?) do a thing where once a week they'd throttle the internet so everyone had to experience what things are like for their average users.
I do wonder though if some of those google products were not part of google but independent co's what would happen ?
granted, I feel like fixing bugs should be pre-allocated to the road map adequately, vs. spending 1 giant cycle every 10 years catching up on bug fixes a la snow leopard (cough cough apple)
I've been pushing for things like this for years...
Having every 3rd or 4th sprint being dev initiatives and bugs... Or having a long/short sprint cycle where short sprints are for bugs mostly... Basically every 3rd week is for meetings and bug work so you get a solid 2 weeks with reduced meetings.
It's hard to convince upper managers of the utility though.
I just had a majorly fun time addressing tech debt, deleting about 15k lines-of-code from a codebase that now has ~45k lines of implementation, and 50k lines of tests. This was made possible by moving from a homegrown auth system to Clerk, as well as consolidating some Cloudflare workers, and other basic stuff. Not as fun as creating the tech debt in the first place, but much more satisfying. Open source repo if you like to read this sort of thing: https://github.com/VibesDIY/vibes.diy/pull/582
I introduced this to my old company years ago and called it Big Block of Cheese Day after the West Wing episode [1]. We mostly focused on very minor bugs that affected a tiny bit of our user base in edgey edge cases but littered our error logs. (This was years ago at a, back then, relatively immature tech company.)
It had the same spirit as a hackathon.
[1] https://westwing.fandom.com/wiki/Big_Block_of_Cheese_Day
> closed a feature request from 2021! > It’s a classic fixit issue: a small improvement that never bubbled to the priority list. It took me one day to implement. One day for something that sat there for four years
> The benefits of fixits
> For the product: craftsmanship and care
sorry, but this is not care when the priority system is so broken that it requires a full suspension, but only once a quarter
> A hallmark of any good product is attention to detail:
That's precisely the issue, taking 4 years to bring attention to detail, and only outside the main priority system.
Now, don't get me wrong, a fixit is better than nothing and having 4 year bugs turn into 40 year ones, it's just that this is not a testament of craftsmanship/care/attention to detail
I wanted to take a look at some of these bug fixes, and one of the linked ones [1] seems more like a feature to me. So maybe it should be the week of "low priority" issues, or something like that.
I don't mean to sound negative, I think it's a great idea. I do something like this at home from time to time. Just spend a day repairing and fixing things. Everything that has accumulated.
We’ve done little mini competitions like this at my company, and it’s always great for morale. Celebrating tiny wins in a light, semi-competitive way goes a long way for elevating camaraderie. Love it!
I like the idea of this, but why not just have some time per week/sprint for bugs? At my company we prioritise features, but we also take some bug tickets every sprint (sometimes loads of bug tickets if there aren't many new features ready for dev), and generally one engineer is on "prod support" which means tackling bugs as they get reported
We once did this for a massive product with 3 releases per year: took a whole cycle to do zero features, and just fix bugs. Internal customers who usually stepped over themselves to get their latest feature in the program, were accepting it. But we had to announce it early. Otherwise the usual consensus is that customers would rather take 1 feature together with 10 new bugs, than -5 bugs and no new features.
Confused about the meaning of "bug" used in this artcle. It seems to be more about feature requests, nice to haves and polish rather than actual errors in edge cases.
Also explains the casual mention of "estimation" on fixes. A real bug fix is even more hard to estimate than already brittle feature estimates.
Good done, otherwise technical debt will have stopped you
We had a quarter, where each Monday we spent most of the day fixing bugs. It greatly improved the product.
It is good to fix bugs, but in my team we need neither the "points system” for bugs nor the leaderboard showing how many points people have. We are against quantifying.
> We also have a “points system” for bugs and a leaderboard showing how many points people have. [...] It’s a simple structure, but it works surprisingly well.
What good and bad experiences have people had with software development metrics leaderboards?
Getting ready to do a December “Bug Smash” based on the model in the book Shape Up. Whole team has been eagerly awaiting it for months.
One nice thing if you work on the B2B software side - end of year is generally slow in terms of new deals. Definitely a good idea to schedule bug bashes, refactors, and general tech debt payments with greater buy in from the business
Focused bug-fixing weeks like this really help improve product quality and team morale. It’s impressive to see the impact when everyone pitches in on these smaller but important issues that often get overlooked.
Systemd should do this too
I'm a bit torn on Fix-it weeks. They are nice but many bugs simply aren't worth fixing. Generally, if they were worth fixing - they would have been fixed.
I do appreciate though that certain people, often very good detail oriented engineers, find large backlogs incredibly frustrating so I support fix-it weeks even if there isn't clear business ROI.
So normally they don't fix bugs before adding feature bloat?
I did this with my entire employment at a company I worked with. Or rather, I should say I made it a point to ignore the roadmap and do what was right for the company by optimizing for value for customers and the team.
Fixit weeks is a band aid, and we also tried it. The real fix is being a good boss and trusting your coworkers to do their jobs.
How did you not get fired?
hello b/Googler :)
So much of the tech debt work scheduling feels like a coordination or cover problem. We’re overdue for a federal “Tech Debt Week” holiday once a year, and just save people all the hand-wringing of how when or how much. If big tech brands can keep affording to celebrate April fools jokes, they can afford to celebrate this.
I love the idea, but this line:
> 1) no bug should take over 2 days
Is odd. It’s virtually impossible for me to estimate how long it will take to fix a bug, until the job is done.
That said, unless fixing a bug requires a significant refactor/rewrite, I can’t imagine spending more than a day on one.
Also, I tend to attack bugs by priority/severity, as opposed to difficulty.
Some of the most serious bugs are often quite easy to find.
Once I find the cause of a bug, the fix is usually just around the corner.