logoalt Hacker News

yowlingcattoday at 7:06 PM0 repliesview on HN

One of the most challenging kinds of thought to work through with my engineers in professional communication is nuance. For example, they may say something like this, but actually mean "For a particular situation, this is wrong."

The context a decision is evaluated is particularly important for "rules of thumb" like this. There's the rule of 3 (which many senior engineers imparted to me earlier on in my career) - don't refactor until you've actually duplicated it thrice, but even so, what they speak of is a catch-22 that's pretty important to reason about carefully.

On one hand, if you overcorrected on the fear of abstraction, you could easily end up with 500 duplicates that are slightly different and need to be maintained 500 different ways, slowly causing slightly wrong behavior some of the time, data corruption, combinatoric explosion. Surely, once there is such a situation, some degree of abstraction is the only right decision.

On the other hand, if you overcorrected on the fear of duplication early on, you could easily end up with a premature optimization and complexity -- complexity which, most importantly, could be rooted in a gap of understanding of how the code will be used and what direction it may go in over time (often based on which direction the business will go over time).

The only answer that actually works, of course, is "somewhere in the middle." Obviously, that's pretty vague and not very useful. Where, exactly, in the middle IS the right place?

As the years have gone by, I've become more and more steadfast that the answer to that question is and must be an art and not a science. Of course, it must always be rooted in practicality, the actual context of the code around it and where the code/business was in the past and where it will be in the future.

But just as importantly, some of it must be based around beliefs in the face of imperfect information about what you want to invest in for the sake of the technology, the team that develops it, and the business that relies on it. It could be that for your team, your values make it make sense to go a little bit further than "good enough" on normalizing your data modeling, because the way you like to run your business requires that normal form to do the analytics and make decisions productively. It could be that for your team, your values make it make sense to go a little bit further than "good enough" on splitting service boundaries and ensuring clean queues and message passing infrastructure, because you have seasonal spikes where you need to scale up to a ton of load and then scale down after without constantly doing a song and dance or pre-provisioning fragile infrastructure.

But the most common thread there is - art, not a science. Every single decision depends on YOUR team, YOUR business, YOUR needs - and like any art, there is no universal rule or discovery or best practice in the industry that will magically work for your needs without working through the details of whether it appropriately fits your situation or not.

So with that said - I can't really agree with you. At any place I've ever worked with a competent team, maintaining duplicate code is just not that hard and follows the same process for being dealt with. Built a robust test suite that encodes the actual differences and the shared structure. Pull out the pieces that have a good reason to be abstracted and redesign the pieces that encode the true differential structure in a way that is intuitive. Lather rinse repeat. It's always straightforward because it's known - by the time you are doing this process, you've had tons of repetitions and data on what is driving you to develop the abstraction, so when you make the decision, you are making it empirically.

Conversely, I have seen many otherwise competent teams slowed to a halt with premature abstraction. Frameworks that were well intended and reduced duplication, but encoded coupling between components that at a certain point in the businesses progression, fought with reality rather than aided, and all because they were frozen into place before anyone empirically had really clear data about whether the abstraction would be worth it long term. Well intended "clean code" refactors that were meant to solve the old "bad duplication" but instead created a far more difficult to reason about "abstracted base" of code that didn't really solve any of the domain modeling problems and was just as difficult to maintain without introducing buggy behaviors (if not more so) than before.

The biggest problem is that premature abstraction is sexy and fun. There are incentives and dopamine hits from doing it extraneously. But fixing legacy duplication is not fun. And so when it gets done, it tends to get done in a pragmatic way to relieve pain rather than to elicit pleasure. That, I believe is one of the biggest confounding sociological aspects of this whole discussion.