logoalt Hacker News

keyletoday at 6:29 AM15 repliesview on HN

I was enjoying what I was reading until the "Limit function length" part which made me jolt out of my chair.

This is a common misconception.

   Limit function length: Keep functions concise, ideally under 70 lines. Shorter functions are easier to understand, test, and debug. They promote single responsibility, where each function does one thing well, leading to a more modular and maintainable codebase.
Say, you have a process that is single threaded and does a lot of stuff that has to happen step by step.

New dev comes in; and starts splitting everything it does in 12 functions, because, _a function, should do one thing!_ Even better, they start putting stuff in various files because the files are getting too long.

Now you have 12 functions, scattered over multiple packages, and the order of things is all confused, you have to debug through to see where it goes. They're used exactly once, and they're only used as part of a long process. You've just increased the cognitive load of dealing with your product by a factor of 12. It's downright malignant.

Code should be split so that state is isolated, and business processes (intellectual property) is also self contained and testable. But don't buy into this "70 lines" rule. It makes no sense. 70 lines of python isn't the same as 70 lines of C, for starters. If code is sequential, and always running in that order and it reads like a long script; that's because it is!

Focus on separating pure code from stateful code, that's the key to large maintainable software! And choose composability over inheritance. These things weren't clear to me the first 10 years, but after 30 years, I've made those conclusions. I hope other old-timers can chime in on this.

The length of functions in terms of line count has absolutely nothing to do with "a more modular and maintainable codebase", as explained in the manifesto.

Just like "I committed 3,000 lines of code yesterday" has nothing to do with productivity. And a red car doesn't go faster.


Replies

stousettoday at 6:43 AM

“Ideally under 70 lines” is not “always under 70 lines under pain of death”.

It’s a guideline. There are exceptions. Most randomly-selected 100-line functions in the wild would probably benefit from being four 25-line functions. But many wouldn’t. Maturity is knowing when the guideline doesn’t apply. But if you find yourself constantly writing a lot of long functions, it’s a good signal something is off.

Sure, language matters. Domain matters too. Pick a number other than 70 if you’re using a verbose language like golang. Pick a number less if you’re using something more concise.

People need to stop freaking out over reasonable, well-intentioned guidelines as if they’re inviolable rules. 150 is way too many for almost all functions in mainstream languages. 20 would need to be violated way too often to be a useful rule of thumb.

show 2 replies
rhubarbtreetoday at 7:12 AM

Just chiming in here to say, absolutely you should keep functions small and doing one thing. Any junior reading this should go and read the pragmatic programmer.

Of course a function can be refactored in a wrongheaded way as you’ve suggested, but that’s true of any coding - there is taste.

The ideal of refactoring such a function you describe would be to make it more readable, not less. The whole point of modules is so you don’t have to hold in your head the detail they contain.

Long functions are in general a very bad idea. They don’t fit on a single screen, so to understand them you end up scrolling up and down. It’s hard to follow the state, because more things happen and there is more state as the function needs more parameters and intermediate variables. They’re far more likely to lead to complecting (see Rich Hickey) and intertwining different processes. Most importantly, for an inexperienced dev it increases the chance of a big ball of mud, eg a huge switch statement with inline code rather than a series of higher level abstractions that can be considered in isolation.

I don’t think years worked is an indicator of anything, but I’ve been coding for nearly 40 years FWIW.

show 1 reply
zemtoday at 7:20 AM

> Say, you have a process that is single threaded and does a lot of stuff that has to happen step by step. > New dev comes in; and starts splitting everything it does in 12 functions, because, _a function, should do one thing

I would almost certainly split it up, not because "a function should only do one thing" but because invariably you get a run of several steps that can be chunked into one logical operation, and replacing those steps with the descriptive name reduces the cognitive load of reading and maintaining the original function.

show 1 reply
vanschelventoday at 8:25 AM

And here's John Carmack on the subject of 1000s of lines of code in a single function: http://number-none.com/blow/blog/programming/2014/09/26/carm...

show 1 reply
igogq425today at 9:24 AM

This is a balancing act between conflicting requirements. It is understandable that you don't want to jump back and forth between countless small subfunctions in order to meticulously trace a computation. But conceptually, the overall process still breaks down into subprocesses. Wouldn't it make sense to move these sub-processes into separate functions and name them accordingly? I have a colleague who has produced code blocks that are 6000 lines long. It is then almost impossible to get a quick overview of what the code actually does. So why not increase high-level readability by making the conceptual structure visible in this way?

A ReverseList function, for example, is useful not only because it can be used in many different places, but also because the same code would be more disruptive than helpful for understanding the overall process if it were inline. Of course, I understand that code does not always break down into such neat semantic building blocks.

> Focus on separating pure code from stateful code, that's the key to large maintainable software! And choose composability over inheritance.

100%!

frje1400today at 7:44 AM

I think that you are describing an ideal scenario that does not reflect what I see in reality. In the "enterprise applications" that I work on, long functions evolve poorly. Meaning, even if a long function follows the ideal of "single thread, step by step" when it's first written, when devs add new code, they will typically add their next 5 lintes to the same function because it's already there. Then after 5 years you have a monster.

CraigJPerrytoday at 8:11 AM

What would be a good example of the kinds of things a 100 line function would be doing?

I don't see that in my world so i'm naively trying to inline functions in codebases i'm familiar with and not really valuing the result i can dream up.

For one, my tests would be quite annoying, large and with too much setup for my taste. But i don't think i'd like to have to scroll a function, especially if i had to make changes to the start and end of the function in one commit.

I'm curious of the kinds of "long script" flavoured procedures, what are they doing typically?

I ask because some of the other stuff you mentioned i really strongly agree with like "Focus on separating pure code from stateful code" - this is such an under valued concept, and it's an absolute game changer for building robust software. Can i extract a pure function for this and separately have function to coordinate side effects - but that's incompatible with too long functions, those side effectfull functions would be so hard to test.

show 3 replies
alfons_foobartoday at 6:52 AM

Agree that "splitting for splittings' sake" (only to stay below an arbitrary line count) does indeed not make sense.

On the other hand I often see functions like you describe - something has to be executed step-by-step (and the functionality is only used there) - where I _whish_ it was split up into separate functions, so we could have meaningful tests for each step, not only for the "whole thing".

show 1 reply
psychoslavetoday at 8:06 AM

Funny this is the assertion in the I would most agree to use as general design principle to apply as thoroughly as possible, with tightened variable scope on equal position. Though no general principle should be followed blindly of course.

That's not the the function length per se. A function that is 1000 lines of mere basic assignments or holding a single giant switch can sometime be an apt option with careful consideration of tradeoffs as origin of the design. Number of line doesn't tell much of the function complexity and cognitive load of will imply to grasp what it does, though it can be a first proxy metric.

But most of the time giant functions found in the wild grow up organically with 5 levels of intertwined control control moving down and up, accumulating variables instead of const without consideration to scope span. In that case every time a change is needed, the cognitive load to grasp everything that need to be considered to change this finding is extremely huge. All the more as this giant function most likely won't have an test suit companion, because good engineering practices are more followed at equal level on several points.

cjfdtoday at 8:39 AM

I have been programming professionally for 17 years and I think this guideline is fine. I have difficulty imagining a function of 70 lines that would not be better off being split into multiple functions. It is true that if a function is just a list of stuff longer functions can be allowed then when it does multiple different things but 70 lines is really pushing that.

agentultratoday at 2:56 PM

I'm on the fence about this.

I see huge >130 line functions as a liability. There's so much state in it that a mistake on one line is not obvious. It makes those functions "sticky" and they tend to become the center of a lot of call graphs... like a neutron star. When a mistake is made maintaining or modifying this function it tends to have far-reaching side effects.

On the other hand some APIs (looking at you, OpenGL) are just so verbose that you can't avoid long functions.

I think it's generally good to compose functions from smaller functions where possible. Sometimes you can't and probably shouldn't. But it's hard to give a quantifiable rule in my experience. Approximations work but will never be perfect.

osigurdsontoday at 2:08 PM

At first, I was thinking the same but then realized this is over a full page of code. It isn't an insane rule of thumb at all.

At least we aren't talking about "clean code" level of absurdity here: 5-20 lines with 0 - 2 parameters.

teiferertoday at 8:02 AM

> They're used exactly once

To me, that's key here. That things are scattered over multiple files is a minor issue. Any competent IDE can more or less hide that and smoothen the experience. But if you have factored some code into a function, suddenly other places may call it. You have inadvertently created an API and any change you make needs to double check that other callers either don't exist or have their assumptions not suddenly violated. That's no issue if the code is right therr. No other users, and the API is the direct context of the lines of code right around it. (Yes you can limit visibility to other modules etc but that doesn't fully solve the issue of higher cognitive load.)

show 1 reply
marcelrtoday at 2:05 PM

i see these rules and think “70 lines wow that’s short”

and then i read code and see a 50 line function and am like “wow this function is doing a lot”

sure strict rules aren’t amazing, but i think it would be cool to have a linter warning when there are more than X functions with over 70 lines (this is language dependent - verbosity)

zahlmantoday at 5:38 PM

> These things weren't clear to me the first 10 years, but after 30 years, I've made those conclusions. I hope other old-timers can chime in on this.

'Sup, it's me, the "new dev". Except I, too, have been at it for decades, and I get more and more attached to short functions year over year. (You are correct about composition and about isolating state mutations. But short functions are tools that help me to do those things. Of course, it helps a ton to have functions as first-class objects. Function pointers are criminally underused in C codebases from what I've seen. They can be used for much more than just reinventing C++ vtables.)

People put numbers on their advice because they don't trust the audience to have good taste, or to have a sense of the scale they have in mind. Of course that has the downside that metrics become targets. When I see a number in this kind of advice, I kinda take it in two passes: understand what kind of limit is proposed (Over or under? What is being limited?) and then go back and consider the numeric ballpark the author has in mind. Because, yes, 70 lines of Python is not the same as 70 lines of C.

But I can scarcely even fathom ten lines in a Python function that I write nowadays. And I'm rather skeptical that "LOC needed to represent a coherent idea" scales linearly with "LOC needed to make a whole program work".

> Now you have 12 functions, scattered over multiple packages, and the order of things is all confused, you have to debug through to see where it goes. They're used exactly once, and they're only used as part of a long process. You've just increased the cognitive load of dealing with your product by a factor of 12. It's downright malignant.

Well, no, that isn't what happens at all.

First off, the files where new functions get moved, if they get moved at all, are almost certainly going to be in the same "package" (whatever that means in the programming language in use). The idea that it might be hard to find the implementation code for something not in the current file, is pretty close to being a problem unique to C and C++. And I'm pretty sure modern IDEs have no problem dealing with that anyway.

Second, it absolutely does not "increase the cognitive load by a factor of 12". In my extensive experience, the cognitive load is decreased significantly. Because now the functions have names; the steps in the process are labelled. Because now you can consider them in isolation — the code for the adjacent steps is far easier to ignore.

Why would you "have to debug through to see where it goes"? Again, the functions have names. If the process really is purely sequential, then the original function now reads like a series of function calls, each naming a step in the sequence. It's now directly telling you what the code does and how. And it's also directly telling you "where it goes": to the function that was called, and back.

You also no longer have to read comments interspersed into a longer code flow, or infer logical groupings into steps. You can consider each step in isolation. The grouping is already done for you — that's the point. And if you aren't debugging a problem, that implies the code currently works. Therefore, you don't need to go over the details all at once. You are free to dig in at any point that tickles your curiousity, or not. You don't have to filter through anything you aren't interested in.

(Notice how in the three paragraphs above, I give one-sentence descriptions in the first paragraph of individual advantages, and then dedicate a separate paragraph to expanding on each? That is precisely the same idea of "using short functions", applied to natural language. A single, long paragraph would have been fewer total words, but harder to read and understand, and less coherent.)

All of that said, you don't really debug code primarily by single-stepping through long functions, do you? I find problems by binary search (approximately, guided by intuition) with breakpoints and/or logging. And when the steps are factored out into helper functions, it becomes easier to find natural breakpoints in the "main" function and suss out the culprit.

Shorter functions absolutely do have the properties described in the quote. Almost definitionally so. Nobody really groks code on the level of dozens of individual statements. We know brains don't work like that (https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus...).