I’ve found it’s mostly two things: readability (ironically) and performance. I’ll describe a few crude examples, but I won’t get too much into specifics, otherwise I might as well write another book myself.
The performance part is simple: its excessive reliance on polymorphism and the presence of several levels of abstraction just doesn’t allow for good code generation. I’ve seen 10x+ performance improvements by dropping all of the above, with often minimal loss in readability; on the contrary, oftentimes the code became more readable as well.
The readability part is harder to explain; not only because it depends on the codebase and the problem at hand, but also on the coding style each programmer has (though in my opinion, in that particular case it’s the programmer’s problem, not the codebase’s).
I like to think of codebases as puzzles. To understand a codebase, you need to piece together said puzzle. What I’ve found with Clean Code codebases is that each piece of the puzzle is itself a smaller puzzle to piece together, which isn’t ideal.
Functions
They should be small and do one thing
I generally disagree, not because those ideas are wrong, but because they’re often too limiting.
What often happens by following those principles is you end up with a slew of tiny functions scattered around your codebase (or a single file), and you are forced to piece together the behaviour they exhibit when called together. Your code loses locality and, much like with CPU cache locality, your brain has to do extra work to retrieve the information it needs every time it needs to jump somewhere else.
It may work for describing what the code does at a high level, but understanding how it works to make meaningful changes will require a lot more work as a result.
Don’t repeat yourself
Once again, it makes sense in principle, but in practice it often creates more problems. I agree that having massive chunks of repeated code is bad, no questions about it, but for smaller chunks it may actually be desirable in some circumstances.
By never repeating code, you end up with functions that are over-parameterized to account for all possible uses and combinations that particular code snippet needs to work with. As a result, that code becomes more complex, and the code that calls it does too, because it requires you to know all the right parameters to pass for it to do the right thing.
Exceptions
Exceptions are just bad. They are a separate, hidden control flow that you constantly need to be wary of.
The name itself is a misnomer in my opinion, because they’re rarely exceptional: errors are not just common, but an integral part of software development, and they should be treated as such.
Errors as values are much clearer, because they explicitly show that a function may return an error and that it should be handled.
Classes, interfaces and polymorphism
I have lots of gripes with object orientation. Not everything needs to be an object, not everything needs to be polymorphic.
There’s no need to have a Base64Decoder, much less an IBase64Decoder or an AbstractBase64Decoder. Base64 only works one way, there are no alternative implementations, a function is enough.
I’m a lot more on the data oriented side of the isle than the OO one, but I digress.
Object orientation can be good in certain contexts, but it’s not a silver bullet.
Encapsulation for the sake of it
Let’s say you have something like this:
class Point {
public float X, Y;
}
With the Clean Code approach, it magically becomes:
class Point {
private float x, y;
public float get_x() {
return this.x;
}
public float get_y() {
return this.y;
}
public void set_x(float x) {
this.x = x;
}
public void set_y(float y) {
this.y = y;
}
}
Why? Who the hell knows. It makes absolutely no tangible difference, it only makes your code longer and more verbose. Now, if a value needs validation, sure, but oftentimes this is just done regardless and it drives me insane.
Abstract classes for everything!
“You’ll never know when you’ll need to add another implementation, right?”
“You don’t need to know the underlying implementation”
The problem with wanting to create the most generalized code in advance is that you end up stuck in abstraction hell.
You may as well not need the ability to have arbitrary implementations, but now you need to plan for that.
Not only that, but it also makes reasoning about your code harder: how many times have you had to step through your code just to figure out what was being executed | just to figure out what particular concrete class was hiding behind an abstract class reference?
I myself, way too many, and there was often no reason for that.
Also, the idea that you shouldn’t know about the implementation is crazy to me. Completely encapsulating data and behaviour not only makes you miss out on important optimizations, but often leads to code bloat.
There’s more but I’m tired of typing :)
Take a look at these if you want more info or someone else’s view on the matter, I wholeheartedly recommend both:
Functions should be small and do one thing […] you end up with a slew of tiny functions scattered around your codebase (or a single file), and you are forced to piece together the behaviour they exhibit when called together
I believe you have a wrong idea of what “one thing” is. This comes together with “functions should not mix levels of abstraction” (cited from the first blog entry you referenced). In a very low-level library, “one thing” may be sending an IP packet over a network interface. Higher up, “one thing” may be establishing a database connection. Even higher up, “one thing” may be querying a list of users from the database, and higher up yet again is responding to the GET /users http request. All of these functions do ‘one thing’, but they rely on calls to a few methods that are further down on the abstraction scheme.
By allowing each function to do ‘one thing’, you decompose the huge problem that responding to an HTTP request actually is into more manageable chunks. When you figure out what a function does, it’s way easier to see that the function connectToDb will not be responsible for why all users are suddenly called "Bob". You’ll look into the http handler first, and if that’s not responsible, into getUsersFromDb, and then check what sendQuery does. If all methods truly do one thing, you’ll be certain that checkAuthorization will not be related to the problem.
Tell me if I just didn’t get the point you were trying to make.
Edit: I just read
Martin says that functions should not be large enough to hold nested control structures (conditionals and loops); equivalently, they should not be indented to more than two levels. He says blocks should be one line long, consisting probably of a single function call. […] Most bizarrely, Martin asserts that an ideal function is two to four lines of code long.
If that’s the standard of “doing one thing”, then I agree with you. This is stupid.
I’m not a fan of this kind of code fragmentation.
If all those actions were related and it could have been just one thing, retaining a lot more context, then it should be one function imo.
If by not splitting it it became massive with various disconnected code blocks, sure, but otherwise I’d much prefer being able to read everything together.
If splitting the functions required producing side effects to maintain the same functionality, then that’s even worse.
Huh, I really like code like that. Having a multi-step process split up into sections like that is amazing to reason about actual dependencies of the individual sections. Granted, that only applies if the individual steps are kinda independently meaningful
This allows you to immediately see that part1 and part2 are independently calculated, and what goes into calculating them.
There are several benefits, e.g.:
if there is a problem, you can more easily narrow down where it is (e.g. if part2 calculates as expected and part1 doesn’t, the problem is probably in function1, not function2 or function3). If you have to understand the whole do_stuff before you can effectively debug it, you waste time.
if the function needs to be optimized, you know immediately that function1 and function 2 can probably run in parallel, and even if you don’t want to do that, the slow part will show up in a flame graph.
It really depends on the context frankly. I did say it was a crappy example ;)
Try to read this snippet I stole from Clean Code and tell me if it’s readable without having to uselessly jump everywhere to understand what’s going on:
The “Don’t repeat yourself” mantra is also used with documentation, this leads to documentation which you first have to read and learn unless you frequently want to step into issues of the documentation assumed you read prior parts and didn’t just searched how to do XYZ.
Thank you for linking the blog posts. They are a really good deterrent from Clean Code. I once thought I’d read it, but Fowler’s advice really is stupid.
In case you’re wondering why I replied three times: “Do one thing” :)
Exceptions are just bad. They are a separate, hidden control flow that you constantly need to be wary of.
The name itself is a misnomer in my opinion, because they’re rarely exceptional: errors are not just common, but an integral part of software development
They may be a part of software development, but they should not be common during the normal execution of software. I once read the hint, “if your app doesn’t run with all exception handlers removed, you are using exceptions in non-exceptional cases”.
Throwing an exception is a way to tell your calling function that you encountered a program state in which you do not know how to proceed safely. If your functions regularly throw errors at you, you didn’t follow their contract and (for instance) didn’t sanitize the data appropriately.
Errors as values are much clearer, because they explicitly show that a function may return an error and that it should be handled.
I disagree here. You can always ignore an error return value and pretend that the “actual” value you got is correct. Ignoring an exception, on the other hand, requires the effort to first catch it and then write an empty error handler. Also (taking go as an inspiration), I (personally) find this very hard to read:
This code mingles two separate things: The “normal” flow of the program, which is supposed to facilitate a business case, and error handling.
In this example, on the other hand, you can easily figure out the flow of data and how it relates to the function’s purpose and ignore possible errors. Or you can concentrate on the error handling, if you so choose. But you don’t have to do both simultaneously:
try {
res = try_something()
res2 = try_something_else(res)
res3 = try_yet_something_else(res2)
return res3
} catch (e) {
// check which error it is and handle it appropriately
throw_own_exception()
}
Also (taking go as an inspiration), I (personally) find this very hard to read
Agreed. Go’s implementation of errors as values is extremely noisy and error prone. I’m not a fan of it either.
You can always ignore an error return value and pretend that the “actual” value you got is correct.
Then that’s a language design / api design issue. You should make it so you cannot get the value unless you handle the error.
I’m of the opinion that errors should be handled “as soon as possible”. That doesn’t necessarily mean immediately below the function call the error originates from, it may very well be further up the call chain. The issue with exceptions is that they make it difficult to know whether or not a function can fail without knowing its implementation, and encourage writing code that spontaneously fails because someone somewhere forgot that something should be handled.
The best implementation of errors as values I’ve seen is Rust’s Result type, which paired with the ? operator can achieve a similar flow to exceptions (when you don’t particularly care where exactly an error as occurred and just want to model the happy path) while clearly signposting all fallible function calls.
So taking your example:
try {
res = try_something()
res2 = try_something_else(res)
res3 = try_yet_something_else(res2)
return res3
} catch (e) {
// check which error it is and handle it appropriately
throw_own_exception()
}
The difference is that you know that try_something and try_yet_something_else may fail, while try_something_else cannot, and you’re able to handle those errors further up if you wish.
You could do so with exceptions as well, but it wasn’t clear.
The same clarity argument can be made for null as well. An Option type is much more preferable because it forces you to handle the case in which you are handed nothing. If a function can operate with nothing, then you can clearly signpost it with an Option<T>, as opposed to just T if a value is mandatory.
Exceptions are also a lot more computationally expensive. The compiler needs to generate landing pads and a bunch of other stuff, which not only bloat your binaries but also prevent several optimizations. C# notoriously cannot inline functions containing throws for example, and utility methods must be created to mitigate the performance impact.
I generally agree, but there are some things that are oversimplified. Sure a point(x, y) can have public attributes, but usually business objects are a bit more complex: insurancePolicy, deliveryRoute, user, etc. Having some control over those is definitely something you want to implement, at the cost of some boilerplate.
Oh for sure. I have nothing against getters and setters when they’re justified, but in the case of bare fields with no validation like that example it’s just annoying.
Also stuff like this just grinds my gears (oversimplified example again):
class IntegerAdder {
private int a, b;
public IntegerAdder(int a, int b) {
this.a = a;
this.b = b;
}
public int get_sum() {
return a + b;
}
}
Just make it a bloody function.
You may say it’s silly, but I’ve genuinely found code like this in the wild. Not that exact code snippet of course but that was the spirit.
The “Don’t repeat yourself” mantra is also used with documentation, this leads to documentation which you first have to read and learn unless you frequently want to step into issues of the documentation assumed you read prior parts and didn’t just searched how to do XYZ.
Also while I used the more clean code oriented XML DOM implementation for my D XML parser (or at least copied such code as it was abandoned by its original creator), I planned a much simpler system for my SDLang parser. While everything originates from the DLElement abstract class, I didn’t go overboard with the interfaces, etc.
I’ve found it’s mostly two things: readability (ironically) and performance. I’ll describe a few crude examples, but I won’t get too much into specifics, otherwise I might as well write another book myself.
The performance part is simple: its excessive reliance on polymorphism and the presence of several levels of abstraction just doesn’t allow for good code generation. I’ve seen 10x+ performance improvements by dropping all of the above, with often minimal loss in readability; on the contrary, oftentimes the code became more readable as well.
The readability part is harder to explain; not only because it depends on the codebase and the problem at hand, but also on the coding style each programmer has (though in my opinion, in that particular case it’s the programmer’s problem, not the codebase’s).
I like to think of codebases as puzzles. To understand a codebase, you need to piece together said puzzle. What I’ve found with Clean Code codebases is that each piece of the puzzle is itself a smaller puzzle to piece together, which isn’t ideal.
Functions
They should be small and do one thing
I generally disagree, not because those ideas are wrong, but because they’re often too limiting.
What often happens by following those principles is you end up with a slew of tiny functions scattered around your codebase (or a single file), and you are forced to piece together the behaviour they exhibit when called together. Your code loses locality and, much like with CPU cache locality, your brain has to do extra work to retrieve the information it needs every time it needs to jump somewhere else.
It may work for describing what the code does at a high level, but understanding how it works to make meaningful changes will require a lot more work as a result.
Don’t repeat yourself
Once again, it makes sense in principle, but in practice it often creates more problems. I agree that having massive chunks of repeated code is bad, no questions about it, but for smaller chunks it may actually be desirable in some circumstances.
By never repeating code, you end up with functions that are over-parameterized to account for all possible uses and combinations that particular code snippet needs to work with. As a result, that code becomes more complex, and the code that calls it does too, because it requires you to know all the right parameters to pass for it to do the right thing.
Exceptions
Exceptions are just bad. They are a separate, hidden control flow that you constantly need to be wary of.
The name itself is a misnomer in my opinion, because they’re rarely exceptional: errors are not just common, but an integral part of software development, and they should be treated as such.
Errors as values are much clearer, because they explicitly show that a function may return an error and that it should be handled.
Classes, interfaces and polymorphism
I have lots of gripes with object orientation. Not everything needs to be an object, not everything needs to be polymorphic. There’s no need to have a
Base64Decoder
, much less anIBase64Decoder
or anAbstractBase64Decoder
. Base64 only works one way, there are no alternative implementations, a function is enough.I’m a lot more on the data oriented side of the isle than the OO one, but I digress.
Object orientation can be good in certain contexts, but it’s not a silver bullet.
Encapsulation for the sake of it
Let’s say you have something like this:
With the Clean Code approach, it magically becomes:
Why? Who the hell knows. It makes absolutely no tangible difference, it only makes your code longer and more verbose. Now, if a value needs validation, sure, but oftentimes this is just done regardless and it drives me insane.
Abstract classes for everything!
The problem with wanting to create the most generalized code in advance is that you end up stuck in abstraction hell.
You may as well not need the ability to have arbitrary implementations, but now you need to plan for that.
Not only that, but it also makes reasoning about your code harder: how many times have you had to step through your code just to figure out what was being executed | just to figure out what particular concrete class was hiding behind an abstract class reference?
I myself, way too many, and there was often no reason for that.
Also, the idea that you shouldn’t know about the implementation is crazy to me. Completely encapsulating data and behaviour not only makes you miss out on important optimizations, but often leads to code bloat.
There’s more but I’m tired of typing :)
Take a look at these if you want more info or someone else’s view on the matter, I wholeheartedly recommend both:
I believe you have a wrong idea of what “one thing” is. This comes together with “functions should not mix levels of abstraction” (cited from the first blog entry you referenced). In a very low-level library, “one thing” may be sending an IP packet over a network interface. Higher up, “one thing” may be establishing a database connection. Even higher up, “one thing” may be querying a list of users from the database, and higher up yet again is responding to the
GET /users
http request. All of these functions do ‘one thing’, but they rely on calls to a few methods that are further down on the abstraction scheme.By allowing each function to do ‘one thing’, you decompose the huge problem that responding to an HTTP request actually is into more manageable chunks. When you figure out what a function does, it’s way easier to see that the function
connectToDb
will not be responsible for why all users are suddenly called"Bob"
. You’ll look into the http handler first, and if that’s not responsible, intogetUsersFromDb
, and then check whatsendQuery
does. If all methods truly do one thing, you’ll be certain thatcheckAuthorization
will not be related to the problem.Tell me if I just didn’t get the point you were trying to make.
Edit: I just read
If that’s the standard of “doing one thing”, then I agree with you. This is stupid.
Yeah that was essentially what I was referring to (referring to your edit).
I generally dislike stuff like (crappy example incoming):
I’m not a fan of this kind of code fragmentation.
If all those actions were related and it could have been just one thing, retaining a lot more context, then it should be one function imo.
If by not splitting it it became massive with various disconnected code blocks, sure, but otherwise I’d much prefer being able to read everything together.
If splitting the functions required producing side effects to maintain the same functionality, then that’s even worse.
Huh, I really like code like that. Having a multi-step process split up into sections like that is amazing to reason about actual dependencies of the individual sections. Granted, that only applies if the individual steps are kinda independently meaningful
To adapt your example to what I mean:
This allows you to immediately see that part1 and part2 are independently calculated, and what goes into calculating them.
There are several benefits, e.g.:
It really depends on the context frankly. I did say it was a crappy example ;)
Try to read this snippet I stole from Clean Code and tell me if it’s readable without having to uselessly jump everywhere to understand what’s going on:
That’s what I was talking about.
The “Don’t repeat yourself” mantra is also used with documentation, this leads to documentation which you first have to read and learn unless you frequently want to step into issues of the documentation assumed you read prior parts and didn’t just searched how to do XYZ.
Thank you for linking the blog posts. They are a really good deterrent from Clean Code. I once thought I’d read it, but Fowler’s advice really is stupid.
In case you’re wondering why I replied three times: “Do one thing” :)
They may be a part of software development, but they should not be common during the normal execution of software. I once read the hint, “if your app doesn’t run with all exception handlers removed, you are using exceptions in non-exceptional cases”.
Throwing an exception is a way to tell your calling function that you encountered a program state in which you do not know how to proceed safely. If your functions regularly throw errors at you, you didn’t follow their contract and (for instance) didn’t sanitize the data appropriately.
I disagree here. You can always ignore an error return value and pretend that the “actual” value you got is correct. Ignoring an exception, on the other hand, requires the effort to first catch it and then write an empty error handler. Also (taking go as an inspiration), I (personally) find this very hard to read:
res, error = try_something() if error { handle_the_error(error) return_own_error() } res2, error2 = try_something_else(res) if error2 { handle_other_error(error2) return_own_error() } res3, error3 = try_yet_something_else(res2) if error3 { handle_the_third_error(error3) return_own_error() } return res3
This code mingles two separate things: The “normal” flow of the program, which is supposed to facilitate a business case, and error handling.
In this example, on the other hand, you can easily figure out the flow of data and how it relates to the function’s purpose and ignore possible errors. Or you can concentrate on the error handling, if you so choose. But you don’t have to do both simultaneously:
try { res = try_something() res2 = try_something_else(res) res3 = try_yet_something_else(res2) return res3 } catch (e) { // check which error it is and handle it appropriately throw_own_exception() }
Agreed. Go’s implementation of errors as values is extremely noisy and error prone. I’m not a fan of it either.
Then that’s a language design / api design issue. You should make it so you cannot get the value unless you handle the error.
I’m of the opinion that errors should be handled “as soon as possible”. That doesn’t necessarily mean immediately below the function call the error originates from, it may very well be further up the call chain. The issue with exceptions is that they make it difficult to know whether or not a function can fail without knowing its implementation, and encourage writing code that spontaneously fails because someone somewhere forgot that something should be handled.
The best implementation of errors as values I’ve seen is Rust’s
Result
type, which paired with the?
operator can achieve a similar flow to exceptions (when you don’t particularly care where exactly an error as occurred and just want to model the happy path) while clearly signposting all fallible function calls. So taking your example:It would become:
The difference is that you know that
try_something
andtry_yet_something_else
may fail, whiletry_something_else
cannot, and you’re able to handle those errors further up if you wish.You could do so with exceptions as well, but it wasn’t clear.
The same clarity argument can be made for
null
as well. AnOption
type is much more preferable because it forces you to handle the case in which you are handed nothing. If a function can operate with nothing, then you can clearly signpost it with anOption<T>
, as opposed to justT
if a value is mandatory.Exceptions are also a lot more computationally expensive. The compiler needs to generate landing pads and a bunch of other stuff, which not only bloat your binaries but also prevent several optimizations. C# notoriously cannot inline functions containing
throw
s for example, and utility methods must be created to mitigate the performance impact.You’re talking Monads, baby!
I generally agree, but there are some things that are oversimplified. Sure a point(x, y) can have public attributes, but usually business objects are a bit more complex: insurancePolicy, deliveryRoute, user, etc. Having some control over those is definitely something you want to implement, at the cost of some boilerplate.
Oh for sure. I have nothing against getters and setters when they’re justified, but in the case of bare fields with no validation like that example it’s just annoying.
Also stuff like this just grinds my gears (oversimplified example again):
Just make it a bloody function.
You may say it’s silly, but I’ve genuinely found code like this in the wild. Not that exact code snippet of course but that was the spirit.
The “Don’t repeat yourself” mantra is also used with documentation, this leads to documentation which you first have to read and learn unless you frequently want to step into issues of the documentation assumed you read prior parts and didn’t just searched how to do XYZ.
Also while I used the more clean code oriented XML DOM implementation for my D XML parser (or at least copied such code as it was abandoned by its original creator), I planned a much simpler system for my SDLang parser. While everything originates from the
DLElement
abstract class, I didn’t go overboard with the interfaces, etc.