118

The codebase I'm working on frequently uses instance variables to share data between various trivial methods. The original developer is adamant that this adheres to the best practices stated in the Clean Code book by Uncle Bob/Robert Martin: "The first rule of functions is that they should be small." and "The ideal number of arguments for a function is zero (niladic). (...) Arguments are hard. They take a lot of conceptual power."

An example:

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;
  @Inject private CryptoService cryptoService;

  private byte[] encodedData;
  private EncryptionInfo encryptionInfo;
  private EncryptedObject payloadOfResponse;
  private URI destinationURI;

  public EncryptedResponse process(EncryptedRequest encryptedRequest) {
    checkNotNull(encryptedRequest);

    getEncodedData(encryptedRequest);
    getEncryptionInfo();
    getDestinationURI();
    passRequestToServiceClient();

    return cryptoService.encryptResponse(payloadOfResponse);
  }

  private void getEncodedData(EncryptedRequest encryptedRequest) {
    encodedData = cryptoService.decryptRequest(encryptedRequest, byte[].class);
  }

  private void getEncryptionInfo() {
    encryptionInfo = cryptoService.getEncryptionInfoForDefaultClient();
  }

  private void getDestinationURI() {
    destinationURI = router.getDestination().getUri();
  }

  private void passRequestToServiceClient() {
    payloadOfResponse = serviceClient.handle(destinationURI, encodedData, encryptionInfo);
  }
}

I would refactor that into the following using local variables:

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;
  @Inject private CryptoService cryptoService;

  public EncryptedResponse process(EncryptedRequest encryptedRequest) {
    checkNotNull(encryptedRequest);

    byte[] encodedData = cryptoService.decryptRequest(encryptedRequest, byte[].class);
    EncryptionInfo encryptionInfo = cryptoService.getEncryptionInfoForDefaultClient();
    URI destinationURI = router.getDestination().getUri();
    EncryptedObject payloadOfResponse = serviceClient.handle(destinationURI, encodedData,
      encryptionInfo);

    return cryptoService.encryptResponse(payloadOfResponse);
  }
}

This is shorter, it eliminates the implicit data coupling between the various trivial methods and it limits the variable scopes to the minimum required. Yet despite these benefits I still cannot seem to convince the original developer that this refactoring is warranted, as it appears to contradict the practices of Uncle Bob mentioned above.

Hence my questions: What is the objective, scientific rationale to favor local variables over instance variables? I can't quite seem to put my finger on it. My intuition tells me that hidden couplings are bad and that a narrow scope is better than a broad one. But what is the science to back this up?

And conversely, are there any downsides to this refactoring that I have possibly overlooked?

3
  • Comments are not for extended discussion; this conversation has been moved to chat.
    – maple_shaft
    Commented Mar 12, 2019 at 17:15
  • Your version is, possibly, thread-safe. His almost certainly isn't. As pointed out in the John Bollinger answer. His version will almost certainly blow up horribly badly since nothing is synchronized.
    – user949300
    Commented Jun 18, 2021 at 23:07
  • The arguments of the "original developer" (keep # of arguments down) could also be read as a plead for global variables. Trying to get rid of something (you consider) bad doesn't justify replacing it with something worse...
    – tofro
    Commented Apr 8, 2022 at 22:50

12 Answers 12

177

What is the objective, scientific rationale to favor local variables over instance variables?

Scope isn't a binary state, it's a gradient. You can rank these from largest to smallest:

Global > Class > Local (method) > Local (code block, e.g. if, for, ...)

Edit: what I call a "class scope" is what you mean by "instance variable". To my knowledge, those are synonymous, but I'm a C# dev, not a Java dev. For the sake of brevity, I've lumped all statics into the global category since statics are not the topic of the question.

The smaller the scope, the better. The rationale is that variables should live in the smallest scope possible. There are many benefits to this:

  • It forces you to think about the current class' responsibility and helps you stick to SRP.
  • It enables you to not have to avoid global naming conflicts, e.g. if two or more classes have a Name property, you're not forced to prefix them like FooName, BarName, ... Thus keeping your variable names as clean and terse as possible.
  • It declutters the code by limiting the available variables (e.g. for Intellisense) to those that are contextually relevant.
  • It enables some form of access control so your data can't be manipulated by some actor you don't know about (e.g. a different class developed by a colleague).
  • It makes the code more readable as you ensure that the declaration of these variables tries to stay as close to the actual usage of these variables as is possible.
  • Wantonly declaring variables in an overly wide scope is often indicative of a developer who doesn't quite grasp OOP or how to implement it. Seeing overly widely scoped variables acts as a red flag that there's probably something going wrong with the OOP approach (either with the developer in general or the codebase in specific).
  • (Comment by Kevin) Using locals forces you to do things in the right order. In the original (class variable) code, you could wrongly move passRequestToServiceClient() to the top of the method, and it would still compile. With locals, you could only make that mistake if you passed an uninitialized variable, which is hopefully obvious enough that you don't actually do it.

Yet despite these benefits I still cannot seem to convince the original developer that this refactoring is warranted, as it appears to contradict the practices of Uncle Bob mentioned above.

And conversely, are there any downsides to this refactoring that I have possibly overlooked?

The issue here is that your argument for local variables is valid, but you've also made additional changes which are not correct and cause your suggested fix to fail the smell test.

While I understand your "no class variable" suggestion and there's merit to it, you've actually also removed the methods themselves, and that's a whole different ballgame. The methods should have stayed, and instead you should've altered them to return their value rather than store it in a class variable:

private byte[] getEncodedData() {
    return cryptoService.decryptRequest(encryptedRequest, byte[].class);
}

private EncryptionInfo getEncryptionInfo() {
    return cryptoService.getEncryptionInfoForDefaultClient();
}

// and so on...

I do agree with what you've done in the process method, but you should've been calling the private submethods rather than executing their bodies directly.

public EncryptedResponse process(EncryptedRequest encryptedRequest) {
    checkNotNull(encryptedRequest);

    byte[] encodedData = getEncodedData();
    EncryptionInfo encryptionInfo = getEncryptionInfo();

    //and so on...

    return cryptoService.encryptResponse(payloadOfResponse);
}

You'd want that extra layer of abstraction, especially when you run into methods that need to be reused several times. Even if you don't currently reuse your methods, it's still a matter of good practice to already create submethods where relevant, even if only to aid code readability.

Regardless of the local variable argument, I immediately noticed that your suggested fix is considerably less readable than the original. I do concede that wanton use of class variables also detracts from code readability, but not at first sight compared to you having stacked all the logic in a single (now long-winded) method.

8
  • Comments are not for extended discussion; this conversation has been moved to chat.
    – maple_shaft
    Commented Mar 12, 2019 at 17:13
  • 1
    Instance scope and class scope should be separated. Commented Jun 18, 2021 at 21:52
  • Another good way to think about what logic to put in separate methods and if it should use parameters instead of member variables is: is it easy to unittest? It forces you to think about the Single Responsibility Principle and use parameters and return value in the right way. Commented Jun 19, 2021 at 6:52
  • @Deduplicator: For the purposes of this answer, class scope is lumped into the "Global" section as this section was purposefully intended as a catch-all for statics. There's a reason I called scope a gradient. It is the nature of gradients that you can always find more granular detail if you want to but that does not mean the details is necessary, which it isn't, for the topic at hand.
    – Flater
    Commented Jun 21, 2021 at 7:55
  • "The methods should have stayed" - really? Even passRequestToServiceClient? I'd argue that all of these are so trivial that no readability is gained. They're all one-liners, and the method name says as much as the variable name to which you assign the result.
    – Bergi
    Commented Mar 30, 2022 at 4:00
85

The original code is using member variables like arguments. When he says to minimize the number of arguments, what he really means is to minimize the amount of data that the methods requires in order to function. Putting that data into member variables doesn't improve anything.

4
  • 23
    Totally agree! Those member variables are simply implicit function arguments. In fact it's worse since now there is no explicit link between the value of those variable and the function usage (from an external POV)
    – Rémi
    Commented Mar 5, 2019 at 20:08
  • 1
    I would say this is not what the book meant. How many functions require exactly zero input data to run? This one of the bits I think the book had wrong.
    – Qwertie
    Commented Mar 5, 2019 at 22:46
  • 1
    @Qwertie If you have an object, the data it works on might be completely encapsulated inside it. Functions like process.Start(); or myString.ToLowerCase() shouldn't seem too weird (and are indeed the easiest to understand).
    – R. Schmitz
    Commented Mar 7, 2019 at 10:54
  • 5
    Both have one argument: the implicit this. One could even argue this argument is given explicitly – before the dot.
    – BlackJack
    Commented Mar 7, 2019 at 16:51
54

Other answers have already explained the benefits of local variables perfectly, so all that remains is this part of your question:

Yet despite these benefits I still cannot seem to convince the original developer that this refactoring is warranted, as it appears to contradict the practices of Uncle Bob mentioned above.

That should be easy. Simply point him to the following quote in Uncle Bob's Clean Code:

Have No Side Effects

Side effects are lies. Your function promises to do one thing, but it also does other hidden things. Sometimes it will make unexpected changes to the variables of its own class. Sometimes it will make them to the parameters passed into the function or to system globals. In either case they are devious and damaging mistruths that often result in strange temporal couplings and order dependencies.

(example omitted)

This side effect creates a temporal coupling. That is, checkPassword can only be called at certain times (in other words, when it is safe to initialize the session). If it is called out of order, session data may be inadvertently lost. Temporal couplings are confusing, especially when hidden as a side effect. If you must have a temporal coupling, you should make it clear in the name of the function. In this case we might rename the function checkPasswordAndInitializeSession, though that certainly violates “Do one thing.”

That is, Uncle Bob doesn't just say that a function should take few arguments, he also says that functions should avoid interacting with non-local state whenever possible.

3
  • 3
    In a "prefect world", this would be the second answer listed. The first answer for the ideal situation that the coworker listens to reason - but if the coworker is a zealot, this answer here would deal with the situation without too much of a fuzz.
    – R. Schmitz
    Commented Mar 7, 2019 at 11:05
  • 2
    For a more pragmatic variation on this idea, It's simply that it's much easier to reason about local state than instance or global state. Well-defined and tightly-contained mutability and side effects rarely lead to issues. For instance, many sort functions operate in-place via side effect, but this is easy to reason about in a local scope.
    – Beefster
    Commented Mar 7, 2019 at 20:08
  • 1
    Ah, the good old "in a contradictory axiomatic, it's possible to prove anything". Since there are no hard truths and lies IRL, any dogmas will have to include statements that state opposite things i.e. be contradictory. Commented Mar 8, 2019 at 15:44
27

"It contradicts what someone's uncle thinks" is NEVER a good argument. NEVER. Don't take wisdom from uncles, think for yourself.

That said, instance variables should be used to store information that is actually required to be stored permanently or semi-permanently. The information here isn't. It is very simple to live without the instance variables, so they can go.

Test: Write a documentation comment for each instance variable. Can you write anything that isn't completely pointless? And write a documentation comment to the four accessors. They are equally pointless.

Worst is, assume the way to decrypt changes, because you use a different cryptoService. Instead of having to change four lines of code, you have to replace four instance variables with different ones, four getters with different ones, and change four lines of code.

But of course the first version is preferable if you are paid by the line of code. 31 lines instead of 11 lines. Three times more lines to write, and to maintain forever, to read when you are debugging something, to adapt when changes are needed, to duplicate if you support a second cryptoService.

(Missed the important point that using local variables forces you to make the calls in the right order).

8
  • 18
    Thinking for yourself is of course good. But your opening paragraph effectively includes learners or juniors dismissing the input of teachers or seniors; which goes too far.
    – Flater
    Commented Mar 5, 2019 at 7:29
  • 9
    @Flater After thinking about the input of teachers or seniors, and seeing they are wrong, dismissing their input is the only right thing. In the end, it is not about dismissing, but about questioning it and only dismissing it if it definitely proves wrong.
    – glglgl
    Commented Mar 5, 2019 at 8:56
  • 11
    @ChristianHackl: I'm fully on board with not blindly following traditionalism, but I wouldn't blindly dismiss it either. The answer seemingly suggests to eschew acquired knowledge in favor of your own view, and that's not a healthy approach to take. Blindly following others' opinions, no. Questioning something when you disagree, obviously yes. Outright dismissing it because you disagree, no. As I read it, the first paragraph of the answer at least seems to suggest the latter. It very much depends on what gnasher means with "think for yourself", which needs elaboration.
    – Flater
    Commented Mar 5, 2019 at 10:27
  • 10
    On a sharing wisdom platform, this seems a little out of place...
    – drjpizzle
    Commented Mar 5, 2019 at 16:24
  • 4
    NEVER is also NEVER a good argument... I fully agree that rules exist to guide, not to prescribe. However, rules about rules are but a subclass of rules, so you spoke your own verdict... Commented Mar 6, 2019 at 22:36
16

What is the objective, scientific rationale to favor local variables over instance variables? I can't quite seem to put my finger on it. My intuition tells me that hidden couplings are bad and that a narrow scope is better than a broad one. But what is the science to back this up?

Instance variables are for representing the properties of their host object, not for representing properties specific to threads of computation more narrowly scoped than the object itself. Some of the reasons for drawing such a distinction that appear not to already have been covered revolve around concurrency and reentrancy. If methods exchange data by setting the values of instance variables, then two concurrent threads can easily clobber each other's values for those instances variables, yielding intermittent, hard-to-find bugs.

Even one thread alone can encounter problems along those lines, because there is a high risk that a data exchange pattern relying on instance variables makes methods non-reentrant. Similarly, if the same variables are used to convey data between different pairs of methods, then there is a risk that a single thread executing even a non-recursive chain of method invocations will run into bugs revolving around unexpected modifications of the instance variables involved.

In order to get reliably correct results in such a scenario, you need either to use separate variables to communicate between each pair of methods where one invokes the other, or else to have every method implementation take into account all the implementation details of all the other methods it invokes, whether directly or indirectly. This is brittle, and it scales poorly.

1
  • 8
    So far, this is the only answer that mentions thread-safety and concurrency. That's kind of amazing given the specific code example in the question: an instance of SomeBusinessProcess cannot safely process multiple encrypted requests at once. The method public EncryptedResponse process(EncryptedRequest encryptedRequest) is not synchronized, and concurrent calls would quite possibly clobber the values of the instance variables. This is a good point to bring up. Commented Mar 5, 2019 at 21:00
9

Discussing just process(...), your colleagues example is far more legible in the sense of business logic. Conversely your counter example takes more than a cursory glance to extract any meaning.

That being said, clean code is both legible and good quality - pushing local state out into a more global space is just high-level assembly, so a zero for quality.

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;
  @Inject private CryptoService cryptoService;

  public EncryptedResponse process(EncryptedRequest request) {
    checkNotNull(encryptedRequest);

    return encryptResponse
      (routeTo
         ( destination()
         , requestData(request)
         , destinationEncryption()
         )
      );
  }

  private byte[] requestData(EncryptedRequest encryptedRequest) {
    return cryptoService.decryptRequest(encryptedRequest, byte[].class);
  }

  private EncryptionInfo destinationEncryption() {
    return cryptoService.getEncryptionInfoForDefaultClient();
  }

  private URI destination() {
    return router.getDestination().getUri();
  }

  private EncryptedObject routeTo(URI destinationURI, byte[] encodedData, EncryptionInfo encryptionInfo) {
    return serviceClient.handle(destinationURI, encodedData, encryptionInfo);
  }

  private void encryptResponse(EncryptedObject payloadOfResponse) {
    return cryptoService.encryptResponse(payloadOfResponse);
  }
}

This is a rendition that removes the need for variables at any scope. Yes the compiler will generate them but the important part is that it controls that so the code will be efficient. While also being relatively legible.

Just a point on naming. You want the shortest name that is meaningful and expands on the information already available. ie. destinationURI, the 'URI' is already known by the type signature.

3
  • 4
    Eliminating all variables doesn't necessarily make code easier to read.
    – Pharap
    Commented Mar 6, 2019 at 10:09
  • Eliminate all variables completely with pointfree style en.wikipedia.org/wiki/Tacit_programming
    – Marcin
    Commented Mar 6, 2019 at 21:16
  • @Pharap True, a lack of variables does not ensure legibility. In some cases it even makes debugging more difficult. The point is that well chosen names, a clear usage of expression, can communicate an idea very clearly, while still being efficient.
    – Kain0_0
    Commented Mar 6, 2019 at 23:05
7

I would just remove these variables and private methods altogether. Here's my refactor:

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;
  @Inject private CryptoService cryptoService;

  public EncryptedResponse process(EncryptedRequest encryptedRequest) {
    return cryptoService.encryptResponse(
        serviceClient.handle(router.getDestination().getUri(),
        cryptoService.decryptRequest(encryptedRequest, byte[].class),
        cryptoService.getEncryptionInfoForDefaultClient()));
  }
}

For private method, e.g. router.getDestination().getUri() is clearer and more readable than getDestinationURI(). I would even just repeat that if I use the same line twice in the same class. To look at it another way, if there's a need for a getDestinationURI(), then it probably belongs in some other class, not in SomeBusinessProcess class.

For variables and properties, the common need for them is to hold values to be used later in time. If the class has no public interface for the properties, they probably shouldn't be properties. The worst kind of class properties use is probably for passing values between private methods by way of side effects.

Anyway, the class only needs to do process() and then the object will be thrown away, there's no need to keep any state in memory. Further refactor potential would be to take out the CryptoService out of that class.

Based on comments, I want to add this answer is based on real world practice. Indeed, in code review, the first thing that I'd pick out is to refactor the class and move out the encrypt/decrypt work. Once that's done, then I'd ask if the methods and variables are needed, are they named correctly and so on. The final code will probably closer to this:

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;

  public Response process(Request request) {
    return serviceClient.handle(router.getDestination().getUri());
  }
}

With above code, I don't think it needs further refactor. As with the rules, I think it takes experience to know when and when not to apply them. Rules are not theories that are proven to work in all situations.

Code review on the other hand has real impact on how long before a piece of code can pass. My trick is to have less code and make it easy to understand. A variable name can be a point of discussion, if I can remove it reviewers wouldn't even need to think about it.

5
  • My upvote, though many will hesitate here. Of course some abstractions, make sense. (BTW a method called "process" is terrible.) But here the logic is minimal. The OP's question however is on an entire code style, and there the case may be more complex.
    – Joop Eggen
    Commented Mar 5, 2019 at 10:20
  • 1
    One clear issue with chaining it all into one method call is sheer readability. It also doesn't work if you need more than one operation on a given object. Also, this is nigh impossible to debug because you can't step through the operations and inspect the objects. While this works on a technical level, I would not advocate for this as it massively neglects non-runtime aspects of software development.
    – Flater
    Commented Mar 5, 2019 at 10:30
  • @Flater I agree with your comments, we don't want to apply this everywhere. I edited my answer to clarify my practical position. What I want to show is that in practice we only apply rules when it's appropriate. Chaining method call is fine in this instance, and if I need to debug, I'd invoke the tests for the chained methods.
    – imel96
    Commented Mar 6, 2019 at 2:43
  • @JoopEggen Yes, abstractions make sense. In the example, the private methods don't give any abstractions anyway, users of the class don't even know about them
    – imel96
    Commented Mar 6, 2019 at 2:47
  • 1
    @imel96 it's funny that you're probably one of the few people here who noticed that the coupling between ServiceClient and CryptoService makes it worthwhile to focus on injecting CS into SC instead of into SBP, solving the underlying problem at higher architectural level... that's IMVHO the main point of this story; it's too easy to keep track of the big picture while focusing on the details.
    – user88637
    Commented Mar 8, 2019 at 0:15
4

Flater's answer covers issues of scoping quite well, but I think that there's another issue here too.

Note that there's a difference between a function that processes data and a function that simply accesses data.

The former executes actual business logic, whereas the latter saves typing and perhaps adds safety by adding a simpler and more reusable interface.

In this case it seems that the data-access functions don't save typing, and aren't reused anywhere (or there would be other issues in removing them). So these functions simply shouldn't exist.

By keeping only the business logic in named functions, we get the best of both worlds (somewhere between Flater's answer and imel96's answer):

public EncryptedResponse process(EncryptedRequest encryptedRequest) {

    byte[] requestData = decryptRequest(encryptedRequest);
    EncryptedObject responseData = handleRequest(router.getDestination().getUri(), requestData, cryptoService.getEncryptionInfoForDefaultClient());
    EncryptedResponse response = encryptResponse(responseData);

    return response;
}

// define: decryptRequest(), handleRequest(), encryptResponse()
2

The first and most important thing: Uncle Bob seems to be like a preacher sometimes, but states that there are exceptions to his rules.

The whole idea of Clean Code is to improve readability and to avoid errors. There are several rules that are violating each other.

His argument on functions is that niladic functions are best, however that up to three parameters are acceptable. I personally think that 4 are also ok.

When instance variables are used, they should make a coherent class. That means, the variables should be used in many, if not all non-static methods.

Variables that are not used in many places of the class, should be moved.

I would neither consider the original nor the refactored version optimal, and @Flater stated already very well what can be done with return values. It improves readability and reduces errors to use return values.

1

Local variables reduce scope hence limits the ways in which the variables can be used and hence helps prevent certain classes of error, and improves readability.

Instance variable reduce the ways in which the function can be called which also helps reduce certain classes of error, and improves readability.

To say one is right and the other is wrong may well be a valid conclusion in any one particular case, but as general advice...

TL;DR: I think the reason you smell too much zeal is, there's too much zeal.

0

Despite the fact that methods starting with get... should not return void, the separation of levels of abstractions within the methods is given in the first solution. Although the second solution is more scoped it is still harder to reason about what is going on in the method. The assignments of local variables are not needed here. I would keep the method names and refactor the code to something like that:

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;
  @Inject private CryptoService cryptoService;

  public EncryptedResponse process(EncryptedRequest encryptedRequest) {
    checkNotNull(encryptedRequest);

    return getEncryptedResponse(
            passRequestToServiceClient(getDestinationURI(), getEncodedData(encryptedRequest) getEncryptionInfo())
        );
  }

  private EncryptedResponse getEncryptedResponse(EncryptedObject encryptedObject) {
    return cryptoService.encryptResponse(encryptedObject);
  }

  private byte[] getEncodedData(EncryptedRequest encryptedRequest) {
    return cryptoService.decryptRequest(encryptedRequest, byte[].class);
  }

  private EncryptionInfo getEncryptionInfo() {
    return cryptoService.getEncryptionInfoForDefaultClient();
  }

  private URI getDestinationURI() {
    return router.getDestination().getUri();
  }

  private EncryptedObject passRequestToServiceClient(URI destinationURI, byte[] encodedData, EncryptionInfo encryptionInfo) {
    return serviceClient.handle(destinationURI, encodedData, encryptionInfo);
  }
}
0

Both things do the same and the performance differences are unnoticeable, so I don't think there's a scientific argument. It comes down to subjective preferences then.

And I too tend to like your way better than your colleague's. Why? Because I think it's easier to read and understand, despite what some book author says.

Both ways accomplish the same thing, but his way is more spread out. To read that code you need to shift back and forth between several functions and member variables. It's not all condensed in one place, you need to remember all that in your head to understand it. It's a much greater cognitive load.

In contrast, your approach packs everything much more densely, but no so as to make it impenetrable. You just read it line after line, and you don't need to memorize so much to understand it.

However if he's used to code being laid out in such a fashion, I can imagine that for him it might be the other way round.

1
  • That has indeed proven to be a major hurdle for me to understand the flow. This example is quite small, but another one used 35 (!) instance variables for its intermediate results, not counting the dozen instance variables containing the dependencies. It was quite hard to follow along, as you have to keep track of what has already been set earlier on. Some were even reused later on, making it even harder. Luckily the arguments presented here have finally convinced my colleague to agree with refactoring.
    – Alexander
    Commented Mar 14, 2019 at 15:52

Not the answer you're looking for? Browse other questions tagged or ask your own question.