Sin number 5: global destruction

Every time I am asked about global keyword in Python, I get a split-second heart attack. Personally I think it is one of the most unnecessary and dangerous keywords that Python has to offer. I will go one step further and postulate that if you’re using nonlocal you’re probably doing it wrong.

👉 The essence of this problem is misunderstanding the purpose or the mechanics of scope in Python.

Whence the temptation?

Consider a very simple example. Imagine you’ve just learned about the existence of functions in Python and you want to reuse repetitive pieces of code as much as possible:

def concatenate_all_strings():
    string1 = "a"
    string2 = "b"
    string3 = "c"
    return string1 + string2 + string3


print(concatenate_all_strings())

Great! This works. But what if you want to use string1 somewhere else? Let’s say I type up 100 more lines of code and I’m suddenly in need of printing out the string1 variable:

def concatenate_all_strings():
    string1 = "a"
    string2 = "b"
    string3 = "c"
    return string1 + string2 + string3


print(concatenate_all_strings())

# 100 more lines of some code...

print(string1)

What you get when you run this script would be: NameError: name 'string1' is not defined. Now you start searching the Internet for a solution to your problem and inadvertently you find that the simplest way to make a variable that exists within a function body accessible elsewhere is to use global. You will then merrily refactor the above to:

def concatenate_all_strings():
    global string1
    global string2
    global string3
    string1 = "a"
    string2 = "b"
    string3 = "c"
    return string1 + string2 + string3


print(concatenate_all_strings())

# 100 more lines of some code...

print(string1)

Why is this a sin?

This creates a potential for name conflicts and introduces a lack of clarity. When you refuse to use scoped variables and declare everything as global when you want to share variables, you will inadvertetly run into situations where a local variable has the same name as one in an outer scope and depending on which one is evaluated when, you will either end up with the value declared in the outer scope or the inner scope. Furthermore, any implicit modification of the global state makes code harder to understand because you end up searching for the variables in each function body.

This is especially painful when you realize how easy it is to avoid using global altogether and that avoiding global comes naturally when you know some proper functional patterns.

How to recognize a sinner?

Any use of global or nonlocal should put up red flags. Honestly, I haven’t seen any piece of code in my life that couldn’t be refactored to avoid these keywords and such refactoring always resulted in improvements in code maintainability.

👀 Remember, just because some solution is more concise and quicker to implement, doesn’t mean that it will lead to better maintainability. Always assume that somebody else will read your code, thus you should use patterns that make it easy to pinpoint the flow of information in a clear way. global is not one of those patterns. Also, consider that even if nobody else will end up reading your code in the future, your future self might not remember what the project was about in a few weeks. So if not for the sake of your colleagues, avoid antipatterns as a good deed for your future self.

How to repent?

Understand and leverage scope

Most uses of global stem from the misunderstanding of scope. Scope essentially means that whatever is within a function or a class belongs to that function or class if it’s declared there. Anything from the outer scope is accessible to the inner scope but the inverse is not true. You should put variable declarations into such a scope that can be accessed by all the consumers of those variables. In our example this would be the top-level scope of the Python module that we were running:

string1 = "a"  # accessible to both `concatenate_all_strings` and top-level `print`

def concatenate_all_strings():
    string2 = "b"  # accessible only within this function
    string3 = "c"  # accessible only within this function
    return string1 + string2 + string3


print(concatenate_all_strings())

# 100 more lines of some code...

print(string1)

The good part is that if I needed to declare a variable named string2 in the top-level of the script and print it but I would like to keep the original string2 value within the function I could simply do this:

string1 = "a"
string2 = "lol"

def concatenate_all_strings():
    string2 = "b"  # accessible only within this function
    string3 = "c"  # accessible only within this function
    return string1 + string2 + string3


print(concatenate_all_strings())  # prints "abc"

# 100 more lines of some code...

print(string1)  # prints "a"
print(string2)  # prints "lol"

With scoping it’s clear where a particular variable belongs. Every time you enter function scope or class scope it’s a bit like entering a different room in a house. The mirror that hangs on the wall of your own bedroom looks different to the one in your bathroom. The name is still the same but the value doesn’t have to be and it still makes sense that two rooms have a mirror object within them.

Treat functions like physical factory machines

If you’ve worked with Python for a while you will realize that the solution above is kind of lousy and lazy as well. And if you paid attention in math classes you will realize that our function concatenate_all_strings does not really behave like a mathematical function.

👀 In mathematics, a function is an instrument that takes a set of arguments and produces one value from that set of arguments. In programming, we call functions equivalent to mathematical functions pure functions.

Whenever you have the chance you should parameterize what you can and use pure functions. In Python, if you combine that with type hints, the function signature, i.e. the function name alongside its parameters and return type, gives you a very clear idea of what the function does and how it should behave.

string1 = "a"
string2 = "b"
string3 = "c"

def concatenate_all_strings(string1: str, string2: str, string3: str) -> str:
    return string1 + string2 + string3


print(concatenate_all_strings(string1=string1, string2=string2, string3=string3))

# 100 more lines of some code...

print(string1)

Notice how def concatenate_all_strings(string1: str, string2: str, string3: str) -> str tells you almost everything you need to know about the function. It takes in 3 strings and returns a single string and if the name is to be trusted it will probably concatenate the input strings. This function is pure. It does not modify the input in any way, it simply takes the input and produces some output, like a factory machine that takes in some raw material and returns a product. Actually, it’s even better because it does not destroy string1, string2 and string3 in the process so these can be later reused.

Identify repetition and parameterize

Finally, the way I would refactor the example:

from typing import List

string1 = "a"
string2 = "b"
string3 = "c"

def concatenate_all_strings(string_list: List[str]) -> str:
    result = ""
    for s in string_list:
        result += s
    return result


print(concatenate_all_strings([string1, string2, string3]))

# 100 more lines of some code...

print(string1)

When parameterizing functions that used to abuse global or nonlocal you will realize that there may be some sensless repetition involved and the number of parameters quickly explodes. What if I wanted to concatenate not 3 but 4, 5, 6, etc. strings? Every time I would like to add another one, I’d need to add it to the parameter list and then call the function with an additional argument. This is tedious. When you realize this is the way it’s going, think about generalizing the parameter set a bit more. In this case a list of arbitrary length might be the best choice. In other cases you might want to use generators, dictionaries, etc.

🤔 You might be wondering why I didn’t use the *args idiom and used a list as a parameter. Type annotations on *args and **kwargs can sometimes get tricky and this makes the function signature almost always less explicit. Some programming languages do not even support variable number of arguments in functions, Rust for example allows you to do that only in macros, which are essentially generating Rust code before compilation happens and are not functions that will be evaluated at runtime. My advice is to avoid *args and **kwargs unless there is a very good reason for the user interface to leverage them.

Learn advanced scope patterns

Last, let’s take a look at something more advanced if you’re coming here for some more serious patterns.

Scoping can be really powerful. My favorite example is a closure. In Rust closures are basically lambda functions that can capture their outer scope:

fn do_something() {
  let x: i32 = 5;

  let add_x = |i| i + x;  // `x` is grabbed from the outer scope
  add_x  // `add_x` is returned
}

let lazy_add_x = do_something();
let result = lazy_add_x(5);
println!("{}", result);

In Python, all functions can capture their outer scope. Thus the idea of a closure should usually be expressed with a nested function:

from typing import Callable


def add_x(i: int) -> Callable[[], int]:
    x = 5

    def wrapper():
        # `x` is grabbed from outer scope
        return i + x

    return wrapper


lazy_add_x = add_x(5)
result = lazy_add_x()
print(result)
print(x)  # fails, `x` only exists within `add_x` scope

Conclusions

👉 Never use global and nonlocal. If you find you need it, it’s probably because you have a design flaw in your code or you’re misunderstanding scope.

👉 When using functions parameterize what you can for better code reuse.

👉 Use pure functions whenever you can. Don’t silently modify state, return values instead.

👉 If you find yourself copy-pasting or repeating more or less the same thing over and over again, there’s probably a more general parameter set that you could leverage.