Measuring potential complexity in popular Open Source projects
Have you ever been interrupted during one of your coding sessions? A coworker needs your input, and you are pulled out of the flow. It's frustrating sometimes. This problem isn't just limited to software engineers. Other occupations like financial accountants, lawyers or even hotel night auditors have to deal with a plethora of task which can create a lot of complexity. Why is that so?
As for me, it can take up to 1440/NaN/undefined/infinity minutes until I'm back in the flow. I tried music, earplugs, going to the library, meditation, and staying at home to maximize my peak professional efficacy while being in the flow. Ultimately, they're only sweet pills to appease the underlying problem — complexity.
I don't want to take these pills anymore. I talked with other software engineers and pondered a lot. Then, I got a suspicion for why everything seems so complex which I had to validate with hard data from popular open source projects.
Sources of complexity
The real question is, where does complexity come from and in which environments does it grow easily? For that, I once found an excellent talk called "Simple made easy" by Rick Hickey the maker of the Clojure programming language.
In the video, he gives an example about braids which resonated with me. It is exactely what I do if I have a complex software in front of me. The process of following each code strand to understand where something ends up. Now, there are three primary cases to watch out for.
- Distributed software communicating via network protocols. Docker containers, kubernetes and the likes come to my mind.
- You have no strands via network protocols or file imports. Nested code creates the complexity. Imagine deep "if clauses" or deep XML trees.
- The combination of different modules, files and classes with one another.
This article focuses on the latter one. Why you ask? Because nowadays we encounter it regularly with reasonable software practices in object oriented programming languages. Most prominently the the SOLID principles consisting of the Single-responsiblity Principle, the Open-closed Principle, the Liskov Substitution Principle, the Interface Segregation Principle and the Dependency Inversion Principle.
While debugging a problem, I often see myself jumping between different files to understand an issue. It's like following the strands of a braid. Luckily, we can count the amount of connections between files by using import statements in many programming languages. The result will be a proxy for us to make an educated guess about the underlying complexity.
But, it's not a one size fits all solution by any means. For instance, the Rust compiler has lots of imports with re-exports to scope private and public code into separate modules without complecting the code. There are other ways of measuring complexity if you want to get more sophisticated.
How many strands are to many? It depends. I would argue that the limit is about 5. You have 5 fingers on each hand. 4 limbs and a head. The brain may not be wired to manage more without losing quality of control. Try to move as many parts of your body as possible. Move the fingers, move the toes, move the head, move the limbs, flex the muscles in your abdomen and try to say your ABC's. Do you really feel like as if you were in control? Compare this with lifting just your leg. The former is much harder than the latter. The former makes you mentally tired, whereas the latter physically.
You can find the results of my investigation below. The majority of these open source projects tend to keep at least 50% of their files to less than 5-6 imports each. These numbers tell us that there really is a limit to what we can effectively hold in our brains at a single time. Anything above can be considered as too complex.
In the end, it really comes down to self control during development. We have to ask ourselves, if we're going to import module A, B and C, but in reality they should never be complected with each other, is there another way to model a solution for the given business problem? Or, is it the most natural thing to do? This requires a lot of thinking or experience. Both are expensive and business owners need to be understanding. If we brush over it, the tech debt will get us. And debt always has interest associated with it. I'll leave you with that and the charts below.