In this blog series, we invite prominent developers to share their terrifying stories and experiences when dealing with challenges related to codebase complexity.
This is the third post in our ‘Coding Horrors’ blog series. ‘Coding Horrors’ stories are hair-raising tales based on real-life harrowing developer experiences. Since we published our last post, we’ve received a lot of feedback and reactions that revealed that such chilling tales are much more widespread than even we imagined. Developers shared with us unimaginable stories of rotten codebases, creativity stifling company culture, and misbegotten development processes. Many of these stories hold valuable lessons while others stand out as a dire warning, a red flag for others to watch out for. As always, if you’ve had your own brush with nightmarish code, we’d love to hear about it!
Today’s coding horror survivor is our very own Shachaf, Data Scientist at Digma. Shachaf’s take on our topic as a Data Scientist is unique. In his own words: “Data Scientists mostly don’t come with a background in software engineering. Most of these folks that I met in my career are lone-wolf developers of scripts (the opposite of which would be team players who develop more complex systems).”
Lessons learned: Feature creep
- Remember that a model is not likely to tell you that its inputs don’t make sense – it will just give you wrong numbers.
- If your team decides to duplicate the entire codebase to create a new feature – you have a choice between running for the hills or getting some popcorn, either way, it’s not an auspicious sign.
- Raise a flag If changes become increasingly difficult to apply and insist on a shift of strategy.
- Lock-in is not an option, wrong decisions that were made in the past should be reversible.
- Note that data goes through processes in a sequence of environments during its lifetime, and maintain avid communications between the different professionals that handle it.
- Ensure that real-time data that is used for model predictions is aligned with training data.
- Make use of abstractions, e.g. scikit-learn’s or Spark’s “Pipeline”, to avoid data misalignments.
- Advocate for TDD for Data Scientists wherever possible.
- Adopt a “from now on” refactoring approach, where code is only refactored when its functionality is changed, rather than an overhaul.
- Don’t assume that getting 90% of the work done means you only got 10% of the work left.
- Plan as tight as possible before starting to code, and then tighten it a bit more.
- Detect feature creep early on, reject most of these creepers, and postpone them until after some initial version is operational and deployed.
What’s the worst, most horrifying experience you have with codebase complexity?
I was working at an established Data Science team at a large corporation At the time, I had not learned how to watch for warning signs and was completely unprepared for the horrors that befell me.
To begin with, there was practically no abstraction in the code. There were only a small number of files and classes which were overburdened with flatly laid code. On the one hand, the code was hosted on a Git repo, and on the surface, it seems like we were following all of the GitOps best practices.
However, because the code was organized as extremely coupled long functions, routines and files, all team members worked on the same parts of the code. The hotspots were feature engineering and learning algorithm tuning, which makes sense and can be handled with a proper abstraction of the process (but, as I said, zero abstraction).
The collisions that arose were too much to bear, so the team decided to (I kid you not) duplicate the entire codebase. Many times over. And commit each copy in a different folder. From now on this will live in infamy as the ‘feature per folder’ atrocity.
So there I was, duplicating the codebase to create a new feature. And when a version was done I just duplicated it again to work on the next version. I had to keep an eye on the work of every other teammate that committed a final change (one that passed a/b testing successfully) to their version so I can change my version accordingly. And you better believe it, once a week we had an “oh no” moment when one teammate realized they forgot to synchronize their version with another’s.
Changes were becoming increasingly difficult to apply, and two different teammates might try to commit conflicting versions to production after they were tested in parallel. Merging those was a nightmare. Every. Single. Time.
It felt like wading through a bog and sinking more and more into the muck on each iteration.. Perhaps worst of all is that we were by now locked in with this work process. Any overhaul or refactor would quickly become outdated and each teammate would eventually have to translate the changes by themselves to the new abstraction. The mere idea of doing it was considered taboo, and it was impossible to get buy-in from both management and teammates as everyone just wanted to race to the next success.
This situation was never amended. As far as I know, they are still working like that, years later.
What was the scariest bug you encountered in a codebase?
As I mentioned before, in Data Science most of the people I met are not trained in software engineering practices. So when I tell them about Test Driven Development they usually say “Oh nice, what do we need it for?”
The thing is, in Data Science it is absolutely necessary to be sure of the correctness of your and other engineers’ code. The usual multi-step process the data goes through is infested with chances to get “it” wrong.
Take for example some real-time system that uses a machine learning model that was trained offline:
- First, we have some form of data collection that involves an ETL (note the T part which stands for “transform”) that prepares the data to be database-ready and inserts it. This is usually written by a data engineer.
- Second, a data scientist pulls out some data, applies some more transformations a.k.a “feature engineering” and uses it to train the model.
- Eventually, there is a real-time service that uses the model for predictions. But this service starts over with the data in its raw form. And chances are that neither the data engineer nor the data scientist are the ones writing this service, but some other software engineer.
Note that each transformation is performed in a different environment.
The problem intensifies if you consider some aggregations or time-dependent metadata that might be used for the model. Those may have to be persisted and managed so the model predictions are based on the same data distribution that the model was trained with.
The number of independent parts that have to move in unison to get a proper prediction from the model is too dang high.
Some abstractions, e.g. scikit-learn’s or Spark’s “Pipeline”, try to bridge some of these gaps by encapsulating these transformations so they are deployable everywhere they’re needed. However, the last time I proposed this to a colleague they tried to wrap their head around the API for a few hours and decided against using it a day later. Mind you once again, these are data scientists, not programmers.
The cherry on top of these grave mistakes is that they are silent. A model is not likely to tell you that its inputs don’t make sense. Even with some input validation, in most cases, you just won’t see it while it happens. It’s not like index-out-of-bounds errors or a beloved segmentation error that crash the process. Everything stays dandy and no one rings the alarm. You simply get the wrong numbers.
In my career, I encountered these data bugs a few too many times. At one point, and only after a change of tech leadership, we realized that we had this kind of data misalignment for 18 months. Two months later we finally were able to lift our KPIs in a reproducible way, for the first time in the life of the team.
Since then, I began advocating TDD for Data Scientists wherever possible. I want to test that the transformations the data engineer applies in the ETL and those that I apply are reflected in the real-time system.
There are several difficulties and caveats in this way, but this is a story for another post.
Refactoring – A story of how you failed?
I landed once in a small team that’s been working on the same project for about two years, with many small features being integrated over time.
The structure and architectural design of the project were NULL, for all practical purposes non-existent. Just a few examples:
- Objects were passed to functions, sometimes mutated and sometimes copy-mutate-returned, so the code was something like that:
obj = f1(obj)
obj = f2(obj)
f3(obj)
obj = f4(obj) - Comments such as “Don’t change this value”, “TODO: remove this”, and “Call this function first”.
- Dead code was abundant, sometimes inside an if False: clause, sometimes commented out, and sometimes hidden in conditions that will never be met.
- Helper functions were kept next to the first place they were ever needed, with no attempt at bundling them up in separate files.
It seemed like no one has ever stopped and raised a flag about how difficult it is to add features and change the code, and if anyone has, they probably failed at making any impactful change.
One more thing, there was only one team member who worked on the project continually since its inception. Another red flag!
So I embarked on a refactoring journey, fully aware that this task may take a while and be met with numerous pitfalls and objections along the way.
I sat down with the most knowledgeable teammate for a week, scrutinizing the code and integrating its needs into more solid structures.
I completed 90% of the job in less than two weeks. I was making good progress. I was almost there, showing my work to the team and being applauded for it.
Along came another feature.
That teammate of mine forgot about some special cases. He rethought some of the decisions we made while planning and chose to flip them. He then suggested I implement some of the new stuff he was working on. He also decided that before finalizing the project we should add newer stuff to the new architecture.
This was the worst kind of feature creep (a term I was not aware of at the time) I have ever experienced, as features were attacking my work both from the past and the future. This project was never finished, and to this day it probably lays hidden in some branch in that repo.
Since then, I learned to detect feature creep early on, and just reject most of these creepers, postponing them until after some initial version is operational.
I also learned to choose whether an overhaul-refactor is needed, or a simpler and more laid-back “from now on” refactoring is better, where code is refactored only when its functionality changes.
Final Words: Tales of Refactoring and Feature Creep
If you missed the first part of the “Code Horrors” series, you can catch up. We’d love to hear your tale as well. If you want to participate in our blog series, shoot us an email.