Coding Horrors: The Tales of Late Feedback Cycles and Development Uncertainty

In this blog series: we ask prominent developers to share their tales in dealing with various aspects of software complexity.
In this blog series, we invite prominent developers to share their terrifying stories and experiences when dealing with challenges related to codebase complexity.
This is the third post in our ‘Coding Horrors’ blog series. ‘Coding Horrors’ stories are hair-raising tales based on real-life harrowing developer experiences. Since we published our last post, we’ve received a lot of feedback and reactions that revealed that such chilling tales are much more widespread than even we imagined. Developers shared with us unimaginable stories of rotten codebases, creativity stifling company culture, and misbegotten development processes. Many of these stories hold valuable lessons while others stand out as a dire warning, a red flag for others to watch out for. As always, if you’ve had your own brush with nightmarish code, we’d love to hear about it!
Today’s coding horror survivor is our very own Shachaf, Data Scientist at Digma. Shachaf’s take on our topic as a Data Scientist is unique. In his own words: “Data Scientists mostly don’t come with a background in software engineering. Most of these folks that I met in my career are lone-wolf developers of scripts (the opposite of which would be team players who develop more complex systems).”
I was working at an established Data Science team at a large corporation At the time, I had not learned how to watch for warning signs, and was completely unprepared for the horrors that befell me.
To begin with, there was practically no abstraction in the code. There were only a small number of files and classes which were overburdened with flatly laid code. On the one hand, the code was hosted on a Git repo, and on the surface, it seems like we were following all of the GitOps best practices.
However, because the code was organized as extremely coupled long functions, routines and files, all team members worked on the same parts of the code. The hotspots were feature engineering and learning algorithm tuning, which makes sense and can be handled with a proper abstraction of the process (but, as I said, zero abstraction).
The collisions that arose were too much to bear, so the team decided to (I kid you not) duplicate the entire codebase. Many times over. And commit each copy in a different folder. From now on this will live in infamy as the ‘feature per folder’ atrocity.
So there I was, duplicating the codebase to create a new feature. And when a version was done I just duplicated it again to work on the next version. I had to keep an eye on the work of every other teammate that committed a final change (one that passed a/b testing successfully) to their version so I can change my version accordingly. And you better believe it, once a week we had an “oh no” moment when one teammate realized they forgot to synchronize their version with another’s.
Changes were becoming increasingly difficult to apply, and two different teammates might try to commit conflicting versions to production after they were tested in parallel. Merging those was a nightmare. Every. Single. Time.
It felt like wading through a bog and sinking more and more into the muck on each iteration.. Perhaps worst of all is that we were by now locked in with this work process. Any overhaul or refactor would quickly become outdated and each teammate would eventually have to translate the changes by themselves to the new abstraction. The mere idea of doing it was considered taboo, and it was impossible to get buy-in from both management and teammates as everyone just wanted to race to the next success.
This situation was never amended. As far as I know, they are still working like that, years later.
As I mentioned before, in Data Science most of the people I met are not trained in software engineering practices. So when I tell them about Test Driven Development they usually say “Oh nice, what do we need it for?”
The thing is, in Data Science it is absolutely necessary to be sure of the correctness of your and other engineers’ code. The usual multi-step process the data goes through is infested with chances to get “it” wrong.
Take for example some real-time system that uses a machine learning model that was trained offline:
Note that each transformation is performed in a different environment.
The problem intensifies if you consider some aggregations or time-dependent metadata that might be used for the model. Those may have to be persisted and managed so the model predictions are based on the same data distribution that the model was trained with.
The number of independent parts that have to move in unison to get a proper prediction from the model is too dang high.
Some abstractions, e.g. scikit-learn’s or Spark’s “Pipeline”, try to bridge some of these gaps by encapsulating these transformations so they are deployable everywhere they’re needed. However, the last time I proposed this to a colleague they tried to wrap their head around the API for a few hours and decided against using it a day later. Mind you once again, these are data scientists, not programmers.
The cherry on top of these grave mistakes is that they are silent. A model is not likely to tell you that its inputs don’t make sense. Even with some input validation, in most cases, you just won’t see it while it happens. It’s not like index-out-of-bounds errors or a beloved segmentation error that crash the process. Everything stays dandy and no one rings the alarm. You simply get the wrong numbers.
In my career, I encountered these data bugs a few too many times. At one point, and only after a change of tech leadership, we realized that we had this kind of data misalignment for 18 months. Two months later we finally were able to lift our KPIs in a reproducible way, for the first time in the life of the team.
Since then, I began advocating TDD for Data Scientists wherever possible. I want to test that the transformations the data engineer applies in the ETL and those that I apply are reflected in the real-time system.
There are several difficulties and caveats in this way, but this is a story for another post.
I landed once in a small team that’s been working on the same project for about two years, with many small features being integrated over time.
The structure and architectural design of the project were NULL, for all practical purposes non-existent. Just a few examples:
It seemed like no one has ever stopped and raised a flag about how difficult it is to add features and change the code, and if anyone has, they probably failed at making any impactful change.
One more thing, there was only one team member who worked on the project continually since its inception. Another red flag!
So I embarked on a refactoring journey, fully aware that this task may take a while and be met with numerous pitfalls and objections along the way.
I sat down with the most knowledgeable teammate for a week, scrutinizing the code and integrating its needs into more solid structures.
I completed 90% of the job in less than two weeks. I was making good progress. I was almost there, showing my work to the team and being applauded for it.
Along came another feature.
That teammate of mine forgot about some special cases. He rethought some of the decisions we made while planning and chose to flip them. He then suggested I implement some of the new stuff he was working on. He also decided that before finalizing the project we should add newer stuff to the new architecture.
This was the worst kind of feature creep (a term I was not aware of at the time) I have ever experienced, as features were attacking my work both from the past and the future. This project was never finished, and to this day it probably lays hidden in some branch in that repo.
Since then, I learned to detect feature creep early on, and just reject most of these creepers, postponing them until after some initial version is operational.
I also learned to choose whether an overhaul-refactor is needed, or a simpler and more laid-back “from now on” refactoring is better, where code is refactored only when its functionality changes.
If you missed the first part of the “Code Horrors” series, you can catch up. We’d love to hear your tale as well. If you want to participate in our blog series, shoot us an email.