In this blog series, we ask prominent developers to share their horrifying stories and experiences in dealing with codebase complexity.
The idea behind the “Coding Horrors: The Terrifying Tales of codebase complexity” blog series originated from our own experiences as developers. Whether you’ve been a software developer for a while or just started, you probably understand the anxiety and terrifying “oops” moments that often occur when working in complex distributed software environments. We’re sure that you’ve all dealt with codebase complexity and have encountered painful experiences while debugging or made incorrect design decisions and have your own code horror story.
After all, most developers have a tale like this to tell. We decided to each time talk to a prominent developer and ask them to share their coding horror stories. This time, Daniel Beck, a senior software developer, UX specialist in product development, and consultant, shared his horror coding stories with us.
Daniel has quite an experience in software development! He’s been in the industry since the very beginning. Just to give you a clear picture, the browser used at his first job was NCSA Mosaic. During that time, all it took was learning the five HTML tags, and you were good to go. Along his journey, Daniel has developed strong opinions, and we’re sure you can learn from them.
You can check out his blog. It will give him the motivation to write more articles that we can all enjoy! danielbeck.net/blog
Codebase complexity: Lessons learned
- Dumb, readable code is infinitely better than clever.
- Don’t listen to vendors blaming the hardware and recommending expensive server upgrades.
- Be aware of the danger of shortcuts and the importance of understanding how code works under the hood.
- Opening and closing a database connection is a slow and expensive operation.
- Consider the potential risks and implications when installing new npm modules or importing someone else’s code.
- Beware of teammates who refactor code based on personal taste without proper documentation or completeness.
- Ensure code changes are well-documented.
- Be cautious of colleagues who make undocumented changes that create subtle traps for others.
- Improve the code review processes to address flaws.
- Acknowledge that even big companies like Facebook can make mistakes, as seen in their DNS record issue on October 4, 2021.
- Don’t ever be the guy who accidentally breaks something on prod.
- Broken code is fixable. People, now, that’s another story.
What’s the worst, most horrifying experience you have with codebase complexity What made it so nightmarish?
Daniel: This one’s only horrifying in retrospect; at the time I thought it was some of my best work. (Yes, it’s 100% my fault.)
The product in question was at heart a content management system for generating variations on a specialized type of website in bulk. I was responsible for building the front-end system that would take in the website data and pass it through a series of designed templates to output the finished site.
This was during that brief period of the late 1990s when the entire industry had collectively decided XML was The Way Things Should Be Done, so naturally I decided to build the template system in XSLT.
XSLT, if you’re not familiar with it, was a fascinatingly pure language; its purpose was to transform XML structures into other XML structures and was itself written in XML because XML was The Way Things Should Be Done.
Among other interesting challenges, it was strictly idempotent, which from a philosophical standpoint is awesome but from a practical standpoint meant that – for example – any looping operation had to be done via recursion instead, because incrementing a variable for an iteration loop counted as a side effect, so was Not Allowed. Control flow was best done through data decomposition instead of branching logic. And so on. To people used to scripting languages and markup, it was a real brain-breaker.
I loved it. No longer did I need to wheedle the specific bits of data I needed from the backend guys, it would all just come at me as one gigantic wad of XML, and I could shuttle it through my increasingly gigantic wad of XSL templates to generate anything I wanted. I had so much power! I could do anything!
But I was loving it, I was learning those advanced concepts and feeling like something of a badass for being able to do it. Over the course of a year or so, I built up a pretty substantial set of these templates which gradually got less terrible as I learned what I was doing; towards the end, I was writing highly decomposed, idiomatic code that I considered clever, and sometimes even “elegant”. Got them to the launch date, collected my contract fee and moved on to the next job feeling good about myself.
I have it on good authority that they kept those XSL templates in place, untouched, for the next five years until the product was scrapped and replaced altogether because literally, no one at the company could understand how they worked. The clever, elegant code was the worst part of course. My earlier, clumsy approaches were the most readable.
That was an important lesson. Dumb, readable code is infinitely better than clever.
Describe a horrifying architectural decision you encountered in a project. How did you deal with it?
This was during my freelancing days. One of my regular clients – a midsize corporation that outsourced a lot of development work – called me in a little bit of a panic: they’d contracted out a small project to a new vendor, the work was complete and they were nearing the launch date. Once they started replacing the test data with the real thing, though, they’d started running into major performance issues: the whole site bogged down when it had more than a tiny amount of data to work with. The vendor was blaming the hardware and recommended upgrading to a much more powerful and expensive server. The client called me in for a reality check before spending that money.
Long story short, the problem turned out to be a utility function these guys had written to make it more convenient to talk to the database: you’d feed it a SQL query, it’d open the DB connection, run the query, return the results, then tidy up and shut down the connection.
Very nice, and it did make their code very readable. But the problem, which I’m sure many readers have spotted already, is that opening and closing a database connection is a slow, expensive operation: ideally, you want to open it once, run all your queries, and then close it only when you’re completely finished. The way these guys had written their code, it was opening and closing the connection for every individual operation, meaning sometimes hundreds or thousands of times: once to load a list of data, once again for each and every item in the list. No wonder the server was bogging down!
This turned out to be an easy fix – just remove the ‘open’ and ‘close’ operations from the utility, and move them to the beginning and the end of the program instead of repeating them inside loops. But it was a good demonstration of the danger of shortcuts, and the difference between being able to get something to work, and understanding why it works and what it’s doing under the hood. They never would have had this issue if they had had to write code to open and close the database every time they were doing it, but the fact that it was tucked away in a utility out of view made it easy not to notice.
I think about this pretty much every time I install a new npm module or otherwise import someone else’s code… what otherwise sensible thing is it doing that might ruin my day?
Refactoring Disasters – A story of how you failed?
We had two separate but related web products, built in separate repositories by different teams using different code styles. We wanted to merge those into one product.
Both codebases were complex and large enough that refactoring them into a sensible, unified repository would be a long-term project; we needed a shorter term solution in the meantime.
What we should have done is left the code where it was, unified the UI, and had the site navigation link back and forth between each product as needed.
What we did instead was copy all the code from product A into subfolders in product B’s repository, with the intention of later on gradually refactoring the code from both to match each other, and find and eliminate duplicate functionality as we went.
This might have worked out okay! Except for the fact that the engineer assigned to copy the code over decided it would be a good idea to start doing some of that refactoring as he went; he made a number of substantial changes to the code on both sides according mostly to his own personal taste, to varying degrees of completeness.
He mostly managed to avoid breaking things in obvious ways during this process but also managed to set many subtle traps for his fellow engineers who weren’t aware of the (of course undocumented) changes he’d made. He then, of course, promptly resigned to join a rival company.
(If you’re thinking this whole situation points to major flaws in both the management and the code review processes of that organization, you are not incorrect!)
So we ended up doing that long term refactor in double time, in public view. We got there, but it was quite a ride, involving one of the most rueful retrospectives I ever hope to be part of.
Last I heard, that company was in the process of “breaking up the monolith” and starting in on decomposing that unified front end into, uh, separate but related products.
Have you ever had to deal with a third-party library or framework that caused unexpected issues or complications?
On October 4, 2021, Facebook screwed up their DNS records and locked themselves, and their associated APIs, offline most of the day. This took our application down with it because it was too tightly bound to those Facebook APIs, implicitly assuming they would never be offline. I mean. It’s Facebook, right? Why would Facebook go offline?
Our frantic code rewrite to correct that took most of the day, plus five minutes – in other words, their API came back online almost immediately before we finished deploying the changes that allowed us to work without that API.
Oh well. At least we were ready for the next time.
Have you ever had a terrifying “oops” moment or made a coding mistake that sent shivers down your spine?
I’m pretty strict about making sure I never have access to the production server because I don’t ever want to be the guy who accidentally breaks something on prod.
I have been this guy, though: “Oh what’s this extra code still lying around unmerged? I’ll just tidy up a little bit.” I still have a screenshot from Slack that day:
That was just code, though. You can always untangle code problems. The most truly terrifying moment of my career takes us way back before Slack, before hipchat, all the way back to the email era.
This was at a small but established startup. One of our customer support people sent a late-night message to a bunch of the team, looking for advice on how to handle some issues with a particularly demanding customer. I don’t remember at all what the specific issue was, but it turned into one of those after-hours griping sessions, all of us blowing off a little steam emailing each other our complaints about how difficult these people were and how unreasonable some of their requests… The cc list kept growing, everyone from sales to engineering to the CEO got their digs in, and eventually, the CSR got the answer she needed and went to send that to the customer.
And then came this message back to the list:
“Uh… guys? Guys I think I accidentally just forwarded this whole email thread to the customer”
Cue roughly ten minutes of silence, during which everyone presumably had small heart attacks while scrolling back through the reply-all chain to see which awful statements they were going to have to personally apologize for. It turned out to be a false alarm, she hadn’t in fact sent the whole thread to the customer – but, yeah, that was a cold sweat moment I hope never to have to go through again. Broken code is fixable. People, now, that’s another story.
Final Words: Tales of Codebase Complexity
If you missed the first part of the “Code Horrors” series, you can catch up. If you can relate to any of these experiences, you’re not alone. And if you have a tale to tell, we want to hear it. Connect with us: Here.