AI is revolutionizing the way developers work. Solutions like GitHub Copilot and ChatGPT are already rolling out lines of code into our codebases, rather than requiring developers to conjure them up themselves. In a matter of mere months, the internet has become flooded with mind-boggling examples of how AI solutions are transforming the coding experience, by saving time, effort, and minimizing mistakes during development.
The technological potential is mind-boggling. AI can offload mundane and repetitive tasks from developers. Today’s AI platforms can take care of tasks such as writing trivial code scaffolding, running basic tests, identifying bugs, refactoring code in a simplified manner, writing documentation, and much more! Relieving developers of these efforts free up the time for higher-level coding, which can improve code quality, boost innovation and help engineering teams develop greater products.
Substituting such menial tasks with AI also provides a better developer experience. GitHub polled Copilot users about their experience. 60% felt more fulfilled and less frustrated when using Copilot and 74% were able to focus on more satisfying work.
Finally, AI-generated code could also assist an industry suffering from a global developer shortage. By being more productive, organizations can accomplish ‘more with less’ and legacy industries can now introduce code into their operations more simply and quickly, to accelerate digital transformation.
40% of Code — Not Touched by Humans
It’s no wonder, then, that using AI as part of the developer day-to-day has been gaining popularity. 400,000 users adopted GitHub Copilot in its first month of GA, June 2022. According to GitHub, these numbers follow suit of 1.2 million developers in its technical preview. These developers, claim GitHub, and Copilot “became an indispensable part of their daily workflows. In files where it’s enabled, nearly 40% of code is being written by GitHub Copilot in popular coding languages.”
The Uncanny Valley of Production Readiness
And yet, countless examples of errors, ranging from hilarious to worrying, beg the question: is AI-generated code ready to be trusted in production? From slight mishaps to blatant beginner-level mistakes, AI coding mistakes can be frustratingly difficult to spot, especially as some generative algorithms are known to ‘make things up confidently’. The code generated by machines has too high a probability of being insecure and exploitable by adversaries, crashing production, creating performance bottlenecks, not scaling, or just plain not working.
Even the most adamant Copilot and ChatGPT aficionados seem to agree that AI-generated code still needs a human being to review it. Human review is required, first and foremost, for quality. AI code is generated in a non-transparent way for users, meaning developers don’t know how and why the code was generated as it was. Unlike when working with a human copilot or pair programmer, developers can’t easily retrace the coding steps or ask why a line of code was generated in a certain way.
What we do know about AI code, is that Copilot, for example, was trained on public and open-source GitHub libraries. The answers it provides are the average or the most common responses provided. [And as we all know, ‘Mediocrity knows nothing higher than itself, but talent instantly recognizes genius.’(Arthur Conan Doyle)]. In other cases, Copilot pieces together code snippets from a variety of libraries.
Given the error potential of this method (and real examples of errors), it’s understandable why engineering managers would require a trustworthy subject (i.e, a human developer on their team) to review the code before agreeing to push it to production.
Another issue that seems to be concerning the online developer community is copyright. Using code from open-source libraries is allowed under certain licensing conditions. As of now, it seems that AI code generators don’t abide by these conditions. This means that companies that use the code are violating open-source licensing terms, which could have legal repercussions.
Given these challenges, some developers wish to shoot down the concept of AI-generated code altogether. It’s too risky, not high quality enough, and requires humans to do the work of maintenance and debugging, which makes it not cost-effective.
Does AI Have a Bright Future Ahead?
Where does this leave us? Should developers relinquish AI until OpenAI, Microsoft and all the other AI contenders improve their algorithms for coding? Or perhaps the organizations that are refusing to use AI code generators are being short-sighted and missing out on the potential and opportunity of AI?
Forfeiting AI seems to be a bit hasty. While errors due exist, the potential is obvious. Yet, organizations cannot be expected to pay the price of AI trial and error. Instead, we need to find a way to leverage AI to generate trustworthy code, today.
Let AI Test the AI
Given the potential of the technology, it does not make sense to give up so easily. As adoption grows, the developer community will start looking into technological solutions that can solve technological gaps. In that sense, the very technology we are using to generate code might also be able to help remedy some of its shortcomings.
What if, instead of relying on humans, we leveraged similar AI technologies for code evaluation, rather than only for a generation? AI can be used not just to write code, but to assess code runtime behavior based on observability, metrics, and functional tests (some of which may be generated!). It can further improve our confidence in the code by testing how it works at scale and validating that it is truly production ready. Furthermore, it can keep track of the generated code in CI, staging, or prod after it’s already released, to identify any unusual errors, performance degradations, or telltale signs that something is amiss.
Treading Cautiously Into the Future
While the potential benefits of AI-generated code are clear, it is essential to approach it with caution. Developers need to ensure that the AI-generated code they use is reliable and secure and that it is well-suited for their specific use case.
Without the right automated and continuous approach to evaluating machine-made code, we might spend more time double-checking and triple checking its validity than we would have spent writing it ourselves. However, if we do manage to develop this complementary technology, we will truly be able to unlock generative AI’s full potential.