Such A Small Thing

In my early days in pre-sales roles, we had a dedicated team selling testing solutions. The sales guy usually started his presentation (after the other solutions got presented) with “It is like in real life – testing comes always at the end, there is never enough money for it, and due to the previous phases there is never enough time…” – he had to hurry up to get his part done and his messages across.

Nowadays it is completely different, as we have AI to generate shiny code completely free of flaws, isn’t it? Unfortunately – we are not there yet.

Generating code is not a problem anymore, thousands of lines of code, but generating secure code is not necessarily the stronghold. I’ve seen some outcomes of LLMs with good looking code, but with a path traversal issue. After asking the agent to check if there are security problems and a fix, the model reported that it found the path traversal problem. But why hasn’t it avoided the problem in the first place?

Another example is a code generated by an agent out of a userstory, and a second agent (from the same vendor, BTW) fixing it. Okay, I want to be fair – we are still in the early days of GenAI, but do we really want to focus on the same characters in development that we have with humans?

After talking to a real person, it is easy to categorize his personality in terms of coding. You have the prototype doing things quick and dirty (but it does the job), or it is the well-balanced personality of a full-stack developer, and the more architectural type, balancing art and engineering style of a principal full-stack developer. All types create code with issues, or it is hard to maintain or inefficient.

However, the vast majority of generated code smells, for example, by creating redundant code in various parts of the application, or unused code. Although it is not a security risk, it still causes problems along the lifecycle.

There is an excellent blog from SonarSource on some LLMs and their personality (Message from Welcome to Sonar Chat!).

It is interesting that some LLMs do change their behavior when you tell them to do so. A nice kick in the butt like “I’ve told you multiple times to avoid security flaws, so do it!” can give the entire process a new spin. Companies that are in the process of adopting GenAI in their development should carefully select which LLM they want to use as a base. Blogs like the one above can help with a decision, but might be outdated immediately after being published. The final decision is still up to you. Here is the link to the OWASP 2025 Top 10 Risk & Mitigations for LLMs and Gen AI Apps (LLMRisks Archive – OWASP Gen AI Security Project). If the LLM in your PoC is not making mistakes, then you have a winner!

Meanwhile, there is no choice – the generated code must be validated and reviewed while taking it into production. It doesn’t start after committing your sourcecode. Continuous checks must be done in the IDE of the developer, and quality gates must be passed along the entire CI/CD cycle. The more the software is endangering the life of users (physically and socially), the more rigid the checks must be.

Let’s think about GenAI LLMs like young kids – they will grow up and mature, but until then they need proper rail guards not getting lost on the way.

Yours sincerely,

Rainer Heinold

Reach us

ASERVO Software GmbH

Konrad-Zuse-Platz 8

81829 München Germany

Tel: +49 89 7167182 – 40

Fax: +49 89 7167182 – 55

E-Mail: Kontakt@aservo.com

Reach us

Company

PRIVACY POLICY

Social Media