Beyond SAST: Using Gemini to Orchestrate Semantic Source Reviews.

Articles

Late to the game

I'm late to the LLM game. But that's not unusual. I had a survey course my senior year at the university and didn't pick up the language in earnest until decades later.

I've had minor side gigs of security-focused source code review since my time at Matasano. Recently one came my way this past summer. I asked Gemini to assist in the security review.

Rather than going strictly agentic, I convinced Gemini to create source Common Lisp code to do the orchestration and reporting. This code prepares the subject code, one file at a time, a context, and one of the (now 36) security criteria, such as "Analyze this file specifically for SQL Injection vulnerabilities." and "Analyze for insecure data handling permitting remote attackers to force SSRF".

This code includes a specific recovery handler for UTF-16LE files, which are often found in files prepared by Windows programs.

The answer to a given question comes back in 30 or 45 seconds. The results of these multiple questions are then combined into one file per source file. At the end of the run, a redundancy reduction pass is run on each combined file.

Once the assessments are delivered, the engagement is asked to review the fixed code. This new code and the previous assessment are combined and fed with new instructions to determine if the code has in fact been fixed to address the finding. This part of the process will also notice new files that had not been part of the first assessment and do a full review of them.

This can be repeated for each development cycle.

A cost calculator provides a report on the cost of running the analysis.

Contrast with some commercial tools

Commercial tools often rely on Static Analysis Security Testing (SAST) which uses predefined "rules" or "signatures" to find bugs.

This code uses orchestration that prepares the subject code, one file at a time, a context, and one of the 36 security criteria. This allows for much deeper semantic analysis—for instance, checking specifically for "insecure data handling permitting remote attackers to force SSRF" —which usually eludes pattern-based commercial scanners.

Most commercial offerings are "one-and-done" scanners. This tool implements a deliberate feedback cycle: the logic automatically notices new files that had not been part of the first assessment and performs a full review of them in subsequent cycles. This effectively builds a "security memory" of the project that standard commercial tools lack.

This process now implements gcloud asynchronous mode to reduce the cost of running the analysis. Dependency calculations are an essential part of the analyses. In particular, for repositories that include both server and client code, analysis includes crossing the boundary between e.g. tornado and the client code that calls it. This helps avoid the siloed view that many basic scanners have.

Surprises

There have been several surprises along the way. The biggest is the very detailed and useful detail in the "Solution" section. Classically, this section has been absent or very sparse, as in "Fix this vulnerability or else." While commercial tools provide generic remediation advice, this tool generates custom fixes based on the actual logic of the code being reviewed.

Another surprise is the fluency that Gemini has in response to instructions like "Create a python program to recursively search this directory and produce a file in s-expressions forms that lists the dependencies and exports for each file."

It is also surprising is that Gemini botches parentheses matching when generating Lisp code or s-expressions

There is also an eerie feeling in the overall assessments produced that somebody has been here before.

I do development interaction through the browser. Yes, the cool kids are using agents, but as noted earlier, I am behind the times. One advantage is that there is zero cost for creating, correcting, testing, and documenting code.

Another surprise is how much of very long development conversations it remembers.

Quota challenges are alleviated using gcloud storage and running the queries in async mode. Early experiments show that the wall clock time is often shorter than synchronous mode. These runs suggest that batch mode is half the cost.

Up Next

This has been used in actual consulting gigs. To reduce the calendar time to delivery, async batch mode will be the main approach.

Many repositories have both server and client code. Dependency calculations are now in place to include awareness of the code beyond (e.g.) tornado api.

Working with Gemini for software development has increased my productivity by an estimated factor of five. My 60 years of software development have helped realize this.

The other part of a successful LLM project is the static, repeatable part, which includes things like a test for correctness, extremely precise instructions and context, and guard rails. These are the difference between success and vibe.

Availability

This tool is currently not open source.