Most Pentest Reports Get Read Once. We Are Trying to Change That.

Around the middle of last year, a client showed us their SharePoint folder of past security reports. There were sixteen of them, going back four years, from four different testing firms. They were all over a hundred pages. They all had executive summaries that said roughly the same thing. They all had finding lists with CVSS scores and remediation recommendations.

The client wanted to know how many of the findings from the previous report had actually been fixed. Nobody in the room could answer. We spent a day cross-referencing the old report against their current state. Roughly a third of the findings had been remediated. Another third had been partially addressed in ways that made the original recommendation no longer accurate. The last third looked exactly the same as they had eleven months earlier.

This pattern is not specific to that client. It is the dominant outcome of the way the industry delivers penetration testing work. The artifact gets produced, filed for the auditor, and quietly outlives the team that commissioned it. We have been changing how we deliver, because the report was clearly not doing the job.

The shape of the standard report

If you have commissioned a pentest, you know the structure. There is an executive summary written for an audience that does not exist. There is a methodology section that mostly repeats the statement of work. There is a findings section with each issue rated using a five-color scale, a CVSS vector that nobody examines, an attack narrative, and a remediation recommendation that often boils down to "follow industry best practices."

The report goes to whoever requested it. They forward it to a security manager. The security manager forwards it to the team that owns the affected systems. By the time it lands with the engineer who could fix the bug, several handoffs have happened, the context has thinned, and the report is fighting for attention with everything else on their backlog.

The dominant outcome is what you would predict. The critical findings get a ticket. The high findings get a ticket if someone has time. The mediums get an apology in the next compliance review. The lows live forever in the SharePoint folder.

What developers actually need

When we started asking the engineering leads on our client side what would make findings more actionable, we got the same answer repeatedly. They did not want longer narratives. They wanted reproductions.

A finding becomes real to a developer the moment they can trigger it themselves. A curl command that exhibits the issue against a staging environment. A short video of the attack working. A test case that fails, with the expected behavior next to it. The CVSS score does not change their behavior. Watching the bug fire in their own terminal does.

We started shipping our reports with a "reproduce this" section for every finding. The same three things, every time. The setup steps so they can get to a working baseline. The exact request, payload, or sequence that demonstrates the issue. The expected response that proves it worked. When an engineer can paste that into their tools and watch it happen, the conversation stops being about whether the finding is valid and starts being about how to fix it.

Critical does not mean fix me first

The other shift was in how we present priority. The traditional ratings are a measure of how bad the worst case is. They do not tell anyone what to do on Monday morning. A critical finding in a system that is two months from being decommissioned is less urgent than a medium finding in a customer-facing payment flow.

We now deliver findings with two dimensions. Severity, in the usual sense, captures the worst-case impact. Priority, separately, captures the order in which we think they should be tackled given the system's context, dependencies, and the cost of remediation. We let the client argue with us about the priority. They almost always have local context we do not. The argument is the point. By the time the conversation is over, the engineering team has agreed on what they are doing first, and that agreement is in writing.

The remediation gap

Industry surveys put the average time to remediate a critical finding from a pentest somewhere between sixty and a hundred and twenty days. For mediums it stretches out past a year. Those numbers are the average. The distribution is bimodal. A small group of organizations remediate critical findings within a sprint or two. Everyone else lets them age.

What separates the two groups is almost never the technical sophistication of the security team. It is whether the findings made it into the engineering workflow as units of work the team already knows how to handle.

A finding in a PDF is information. A ticket in the team's issue tracker, with a clear acceptance criterion and an owner, is work. The difference in remediation rate between those two formats is large enough that we now treat ticket creation as part of the deliverable, not a follow-up the client does on their own time.

The exact mechanics depend on what the client uses. We have created Jira issues, GitHub issues, Linear items, ServiceNow tickets, and direct PRs against client repos for low-risk fixes we could implement ourselves. The format does not matter. What matters is that when the report is delivered, the work is already where the team lives, and somebody's name is on each item.

The debrief is more important than the document

The most useful hour of an engagement is the one immediately after the testing ends, where we sit with the engineering team and walk through the findings live. Not the executive readout. The detailed, screen-sharing, "here is the request we sent and here is what came back" session.

Two things happen in that meeting. First, the engineers ask questions that would have taken three rounds of email otherwise. Second, the engineers usually identify two or three findings that they can fix in the next hour. They fix them in the next hour. We retest before the meeting ends. The fixes are confirmed. That work is done by the time the report is written.

We have started insisting on the debrief as part of every engagement. Some clients pushed back at first because their schedule was tight. None have asked to remove it after the first one.

Retesting that is actually verification

Retesting has historically been a checkbox exercise. The client says they fixed the finding. The tester confirms by trying the original attack one more time. If it does not work, the finding is closed.

The problem is that "the original attack does not work" can mean "the underlying issue was fixed" or it can mean "the obvious version of the attack was blocked but the root cause is still present." The latter happens more often than anyone is comfortable admitting. A team patches the symptom without understanding the cause. The next person to look at the system finds the same class of bug somewhere else.

We treat retesting as a fresh attempt to exploit the underlying weakness, not a replay of the original payload. If the original payload no longer works, we look for adjacent payloads that would have worked against the same root cause. If those still succeed, the finding is not closed. It is updated, with the new attack path documented, and sent back.

This has made retests longer and made closures more reliable. The trade is good. A finding closed wrongly is worse than a finding still open.

Continuous versus point-in-time

The annual pentest still has a place. Some compliance frameworks require it. Some systems do not change quickly enough to justify continuous testing. The clients we work with on a continuous basis are usually the ones with rapid release cadences, large attack surfaces, or features where the security posture genuinely shifts between releases.

The honest comparison is not "annual is bad, continuous is good." It is that annual testing produces a snapshot of where the system stood a week before the release that introduced the vulnerability everyone is now scrambling to fix. Continuous testing finds the vulnerability in the release that introduced it, while the engineer who wrote the code still remembers the design decision.

If you have ever had a finding land six months after the relevant code shipped, you know why continuous matters. The team that wrote the code is half-rotated out. Nobody remembers why the choice was made. The fix takes three times as long because the context has to be rebuilt.

We offer both models. We talk clients out of continuous when they cannot absorb the volume of findings. We talk clients into continuous when their release cadence has outpaced their security feedback loop.

What an executive summary should do

The executive summary in most reports is written defensively. It uses words like "comprehensive" and "rigorous" and "industry-standard." It is meant to satisfy an auditor or a board member, not to drive a decision.

We try to write executive summaries that contain decisions. Three to five things the leadership team should care about. The likely cost in budget and engineering time of addressing them. The risk of not addressing them, in terms the leadership team uses for other risks. The themes that suggest a systemic issue rather than a one-off bug.

When the summary has that shape, leadership reads it. When the summary is a paragraph of adjectives followed by a chart, they delegate it back down. The difference is whether the report changes how the organization spends money next quarter.

What we look for from a client

Engagements go better when both sides treat the report as the start of the conversation rather than the end of the contract.

We ask clients to nominate an engineering owner for each system in scope before testing starts. That person joins the debrief, owns the tickets, and signs off on the retests. Without a named owner, findings drift.

We ask clients to give us access to the staging environment, with realistic test data, and the documentation that exists internally. The depth of finding we can produce depends on the depth of access. A black-box test from outside the perimeter will find a different and usually smaller set of issues than a credentialed assessment with source access. The right depth depends on what the client is trying to learn.

We ask clients to commit to a retest window before the engagement closes. Not "we will get back to you." A date on the calendar. The work that does not have a verification deadline tends not to happen.

What we have stopped doing

A few things we used to do and no longer do:

Sending the report as the first deliverable. The debrief comes first, then the tickets, then the document.

Using only CVSS for prioritization. We still produce CVSS vectors because clients want them, but the priority decision is made separately.

Closing findings on a single confirming test. We retest the underlying weakness, not just the original payload.

Long methodology sections that repeat the statement of work. The report links to the SOW and gets on with the findings.

Recommendations that read like "follow industry best practices." Every recommendation is specific enough that a competent engineer can act on it without further interpretation.

These are small process changes. They add up.

What this is not

We are not claiming that a better delivery process compensates for shallow testing. The findings still have to be real, the attack chains still have to be plausible, the recommendations still have to work. A polished report wrapped around weak testing is worse than a plain report wrapped around strong testing.

We are claiming that strong testing wrapped in the standard report format leaves most of its value on the table. The shift in how we deliver has changed how much of the work actually lands in production fixes. That was the point.

If your last report is sitting unread

If you have a stack of past reports and no clear answer to "how much of this got fixed", you have plenty of company. The first step is not commissioning another test. It is going through the existing findings, marking them as resolved, partially resolved, or untouched, and figuring out which class fell into which bucket and why.

Once you know that, the next engagement can be shaped to avoid the same outcome. Sometimes that means a different scope. Sometimes it means a different delivery format. Sometimes it means involving engineering earlier and the security team later.

Talk to us if you want help running that retrospective. The conversation is short and useful even if you end up working with somebody else for the test itself.