What happens when you turn the tables on automated attackers?
The Setup
For the past few days, I’ve been running a honeypot across several domains. The concept is simple: serve fake files and responses that make automated scanners think they’ve hit the jackpot, when in reality, they’re downloading garbage and wasting their time.
The results? Over 600 attack attempts in just 4.5 days, with attackers downloading fake credentials, configuration files, and “sensitive” backups that lead absolutely nowhere.
What They’re Looking For
Automated scanners are constantly probing websites for common security mistakes. Here’s what they tried to access on my honeypot:
1. Database Reconnaissance
/_all_dbs– CouchDB database listing/server-status– Apache server status pages- Various database administration endpoints
2. The Backup File Hunters
The most aggressive attackers systematically requested dozens of backup files:
/backup.zip
/site-backup.zip
/config.zip
/backup.tar.gz
/old/backup.zip
/restore/backup.sql.gz
/backups/Archive.zip
... and 50+ more variations
These bots try every conceivable backup filename and directory structure. My honeypot serves fake backup files ranging from a few kilobytes up to 1MB each, filled with pseudorandom data. The attackers eagerly download them all, thinking they’ve struck gold.
3. Configuration File Enumeration
Modern attackers know that misconfigured deployments often expose sensitive files:
/docker-compose.yml
/kubernetes.yml
/.env
/config/database.yml
/secrets.yml
/aws-secret.yaml
/azure-config.yml
/serverless.yml
My honeypot returned realistic-looking (but completely fake) configuration files. The attackers downloaded them thinking they’d found exposed credentials.
4. Information Disclosure Attempts
/phpinfo– PHP configuration pages/_profiler/phpinfo– Symfony debug endpoints/config/parameters.yml– Framework configuration files
The Numbers
Time Range: November 30 – December 5, 2025 (4.5 days)
Attack Statistics:
- 600+ distinct requests from automated scanners
- Over 200 MB of fake data downloaded by attackers (and growing)
- 150-200 scanning sessions from different sources
- Multiple cloud provider IPs (AWS, DigitalOcean, others)
- Fake file sizes: Pseudorandomized up to 1MB per file
Most Targeted Domains:
- My main domain received 200+ probing requests
- Subdomains were systematically scanned
- Same attack patterns repeated across all domains
The Beautiful Pattern of Automated Attacks
What’s fascinating is how predictable these attacks are. A typical scanning session looks like:
- Initial probe – Check if server responds
- Service identification – Try to identify database/service type
- Systematic enumeration – Request 10-20 common sensitive files
- Download everything that returns 200 OK
Here’s a real sequence from my logs:
03:33:10 - /config.zip → 200 OK (873 KB)
03:33:09 - /site-backup.zip → 200 OK (1.02 MB)
03:33:07 - /backup.zip → 200 OK (645 KB)
03:33:05 - /kubernetes.yml → 200 OK (47 KB)
03:33:04 - /docker-compose.yml → 200 OK (23 KB)
03:33:02 - /secrets.yml → 200 OK (15 KB)
The bot found six “vulnerable” files in 8 seconds and downloaded over 2.5MB of pseudorandom garbage data. Each file is filled with realistic-looking but completely meaningless binary data that won’t compress well and must be fully downloaded, extracted, and analyzed before they realize it’s worthless.
How Much Time Are We Actually Wasting?
Direct Time Costs
Per scanning session:
- 15-30 seconds per domain scan
- 10-20 HTTP requests
- 10-20 MB of pseudorandom data downloaded per session
Conservative estimate: 2-4 hours of cumulative bot time wasted across all sessions, plus several gigabytes of bandwidth burned downloading fake files.
Indirect Time Costs (The Real Multiplier)
But the time waste doesn’t end when they download the files. Here’s what happens next:
- Extraction time – Automated tools unzip/untar fake archives filled with pseudorandom data
- Parsing time – Scripts analyze fake config files for credentials, choking on random bytes
- Pattern matching failures – Tools search for credential patterns in meaningless data
- Credential testing – Any fake AWS keys, database passwords, and API tokens found get tested against real services (and fail)
- Database pollution – Vulnerability scanners add my domains to their “found vulnerable” lists
- Manual review – Eventually, someone might manually check why their “compromised credentials” don’t work
- Storage costs – All that downloaded junk data gets stored in their attack databases
Estimated total time waste: Each “successful” download could waste 5-10x more time in downstream processing. Plus, pseudorandom data that doesn’t compress means they’re storing and transferring the full file sizes.
Why This Matters
For Defenders
Honeypots aren’t just about wasting attacker time (though that’s satisfying). They provide:
- Early warning signals – See attack patterns before they hit production
- Threat intelligence – Understand what attackers are looking for
- Attack attribution – Track which IPs and patterns target you
- Cost-free defense – Make yourself a less attractive target
For Attackers (Yes, You Reading This)
If you’re running automated scanners, here’s the problem: you can’t tell the difference between my honeypot and a real misconfiguration. Your tools now think my domains are vulnerable when they’re not. Your databases are polluted with false positives. Your automated exploitation chains waste resources on dead ends.
This is the asymmetric advantage of honeypots – they cost me almost nothing to run, but they cost you time, bandwidth, and false confidence.
The Repeated Offenders
Some IPs came back multiple times over the 4.5 day period, hitting the same endpoints repeatedly. This suggests:
- Their automated tools marked my sites as “vulnerable”
- They’re running regular re-scans
- They haven’t figured out they’re hitting a honeypot
Success.
Lessons Learned
What Works Well
✅ Returning 200 OK for “sensitive” files (perfect bait)
✅ Realistic file structure and naming
✅ Pseudorandom file sizes up to 1MB (wastes bandwidth and analysis time)
✅ Non-compressible data (forces full downloads, no shortcuts)
✅ Multiple domains amplifying the effect
✅ Consistent responses that look like real misconfigurations
Potential Improvements
To waste even MORE attacker time, I could:
- Variability in response timing – Add artificial delays to waste more network time
- Nested archives – Zips within zips within tar.gz files
- Fake database responses – Return realistic-looking database dumps
- Embedded fake credentials – AWS keys, database passwords that look valid but aren’t
- Larger file sizes for “full backups” – Push certain files to 5-10MB range
The Ethics Question
“Is it ethical to waste attackers’ time?”
These are automated, indiscriminate attacks against infrastructure I own. The attackers are:
- Attempting unauthorized access
- Searching for exposed credentials
- Planning to exploit any vulnerabilities found
Running a honeypot is passive defense – I’m not attacking back, just making myself look vulnerable while being completely secure. If that wastes their time, that’s the point.
Conclusion
In 4.5 days, my honeypot received over 600 attack attempts, served hundreds of megabytes of pseudorandom fake data, and wasted hours of attacker time and significant bandwidth. And this is just a small-scale experiment.
The beauty of pseudorandom data is that it’s:
- Incompressible – They download every byte
- Unpredictable – Automated tools can’t easily identify it as fake
- Realistic-looking – File sizes match what real backups would be
- Storage-expensive – Takes up space in their attack databases
The internet is constantly being scanned by automated tools looking for low-hanging fruit. By serving fake fruit filled with random garbage, we can:
- Waste their bandwidth and storage
- Burn their processing time
- Learn their techniques
- Make ourselves less attractive targets
- Have a bit of fun at their expense
If you’re running web infrastructure, consider setting up a honeypot. It’s surprisingly easy, costs almost nothing, and provides valuable intelligence about who’s targeting you and what they’re looking for.
Plus, there’s something deeply satisfying about knowing that somewhere, an attacker’s script is trying to extract credentials from a megabyte of pseudorandom noise.
Technical Details
Interested in the raw data? Here’s a breakdown:
- File generation: Pseudorandom data up to 1MB per file
- Why pseudorandom? It doesn’t compress well, forcing attackers to download and store full file sizes
- Status codes: Mix of 200 (bait taken), 404 (realistic), 302 (some redirects), 502 (service unavailable)
- Attack sources: Primarily cloud provider IPs (AWS, DigitalOcean, Alibaba Cloud)
- Geographic distribution: Worldwide
- Peak attack time: Fairly consistent 24/7 automated scanning
- Bandwidth wasted: 10-20MB per comprehensive scan session
- Most common user-agent: (They don’t identify themselves – shocker!)
Most aggressive single domain: ciso.li with 200+ requests, including systematic backup file enumeration and configuration hunting. Each aggressive scan downloads 15-30MB of fake data.
#cybersecurity #honeypot #infosec #threatintel #blueteam #defence
