The Art of Wasting Hackers' Time: A Honeypot Experiment

What happens when you turn the tables on automated attackers?

The Setup

For the past few days, I’ve been running a honeypot across several domains. The concept is simple: serve fake files and responses that make automated scanners think they’ve hit the jackpot, when in reality, they’re downloading garbage and wasting their time.

The results? Over 600 attack attempts in just 4.5 days, with attackers downloading fake credentials, configuration files, and “sensitive” backups that lead absolutely nowhere.

What They’re Looking For

Automated scanners are constantly probing websites for common security mistakes. Here’s what they tried to access on my honeypot:

1. Database Reconnaissance

/_all_dbs – CouchDB database listing
/server-status – Apache server status pages
Various database administration endpoints

2. The Backup File Hunters

The most aggressive attackers systematically requested dozens of backup files:

/backup.zip
/site-backup.zip
/config.zip
/backup.tar.gz
/old/backup.zip
/restore/backup.sql.gz
/backups/Archive.zip
... and 50+ more variations

These bots try every conceivable backup filename and directory structure. My honeypot serves fake backup files ranging from a few kilobytes up to 1MB each, filled with pseudorandom data. The attackers eagerly download them all, thinking they’ve struck gold.

3. Configuration File Enumeration

Modern attackers know that misconfigured deployments often expose sensitive files:

/docker-compose.yml
/kubernetes.yml
/.env
/config/database.yml
/secrets.yml
/aws-secret.yaml
/azure-config.yml
/serverless.yml

My honeypot returned realistic-looking (but completely fake) configuration files. The attackers downloaded them thinking they’d found exposed credentials.

4. Information Disclosure Attempts

/phpinfo – PHP configuration pages
/_profiler/phpinfo – Symfony debug endpoints
/config/parameters.yml – Framework configuration files

The Numbers

Time Range: November 30 – December 5, 2025 (4.5 days)

Attack Statistics:

600+ distinct requests from automated scanners
Over 200 MB of fake data downloaded by attackers (and growing)
150-200 scanning sessions from different sources
Multiple cloud provider IPs (AWS, DigitalOcean, others)
Fake file sizes: Pseudorandomized up to 1MB per file

Most Targeted Domains:

My main domain received 200+ probing requests
Subdomains were systematically scanned
Same attack patterns repeated across all domains

The Beautiful Pattern of Automated Attacks

What’s fascinating is how predictable these attacks are. A typical scanning session looks like:

Initial probe – Check if server responds
Service identification – Try to identify database/service type
Systematic enumeration – Request 10-20 common sensitive files
Download everything that returns 200 OK

Here’s a real sequence from my logs:

03:33:10 - /config.zip → 200 OK (873 KB)
03:33:09 - /site-backup.zip → 200 OK (1.02 MB)
03:33:07 - /backup.zip → 200 OK (645 KB)
03:33:05 - /kubernetes.yml → 200 OK (47 KB)
03:33:04 - /docker-compose.yml → 200 OK (23 KB)
03:33:02 - /secrets.yml → 200 OK (15 KB)

The bot found six “vulnerable” files in 8 seconds and downloaded over 2.5MB of pseudorandom garbage data. Each file is filled with realistic-looking but completely meaningless binary data that won’t compress well and must be fully downloaded, extracted, and analyzed before they realize it’s worthless.

How Much Time Are We Actually Wasting?

Direct Time Costs

Per scanning session:

15-30 seconds per domain scan
10-20 HTTP requests
10-20 MB of pseudorandom data downloaded per session

Conservative estimate: 2-4 hours of cumulative bot time wasted across all sessions, plus several gigabytes of bandwidth burned downloading fake files.

Indirect Time Costs (The Real Multiplier)

But the time waste doesn’t end when they download the files. Here’s what happens next:

Extraction time – Automated tools unzip/untar fake archives filled with pseudorandom data
Parsing time – Scripts analyze fake config files for credentials, choking on random bytes
Pattern matching failures – Tools search for credential patterns in meaningless data
Credential testing – Any fake AWS keys, database passwords, and API tokens found get tested against real services (and fail)
Database pollution – Vulnerability scanners add my domains to their “found vulnerable” lists
Manual review – Eventually, someone might manually check why their “compromised credentials” don’t work
Storage costs – All that downloaded junk data gets stored in their attack databases

Estimated total time waste: Each “successful” download could waste 5-10x more time in downstream processing. Plus, pseudorandom data that doesn’t compress means they’re storing and transferring the full file sizes.

Why This Matters

For Defenders

Honeypots aren’t just about wasting attacker time (though that’s satisfying). They provide:

Early warning signals – See attack patterns before they hit production
Threat intelligence – Understand what attackers are looking for
Attack attribution – Track which IPs and patterns target you
Cost-free defense – Make yourself a less attractive target

For Attackers (Yes, You Reading This)

If you’re running automated scanners, here’s the problem: you can’t tell the difference between my honeypot and a real misconfiguration. Your tools now think my domains are vulnerable when they’re not. Your databases are polluted with false positives. Your automated exploitation chains waste resources on dead ends.

This is the asymmetric advantage of honeypots – they cost me almost nothing to run, but they cost you time, bandwidth, and false confidence.

The Repeated Offenders

Some IPs came back multiple times over the 4.5 day period, hitting the same endpoints repeatedly. This suggests:

Their automated tools marked my sites as “vulnerable”
They’re running regular re-scans
They haven’t figured out they’re hitting a honeypot

Success.

Lessons Learned

What Works Well

✅ Returning 200 OK for “sensitive” files (perfect bait)
✅ Realistic file structure and naming
✅ Pseudorandom file sizes up to 1MB (wastes bandwidth and analysis time)
✅ Non-compressible data (forces full downloads, no shortcuts)
✅ Multiple domains amplifying the effect
✅ Consistent responses that look like real misconfigurations

Potential Improvements

To waste even MORE attacker time, I could:

Variability in response timing – Add artificial delays to waste more network time
Nested archives – Zips within zips within tar.gz files
Fake database responses – Return realistic-looking database dumps
Embedded fake credentials – AWS keys, database passwords that look valid but aren’t
Larger file sizes for “full backups” – Push certain files to 5-10MB range

The Ethics Question

“Is it ethical to waste attackers’ time?”

These are automated, indiscriminate attacks against infrastructure I own. The attackers are:

Attempting unauthorized access
Searching for exposed credentials
Planning to exploit any vulnerabilities found

Running a honeypot is passive defense – I’m not attacking back, just making myself look vulnerable while being completely secure. If that wastes their time, that’s the point.

Conclusion

In 4.5 days, my honeypot received over 600 attack attempts, served hundreds of megabytes of pseudorandom fake data, and wasted hours of attacker time and significant bandwidth. And this is just a small-scale experiment.

The beauty of pseudorandom data is that it’s:

Incompressible – They download every byte
Unpredictable – Automated tools can’t easily identify it as fake
Realistic-looking – File sizes match what real backups would be
Storage-expensive – Takes up space in their attack databases

The internet is constantly being scanned by automated tools looking for low-hanging fruit. By serving fake fruit filled with random garbage, we can:

Waste their bandwidth and storage
Burn their processing time
Learn their techniques
Make ourselves less attractive targets
Have a bit of fun at their expense

If you’re running web infrastructure, consider setting up a honeypot. It’s surprisingly easy, costs almost nothing, and provides valuable intelligence about who’s targeting you and what they’re looking for.

Plus, there’s something deeply satisfying about knowing that somewhere, an attacker’s script is trying to extract credentials from a megabyte of pseudorandom noise.

Technical Details

Interested in the raw data? Here’s a breakdown:

File generation: Pseudorandom data up to 1MB per file
Why pseudorandom? It doesn’t compress well, forcing attackers to download and store full file sizes
Status codes: Mix of 200 (bait taken), 404 (realistic), 302 (some redirects), 502 (service unavailable)
Attack sources: Primarily cloud provider IPs (AWS, DigitalOcean, Alibaba Cloud)
Geographic distribution: Worldwide
Peak attack time: Fairly consistent 24/7 automated scanning
Bandwidth wasted: 10-20MB per comprehensive scan session
Most common user-agent: (They don’t identify themselves – shocker!)

Most aggressive single domain: ciso.li with 200+ requests, including systematic backup file enumeration and configuration hunting. Each aggressive scan downloads 15-30MB of fake data.

#cybersecurity #honeypot #infosec #threatintel #blueteam #defence