Duster v3.0: From Buggy Script to Professional Tool
The honest story of building a directory brute-forcer, fixing race conditions with AI help, and learning what "production-ready" really means
The Beginning: Vibe-Coding at 2 AM
I started building Duster during a CTF challenge. Got shell access to a box, needed to find hidden directories, but couldn't install gobuster or any other tool. So I did what any sleep-deprived hacker would do - opened vim and started writing Bash.
The first version (v1.0) was... functional. It worked. Kinda. It found directories, counted some results, and crashed occasionally. But it got me through that CTF, and I thought "hey, this is pretty cool."
Then I actually looked at the code a week later. And realized I had built something that was fundamentally broken in ways I didn't even understand yet.
What Was Actually Wrong (The Honest List)
Let me be brutally honest about the issues in v1.0. This isn't to bash my past self - it's to show what happens when you learn by building instead of reading documentation first.
1. Race Conditions Everywhere
The biggest problem? Multiple background workers were incrementing the same counters simultaneously:
declare -i found_count=0
function worker(){
# ... check URL ...
if [ "$code" = "200" ]; then
((found_count++)) # Multiple processes doing this at once!
fi
}
Here's what was happening: Worker A reads found_count (value: 5), Worker B reads found_count (also 5), Worker A writes 6, Worker B writes 6. We just lost a count. Multiply this by hundreds of concurrent workers and suddenly your "45 directories found" is actually 87.
2. File Writing Chaos
All workers were writing to the same output file with simple append:
echo "[200 FOUND] $url" >> "$output_file"
Results? Lines got interleaved, some writes were lost, and the output file was occasionally corrupted. Professional? Not even close.
3. No Wildcard Detection
Many websites return 200 for every path you try. Test /admin? 200. Test /completely-random-nonexistent-path-12345? Also 200. My v1.0 script would happily report 10,000 "found" directories when maybe 5 were real.
4. Terrible Thread Management
Every loop iteration was spawning jobs and wc processes:
while [ "$(jobs -rp | wc -l)" -ge "$THREADS" ]; do
sleep 0.05
done
This works, but it's horribly inefficient. For a 10,000-line wordlist, I was spawning 20,000+ extra processes just to count background jobs. CPU usage was through the roof.
5. Other Issues
- No URL encoding (special characters broke everything)
- No resume capability (crash = start over)
- No rate limiting (easy to get banned)
- No error tracking (network issues went unnoticed)
- Assumed dependencies existed (no validation)
Fixing It: The AI-Assisted Rewrite
I knew the tool had problems but didn't fully understand concurrency issues in Bash. So I did what any modern developer does - I brought it to Claude AI and said "help me make this not terrible."
What followed was basically a pair-programming session where Claude identified the issues, explained why they were problems, and helped me implement proper fixes. This is the honest story of what we changed.
Fix #1: Atomic Operations with File-Based Locking
Can't use flock because I wanted maximum portability. So we used directory creation as an atomic operation:
function increment_counter(){
local counter_file="$COUNTER_DIR/$1"
local lockfile="${counter_file}.lock"
# Wait for lock (directory creation is atomic on all filesystems)
while ! mkdir "$lockfile" 2>/dev/null; do
sleep 0.001
done
# Critical section - safe to modify now
local current=$(cat "$counter_file")
echo $((current + 1)) > "$counter_file"
# Release lock
rmdir "$lockfile"
}
Why does this work? Because mkdir is atomic - only one process can successfully create a directory at a time. Others get an error and loop until they can acquire the lock.
Why not flock? I wanted the tool to work everywhere - old systems, minimal containers, BSD variants. Directory operations are atomic on every POSIX filesystem.
Fix #2: Atomic File Writes
Same concept for writing to the output file:
function atomic_write(){
local content="$1"
local file="$2"
local lockfile="${file}.lock"
# Acquire lock
while ! mkdir "$lockfile" 2>/dev/null; do
sleep 0.001
done
# Write safely
echo "$content" >> "$file"
# Release lock
rmdir "$lockfile"
}
Now multiple workers can write without corrupting the file or losing data.
Fix #3: Wildcard Detection
Before scanning, test random paths that definitely don't exist:
function detect_wildcards(){
local test_paths=(
"duster_nonexist_$(date +%s)_test1"
"duster_random_path_$(date +%N)_test2"
"duster_fake_directory_999_test3"
"duster_notfound_path_xyz_test4"
"duster_invalid_endpoint_abc_test5"
)
local wildcard_count=0
local sizes_200=()
for test_path in "${test_paths[@]}"; do
full_url=$(mkurl "$test_path")
response=$(curl -o /dev/null --silent -Iw "%{http_code}|%{size_download}" \
--max-time "$TIMEOUT" -A "$USER_AGENT" "$full_url" 2>/dev/null)
IFS='|' read -r code size <<< "$response"
if [ "$code" = "200" ]; then
((wildcard_count++))
sizes_200+=("$size")
WILDCARD_SIZES+=("$size")
fi
done
# If 3+ random paths return 200, likely wildcard
if [ $wildcard_count -ge 3 ]; then
WILDCARD_DETECTED=true
echo "Wildcard responses detected!"
echo "Common wildcard sizes: ${sizes_200[*]}"
# Ask user if they want to continue...
fi
}
During the scan, we check if responses match the wildcard pattern (same size ±5%) and filter them out.
Fix #4: URL Encoding
Pure Bash implementation that handles special characters:
function urlencode(){
local string="$1"
local strlen=${#string}
local encoded=""
local pos c o
for (( pos=0 ; pos
Now paths with spaces, ampersands, question marks, etc. work correctly.
Fix #5: Rate Limiting
Simple but effective rate limiter:
function rate_limit_check(){
if [ "$RATE_LIMIT" -gt 0 ]; then
current_total=$(get_counter "total")
elapsed=$(($(date +%s) - START_TIME))
if [ $elapsed -gt 0 ]; then
current_rate=$((current_total / elapsed))
if [ $current_rate -ge $RATE_LIMIT ]; then
sleep 0.1 # Throttle
fi
fi
fi
}
Not perfect (no token bucket algorithm) but good enough to avoid getting banned.
Fix #6: Resume Capability
Every processed path gets written to a state file:
# In worker function
if [ -n "$PROCESSED_PATHS" ]; then
if grep -Fxq "$directory" "$PROCESSED_PATHS" 2>/dev/null; then
return 0 # Already processed, skip
fi
fi
# ... process path ...
# Mark as processed
if [ -n "$PROGRESS_FILE" ]; then
echo "$directory" >> "$PROGRESS_FILE"
fi
Now you can resume with --resume progress_file.state
Fix #7: Progress Tracking
Real-time progress with stats and ETA:
function update_progress(){
current=$(get_counter "total")
found=$(get_counter "found")
forbidden=$(get_counter "forbidden")
if [ -n "$TOTAL_LINES" ]; then
percent=$((current * 100 / TOTAL_LINES))
elapsed=$(($(date +%s) - START_TIME))
if [ $elapsed -gt 0 ]; then
req_per_sec=$((current / elapsed))
remaining=$((TOTAL_LINES - current))
eta=$((remaining / req_per_sec))
printf "\r[%3d%%] %d/%d | Found: %d | 403: %d | Speed: %d req/s | ETA: %ds" \
"$percent" "$current" "$TOTAL_LINES" "$found" "$forbidden" "$req_per_sec" "$eta"
fi
fi
}
Output looks like: [45%] 4500/10000 | Found: 12 | 403: 34 | Speed: 89 req/s | ETA: 62s
New Professional Features
Beyond fixing bugs, we added features that make this actually useful:
Size-Based Filtering
Perfect for filtering out wildcard responses when they have predictable sizes.
JSON Output
Machine-readable output for parsing with other tools or scripting pipelines.
Better Error Handling
- Dependency checking (curl, bc)
- Connection testing before scan starts
- Error counter tracks failed requests
- Graceful cleanup on Ctrl+C
Comprehensive Statistics
=====================================
Scan Complete!
=====================================
[+] Found (200): 12
[+] Forbidden (403): 34
[+] Redirects: 8
[+] Errors: 2
[+] Total Checked: 10000
[+] Time Elapsed: 112s
[+] Average Speed: 89 req/s
[+] Results: output/target.com/scan_20250203_123456.txt
[+] Resume File: output/target.com/progress_20250203_123456.state
How to Use Duster v3.0
Installation
git clone https://github.com/ShieldedDev/Duster
cd Duster
chmod +x duster.sh
# Install dependencies (if needed)
sudo apt install curl bc # Debian/Ubuntu
sudo pacman -S curl bc # Arch
sudo dnf install curl bc # Fedora
Basic Usage
CTF Mode (My Go-To)
Translation: Check extensions, 30 threads, 80 req/s rate limit, filter 1024-byte responses, show only 200s.
Resume Interrupted Scan
All Options
Required:
-u Target URL
Optional:
-w Path to wordlist
-t Number of threads (default: 20)
-T Request timeout (default: 10)
-e Check extensions (comma-separated)
-x Enable default extensions
-f Follow redirects
-a Custom User-Agent
-s Show only 200s
-v Verbose mode
Advanced:
-r Rate limit (req/s)
-o Output: text or json
--filter-size Filter specific sizes
--hide-size Hide size range
--show-size Show size range
--no-wildcard Skip wildcard detection
--resume Resume from state file
What It Still Can't Do (And Why)
Let me be honest about limitations. This is a Bash script, not a compiled Go binary.
Performance on Huge Wordlists
- Under 100k lines: Works fine
- 100k-500k lines: Slower but usable
- Over 1M lines: Painful
Why? Every background worker is a full Bash process. File operations for atomic counters add overhead. Interpreted vs compiled. gobuster is legitimately 5-10x faster.
Wildcard Detection Isn't Perfect
Our method (size comparison with variance) catches most wildcards but fails on:
- Dynamic pages with changing sizes
- Multiple wildcard templates
- Content-based wildcards (same size, different content)
Professional tools use response body hashing, word count analysis, Levenshtein distance, even ML models. That's... not happening in Bash.
No Recursive Scanning
Could be added but would make the code way more complex. Current workaround:
Rate Limiting Is Approximate
We check total requests vs elapsed time. It's not precise - no token bucket algorithm, no rolling windows. But it's good enough to avoid bans.
Resume Uses Grep (Slow on Huge Scans)
Checking if a path was processed requires grepping the state file. For 500k-line wordlists, this gets slow. A proper implementation would use a hash table or database.
Bottom line: For CTFs, practice labs, and medium-sized scans, Duster works great. For professional bug bounty or million-line wordlists, use gobuster or ffuf.
Duster vs Professional Tools
Here's the honest comparison:
Speed
- Duster: ~50-150 req/s (depending on threads and target)
- gobuster: ~500-2000 req/s
- ffuf: ~1000-5000 req/s
- feroxbuster: ~800-3000 req/s
When Duster Wins
- Portability: Just needs Bash and curl
- Modifiability: Edit and run, no compilation
- Learning: Readable code, understand how it works
- Restricted environments: Minimal dependencies
When Duster Loses
- Speed: 5-10x slower than compiled tools
- Features: No recursion, basic wildcard detection
- Scale: Struggles with million-line wordlists
- Accuracy: Simpler wildcard detection
My recommendation: Use Duster for learning, CTFs, and quick tests. Use gobuster/ffuf for professional work and large-scale scanning.
What Building This Taught Me
This project was a masterclass in things I didn't know I didn't know:
1. Concurrency Is Hard
I thought "just spawn background jobs and count them" was fine. Turns out race conditions are subtle and breaking things in ways I couldn't see without careful testing.
2. Edge Cases Are Everywhere
URLs with special characters, servers that redirect everything, wildcard responses, timeouts, network errors. Real-world systems are messy in ways localhost testing doesn't reveal.
3. "Works for Me" ≠ Production Ready
v1.0 worked for my use cases because I was testing against well-behaved servers with small wordlists. Scale it up, hit a tricky server, and everything breaks.
4. AI as a Learning Tool
Claude didn't just write code for me - it explained WHY things were problems and HOW the fixes work. That's way more valuable than getting working code.
5. Simple Can Still Be Professional
You don't need 10,000 lines of code to build something production-ready. You need proper error handling, atomic operations, and awareness of edge cases. v3.0 is still under 500 lines but handles things correctly.
Final Thoughts
Is Duster v3.0 the best directory scanner? No. Is it the fastest? Definitely not. Will it replace gobuster? Not a chance.
But it's:
- Portable - Works everywhere with minimal dependencies
- Understandable - You can read and comprehend the entire codebase
- Modifiable - Want a feature? Add it in 10 minutes
- Educational - Learn how these tools work by reading real code
- Honest - No false claims about being enterprise-grade
The journey from buggy v1.0 to professional v3.0 taught me more about Bash, HTTP, concurrency, and tool development than any tutorial could. And that's the real value.
For beginners: Build your own version. Even if it's buggy at first. You'll learn way more from fixing your own bugs than from using perfect tools written by others.
Get Duster v3.0
The complete source code is available on GitHub:
git clone https://github.com/ShieldedDev/Duster
cd Duster
chmod +x duster.sh
./duster.sh -h
Read the code. Modify it. Break it. Fix it. That's how you learn.
- ⭐ Star the repo if you find it useful
- 🐛 Open issues for bugs you find
- 🔧 Submit PRs for improvements
- 📚 Use it to learn, not for production scanning
Legal Notice: This tool is for authorized testing only. Never use it against systems you don't own or have explicit written permission to test. Unauthorized access is illegal.
Comments
Post a Comment