Duster v3.0 - Directory Brute-forcer

Duster v3.0: From Buggy Script to Professional Tool

The honest story of building a directory brute-forcer, fixing race conditions with AI help, and learning what "production-ready" really means

The Beginning: Vibe-Coding at 2 AM

I started building Duster during a CTF challenge. Got shell access to a box, needed to find hidden directories, but couldn't install gobuster or any other tool. So I did what any sleep-deprived hacker would do - opened vim and started writing Bash.

The first version (v1.0) was... functional. It worked. Kinda. It found directories, counted some results, and crashed occasionally. But it got me through that CTF, and I thought "hey, this is pretty cool."

Then I actually looked at the code a week later. And realized I had built something that was fundamentally broken in ways I didn't even understand yet.

What Was Actually Wrong (The Honest List)

Let me be brutally honest about the issues in v1.0. This isn't to bash my past self - it's to show what happens when you learn by building instead of reading documentation first.

1. Race Conditions Everywhere

The biggest problem? Multiple background workers were incrementing the same counters simultaneously:

The broken counter code (v1.0)
declare -i found_count=0

function worker(){
  # ... check URL ...
  if [ "$code" = "200" ]; then
    ((found_count++))  # Multiple processes doing this at once!
  fi
}

Here's what was happening: Worker A reads found_count (value: 5), Worker B reads found_count (also 5), Worker A writes 6, Worker B writes 6. We just lost a count. Multiply this by hundreds of concurrent workers and suddenly your "45 directories found" is actually 87.

2. File Writing Chaos

All workers were writing to the same output file with simple append:

Concurrent file writes (v1.0)
echo "[200 FOUND] $url" >> "$output_file"

Results? Lines got interleaved, some writes were lost, and the output file was occasionally corrupted. Professional? Not even close.

3. No Wildcard Detection

Many websites return 200 for every path you try. Test /admin? 200. Test /completely-random-nonexistent-path-12345? Also 200. My v1.0 script would happily report 10,000 "found" directories when maybe 5 were real.

4. Terrible Thread Management

Every loop iteration was spawning jobs and wc processes:

Inefficient thread checking (v1.0)
while [ "$(jobs -rp | wc -l)" -ge "$THREADS" ]; do
  sleep 0.05
done

This works, but it's horribly inefficient. For a 10,000-line wordlist, I was spawning 20,000+ extra processes just to count background jobs. CPU usage was through the roof.

5. Other Issues

  • No URL encoding (special characters broke everything)
  • No resume capability (crash = start over)
  • No rate limiting (easy to get banned)
  • No error tracking (network issues went unnoticed)
  • Assumed dependencies existed (no validation)

Fixing It: The AI-Assisted Rewrite

I knew the tool had problems but didn't fully understand concurrency issues in Bash. So I did what any modern developer does - I brought it to Claude AI and said "help me make this not terrible."

What followed was basically a pair-programming session where Claude identified the issues, explained why they were problems, and helped me implement proper fixes. This is the honest story of what we changed.

Fix #1: Atomic Operations with File-Based Locking

Can't use flock because I wanted maximum portability. So we used directory creation as an atomic operation:

Atomic counter increment (v3.0)
function increment_counter(){
    local counter_file="$COUNTER_DIR/$1"
    local lockfile="${counter_file}.lock"
    
    # Wait for lock (directory creation is atomic on all filesystems)
    while ! mkdir "$lockfile" 2>/dev/null; do
        sleep 0.001
    done
    
    # Critical section - safe to modify now
    local current=$(cat "$counter_file")
    echo $((current + 1)) > "$counter_file"
    
    # Release lock
    rmdir "$lockfile"
}

Why does this work? Because mkdir is atomic - only one process can successfully create a directory at a time. Others get an error and loop until they can acquire the lock.

Why not flock? I wanted the tool to work everywhere - old systems, minimal containers, BSD variants. Directory operations are atomic on every POSIX filesystem.

Fix #2: Atomic File Writes

Same concept for writing to the output file:

Safe file writing (v3.0)
function atomic_write(){
    local content="$1"
    local file="$2"
    local lockfile="${file}.lock"
    
    # Acquire lock
    while ! mkdir "$lockfile" 2>/dev/null; do
        sleep 0.001
    done
    
    # Write safely
    echo "$content" >> "$file"
    
    # Release lock
    rmdir "$lockfile"
}

Now multiple workers can write without corrupting the file or losing data.

Fix #3: Wildcard Detection

Before scanning, test random paths that definitely don't exist:

Wildcard detection logic
function detect_wildcards(){
    local test_paths=(
        "duster_nonexist_$(date +%s)_test1"
        "duster_random_path_$(date +%N)_test2"
        "duster_fake_directory_999_test3"
        "duster_notfound_path_xyz_test4"
        "duster_invalid_endpoint_abc_test5"
    )
    
    local wildcard_count=0
    local sizes_200=()
    
    for test_path in "${test_paths[@]}"; do
        full_url=$(mkurl "$test_path")
        response=$(curl -o /dev/null --silent -Iw "%{http_code}|%{size_download}" \
            --max-time "$TIMEOUT" -A "$USER_AGENT" "$full_url" 2>/dev/null)
        
        IFS='|' read -r code size <<< "$response"
        
        if [ "$code" = "200" ]; then
            ((wildcard_count++))
            sizes_200+=("$size")
            WILDCARD_SIZES+=("$size")
        fi
    done
    
    # If 3+ random paths return 200, likely wildcard
    if [ $wildcard_count -ge 3 ]; then
        WILDCARD_DETECTED=true
        echo "Wildcard responses detected!"
        echo "Common wildcard sizes: ${sizes_200[*]}"
        # Ask user if they want to continue...
    fi
}

During the scan, we check if responses match the wildcard pattern (same size ±5%) and filter them out.

Fix #4: URL Encoding

Pure Bash implementation that handles special characters:

URL encoding function
function urlencode(){
    local string="$1"
    local strlen=${#string}
    local encoded=""
    local pos c o
    
    for (( pos=0 ; pos

Now paths with spaces, ampersands, question marks, etc. work correctly.

Fix #5: Rate Limiting

Simple but effective rate limiter:

Rate limiting check
function rate_limit_check(){
    if [ "$RATE_LIMIT" -gt 0 ]; then
        current_total=$(get_counter "total")
        elapsed=$(($(date +%s) - START_TIME))
        
        if [ $elapsed -gt 0 ]; then
            current_rate=$((current_total / elapsed))
            
            if [ $current_rate -ge $RATE_LIMIT ]; then
                sleep 0.1  # Throttle
            fi
        fi
    fi
}

Not perfect (no token bucket algorithm) but good enough to avoid getting banned.

Fix #6: Resume Capability

Every processed path gets written to a state file:

Resume support
# In worker function
if [ -n "$PROCESSED_PATHS" ]; then
    if grep -Fxq "$directory" "$PROCESSED_PATHS" 2>/dev/null; then
        return 0  # Already processed, skip
    fi
fi

# ... process path ...

# Mark as processed
if [ -n "$PROGRESS_FILE" ]; then
    echo "$directory" >> "$PROGRESS_FILE"
fi

Now you can resume with --resume progress_file.state

Fix #7: Progress Tracking

Real-time progress with stats and ETA:

Progress display
function update_progress(){
    current=$(get_counter "total")
    found=$(get_counter "found")
    forbidden=$(get_counter "forbidden")
    
    if [ -n "$TOTAL_LINES" ]; then
        percent=$((current * 100 / TOTAL_LINES))
        elapsed=$(($(date +%s) - START_TIME))
        
        if [ $elapsed -gt 0 ]; then
            req_per_sec=$((current / elapsed))
            remaining=$((TOTAL_LINES - current))
            eta=$((remaining / req_per_sec))
            
            printf "\r[%3d%%] %d/%d | Found: %d | 403: %d | Speed: %d req/s | ETA: %ds" \
                "$percent" "$current" "$TOTAL_LINES" "$found" "$forbidden" "$req_per_sec" "$eta"
        fi
    fi
}

Output looks like: [45%] 4500/10000 | Found: 12 | 403: 34 | Speed: 89 req/s | ETA: 62s

New Professional Features

Beyond fixing bugs, we added features that make this actually useful:

Size-Based Filtering

./duster.sh -u https://target.com --filter-size 1024,2048
./duster.sh -u https://target.com --hide-size 1000-2000
./duster.sh -u https://target.com --show-size 100-5000

Perfect for filtering out wildcard responses when they have predictable sizes.

JSON Output

./duster.sh -u https://target.com -o json

Machine-readable output for parsing with other tools or scripting pipelines.

Better Error Handling

  • Dependency checking (curl, bc)
  • Connection testing before scan starts
  • Error counter tracks failed requests
  • Graceful cleanup on Ctrl+C

Comprehensive Statistics

Scan summary output
=====================================
        Scan Complete!
=====================================
[+] Found (200): 12
[+] Forbidden (403): 34
[+] Redirects: 8
[+] Errors: 2
[+] Total Checked: 10000
[+] Time Elapsed: 112s
[+] Average Speed: 89 req/s
[+] Results: output/target.com/scan_20250203_123456.txt
[+] Resume File: output/target.com/progress_20250203_123456.state

How to Use Duster v3.0

Installation

Clone and setup
git clone https://github.com/ShieldedDev/Duster
cd Duster
chmod +x duster.sh

# Install dependencies (if needed)
sudo apt install curl bc  # Debian/Ubuntu
sudo pacman -S curl bc    # Arch
sudo dnf install curl bc  # Fedora

Basic Usage

./duster.sh -u https://example.com

CTF Mode (My Go-To)

./duster.sh -u http://target.ctf.com -x -t 30 -r 80 --filter-size 1024 -s

Translation: Check extensions, 30 threads, 80 req/s rate limit, filter 1024-byte responses, show only 200s.

Resume Interrupted Scan

./duster.sh -u https://example.com --resume output/example.com/progress_*.state

All Options

Complete flag reference
Required:
  -u          Target URL

Optional:
  -w     Path to wordlist
  -t      Number of threads (default: 20)
  -T      Request timeout (default: 10)
  -e         Check extensions (comma-separated)
  -x               Enable default extensions
  -f               Follow redirects
  -a        Custom User-Agent
  -s               Show only 200s
  -v               Verbose mode

Advanced:
  -r         Rate limit (req/s)
  -o       Output: text or json
  --filter-size    Filter specific sizes
  --hide-size      Hide size range
  --show-size      Show size range
  --no-wildcard    Skip wildcard detection
  --resume   Resume from state file

What It Still Can't Do (And Why)

Let me be honest about limitations. This is a Bash script, not a compiled Go binary.

Performance on Huge Wordlists

  • Under 100k lines: Works fine
  • 100k-500k lines: Slower but usable
  • Over 1M lines: Painful

Why? Every background worker is a full Bash process. File operations for atomic counters add overhead. Interpreted vs compiled. gobuster is legitimately 5-10x faster.

Wildcard Detection Isn't Perfect

Our method (size comparison with variance) catches most wildcards but fails on:

  • Dynamic pages with changing sizes
  • Multiple wildcard templates
  • Content-based wildcards (same size, different content)

Professional tools use response body hashing, word count analysis, Levenshtein distance, even ML models. That's... not happening in Bash.

No Recursive Scanning

Could be added but would make the code way more complex. Current workaround:

# Find /admin/ ./duster.sh -u https://example.com # Manually scan inside it ./duster.sh -u https://example.com/admin

Rate Limiting Is Approximate

We check total requests vs elapsed time. It's not precise - no token bucket algorithm, no rolling windows. But it's good enough to avoid bans.

Resume Uses Grep (Slow on Huge Scans)

Checking if a path was processed requires grepping the state file. For 500k-line wordlists, this gets slow. A proper implementation would use a hash table or database.

Bottom line: For CTFs, practice labs, and medium-sized scans, Duster works great. For professional bug bounty or million-line wordlists, use gobuster or ffuf.

Duster vs Professional Tools

Here's the honest comparison:

Speed

  • Duster: ~50-150 req/s (depending on threads and target)
  • gobuster: ~500-2000 req/s
  • ffuf: ~1000-5000 req/s
  • feroxbuster: ~800-3000 req/s

When Duster Wins

  • Portability: Just needs Bash and curl
  • Modifiability: Edit and run, no compilation
  • Learning: Readable code, understand how it works
  • Restricted environments: Minimal dependencies

When Duster Loses

  • Speed: 5-10x slower than compiled tools
  • Features: No recursion, basic wildcard detection
  • Scale: Struggles with million-line wordlists
  • Accuracy: Simpler wildcard detection

My recommendation: Use Duster for learning, CTFs, and quick tests. Use gobuster/ffuf for professional work and large-scale scanning.

What Building This Taught Me

This project was a masterclass in things I didn't know I didn't know:

1. Concurrency Is Hard

I thought "just spawn background jobs and count them" was fine. Turns out race conditions are subtle and breaking things in ways I couldn't see without careful testing.

2. Edge Cases Are Everywhere

URLs with special characters, servers that redirect everything, wildcard responses, timeouts, network errors. Real-world systems are messy in ways localhost testing doesn't reveal.

3. "Works for Me" ≠ Production Ready

v1.0 worked for my use cases because I was testing against well-behaved servers with small wordlists. Scale it up, hit a tricky server, and everything breaks.

4. AI as a Learning Tool

Claude didn't just write code for me - it explained WHY things were problems and HOW the fixes work. That's way more valuable than getting working code.

5. Simple Can Still Be Professional

You don't need 10,000 lines of code to build something production-ready. You need proper error handling, atomic operations, and awareness of edge cases. v3.0 is still under 500 lines but handles things correctly.

Final Thoughts

Is Duster v3.0 the best directory scanner? No. Is it the fastest? Definitely not. Will it replace gobuster? Not a chance.

But it's:

  • Portable - Works everywhere with minimal dependencies
  • Understandable - You can read and comprehend the entire codebase
  • Modifiable - Want a feature? Add it in 10 minutes
  • Educational - Learn how these tools work by reading real code
  • Honest - No false claims about being enterprise-grade

The journey from buggy v1.0 to professional v3.0 taught me more about Bash, HTTP, concurrency, and tool development than any tutorial could. And that's the real value.

For beginners: Build your own version. Even if it's buggy at first. You'll learn way more from fixing your own bugs than from using perfect tools written by others.

Get Duster v3.0

The complete source code is available on GitHub:

Clone and start using
git clone https://github.com/ShieldedDev/Duster
cd Duster
chmod +x duster.sh
./duster.sh -h

Read the code. Modify it. Break it. Fix it. That's how you learn.

  • Star the repo if you find it useful
  • 🐛 Open issues for bugs you find
  • 🔧 Submit PRs for improvements
  • 📚 Use it to learn, not for production scanning

Legal Notice: This tool is for authorized testing only. Never use it against systems you don't own or have explicit written permission to test. Unauthorized access is illegal.

Built with curiosity, fixed with AI, documented with honesty.

Find me on GitHub and Twitter

Questions? Always happy to discuss tool development and pentesting.

Comments