Duster v3.0: From Buggy Script to Professional Tool

The honest story of building a directory brute-forcer, fixing race conditions with AI help, and learning what "production-ready" really means

The Beginning: Vibe-Coding at 2 AM

I started building Duster during a CTF challenge. Got shell access to a box, needed to find hidden directories, but couldn't install gobuster or any other tool. So I did what any sleep-deprived hacker would do - opened vim and started writing Bash.

The first version (v1.0) was... functional. It worked. Kinda. It found directories, counted some results, and crashed occasionally. But it got me through that CTF, and I thought "hey, this is pretty cool."

Then I actually looked at the code a week later. And realized I had built something that was fundamentally broken in ways I didn't even understand yet.

What Was Actually Wrong (The Honest List)

Let me be brutally honest about the issues in v1.0. This isn't to bash my past self - it's to show what happens when you learn by building instead of reading documentation first.

1. Race Conditions Everywhere

The biggest problem? Multiple background workers were incrementing the same counters simultaneously:

The broken counter code (v1.0)

declare -i found_count=0

function worker(){
  # ... check URL ...
  if [ "$code" = "200" ]; then
    ((found_count++))  # Multiple processes doing this at once!
  fi
}

Here's what was happening: Worker A reads found_count (value: 5), Worker B reads found_count (also 5), Worker A writes 6, Worker B writes 6. We just lost a count. Multiply this by hundreds of concurrent workers and suddenly your "45 directories found" is actually 87.

2. File Writing Chaos

All workers were writing to the same output file with simple append:

Concurrent file writes (v1.0)

echo "[200 FOUND] $url" >> "$output_file"

Results? Lines got interleaved, some writes were lost, and the output file was occasionally corrupted. Professional? Not even close.

3. No Wildcard Detection

Many websites return 200 for every path you try. Test /admin? 200. Test /completely-random-nonexistent-path-12345? Also 200. My v1.0 script would happily report 10,000 "found" directories when maybe 5 were real.

4. Terrible Thread Management

Every loop iteration was spawning jobs and wc processes:

Inefficient thread checking (v1.0)

while [ "$(jobs -rp | wc -l)" -ge "$THREADS" ]; do
  sleep 0.05
done

This works, but it's horribly inefficient. For a 10,000-line wordlist, I was spawning 20,000+ extra processes just to count background jobs. CPU usage was through the roof.

5. Other Issues

No URL encoding (special characters broke everything)
No resume capability (crash = start over)
No rate limiting (easy to get banned)
No error tracking (network issues went unnoticed)
Assumed dependencies existed (no validation)

Fixing It: The AI-Assisted Rewrite

I knew the tool had problems but didn't fully understand concurrency issues in Bash. So I did what any modern developer does - I brought it to Claude AI and said "help me make this not terrible."

What followed was basically a pair-programming session where Claude identified the issues, explained why they were problems, and helped me implement proper fixes. This is the honest story of what we changed.

Fix #1: Atomic Operations with File-Based Locking

Can't use flock because I wanted maximum portability. So we used directory creation as an atomic operation:

Atomic counter increment (v3.0)

function increment_counter(){
    local counter_file="$COUNTER_DIR/$1"
    local lockfile="${counter_file}.lock"
    
    # Wait for lock (directory creation is atomic on all filesystems)
    while ! mkdir "$lockfile" 2>/dev/null; do
        sleep 0.001
    done
    
    # Critical section - safe to modify now
    local current=$(cat "$counter_file")
    echo $((current + 1)) > "$counter_file"
    
    # Release lock
    rmdir "$lockfile"
}

Why does this work? Because mkdir is atomic - only one process can successfully create a directory at a time. Others get an error and loop until they can acquire the lock.

Why not flock? I wanted the tool to work everywhere - old systems, minimal containers, BSD variants. Directory operations are atomic on every POSIX filesystem.

Fix #2: Atomic File Writes

Same concept for writing to the output file:

Safe file writing (v3.0)

function atomic_write(){
    local content="$1"
    local file="$2"
    local lockfile="${file}.lock"
    
    # Acquire lock
    while ! mkdir "$lockfile" 2>/dev/null; do
        sleep 0.001
    done
    
    # Write safely
    echo "$content" >> "$file"
    
    # Release lock
    rmdir "$lockfile"
}

Now multiple workers can write without corrupting the file or losing data.

Fix #3: Wildcard Detection

Before scanning, test random paths that definitely don't exist:

Wildcard detection logic

function detect_wildcards(){
    local test_paths=(
        "duster_nonexist_$(date +%s)_test1"
        "duster_random_path_$(date +%N)_test2"
        "duster_fake_directory_999_test3"
        "duster_notfound_path_xyz_test4"
        "duster_invalid_endpoint_abc_test5"
    )
    
    local wildcard_count=0
    local sizes_200=()
    
    for test_path in "${test_paths[@]}"; do
        full_url=$(mkurl "$test_path")
        response=$(curl -o /dev/null --silent -Iw "%{http_code}|%{size_download}" \
            --max-time "$TIMEOUT" -A "$USER_AGENT" "$full_url" 2>/dev/null)
        
        IFS='|' read -r code size <<< "$response"
        
        if [ "$code" = "200" ]; then
            ((wildcard_count++))
            sizes_200+=("$size")
            WILDCARD_SIZES+=("$size")
        fi
    done
    
    # If 3+ random paths return 200, likely wildcard
    if [ $wildcard_count -ge 3 ]; then
        WILDCARD_DETECTED=true
        echo "Wildcard responses detected!"
        echo "Common wildcard sizes: ${sizes_200[*]}"
        # Ask user if they want to continue...
    fi
}

During the scan, we check if responses match the wildcard pattern (same size ±5%) and filter them out.

Fix #4: URL Encoding

Pure Bash implementation that handles special characters:

URL encoding function

function urlencode(){
    local string="$1"
    local strlen=${#string}
    local encoded=""
    local pos c o
    
    for (( pos=0 ; pos



    
      Now paths with spaces, ampersands, question marks, etc. work correctly.
    

    Fix #5: Rate Limiting
    
      Simple but effective rate limiter:
    

    
      
        Rate limiting check
        
      
      function rate_limit_check(){
    if [ "$RATE_LIMIT" -gt 0 ]; then
        current_total=$(get_counter "total")
        elapsed=$(($(date +%s) - START_TIME))
        
        if [ $elapsed -gt 0 ]; then
            current_rate=$((current_total / elapsed))
            
            if [ $current_rate -ge $RATE_LIMIT ]; then
                sleep 0.1  # Throttle
            fi
        fi
    fi
}
    

    
      Not perfect (no token bucket algorithm) but good enough to avoid getting banned.
    

    Fix #6: Resume Capability
    
      Every processed path gets written to a state file:
    

    
      
        Resume support
        
      
      # In worker function
if [ -n "$PROCESSED_PATHS" ]; then
    if grep -Fxq "$directory" "$PROCESSED_PATHS" 2>/dev/null; then
        return 0  # Already processed, skip
    fi
fi

# ... process path ...

# Mark as processed
if [ -n "$PROGRESS_FILE" ]; then
    echo "$directory" >> "$PROGRESS_FILE"
fi
    

    
      Now you can resume with --resume progress_file.state
    

    Fix #7: Progress Tracking
    
      Real-time progress with stats and ETA:
    

    
      
        Progress display
        
      
      function update_progress(){
    current=$(get_counter "total")
    found=$(get_counter "found")
    forbidden=$(get_counter "forbidden")
    
    if [ -n "$TOTAL_LINES" ]; then
        percent=$((current * 100 / TOTAL_LINES))
        elapsed=$(($(date +%s) - START_TIME))
        
        if [ $elapsed -gt 0 ]; then
            req_per_sec=$((current / elapsed))
            remaining=$((TOTAL_LINES - current))
            eta=$((remaining / req_per_sec))
            
            printf "\r[%3d%%] %d/%d | Found: %d | 403: %d | Speed: %d req/s | ETA: %ds" \
                "$percent" "$current" "$TOTAL_LINES" "$found" "$forbidden" "$req_per_sec" "$eta"
        fi
    fi
}
    

    
      Output looks like: [45%] 4500/10000 | Found: 12 | 403: 34 | Speed: 89 req/s | ETA: 62s



  
  
    New Professional Features
    
      Beyond fixing bugs, we added features that make this actually useful:
    

    Size-Based Filtering
    ./duster.sh -u https://target.com --filter-size 1024,2048
    ./duster.sh -u https://target.com --hide-size 1000-2000
    ./duster.sh -u https://target.com --show-size 100-5000

    
      Perfect for filtering out wildcard responses when they have predictable sizes.
    

    JSON Output
    ./duster.sh -u https://target.com -o json

    
      Machine-readable output for parsing with other tools or scripting pipelines.
    

    Better Error Handling
    
      Dependency checking (curl, bc)
      Connection testing before scan starts
      Error counter tracks failed requests
      Graceful cleanup on Ctrl+C
    

    Comprehensive Statistics
    
      
        Scan summary output
        
      
      =====================================
        Scan Complete!
=====================================
[+] Found (200): 12
[+] Forbidden (403): 34
[+] Redirects: 8
[+] Errors: 2
[+] Total Checked: 10000
[+] Time Elapsed: 112s
[+] Average Speed: 89 req/s
[+] Results: output/target.com/scan_20250203_123456.txt
[+] Resume File: output/target.com/progress_20250203_123456.state
    
  

  
  
    How to Use Duster v3.0

    Installation
    
      
        Clone and setup
        
      
      git clone https://github.com/ShieldedDev/Duster
cd Duster
chmod +x duster.sh

# Install dependencies (if needed)
sudo apt install curl bc  # Debian/Ubuntu
sudo pacman -S curl bc    # Arch
sudo dnf install curl bc  # Fedora
    

    Basic Usage
    ./duster.sh -u https://example.com

    CTF Mode (My Go-To)
    ./duster.sh -u http://target.ctf.com -x -t 30 -r 80 --filter-size 1024 -s

    
      Translation: Check extensions, 30 threads, 80 req/s rate limit, filter 1024-byte responses, show only 200s.
    

    Resume Interrupted Scan
    ./duster.sh -u https://example.com --resume output/example.com/progress_*.state

    All Options
    
      
        Complete flag reference
        
      
      Required:
  -u          Target URL

Optional:
  -w     Path to wordlist
  -t      Number of threads (default: 20)
  -T      Request timeout (default: 10)
  -e         Check extensions (comma-separated)
  -x               Enable default extensions
  -f               Follow redirects
  -a        Custom User-Agent
  -s               Show only 200s
  -v               Verbose mode

Advanced:
  -r         Rate limit (req/s)
  -o       Output: text or json
  --filter-size    Filter specific sizes
  --hide-size      Hide size range
  --show-size      Show size range
  --no-wildcard    Skip wildcard detection
  --resume   Resume from state file
    
  

  
  
    What It Still Can't Do (And Why)
    
      Let me be honest about limitations. This is a Bash script, not a compiled Go binary.
    

    Performance on Huge Wordlists
    
      Under 100k lines: Works fine
      100k-500k lines: Slower but usable
      Over 1M lines: Painful
    

    
      Why? Every background worker is a full Bash process. File operations for atomic counters add overhead. Interpreted vs compiled. gobuster is legitimately 5-10x faster.
    

    Wildcard Detection Isn't Perfect
    
      Our method (size comparison with variance) catches most wildcards but fails on:
    
    
      Dynamic pages with changing sizes
      Multiple wildcard templates
      Content-based wildcards (same size, different content)
    

    
      Professional tools use response body hashing, word count analysis, Levenshtein distance, even ML models. That's... not happening in Bash.
    

    No Recursive Scanning
    
      Could be added but would make the code way more complex. Current workaround:
    

    # Find /admin/
./duster.sh -u https://example.com

# Manually scan inside it
./duster.sh -u https://example.com/admin

    Rate Limiting Is Approximate
    
      We check total requests vs elapsed time. It's not precise - no token bucket algorithm, no rolling windows. But it's good enough to avoid bans.
    

    Resume Uses Grep (Slow on Huge Scans)
    
      Checking if a path was processed requires grepping the state file. For 500k-line wordlists, this gets slow. A proper implementation would use a hash table or database.
    

    
      Bottom line: For CTFs, practice labs, and medium-sized scans, Duster works great. For professional bug bounty or million-line wordlists, use gobuster or ffuf.
    
  

  
  
    Duster vs Professional Tools
    
      Here's the honest comparison:
    

    Speed
    
      Duster: ~50-150 req/s (depending on threads and target)
      gobuster: ~500-2000 req/s
      ffuf: ~1000-5000 req/s
      feroxbuster: ~800-3000 req/s
    

    When Duster Wins
    
      Portability: Just needs Bash and curl
      Modifiability: Edit and run, no compilation
      Learning: Readable code, understand how it works
      Restricted environments: Minimal dependencies
    

    When Duster Loses
    
      Speed: 5-10x slower than compiled tools
      Features: No recursion, basic wildcard detection
      Scale: Struggles with million-line wordlists
      Accuracy: Simpler wildcard detection
    

    
      My recommendation: Use Duster for learning, CTFs, and quick tests. Use gobuster/ffuf for professional work and large-scale scanning.
    
  

  
  
    What Building This Taught Me
    
      This project was a masterclass in things I didn't know I didn't know:
    

    1. Concurrency Is Hard
    
      I thought "just spawn background jobs and count them" was fine. Turns out race conditions are subtle and breaking things in ways I couldn't see without careful testing.
    

    2. Edge Cases Are Everywhere
    
      URLs with special characters, servers that redirect everything, wildcard responses, timeouts, network errors. Real-world systems are messy in ways localhost testing doesn't reveal.
    

    3. "Works for Me" ≠ Production Ready
    
      v1.0 worked for my use cases because I was testing against well-behaved servers with small wordlists. Scale it up, hit a tricky server, and everything breaks.
    

    4. AI as a Learning Tool
    
      Claude didn't just write code for me - it explained WHY things were problems and HOW the fixes work. That's way more valuable than getting working code.
    

    5. Simple Can Still Be Professional
    
      You don't need 10,000 lines of code to build something production-ready. You need proper error handling, atomic operations, and awareness of edge cases. v3.0 is still under 500 lines but handles things correctly.
    
  

  
  
    Final Thoughts
    
      Is Duster v3.0 the best directory scanner? No. Is it the fastest? Definitely not. Will it replace gobuster? Not a chance.
    
    
      But it's:
    
    
      Portable - Works everywhere with minimal dependencies
      Understandable - You can read and comprehend the entire codebase
      Modifiable - Want a feature? Add it in 10 minutes
      Educational - Learn how these tools work by reading real code
      Honest - No false claims about being enterprise-grade
    

    
      The journey from buggy v1.0 to professional v3.0 taught me more about Bash, HTTP, concurrency, and tool development than any tutorial could. And that's the real value.
    

    
      For beginners: Build your own version. Even if it's buggy at first. You'll learn way more from fixing your own bugs than from using perfect tools written by others.
    
  

  
  
    Get Duster v3.0
    
      The complete source code is available on GitHub:
    

    
      
        Clone and start using
        
      
      git clone https://github.com/ShieldedDev/Duster
cd Duster
chmod +x duster.sh
./duster.sh -h
    

    
      Read the code. Modify it. Break it. Fix it. That's how you learn.
    

    
      ⭐ Star the repo if you find it useful
      🐛 Open issues for bugs you find
      🔧 Submit PRs for improvements
      📚 Use it to learn, not for production scanning
    

    
      Legal Notice: This tool is for authorized testing only. Never use it against systems you don't own or have explicit written permission to test. Unauthorized access is illegal.
    
  

  
    Built with curiosity, fixed with AI, documented with honesty.
    Find me on GitHub and Twitter
    Questions? Always happy to discuss tool development and pentesting.

Duster v3.0 - Directory Brute-forcer

The Beginning: Vibe-Coding at 2 AM

What Was Actually Wrong (The Honest List)

1. Race Conditions Everywhere

2. File Writing Chaos

3. No Wildcard Detection

4. Terrible Thread Management

5. Other Issues

Fixing It: The AI-Assisted Rewrite

Fix #1: Atomic Operations with File-Based Locking

Fix #2: Atomic File Writes

Fix #3: Wildcard Detection

Fix #4: URL Encoding

Fix #5: Rate Limiting

Fix #6: Resume Capability

Fix #7: Progress Tracking

New Professional Features

Size-Based Filtering

JSON Output

Better Error Handling

Comprehensive Statistics

How to Use Duster v3.0

Installation

Basic Usage

CTF Mode (My Go-To)

Resume Interrupted Scan

All Options

What It Still Can't Do (And Why)

Performance on Huge Wordlists

Wildcard Detection Isn't Perfect

No Recursive Scanning

Rate Limiting Is Approximate

Resume Uses Grep (Slow on Huge Scans)

Duster vs Professional Tools

Speed

When Duster Wins

When Duster Loses

What Building This Taught Me

1. Concurrency Is Hard

2. Edge Cases Are Everywhere

3. "Works for Me" ≠ Production Ready

4. AI as a Learning Tool

5. Simple Can Still Be Professional

Final Thoughts

Get Duster v3.0

Comments

Post a Comment