Building Duster: A Directory Brute-forcer in Bash
How I built a web enumeration tool from scratch and what I learned along the way
Introduction
I just wanted to create the directory brute-forcing tool in BASH - and here it is.
Duster is a directory buster written entirely in Bash. Nothing fancy, just about 300 lines of shell script that does the job. And honestly, building it taught me more about HTTP and web scanning than months of just using existing tools.
This post is going to walk through the entire thing - how it works, why I made certain decisions, and all the little details that make it actually useful. If you're learning pentesting or just want to understand what tools like gobuster are doing under the hood, this is for you.
The Problem We're Solving
When you're poking at a website, there's always hidden stuff. Admin panels, backup files, old directories that developers forgot about. These aren't linked anywhere on the site, so you can't just click around to find them.
The traditional approach is directory brute-forcing - you take a big list of common directory names and just try them all. If /admin exists, the server returns 200. If it doesn't, you get 404. Simple concept, but there are complications:
- Speed matters - Checking one path at a time would take forever. You need to check many paths simultaneously.
- Servers behave differently - Some return 403 (forbidden) for real directories, some do weird redirects, some give you 200 for everything even if it doesn't exist.
- You're looking for files too - Not just /admin but also /backup.zip, /config.php, stuff like that.
Tools like gobuster and feroxbuster solve these problems really well. They're fast, feature-rich, and battle-tested. So why build another one?
Why Build Your Own Tool?
Look, I'm not saying Duster is better than gobuster. It's not. Gobuster is written in Go, it's compiled, it's fast as hell. But here's why I wanted my own:
- It works everywhere - Any box with Bash and curl can run it. No installation, no compilation, no dependency hell.
- You understand it completely - It's 300 lines of shell script. You can read the whole thing in 10 minutes and know exactly what it's doing.
- Easy to modify - Want to add a feature? Just edit the script. No need to learn Go or recompile anything.
- Learning experience - Building it forces you to understand HTTP status codes, redirects, threading, all the stuff that makes these tools work.
Plus, sometimes you're in restricted environments. Jump boxes, minimal VMs, Docker containers. Having a tool that's just a shell script can be surprisingly useful.
Goals for This Tool
I wanted something that could:
- Control speed with threads - Not too slow, but also not hammering servers so hard you get banned.
- Handle timeouts properly - Some servers are slow. Need to wait, but not forever.
- Work with redirects intelligently - Follow them when needed, but don't spam output with useless trailing slash redirects.
- Set custom User-Agent - Some servers block obvious scanner traffic.
- Log everything properly - Save results to files with timestamps so you can review later.
Design Decisions
Why Bash + curl?
Bash is everywhere. If a Linux system exists, it probably has Bash. And curl is pretty much standard too. This means zero external dependencies. No pip install, no go get, nothing. Just download and run.
Yeah, Bash is slower than compiled languages. But for most CTF challenges and small to medium scans, it's fast enough. And the portability trade-off is worth it.
Threading with Background Jobs
Here's where it gets interesting. Bash doesn't have real threading, but it has something useful - background jobs. When you end a command with &, it runs in the background:
curl https://example.com/admin & # Runs in background
curl https://example.com/backup & # Also runs in background
wait # Wait for all to finish
The trick is controlling how many jobs run at once. If you just spawn unlimited background jobs, you'll overwhelm your system or the target server. So we need a limiter:
# Spawn a background job
worker "$directory" &
# Wait if we've hit the thread limit
while [ "$(jobs -rp | wc -l)" -ge "$THREADS" ]; do
sleep 0.05
done
The jobs -rp command lists all running background process IDs. We count them, and if we've hit our thread limit (say, 20), we wait a tiny bit before spawning another. Simple but effective.
Understanding HTTP Response Codes
This is critical for any directory scanner. Different HTTP codes tell you different things:
- 200 OK - The path exists and you can access it. This is what we're hunting for.
- 301/302 Redirect - Server is sending you somewhere else. Could be important, or could just be a trailing slash redirect.
- 403 Forbidden - The path exists, but you're not allowed to access it. Still interesting because it confirms it's real.
- 404 Not Found - Path doesn't exist. We ignore these.
- 401 Unauthorized - Needs authentication. Interesting to note.
- 500 Server Error - Something broke. Could be a real error or a sign you found something weird.
The Trailing Slash Problem
This one drove me crazy when I first started. When you request /admin and it's a directory, most web servers automatically redirect to /admin/ (with trailing slash). This is normal behavior, but it clutters your output:
That's not really interesting - it's just the server being technically correct. What we actually want to know is: does /admin/ return 200 (accessible) or 403 (forbidden)?
So I added logic to detect these trailing slash redirects and automatically follow them to get the real status. Now instead of seeing a redirect, you see:
Much clearer. We know it's a directory, and we know we can access it.
Feature Breakdown
Wordlist Options
The tool needs a list of paths to try. By default, it uses the DirBuster medium wordlist (/usr/share/wordlists/dirbuster/directory-list-2.3-medium.txt) which is pretty standard on Kali and similar systems.
But you can use any wordlist you want with the -w flag. I often use different wordlists depending on the target - raft-large for thorough scans, a small custom list for quick checks.
Thread Control
The -t flag lets you control how many requests happen simultaneously. Default is 20, which is pretty reasonable:
- Low threads (5-10) - Slower but more polite. Good for production sites or when you want to stay under the radar.
- Medium threads (20-30) - Balanced. Good for most CTF challenges.
- High threads (50+) - Aggressive. Fast but can trigger rate limits or crash weak servers. Use carefully.
Extension Brute-forcing
This is one of my favorite features. A lot of interesting files aren't directories - they're things like:
- backup.zip
- config.php
- database.sql
- admin.bak
- test.old
When you enable extension checking with -x, the tool doesn't just check /admin - it also checks /admin.php, /admin.html, /admin.bak, and so on. You can customize which extensions to try with the -e flag.
Pro tip: In CTFs, I've found flags in .old, .bak, and .txt files more times than I can count. Always check extensions.
Redirect Handling
The -f flag makes curl follow redirects automatically. Sometimes you want this (to see where redirects lead), sometimes you don't (because you just want to catalog what redirects where).
By default, the tool doesn't follow redirects except for the trailing slash case. This keeps the output cleaner and lets you see the actual redirect behavior.
Output Filtering
Different situations need different output:
- Normal mode - Shows 200s, 403s, and redirects. Good for comprehensive results.
- Silent mode (-s) - Only shows 200s. Clean output when you just want to see what's accessible.
- Verbose mode (-v) - Shows everything including 404s, 405s, server errors. Good for debugging.
Timeout and User-Agent
The -T flag sets the timeout. Default is 10 seconds. If a server doesn't respond in that time, we move on. You might want longer timeouts for slow servers or shorter ones for fast networks.
The -a flag lets you set a custom User-Agent. Some servers block obvious scanner traffic (User-Agents like "curl/7.68.0"). Setting it to something like a browser can help:
Deep Dive into the Core Logic
Let's break down exactly how the tool works, function by function. This is where you'll really understand what's happening under the hood.
Building URLs Properly
First, we need a function that takes a base URL and a path and combines them correctly. Sounds simple, but there are edge cases:
function mkurl(){
# Remove trailing slash from base URL if present
local base="${url%/}"
local path="$1"
# Remove leading slash from path if present
path="${path#/}"
# Combine with single slash
echo "${base}/${path}"
}
Why do we need this? Because if the URL is https://example.com/ and the path is /admin, you'd get https://example.com//admin (double slash). This function ensures we always get clean URLs.
The ${url%/} syntax removes a trailing slash if present. The ${path#/} removes a leading slash. Then we combine them with a single slash. Clean and consistent.
The Worker Function - Where the Magic Happens
This is the heart of the tool. Each worker checks one path. Here's the complete function with detailed explanations:
function worker(){
local directory="$1"
local local_out_file="$2"
# Skip empty lines and comments (lines starting with #)
[[ -z "$directory" || "$directory" =~ ^# ]] && return 0
# Build the full URL
local full_url=$(mkurl "$directory")
# Setup curl options as an array
local curl_opts=(
-o /dev/null # Don't save response body
--silent # Don't show progress
-I # HEAD request (faster than GET)
-w "%{http_code}|%{size_download}|%{redirect_url}" # Format string
--max-time "$TIMEOUT" # Timeout limit
-A "$USER_AGENT" # Custom User-Agent
)
# Add -L flag if following redirects is enabled
[[ "$FOLLOW_REDIRECTS" == "true" ]] && curl_opts+=(-L)
# Execute curl and capture response
local response=$(curl "${curl_opts[@]}" "$full_url" 2>/dev/null)
# Parse the response into three variables
IFS='|' read -r code size redirect_url <<< "$response"
# Increment counter
((total_checked++))
# Check if this is a trailing slash redirect
local is_trailing_slash_redirect=false
if [[ "$code" =~ ^30[127]$ ]] && [[ "$redirect_url" == "${full_url}/" ]]; then
is_trailing_slash_redirect=true
# Follow it to get the real status
local dir_response=$(curl -o /dev/null --silent -Iw "%{http_code}|%{size_download}" \
--max-time "$TIMEOUT" -A "$USER_AGENT" "${full_url}/" 2>/dev/null)
IFS='|' read -r code size <<< "$dir_response"
fi
# Now handle the response based on the code
case "$code" in
200)
((found_count++))
if [[ "$is_trailing_slash_redirect" == "true" ]]; then
echo -e "[200 FOUND] ${full_url}/ [DIR] (${size} bytes)"
echo "[200 FOUND] ${full_url}/ [DIRECTORY] ($size bytes)" >> "$local_out_file"
else
echo -e "[200 FOUND] $full_url (${size} bytes)"
echo "[200 FOUND] $full_url ($size bytes)" >> "$local_out_file"
fi
;;
403)
((forbidden_count++))
if [[ "$is_trailing_slash_redirect" == "true" ]]; then
echo -e "[403 FORBIDDEN] ${full_url}/ [DIR]"
echo "[403 FORBIDDEN] ${full_url}/ [DIRECTORY]" >> "$local_out_file"
else
echo -e "[403 FORBIDDEN] $full_url"
echo "[403 FORBIDDEN] $full_url" >> "$local_out_file"
fi
;;
301|302|307|308)
# Only show if NOT a trailing slash redirect
if [[ "$is_trailing_slash_redirect" != "true" ]]; then
((redirect_count++))
echo -e "[${code} REDIRECT] $full_url -> $redirect_url"
echo "[${code} REDIRECT] $full_url -> $redirect_url" >> "$local_out_file"
fi
;;
esac
}
Let's break down the important parts:
The curl Format String
This line is crucial: -w "%{http_code}|%{size_download}|%{redirect_url}"
The -w flag in curl lets you specify a format string for output. We're asking for three things separated by pipes:
- %{http_code} - The HTTP status code (200, 404, etc.)
- %{size_download} - Size of the response in bytes
- %{redirect_url} - Where the redirect points (if any)
So if we request /admin and get redirected, curl might output: 301|0|http://target.com/admin/
We then parse this with: IFS='|' read -r code size redirect_url <<< "$response"
This splits the string on the pipe character and assigns the three parts to three variables. Pretty elegant.
Why Track Response Size?
Some websites are sneaky. They return 200 OK for every path you try, even if it doesn't exist. They just show a custom "not found" page that happens to have 200 status.
By tracking the response size, you can spot this pattern. If every 200 response is exactly 1024 bytes, that's suspicious. Real pages would have different sizes.
Right now the tool just shows you the sizes and you have to spot patterns manually. In the future, I might add automatic wildcard detection that tests random paths first and baselines what a fake 200 looks like.
The Trailing Slash Logic
This bit is important:
if [[ "$code" =~ ^30[127]$ ]] && [[ "$redirect_url" == "${full_url}/" ]]; then
is_trailing_slash_redirect=true
# Follow it...
fi
We check two things: Is the code 301, 302, or 307? AND does the redirect URL just add a trailing slash?
If both are true, this is just a standard directory redirect. We follow it immediately and report the actual status (200 or 403) instead of the redirect.
Extension Checking
When you enable extension checking, this function gets called instead of directly calling worker:
function check_with_extensions(){
local base_path="$1"
local local_out_file="$2"
# First test the base path without extension
worker "$base_path" "$local_out_file" &
# Wait if we've hit thread limit
while [ "$(jobs -rp 2>/dev/null | wc -l)" -ge "$THREADS" ]; do
sleep 0.05
done
# Split the extensions string on commas into an array
IFS=',' read -ra EXT_ARRAY <<< "$EXTENSIONS"
# Test each extension
for ext in "${EXT_ARRAY[@]}"; do
worker "${base_path}.${ext}" "$local_out_file" &
# Again, wait if we've hit thread limit
while [ "$(jobs -rp 2>/dev/null | wc -l)" -ge "$THREADS" ]; do
sleep 0.05
done
done
}
So if your wordlist has admin and you've specified extensions php,html,txt, this function will test:
- /admin
- /admin.php
- /admin.html
- /admin.txt
Each one spawns as a background job, and we control the rate with the same thread limiting logic.
The Main Brute-force Loop
This ties everything together. We read the wordlist line by line and spawn workers:
function bruteforce(){
echo "[+] Starting directory bruteforce..."
echo "[+] Target: $url"
echo "[+] Wordlist: $wlist"
echo "[+] Threads: $THREADS"
echo ""
# Read wordlist line by line
while IFS= read -r directory; do
# Skip empty lines and comments
[[ -z "$directory" || "$directory" =~ ^# ]] && continue
# Check if extension mode is enabled
if [[ "$CHECK_EXTENSIONS" == "true" ]]; then
check_with_extensions "$directory" "$out_file"
else
# Spawn worker in background
worker "$directory" "$out_file" &
# Wait if we've hit thread limit
while [ "$(jobs -rp 2>/dev/null | wc -l)" -ge "$THREADS" ]; do
sleep 0.05
done
fi
done < "$wlist"
# Wait for all background jobs to finish
wait
# Print summary
echo ""
echo "====================================="
echo " Scan Complete!"
echo "====================================="
echo "[+] Found (200): $found_count"
echo "[+] Forbidden (403): $forbidden_count"
echo "[+] Redirects: $redirect_count"
echo "[+] Total checked: $total_checked"
echo "[+] Results saved to: $out_file"
}
The while IFS= read -r directory loop reads the wordlist. The IFS= part preserves leading/trailing whitespace (though we usually don't want that, it's good practice). The -r prevents backslash interpretation.
For each line, we either call check_with_extensions (which spawns multiple workers) or spawn a single worker. In both cases, we're controlling concurrency with the thread limit check.
The final wait command is important - it blocks until all background jobs finish. Without it, the script would finish before all workers complete.
Resilience and Error Handling
Testing Connection First
Before we start a long scan, we should check if the target is even reachable. Nothing worse than letting a scan run for 10 minutes only to realize the URL was wrong:
function test_connection(){
echo "[*] Testing connection to target..."
# Try to connect with short timeout
local test_response=$(curl -o /dev/null --silent -Iw "%{http_code}" \
--max-time 5 -A "$USER_AGENT" "$url" 2>/dev/null)
# Check if we got any response
if [[ -z "$test_response" ]]; then
echo "[!] Failed to connect to target."
echo "[!] Check the URL and try again."
exit 1
fi
echo "[+] Connection successful (HTTP $test_response)"
echo ""
}
This just sends one request to the base URL with a 5-second timeout. If curl returns nothing, we know something's wrong - maybe the URL is malformed, maybe the server is down, maybe there's no internet connection.
Graceful Interrupt Handling
When you press Ctrl+C, you want the scan to stop cleanly. Save what you've found so far, kill the background workers, and exit gracefully:
function ctrl_c(){
echo ""
echo "[!] Keyboard Interrupt detected. Cleaning up..."
echo "[SCAN INTERRUPTED BY USER at $(date)]" >> "$out_file"
# Kill all background jobs
jobs -p | xargs -r kill 2>/dev/null
# Wait for them to actually stop
wait 2>/dev/null
# Show partial results
echo ""
echo "Partial Results:"
echo "Found: $found_count | Forbidden: $forbidden_count | Checked: $total_checked"
exit 1
}
# Set up the trap
trap ctrl_c SIGINT
The trap command tells Bash to call our ctrl_c function when it receives SIGINT (which is what Ctrl+C sends).
We use jobs -p to get all background process IDs, then xargs kill to kill them. The -r flag to xargs means "don't run if there's no input" (in case there are no jobs running).
We still show you how many paths were checked and how many hits were found. The results file gets a note that the scan was interrupted. This way you don't lose your partial progress.
Usage Examples
Let's look at real-world usage patterns. These are the commands I actually use when scanning targets.
Basic Scan
Simplest form. Uses default wordlist, 20 threads, 10 second timeout. Good for a quick check.
Scan with Extension Checking
Adds extension checking with the default list (php, html, txt, asp, aspx, jsp, bak, old, zip, tar.gz). This is my go-to command for CTFs.
Custom Extensions
Check only specific extensions. Use this when you know what kind of application you're dealing with. PHP app? Focus on .php and .bak files.
Aggressive Scan
50 threads, 5 second timeout, verbose output. This is fast but aggressive. Only use on targets you control or have explicit permission for. Can overwhelm weak servers.
Clean Output Mode
Shows only 200 responses. Great when you just want a clean list of accessible paths without all the 403s and redirects.
Custom Wordlist
Use your own wordlist with 30 threads. I keep several specialized wordlists for different scenarios - one for API endpoints, one for common files, one for admin paths.
Stealthy Scan
Low thread count, browser-like User-Agent. Trying to blend in a bit more. Still not invisible, but less obviously automated.
Tip: Always start with default settings on unknown targets. You can always run a second scan with more aggressive settings if the server handles it well.
Limitations and Next Steps
Let's be honest about what this tool can't do and where it could improve.
Wildcard 200 Responses
The biggest issue right now is servers that return 200 for everything. You request /doesnotexist12345 and get 200 with a "not found" page. This floods your results with false positives.
The tool shows you response sizes, so you can spot this pattern manually. If every hit is the same size, they're probably all fake. But I'd like to add automatic detection:
- Before the real scan, test a few random paths that definitely don't exist
- Record the status codes and sizes you get
- During the scan, filter out responses that match the wildcard pattern
This would make the tool much more accurate on tricky servers.
No Recursive Scanning
If you find /admin/, wouldn't it be nice to automatically scan inside it? Check for /admin/backup/, /admin/config/, etc?
Right now you'd have to run a second scan manually with -u https://target.com/admin. Adding recursion would be useful but also complicates the code significantly. Maybe a future version.
Performance on Huge Wordlists
If you're using a wordlist with 1 million entries, Bash starts to show its age. Reading files line by line and spawning processes has overhead. Tools like gobuster (written in Go) will be much faster.
For most CTF and lab scenarios, it's fine. But for comprehensive bug bounty scanning, you might want a faster tool.
Size-Based Filtering
It would be nice to have command-line options like:
- --hide-size 1024 - Don't show responses that are exactly 1024 bytes
- --show-size-range 100-5000 - Only show responses between 100 and 5000 bytes
This would help filter out wildcard responses without having to visually scan through results.
Probability Scoring
Advanced scanners use probability scoring. They look at:
- Response size compared to other responses
- Response headers
- Response body content
- Timing characteristics
Then they give each response a score: 90% sure it's real, 20% sure it's a wildcard, etc. This is sophisticated but really useful for automated scanning.
Implementing this in Bash would be challenging. You'd need to parse response bodies, which means using GET instead of HEAD requests (slower), and doing text analysis in shell script (painful). Probably not worth it for this tool's goals.
Important: This tool is for educational purposes and authorized testing only. Never use it against systems you don't own or have explicit written permission to test. Unauthorized access is illegal and unethical.
What I Learned
Building Duster taught me more about HTTP and web scanning than months of just using existing tools. Here are the key takeaways:
- HTTP is weird - Different servers handle the same requests differently. Understanding 301 vs 302 vs 307, when servers add trailing slashes, how Content-Length works - it all matters.
- Bash is more capable than you think - Background jobs, job control, IFS manipulation, parameter expansion. There's a lot you can do with just shell script.
- Concurrency is hard - Even with a simple thread limiter, you have to think about race conditions, cleanup, interrupt handling. It's not trivial.
- Edge cases are everywhere - Servers that redirect everything, wildcard responses, timeouts, URL encoding issues. Real-world systems are messy.
- Simple tools have value - Not everything needs to be a 50,000-line enterprise application. Sometimes a 300-line shell script does exactly what you need.
If you're learning pentesting or web security, I really recommend building something like this. You don't have to reinvent the wheel completely, but understanding how directory scanning works at this level makes you way better at using professional tools.
You'll understand why gobuster has certain flags, what feroxbuster is really doing when it detects wildcards, why different tools give different results on the same target.
Getting the Code
The full script is available on GitHub. It's about 300 lines total, fully commented. Feel free to:
- Use it as-is for CTFs and authorized testing
- Modify it for your specific needs
- Learn from it and build your own version
- Contribute improvements if you have ideas
That's the beauty of shell scripts - they're not compiled, not obfuscated, just readable code you can understand and modify.
git clone https://github.com/ShieldedDev/Duster
cd Duster
chmod +x duster.sh
./duster.sh
Next challenge: Try adding one new feature to the script. Maybe wildcard detection, maybe recursive scanning, maybe JSON output. The best way to learn is by doing.
Comments
Post a Comment