🔍 Mastering Photon: The Ultimate Web Scraping & OSINT Tool for Ethical Hackers

🌐 What is Photon?

Photon is a lightning-fast, smart web crawler built specifically for OSINT tasks. It’s not just a URL scraper—it’s a powerful weapon for bug bounty hunters, ethical hackers, and red teamers to extract:

✅ Emails & social links
✅ Endpoints & APIs
✅ JS files & secrets
✅ Hidden files (PDFs, ZIPs)
✅ Wayback Machine archives

⚠️ Note: Use Photon only on websites you own or have explicit permission to test. Unauthorized scans may violate cyber laws like CFAA.

🚀 Why Use Photon?

Feature	Benefit
🧠 Intelligent Crawling	Finds internal/external URLs, APIs, scripts, secrets
🔍 JS Parsing	Uncovers hidden endpoints from JavaScript files
⌛ Wayback Support	Crawls archive.org data for old/exposed pages
🔐 Regex Intelligence	Detects secrets like keys, passwords, tokens
💾 Export Flexibility	Supports `.json`, `.txt`, `.csv` formats for automation & scripting

🛠️ Installation on Kali Linux / Ubuntu

Option 1: GitHub Clone (Recommended)

git clone https://github.com/s0md3v/Photon.git
cd Photon
pip3 install -r requirements.txt

Option 2: Using pip (Global Install)

pip3 install photon

When you run photon -h, you’ll see:

      ____  __          __
     / __ \/ /_  ____  / /_____  ____                                                                           
    / /_/ / __ \/ __ \/ __/ __ \/ __ \                                                                          
   / ____/ / / / /_/ / /_/ /_/ / / / /                                                                          
  /_/   /_/ /_/\____/\__/\____/_/ /_/ v1.2.2                                                                    

usage: photon.py [-h] [-u ROOT] [-c COOK] [-r REGEX] [-e {csv,json}] [-o OUTPUT] [-l LEVEL] [-t THREADS]
                 [-d DELAY] [-v] [-s SEEDS [SEEDS ...]] [--stdout STD] [--user-agent USER_AGENT]
                 [--exclude EXCLUDE] [--timeout TIMEOUT] [--clone] [--headers] [--dns] [--keys] [--only-urls]
                 [--wayback]

📌 Commonly Used Options (Updated)

Option	Description
`-u, --url`	Root URL to crawl
`-o, --output`	Output directory for scan results
`-t, --threads`	Number of parallel threads
`-d, --delay`	Delay between requests (in seconds)
`-v, --verbose`	Verbose output
`-c, --cookie`	Add session cookie (for logged-in scans)
`-r, --regex`	Custom regex pattern to match secrets
`--export`	Export format: `json` or `csv`
`-l, --level`	Crawl depth level
`-s, --seeds`	Add multiple seed URLs manually
`--keys`	Automatically search for secrets like tokens, AWS keys
`--dns`	Enumerate subdomains and collect DNS data
`--clone`	Clone the full site locally
`--only-urls`	Extract only URLs and ignore metadata
`--wayback`	Fetch URLs from the Wayback Machine (archive.org)
`--headers`	Include headers in scan
`--timeout`	Set HTTP timeout
`--user-agent`	Set custom user-agent string
`--stdout`	Print results to terminal
`--exclude`	Skip URLs matching this regex
`-h, --help`	Show help and exit

🔍 Real-World OSINT Examples

🕵️ Find Emails

photon.py -u https://target.com --emails

📌 Output:

contact@target.com  
admin@target.com

🕸️ Discover API Endpoints

photon.py -u https://target.com --js

📌 Output:

/v1/user/profile  
/v1/admin/dashboard

🕰️ Wayback Recon

photon.py -u https://oldsite.com --wayback

📌 Output:

/old-login.html  
/dev-test.php

📂 Output Files Explained

File	What It Contains
`endpoints.txt`	Dynamic paths, API URLs
`external.txt`	External links and domains
`files.txt`	Downloadable files like .zip, .pdf, .docx
`fuzzable.txt`	Parameterized URLs (`?id=`, `?page=`)
`intel.txt`	Secrets, emails, AWS keys
`internal.txt`	Internal pages found
`scripts.txt`	JavaScript files for analysis
`robots.txt`	robots.txt content (disallowed paths)
`report.json`	Full report in JSON

🧪 Practice Lab (Safe & Legal)

1. Run Photon on your own blog:
   photon.py -u https://yourblog.com -o output

2. Open `internal.txt` to find hidden pages

3. Use `intel.txt` to see if secrets/emails leak

4. Load `scripts.txt` into LinkFinder for JS analysis

🧱 Defensive Tips for Admins

If you’re on the blue team or a web developer, here’s how to defend against Photon-like crawlers:

Tip	Purpose
Block bots via `robots.txt`	Prevent crawlers
Rate-limit requests	Detect automated scanning
Obfuscate JS code	Hide internal logic
Monitor User-Agent headers	Photon uses Python-based headers

🧠 Pro Tips

✅ Combine Photon with:

LinkFinder — JS endpoint discovery
Waybackurls — historical URLs
ffuf — fuzz vulnerable parameters
nuclei — scan with pre-made templates

📚 Learn More

🧵 Final Thoughts

Photon isn’t just a scanner — it’s a recon power tool that fits right into any red teamer’s or bug bounty hunter’s workflow. From scraping emails to digging deep into JavaScript, Photon saves hours of manual recon and gets you directly to the data that matters.

📢 Like this blog? Share it! 🧠 Follow me on LinkedIn 📚 Explore more on My GitHub Blog

#OSINT #PhotonTool #BugBounty #Recon #CyberSecurity #WebScraping #EthicalHacking #RajkumarBlogs

🌐 What is Photon?#

🚀 Why Use Photon?#

🛠️ Installation on Kali Linux / Ubuntu#

Option 1: GitHub Clone (Recommended)#

Option 2: Using pip (Global Install)#

🆕 Photon v1.2.2 Help Menu (Kali Linux)#

📌 Commonly Used Options (Updated)#

🔍 Real-World OSINT Examples#

🕵️ Find Emails#

🕸️ Discover API Endpoints#

🕰️ Wayback Recon#

📂 Output Files Explained#

🧪 Practice Lab (Safe & Legal)#

🧱 Defensive Tips for Admins#

🧠 Pro Tips#

📚 Learn More#

🧵 Final Thoughts#

🌐 What is Photon?

🚀 Why Use Photon?

🛠️ Installation on Kali Linux / Ubuntu

Option 1: GitHub Clone (Recommended)

Option 2: Using pip (Global Install)

🆕 Photon v1.2.2 Help Menu (Kali Linux)

📌 Commonly Used Options (Updated)

🔍 Real-World OSINT Examples

🕵️ Find Emails

🕸️ Discover API Endpoints

🕰️ Wayback Recon

📂 Output Files Explained

🧪 Practice Lab (Safe & Legal)

🧱 Defensive Tips for Admins

🧠 Pro Tips

📚 Learn More

🧵 Final Thoughts