This tool is designed for users who need to perform secure and untraceable data collection from websites, including those on the Tor network. It is ideal for privacy-conscious programmers and researchers who want to crawl and extract webpage content anonymously while respecting web crawling ethics.

Key Features

Anonymous web scraping via the Tor network

Supports crawling both regular and .onion webpages

Simple command-line interface with minimal arguments

Ability to crawl links recursively with configurable depth and delay

Output can be piped to other tools for further processing

Verbose mode to display detailed progress information

Option to bypass Tor network if needed

Generates output files containing crawled links

Insights & Recommendations

Users must have the Tor service installed and running for the tool to function properly. While the crawler can bypass robots.txt restrictions, it is recommended to respect website terms of service and copyright laws. The tool is suitable for responsible data gathering and privacy-focused automation.

Installation

git clone https://github.com/MikeMeliz/TorCrawl.py.git
pip install -r requirements.txt
Install and start TOR service depending on OS:
For Debian/Ubuntu: apt-get install tor
Start TOR service: service tor start
For Windows: Download tor.exe
Install TOR service: tor.exe --service install
Start TOR service: tor.exe --service start
For MacOS: brew install tor
Start TOR service: brew services start tor

Usage

torcrawl -u http://www.github.com/ | grep 'google-analytics'

Scrape the specified webpage anonymously through Tor and filter output for 'google-analytics' entries.

torcrawl -v -u http://www.github.com/ -c -d 2 -p 2

Verbose crawl of the webpage with link crawling enabled, 2 levels deep, and 2 seconds delay between requests.

Smart Usage Notes

Integrate TorCrawl.py with threat intelligence platforms to automate anonymous data collection on emerging threats.

Use the tool in red team exercises to simulate adversary reconnaissance via anonymized web crawling.

Chain with vulnerability scanners to enhance stealthy target discovery in penetration testing.

Deploy in blue team environments for continuous monitoring of dark web sites and onion services related to the organization.

Incorporate delay and depth parameters to mimic human browsing patterns and avoid detection or rate limiting.

Open Source Security Atlas

TorCrawl.py

About This Tool

Primary Use Case

Key Features

Insights & Recommendations

Installation

Usage

Smart Usage Notes

Security Capability Profile

Tags

You Might Also Like