OSINT Methods: Tools and Techniques for Open Source Intelligence
OSINT (Open Source Intelligence) refers to the systematic collection and analysis of publicly available information for security and reconnaissance purposes. This article explains OSINT methods for corporate research: DNS enumeration (dnsx, amass, subfinder), Google Dorking, Shodan/Censys, Certificate Transparency, social media OSINT, WHOIS analysis, and passive reconnaissance frameworks such as Maltego and SpiderFoot.
Table of Contents (7 sections)
OSINT (Open Source Intelligence) is the foundation of every professional penetration test: Before an attacker actively penetrates a system, they often spend hours or days gathering publicly available information. The goal: to obtain a complete picture of the attack surface without sending a single packet to the target network. What attackers see, defenders must also know.
OSINT Framework and Phases
OSINT Reconnaissance Framework:
Passive OSINT (without direct interaction with the target):
→ DNS information from public sources
→ WHOIS data
→ Certificate Transparency logs
→ Google Dorking
→ Shodan/Censys (cached scans)
→ Social media and company websites
→ Job postings (technology stack identifiable!)
→ Pastebin / Dark Web leaks
→ Code repositories (GitHub, GitLab)
Semi-Passive OSINT (not directly identifiable):
→ DNS resolution of public records (A, MX, SPF, DMARC)
→ Certificate Transparency log queries
→ Wayback Machine (archive.org)
→ Web crawling with passive fingerprinting
Active reconnaissance (direct, identifiable):
→ Port scanning (nmap)
→ Banner grabbing
→ Web technology fingerprinting
→ WAF detection
→ Subdomain brute force (active, sending DNS requests)
OSINT Framework Categories:
osintframework.com: Categorized overview of all tools
Categories: Username, Email, Domain, IP/Network, Social Media,
Dark Web, Documents, Images, Phone, Business
DNS Enumeration
DNS Enumeration - Discovering subdomains and infrastructure:
Passive Subdomain Enumeration (without sending DNS queries to the target):
subfinder (Project Discovery):
subfinder -d example.com -o subdomains.txt
subfinder -d example.com -all -recursive -o subdomains.txt
# Sources: Certificate Transparency, VirusTotal, Shodan, etc.
amass (OWASP):
amass enum -passive -d example.com -o amass-passive.txt
amass enum -active -d example.com -o amass-active.txt
amass db -names -d example.com # From amass's own database
Certificate Transparency Logs:
# crt.sh - Query all certificates for a domain:
curl -s "https://crt.sh/?q=%.example.com&output;=json" | \
jq -r '.[].name_value' | \
sort -u | grep -v "*"
# crt.sh also for wildcard subdomains:
curl -s "https://crt.sh/?q=%.%.example.com&output;=json" | \
jq -r '.[].name_value' | sort -u
DNS Brute Force (active):
dnsx -l subdomains.txt -a -resp -o resolved.txt
# Tool: puredns (with wordlist)
puredns bruteforce wordlist.txt example.com -r resolvers.txt
Analyze DNS record types:
# MX records (email infrastructure):
dig MX example.com
# → Google Workspace? Microsoft 365? Own mail server?
# SPF, DMARC, DKIM:
dig TXT example.com | grep spf
dig TXT _dmarc.example.com
# → Shows email security configuration (or lack thereof!)
# NS records (nameservers):
dig NS example.com
# → Cloudflare? AWS Route 53? Own NS?
# Attempt an AXFR zone transfer (often not allowed, but try anyway):
dig @ns1.example.com example.com AXFR
# If successful: ALL DNS records for the domain!
Google Dorking
Google Dorking - Sensitive information via search engines:
Basic operators:
site:example.com → Only this domain
filetype:pdf → Only PDFs
inurl:/admin → URL contains /admin
intitle:"Index of /" → Directory listings
intext:"confidential" → Text in the document
-site:www.example.com → Exclude this subdomain
Practical dorking combinations:
# Find subdomains:
site:*.example.com -site:www.example.com
# Login pages:
site:example.com inurl:login OR inurl:signin OR inurl:auth
# Configuration files:
site:example.com filetype:env OR filetype:config OR filetype:cfg
# Error messages with stack traces:
site:example.com "stack trace" OR "exception" OR "debug"
# Open redirects:
site:example.com inurl:redirect= OR inurl:url= OR inurl:return=
# Passwords (often in old files):
site:example.com filetype:txt password OR username
# Backup files:
site:example.com filetype:bak OR filetype:backup OR filetype:sql
# Exposed .git:
site:example.com inurl:/.git/config
# phpinfo():
site:example.com inurl:phpinfo.php
# Jenkins/CI-CD:
site:example.com inurl:jenkins OR inurl:gitlab OR inurl:bitbucket
Google Dorks Automation:
# ghdb-scraper (Google Hacking Database):
python3 ghdb.py -q "site:example.com" -d "vulnerability"
# GoogD0rker:
python3 googd0rker.py -q "site:example.com" -t web
Shodan and Censys
Shodan - Search engine for connected devices:
Basic Shodan searches:
# Direct IP address:
host: 203.0.113.1
# By ASN:
org:"Example GmbH"
asn:"AS12345"
# By technology:
product:"Microsoft IIS"
product:"Apache httpd" version:"2.4.51"
# By port/service:
port:3389 org:"Example GmbH" # RDP to org
port:22 country:DE # SSH in Germany
# Certificate information:
ssl.cert.subject.cn:"*.example.com"
ssl.cert.expired:true org:"Example GmbH" # Expired certificates!
Shodan CLI:
# Installation:
pip install shodan
shodan init YOUR_API_KEY
# Domain info:
shodan domain example.com
# IP info:
shodan host 203.0.113.1
# Download search:
shodan search --fields ip_str,port,org "org:'Example GmbH'" \
--limit 1000 > shodan_results.txt
Censys (Alternative, more SSL/TLS focus):
search.censys.io or censys.io/api
# Python API:
from censys.search import CensysHosts
h = CensysHosts()
results = h.search("ip_addresses.reverse_dns.reverse_dns:'example.com'")
# All IPs of an organization:
results = h.search("autonomous_system.organization_id:'NNNNNN'")
# Certificates:
from censys.search import CensysCertificates
c = CensysCertificates()
certs = c.search("parsed.subject.common_name:'*.example.com'")
What you'll find:
→ Forgotten test servers (example-test.example.com:8080)
→ Outdated SSL certificates (known CVEs)
→ Exposed admin panels (Grafana, Jenkins, Kibana)
→ Default credentials on network devices
→ Exposed databases (MongoDB, Elasticsearch, Redis)
GitHub and Code Repositories
GitHub OSINT - Finding source code leaks:
GitHub search:
# Search for organization:
org:example-gmbh
# Sensitive files:
org:example-gmbh filename:.env
org:example-gmbh filename:config.json password
org:example-gmbh filename:docker-compose.yml
# API keys in the code:
org:example-gmbh "AWS_SECRET_ACCESS_KEY"
org:example-gmbh "AKIA" # AWS Key Prefix
org:example-gmbh "-----BEGIN RSA PRIVATE KEY-----"
org:example-gmbh "ghp_" # GitHub Personal Access Token
Automated GitHub search:
trufflehog (Secrets scanner):
trufflehog github --org=example-gmbh
trufflehog git https://github.com/example-gmbh/repo
gitleaks:
gitleaks detect --source . --report-path leaks.json
gitleaks detect --source https://github.com/example-gmbh/repo
gitrob:
gitrob analyze --github-access-token TOKEN \
--organization example-gmbh
Analyze commits:
# Search git log (locally):
git log --all -p | grep -i "password\|secret\|key\|token"
# git-secrets (prevents secrets from being committed):
git secrets --install
git secrets --register-aws
Common Leaks:
→ AWS Access Keys (AKIA... prefix)
→ Private SSH/TLS keys
→ Database passwords in .env files
→ API keys (Stripe, Twilio, SendGrid, etc.)
→ JWT secrets (HMAC key in plain text)
→ Hardcoded production credentials in tests
Social Engineering Reconnaissance
Social Media and Corporate OSINT:
LinkedIn:
→ List of employees at the target organization
→ Technology stack from job postings!
"Seeking Python developers with Django and AWS experience"
→ Reveals technology stack
→ Derive email format (vorname.nachname@example.com?)
→ Organizational chart / decision-making structures
LinkedIn OSINT Tools:
linkedin2username: Employee lists → Generate email list
ScrapedIn / ProspectIn: Automated LinkedIn scraping
Email Enumeration:
hunter.io:
# API:
curl "https://api.hunter.io/v2/domain-search?domain=example.com&api;_key=KEY"
→ Email format + known email addresses
emailhippo / verifalia:
→ Check email validity without sending
WHOIS / Domain Registration:
whois example.com
# → Registrant (often anonymized), registrar, creation date
# → For older registrations: real contact information!
# Reverse WHOIS (same owner → other domains):
viewdns.info/reversewhois/?q=admin@example.com
# Domain history:
domaintools.com / whoisology.com
Wayback Machine:
# Old versions of the website:
web.archive.org/web/*/example.com
# Older versions often contain:
→ Old employee pages with real email addresses
→ Previous technology versions (older CMS, etc.)
→ Deleted sensitive pages (internal, documentation)
Analyze job postings:
# Information from job postings:
"Experience with Fortinet FortiGate preferred" → Firewall vendor known!
"AWS Certified Solutions Architect" → AWS as cloud provider
"SIEM experience with Splunk" → SIEM system known
"Knowledge of SAP ERP" → Business software
→ Attacker knows the technology stack before the first attack!
OSINT Frameworks and Automation
OSINT Platforms and Automation:
Maltego:
→ Graphical OSINT visualization and correlation
→ Transforms: automated data queries (Shodan, Censys, etc.)
→ Community Edition: free (limited transforms)
→ Ideal for: visualizing relationships, identifying attack paths
SpiderFoot (Open Source):
# Installation:
pip install spiderfoot
python3 sf.py -l 127.0.0.1:5001
# CLI scan:
python3 sfcli.py -s example.com -t DOMAIN -m all -o json
# → Automatically: subdomains, IPs, emails, social profiles, leaks
Over 200 modules integrated:
→ Shodan, Censys, VirusTotal
→ HaveIBeenPwned, Dehashed
→ Certificate Transparency
→ Google, Bing, DuckDuckGo
→ GitHub, GitLab
→ LinkedIn, Twitter (limited)
Recon-ng:
# Modular recon framework:
recon-ng
[recon-ng]> marketplace install all
[recon-ng]> workspaces create example-gmbh
[recon-ng]> db insert domains
Domain: example.com
[recon-ng]> modules load recon/domains-hosts/hackertarget
[recon-ng]> run
# → Subdomains found
theHarvester:
# Quick email/subdomain enumeration:
theHarvester -d example.com -l 500 -b google,bing,linkedin,shodan
OSINT Report Template:
→ Target and Scope
→ Passive Findings (without target system contact)
→ Subdomain List (+ active IPs)
→ Email addresses (for phishing simulation scope)
→ Technology stack analysis
→ Exposed services (Shodan)
→ Credential leaks (HIBP, DeHashed)
→ Recommendations: What should be addressed immediately? Questions about this topic?
Our experts advise you free of charge and without obligation.
About the Author
M.Sc. IT-Sicherheit mit über 5 Jahren Erfahrung in offensiver Sicherheitsanalyse. Leitet die Durchführung von Penetrationstests mit Spezialisierung auf Web-Applikationen, Netzwerk-Infrastruktur, Reverse Engineering und Hardware-Sicherheit. Verantwortlich für mehrere Responsible Disclosures.