OSINT Methods: Tools and Techniques for Open Source Intelligence

OSINT (Open Source Intelligence) is the foundation of every professional penetration test: Before an attacker actively penetrates a system, they often spend hours or days gathering publicly available information. The goal: to obtain a complete picture of the attack surface without sending a single packet to the target network. What attackers see, defenders must also know.

OSINT Framework and Phases

OSINT Reconnaissance Framework:

Passive OSINT (without direct interaction with the target):
  → DNS information from public sources
  → WHOIS data
  → Certificate Transparency logs
  → Google Dorking
  → Shodan/Censys (cached scans)
  → Social media and company websites
  → Job postings (technology stack identifiable!)
  → Pastebin / Dark Web leaks
  → Code repositories (GitHub, GitLab)

Semi-Passive OSINT (not directly identifiable):
  → DNS resolution of public records (A, MX, SPF, DMARC)
  → Certificate Transparency log queries
  → Wayback Machine (archive.org)
  → Web crawling with passive fingerprinting

Active reconnaissance (direct, identifiable):
  → Port scanning (nmap)
  → Banner grabbing
  → Web technology fingerprinting
  → WAF detection
  → Subdomain brute force (active, sending DNS requests)

OSINT Framework Categories:
  osintframework.com: Categorized overview of all tools
  Categories: Username, Email, Domain, IP/Network, Social Media,
              Dark Web, Documents, Images, Phone, Business

DNS Enumeration

DNS Enumeration - Discovering subdomains and infrastructure:

Passive Subdomain Enumeration (without sending DNS queries to the target):

subfinder (Project Discovery):
  subfinder -d example.com -o subdomains.txt
  subfinder -d example.com -all -recursive -o subdomains.txt
  # Sources: Certificate Transparency, VirusTotal, Shodan, etc.

amass (OWASP):
  amass enum -passive -d example.com -o amass-passive.txt
  amass enum -active -d example.com -o amass-active.txt
  amass db -names -d example.com  # From amass&#x27;s own database

Certificate Transparency Logs:
  # crt.sh - Query all certificates for a domain:
  curl -s &quot;https://crt.sh/?q=%.example.com&amp;output;=json&quot; | \
    jq -r &#x27;.[].name_value&#x27; | \
    sort -u | grep -v &quot;*&quot;

  # crt.sh also for wildcard subdomains:
  curl -s &quot;https://crt.sh/?q=%.%.example.com&amp;output;=json&quot; | \
    jq -r &#x27;.[].name_value&#x27; | sort -u

DNS Brute Force (active):
  dnsx -l subdomains.txt -a -resp -o resolved.txt
  # Tool: puredns (with wordlist)
  puredns bruteforce wordlist.txt example.com -r resolvers.txt

Analyze DNS record types:
  # MX records (email infrastructure):
  dig MX example.com
  # → Google Workspace? Microsoft 365? Own mail server?

  # SPF, DMARC, DKIM:
  dig TXT example.com | grep spf
  dig TXT _dmarc.example.com
  # → Shows email security configuration (or lack thereof!)

  # NS records (nameservers):
  dig NS example.com
  # → Cloudflare? AWS Route 53? Own NS?

  # Attempt an AXFR zone transfer (often not allowed, but try anyway):
  dig @ns1.example.com example.com AXFR
  # If successful: ALL DNS records for the domain!

Google Dorking

Google Dorking - Sensitive information via search engines:

Basic operators:
  site:example.com           → Only this domain
  filetype:pdf               → Only PDFs
  inurl:/admin               → URL contains /admin
  intitle:&quot;Index of /&quot;       → Directory listings
  intext:&quot;confidential&quot;      → Text in the document
  -site:www.example.com      → Exclude this subdomain

Practical dorking combinations:

  # Find subdomains:
  site:*.example.com -site:www.example.com

  # Login pages:
  site:example.com inurl:login OR inurl:signin OR inurl:auth

  # Configuration files:
  site:example.com filetype:env OR filetype:config OR filetype:cfg

  # Error messages with stack traces:
  site:example.com &quot;stack trace&quot; OR &quot;exception&quot; OR &quot;debug&quot;

  # Open redirects:
  site:example.com inurl:redirect= OR inurl:url= OR inurl:return=

  # Passwords (often in old files):
  site:example.com filetype:txt password OR username

  # Backup files:
  site:example.com filetype:bak OR filetype:backup OR filetype:sql

  # Exposed .git:
  site:example.com inurl:/.git/config

  # phpinfo():
  site:example.com inurl:phpinfo.php

  # Jenkins/CI-CD:
  site:example.com inurl:jenkins OR inurl:gitlab OR inurl:bitbucket

Google Dorks Automation:
  # ghdb-scraper (Google Hacking Database):
  python3 ghdb.py -q &quot;site:example.com&quot; -d &quot;vulnerability&quot;
  # GoogD0rker:
  python3 googd0rker.py -q &quot;site:example.com&quot; -t web

Shodan and Censys

Shodan - Search engine for connected devices:

Basic Shodan searches:
  # Direct IP address:
  host: 203.0.113.1

  # By ASN:
  org:&quot;Example GmbH&quot;
  asn:&quot;AS12345&quot;

  # By technology:
  product:&quot;Microsoft IIS&quot;
  product:&quot;Apache httpd&quot; version:&quot;2.4.51&quot;

  # By port/service:
  port:3389 org:&quot;Example GmbH&quot;    # RDP to org
  port:22 country:DE               # SSH in Germany

  # Certificate information:
  ssl.cert.subject.cn:&quot;*.example.com&quot;
  ssl.cert.expired:true org:&quot;Example GmbH&quot;  # Expired certificates!

Shodan CLI:
  # Installation:
  pip install shodan
  shodan init YOUR_API_KEY

  # Domain info:
  shodan domain example.com

  # IP info:
  shodan host 203.0.113.1

  # Download search:
  shodan search --fields ip_str,port,org &quot;org:&#x27;Example GmbH&#x27;&quot; \
    --limit 1000 &gt; shodan_results.txt

Censys (Alternative, more SSL/TLS focus):
  search.censys.io or censys.io/api

  # Python API:
  from censys.search import CensysHosts
  h = CensysHosts()
  results = h.search(&quot;ip_addresses.reverse_dns.reverse_dns:&#x27;example.com&#x27;&quot;)

  # All IPs of an organization:
  results = h.search(&quot;autonomous_system.organization_id:&#x27;NNNNNN&#x27;&quot;)

  # Certificates:
  from censys.search import CensysCertificates
  c = CensysCertificates()
  certs = c.search(&quot;parsed.subject.common_name:&#x27;*.example.com&#x27;&quot;)

What you&#x27;ll find:
  → Forgotten test servers (example-test.example.com:8080)
  → Outdated SSL certificates (known CVEs)
  → Exposed admin panels (Grafana, Jenkins, Kibana)
  → Default credentials on network devices
  → Exposed databases (MongoDB, Elasticsearch, Redis)

GitHub and Code Repositories

GitHub OSINT - Finding source code leaks:

GitHub search:
  # Search for organization:
  org:example-gmbh

  # Sensitive files:
  org:example-gmbh filename:.env
  org:example-gmbh filename:config.json password
  org:example-gmbh filename:docker-compose.yml

  # API keys in the code:
  org:example-gmbh &quot;AWS_SECRET_ACCESS_KEY&quot;
  org:example-gmbh &quot;AKIA&quot;               # AWS Key Prefix
  org:example-gmbh &quot;-----BEGIN RSA PRIVATE KEY-----&quot;
  org:example-gmbh &quot;ghp_&quot;               # GitHub Personal Access Token

Automated GitHub search:

  trufflehog (Secrets scanner):
    trufflehog github --org=example-gmbh
    trufflehog git https://github.com/example-gmbh/repo

  gitleaks:
    gitleaks detect --source . --report-path leaks.json
    gitleaks detect --source https://github.com/example-gmbh/repo

  gitrob:
    gitrob analyze --github-access-token TOKEN \
      --organization example-gmbh

Analyze commits:
  # Search git log (locally):
  git log --all -p | grep -i &quot;password\|secret\|key\|token&quot;

  # git-secrets (prevents secrets from being committed):
  git secrets --install
  git secrets --register-aws

Common Leaks:
  → AWS Access Keys (AKIA... prefix)
  → Private SSH/TLS keys
  → Database passwords in .env files
  → API keys (Stripe, Twilio, SendGrid, etc.)
  → JWT secrets (HMAC key in plain text)
  → Hardcoded production credentials in tests

Social Media and Corporate OSINT:

LinkedIn:
  → List of employees at the target organization
  → Technology stack from job postings!
    &quot;Seeking Python developers with Django and AWS experience&quot;
    → Reveals technology stack
  → Derive email format (vorname.nachname@example.com?)
  → Organizational chart / decision-making structures

LinkedIn OSINT Tools:
  linkedin2username: Employee lists → Generate email list
  ScrapedIn / ProspectIn: Automated LinkedIn scraping

Email Enumeration:
  hunter.io:
    # API:
    curl &quot;https://api.hunter.io/v2/domain-search?domain=example.com&amp;api;_key=KEY&quot;
    → Email format + known email addresses

  emailhippo / verifalia:
    → Check email validity without sending

WHOIS / Domain Registration:
  whois example.com
  # → Registrant (often anonymized), registrar, creation date
  # → For older registrations: real contact information!

  # Reverse WHOIS (same owner → other domains):
  viewdns.info/reversewhois/?q=admin@example.com

  # Domain history:
  domaintools.com / whoisology.com

Wayback Machine:
  # Old versions of the website:
  web.archive.org/web/*/example.com

  # Older versions often contain:
  → Old employee pages with real email addresses
  → Previous technology versions (older CMS, etc.)
  → Deleted sensitive pages (internal, documentation)

Analyze job postings:
  # Information from job postings:
  &quot;Experience with Fortinet FortiGate preferred&quot;  → Firewall vendor known!
  &quot;AWS Certified Solutions Architect&quot;            → AWS as cloud provider
  &quot;SIEM experience with Splunk&quot;                   → SIEM system known
  &quot;Knowledge of SAP ERP&quot;                       → Business software
  → Attacker knows the technology stack before the first attack!

OSINT Frameworks and Automation

OSINT Platforms and Automation:

Maltego:
  → Graphical OSINT visualization and correlation
  → Transforms: automated data queries (Shodan, Censys, etc.)
  → Community Edition: free (limited transforms)
  → Ideal for: visualizing relationships, identifying attack paths

SpiderFoot (Open Source):
  # Installation:
  pip install spiderfoot
  python3 sf.py -l 127.0.0.1:5001

  # CLI scan:
  python3 sfcli.py -s example.com -t DOMAIN -m all -o json
  # → Automatically: subdomains, IPs, emails, social profiles, leaks

  Over 200 modules integrated:
  → Shodan, Censys, VirusTotal
  → HaveIBeenPwned, Dehashed
  → Certificate Transparency
  → Google, Bing, DuckDuckGo
  → GitHub, GitLab
  → LinkedIn, Twitter (limited)

Recon-ng:
  # Modular recon framework:
  recon-ng
  [recon-ng]&gt; marketplace install all
  [recon-ng]&gt; workspaces create example-gmbh
  [recon-ng]&gt; db insert domains
  Domain: example.com
  [recon-ng]&gt; modules load recon/domains-hosts/hackertarget
  [recon-ng]&gt; run
  # → Subdomains found

theHarvester:
  # Quick email/subdomain enumeration:
  theHarvester -d example.com -l 500 -b google,bing,linkedin,shodan

OSINT Report Template:
  → Target and Scope
  → Passive Findings (without target system contact)
  → Subdomain List (+ active IPs)
  → Email addresses (for phishing simulation scope)
  → Technology stack analysis
  → Exposed services (Shodan)
  → Credential leaks (HIBP, DeHashed)
  → Recommendations: What should be addressed immediately?