Cloud detection engineering: attack detection in AWS, Azure and GCP

Cloud infrastructures differ fundamentally from on-premises environments: no network perimeter, identities as the primary access control mechanism, thousands of API calls per second, and a constantly evolving attack surface. Traditional detection rules for network IDS or Windows event logs are largely ineffective in the cloud. Cloud detection engineering is the discipline that bridges this gap.

Cloud Telemetry: The Data Foundation

Primary data sources by cloud provider:

AWS:
  CloudTrail:             ALL API calls (management events = who did what)
  CloudTrail S3 events:   Data events (read access to S3 → expensive but important!)
  VPC Flow Logs:          Network traffic between services (source/destination IP + port)
  GuardDuty Findings:     AWS-native: ML-based anomaly detection
  Config:                 Configuration changes to AWS resources
  Security Hub:           Aggregator for all AWS security findings
  Route53 Resolver Logs:  DNS queries within the VPC

  Critical CloudTrail events:
  ConsoleLogin                    → Who logged in and when?
  AssumeRole                      → Role assumption (lateral movement!)
  CreateUser/AttachUserPolicy     → New users + permissions (persistence!)
  PutBucketAcl/PutObjectAcl       → S3 set to public? (Data Exposure!)
  CreateAccessKey                 → New API key (Persistence!)
  DeleteTrail/StopLogging         → Attacker deletes CloudTrail (Defense Evasion!)

Azure:
  Activity Log:           Azure Resource Manager API calls
  Azure AD Sign-in Logs:  All authentications (including Conditional Access)
  Azure AD Audit Logs:    AD changes (users, groups, roles)
  Diagnostic Logs:        Per-service logs (Storage, Key Vault, etc.)
  Microsoft Defender for Cloud: Security Score + Alerts
  Sentinel Connectors:    Aggregation of all logs listed above

  Critical Azure Events:
  Add member to role             → Privilege Escalation!
  Create or update key vault key → Key rotation or attacker key?
  Delete diagnostic setting      → Logging disabled? (Defense Evasion!)
  Sign-in from unfamiliar location
  User Risk changed to High

GCP:
  Cloud Audit Logs:       Admin Activity + Data Access + System Events
  VPC Flow Logs:          Network Traffic
  Cloud Logging:          All service logs centralized
  Security Command Center: Native threat detection
  Chronicle (SIEM):       GCP-native SIEM at petabyte scale

Critical quality issues:
  → CloudTrail Data Events: NOT enabled by default! (extra costs)
  → Azure AD Sign-In Logs: only 30-day retention in Free Tier!
  → VPC Flow Logs: Not enabled by default; note costs
  → Recommendation: All logs → central SIEM (Sentinel/Splunk/Chronicle)
    Retention: min. 12 months for compliance (ISO 27001, NIS2)

ATT&CK; for Cloud - Threat Landscape

MITRE ATT&amp;CK; Cloud-specific tactics:

T1078.004 - Valid Accounts: Cloud Accounts
  → Compromised IAM users, service accounts, API keys
  Detection: Login from a new region, unusual API calls after login

T1530 - Data from Cloud Storage Object
  → S3/Azure Blob/GCS exfiltration after access
  Detection: GetObject calls to bucket from unknown IP/IAM entity

T1537 - Transfer Data to Cloud Account
  → Attacker copies data to their own cloud account
  Detection: S3 cross-account copy, Azure AzCopy to external

T1078.001 - Default Accounts
  → Root Account Usage, Default Service Accounts
  Detection: Root account login (should NEVER happen!), MFA missing

T1190 - Exploit Public-Facing Application
  → Lambda, EC2, AppService compromised via vulnerability
  Detection: New outbound connections from Lambda/EC2

T1548.005 - Abuse Elevation Control Mechanism: Temporary Elevated Cloud Access
  → Privilege Escalation via PassRole, IAM:CreatePolicy, UpdateAssumeRolePolicy
  Detection: User creates new policy with high privileges

T1070.004 - Indicator Removal: File Deletion
  → CloudTrail deletion, Flowlog deactivation (defense evasion)
  Detection: DeleteTrail, StopLogging, PutBucketLogging (logging disabled)

T1136.003 - Create Account: Cloud Account
  → New IAM user/service accounts for persistence
  Detection: CreateUser, CreateServicePrincipal without ticket reference

T1098.001 - Account Manipulation: Additional Cloud Credentials
  → AttachUserPolicy, CreateAccessKey for persistence
  Detection: CreateAccessKey for another user, AttachUserPolicy with admin

Build Cloud ATT&amp;CK Coverage Matrix:
  1. Inventory all relevant ATT&amp;CK cloud techniques
  2. For each technique: Do we have a detection rule?
  3. Coverage score: (covered techniques / total) × 100
  4. Prioritization: Most common techniques in CISA/Cloud IR reports first

Detection Rules - Cloud-Specific KQL Examples

Microsoft Sentinel KQL Detection Rules:

1. Root Account Login (AWS) - Critical:
AWSCloudTrail
| where EventName == &quot;ConsoleLogin&quot;
| where UserIdentityType == &quot;Root&quot;
| project TimeGenerated, SourceIpAddress, UserAgent, ErrorCode
| extend Severity = &quot;Critical&quot;
// Root account login should NEVER happen!
// Immediate alert to SOC + Cloud Infrastructure Team

2. CloudTrail Logging Disabled - Defense Evasion:
AWSCloudTrail
| where EventName in (&quot;DeleteTrail&quot;, &quot;StopLogging&quot;, &quot;UpdateTrail&quot;)
| where isempty(ErrorCode)  // Only successful calls!
| project TimeGenerated, UserIdentityArn, EventName, SourceIpAddress
// If an attacker deletes logs → blind spot!

3. New IAM policy with admin rights:
AWSCloudTrail
| where EventName in (&quot;CreatePolicy&quot;, &quot;CreatePolicyVersion&quot;, &quot;PutUserPolicy&quot;, &quot;PutRolePolicy&quot;)
| where RequestParameters contains &quot;\&quot;Action\&quot;:\&quot;*\&quot;&quot;
      or RequestParameters contains &quot;\&quot;Resource\&quot;:\&quot;*\&quot;&quot;
| project TimeGenerated, UserIdentityArn, PolicyName=tostring(RequestParameters)
// New admin policy = potential privilege escalation!

4. S3 bucket set to public:
AWSCloudTrail
| where EventName in (&quot;PutBucketAcl&quot;, &quot;PutBucketPolicy&quot;)
| where RequestParameters contains &quot;AllUsers&quot;
       or RequestParameters contains &quot;AuthenticatedUsers&quot;
| project TimeGenerated, BucketName=extract(&#x27;&quot;name&quot;:&quot;([^&quot;]+)&quot;&#x27;, 1, RequestParameters),
          UserIdentityArn, SourceIpAddress
// Immediately: Check bucket policy, revert if necessary!

5. Azure: New owner role assignment:
AzureActivity
| where OperationNameValue == &quot;MICROSOFT.AUTHORIZATION/ROLEASSIGNMENTS/WRITE&quot;
| extend RoleDefinitionId = tostring(parse_json(Properties).roleDefinitionId)
| where RoleDefinitionId endswith &quot;8e3af657-a8ff-443c-a75c-2fe8c4bcb635&quot;  // Owner GUID
| project TimeGenerated, Caller, ResourceGroup, Properties
// Owner role assignment → immediate alert!

6. Azure: Mass Resource Deletion (Ransomware Indicator):
AzureActivity
| where ActivityStatusValue == &quot;Success&quot;
| where OperationNameValue endswith &quot;/delete&quot;
| summarize DeleteCount = count() by bin(TimeGenerated, 5m), Caller, ResourceGroup
| where DeleteCount &gt; 10  // &gt;10 deletions in 5 min = anomaly
| project TimeGenerated, Caller, ResourceGroup, DeleteCount

7. GCP: Service Account Key Export (Credential Theft):
resource.type=&quot;audited_resource&quot;
protoPayload.methodName=&quot;google.iam.admin.v1.CreateServiceAccountKey&quot;
-- New SA key = potential credential exfiltration!
-- SA keys are long-lived → high risk

8. Impossible Travel for Cloud Console:
AWSCloudTrail
| where EventName == &quot;ConsoleLogin&quot;
| where isempty(ErrorCode)
| summarize Locations = make_set(SourceIpAddress) by bin(TimeGenerated, 1h), UserIdentityArn
| where array_length(Locations) &gt; 1
// Manual review: Different countries within 1 hour?

Detection-as-Code and Sigma

Detection-as-Code Approach:

Why Detection-as-Code:
  → Versioning in Git (who changed the rule? why?)
  → Peer review for new rules (four-eyes principle)
  → Automated testing (does the rule work?)
  → Deployment via CI/CD (no manual SIEM configuration)
  → Portability between SIEM products

Sigma - SIEM-agnostic Detection Rules:

Sigma rule example: AWS S3 Public ACL:
  title: AWS S3 Bucket Made Public
  id: a91b3fd8-1234-5678-abcd-ef0123456789
  status: experimental
  description: Detects when an S3 bucket ACL is modified to allow public access
  author: AWARE7 GmbH
  date: 2026-03-04
  logsource:
    product: aws
    service: cloudtrail
  detection:
    selection:
      eventSource: s3.amazonaws.com
      eventName:
        - PutBucketAcl
        - PutBucketPolicy
      requestParameters|contains:
        - AllUsers
        - AuthenticatedUsers
    condition: selection
  falsepositives:
    - Intentional public hosting (static websites)
  level: high
  tags:
    - attack.exfiltration
    - attack.t1530

Sigma Compiler:
  # Convert to Splunk:
  sigma convert -t splunk sigma-aws-s3-public.yml

  # Convert to Microsoft Sentinel (KQL):
  sigma convert -t microsoft365defender sigma-aws-s3-public.yml

  # To Elasticsearch:
  sigma convert -t elasticsearch sigma-aws-s3-public.yml

Terraform for Detection-as-Code (Azure Sentinel):
  resource &quot;azurerm_sentinel_alert_rule_scheduled&quot; &quot;s3_public_acl&quot; {
    name                = &quot;aws-s3-public-acl&quot;
    log_analytics_workspace_id = azurerm_log_analytics_workspace.sentinel.id
    display_name        = &quot;AWS S3 Bucket Made Public&quot;
    severity            = &quot;High&quot;
    query               = file(&quot;kql/aws-s3-public-acl.kql&quot;)
    query_frequency     = &quot;PT5M&quot;
    query_period        = &quot;PT5M&quot;
    trigger_operator    = &quot;GreaterThan&quot;
    trigger_threshold   = 0
    incident_configuration {
      create_incident = true
    }
  }

Automated rule tests:
  # pytest-sigma or sigma-tester:
  def test_s3_public_acl_rule():
    malicious_event = {
      &quot;eventSource&quot;: &quot;s3.amazonaws.com&quot;,
      &quot;eventName&quot;: &quot;PutBucketAcl&quot;,
      &quot;requestParameters&quot;: {&quot;accessControlList&quot;: {&quot;AllUsers&quot;: &quot;READ&quot;}}
    }
    assert rule_matches(malicious_event, sigma_rule) == True

    benign_event = {
      &quot;eventSource&quot;: &quot;s3.amazonaws.com&quot;,
      &quot;eventName&quot;: &quot;PutBucketAcl&quot;,
      &quot;requestParameters&quot;: {&quot;accessControlList&quot;: {&quot;AuthenticatedUsers&quot;: &quot;READ&quot;}}
      # Still problematic, but different rule!
    }

False Positive Management

Optimize FP rate without losing coverage:

Problem: Too many alerts → alert fatigue → real attacks overlooked!

Baselining strategies:

1. Time-based baseline:
   → KQL: what is &quot;normal&quot; during business hours vs. at night/on weekends?
   → Alerts only if outside normal time windows:
   | where hourofday(TimeGenerated) !between (8 .. 18)  // Only outside 8 AM–6 PM
   | where dayofweek(TimeGenerated) !in (1d, 7d)  // No weekends

2. Allow list for known IPs/users:
   let known_admin_ips = dynamic([&quot;10.0.0.1&quot;, &quot;192.168.1.100&quot;]);
   AWSCloudTrail
   | where SourceIpAddress !in (known_admin_ips)
   // Exclude known admin IPs

3. Threshold tuning:
   → Not: every CreateAccessKey → Alert
   → Better: CreateAccessKey for a user who previously had none
   → Or: &gt;3 CreateAccessKey calls in 1 hour (mass creation!)

4. Anomaly-based rules (ML):
   Azure Sentinel Anomaly Rules:
   → Automatic baseline from historical data
   → Alert only for significant deviations (&gt;3σ)
   → No manual threshold maintenance required

FP tracking and improvement:
  → Every alert receives feedback: True Positive / False Positive / Benign TP
  → Calculate monthly FP rate per rule
  → FP rate &gt; 50%: Tune or disable rule
  → FP rate = 0% after 3 months: Refine rule (too broad? Attacker knows how to bypass it?)

Metrics for Detection Engineering:
  MTTD (Mean Time to Detect):    How long until attack is detected?
  FP rate:                       % of alerts that are false positives
  Coverage:                      % of ATT&amp;CK techniques covered by detections
  Alert Volume:                  Alerts/day (SOC capacity!)
  Rule Health:                   % of rules that triggered in the last 30 days

Cloud Detection Engineering Process

Maturity Model (analogous to BSIMM/OWASP SAMM):

Level 1 - Basic:
  □ CloudTrail/Activity Logs enabled and archived (12+ months)
  □ Native cloud alerting (GuardDuty, Defender for Cloud) enabled
  □ Critical alerts (root login, MFA bypass) → SOC ticket
  □ Incident response playbook for cloud incidents available

Level 2 - Advanced:
  □ Centralized SIEM with cloud log integration
  □ Detection rules for top 10 ATT&amp;CK cloud techniques;
  □ Automated response: critical rule → AWS Config remediation / Logic App
  □ Monthly threat hunt: review new attacker TTPs from CISA/CTI
  □ Sigma repository with versioned rules

Level 3 - Optimized:
  □ Detection-as-Code fully implemented (Sigma + Terraform, CI/CD-deployed)
  □ ATT&amp;CK Coverage: &gt;70% of relevant cloud techniques covered
  □ Continuous Adversary Simulation (daily Stratus Red Team)
  → Automated verification that rules are working!
  □ Purple Team Exercises: Red Team using cloud TTPs → Improve detection
  □ Threat Intelligence Feed → Automatic rule updates

Cloud Detection Engineering Team Setup:
  Recommended for organizations with 500+ employees or significant cloud usage:
  → 1 Detection Engineer (rule development, SIEM)
  → 1 Cloud Security Engineer (configuration, architecture)
  → SOC analysts who operate the rules
  Alternatively: Managed Detection &amp; Response (MDR) for the cloud