The Log Sources Your SOC Needs for Detection, Forensics, and Hunting
Log source selection isn't an afterthought — it's architecture. Here's how to prioritize sources for detection, forensics, and threat hunting.
The Log Sources Your SOC Needs for Detection, Forensics, and Hunting
You deployed Defender for Endpoint. You have Sentinel. But your detection rules keep generating noise, your forensic reconstructions have gaps, and your hunters can't find the lateral movement chains they know are there.
The problem is usually the same: the right logs aren't connected, or they're connected but not ingested into the right tier.
Here's how to think about log sources as a deliberate architecture decision, not an afterthought.
Three jobs, three different needs
Before you can prioritize log sources, you need to understand that detection, forensics, and hunting have fundamentally different requirements:
Detection needs real-time signal with high fidelity and low latency. You're looking for alerts that fire within minutes of malicious activity, with enough context to triage quickly and low enough false-positive rates that analysts actually pay attention.
Forensics and correlation need historical depth and relationship-rich data. When you're reconstructing how an attacker moved from initial access to data exfiltration over 30 days, you need logs that show the connections between identity, endpoint, email, and cloud activity, and you need them retained long enough to trace the full chain.
Hunting needs raw breadth and behavioral baselines. Hunters write open-ended queries looking for anomalies, patterns, and weak signals that don't fit predetermined rules. They need access to data that's wide enough to find the unexpected.
Not every log source serves all three equally. Know which job you're optimizing for when you prioritize.
Let's walk through how the core log sources in a Microsoft security stack map to each job.
| Log Source | Detection Use | Forensic/Correlation Use | Hunting Use |
|---|---|---|---|
| Entra ID Sign-in Logs | Password spray, AiTM session theft, risky sign-ins | Credential theft timeline, access patterns | Baseline user behavior, anomaly detection |
| Entra ID Audit Logs | Privilege escalation, app consent grants | Administrative action reconstruction | Permission change patterns |
| MDE Device Telemetry | Malware execution, lateral movement, persistence | Endpoint activity timeline | Process trees, command-line analysis |
| Defender for Office 365 | Phishing delivery, malicious attachments | Email thread reconstruction, BEC timeline | Attachment and URL patterns |
| Defender for Identity | DCSync, Kerberoasting, pass-the-hash | AD attack chain reconstruction | Authentication patterns, service account abuse |
| Defender for Cloud Apps | Cloud app access anomalies, OAuth abuse | Data exfiltration timeline | Shadow IT discovery, file access patterns |
| Azure Activity Logs | Resource modification, role assignment | Cloud pivot reconstruction | Infrastructure change patterns |
| Azure Firewall / NSG Flow Logs | Lateral movement, C2 communication | Network path reconstruction | Traffic baseline analysis |
| Endpoint Process Events | Suspicious execution, LOLBins | Process genealogy reconstruction | Command-line hunting, parent-child analysis |
| Third-party Connectors | Perimeter alerts, network detection | Cross-platform correlation | Extended visibility hunting |
Let's break down how each contributes to your operational capability.
Identity logs: where most attacks start
With 22% of breaches involving stolen credentials and 88% of basic web application breaches tied to credential abuse (Verizon 2025 DBIR), identity logs are non-negotiable.
Entra ID sign-in logs
These logs capture every authentication attempt against your tenant — successful, failed, conditional access blocked, and everything in between.
For detection: You're watching for password spray patterns (multiple failed attempts across many accounts from similar infrastructure), impossible travel, sign-ins from anonymizing networks, and the token theft signatures that indicate AiTM attacks.
For forensics: When you're reconstructing a credential theft chain, sign-in logs show you when the attacker first used compromised credentials, what resources they accessed, and how their session evolved over time.
For hunting: Build behavioral baselines — which users authenticate from which locations, at what times, using which applications. Deviations from those baselines are your hunting leads.
Sentinel connector: Microsoft Entra ID native connector. Sign-in and audit logs are available at no additional ingestion cost when using the free tier.
Entra ID audit logs
Audit logs capture administrative actions: user and group modifications, application registrations, conditional access policy changes, and privilege assignments.
For detection: Alert on role assignments to sensitive roles, new application registrations with dangerous permissions, and conditional access policy modifications that weaken security posture.
For forensics: Trace how an attacker escalated privileges, what administrative actions they took, and how they established persistence through service principals or managed identities.
For hunting: Look for patterns of administrative activity that don't match known change windows or authorized administrators.
Sentinel connector: Same Microsoft Entra ID connector — audit logs come with sign-in logs.
Endpoint telemetry: where attacks execute
MDE device telemetry (Advanced Hunting)
Microsoft Defender for Endpoint generates rich telemetry on process execution, file activity, network connections, and registry modifications across all onboarded devices.
For detection: This is your primary source for malware execution, fileless attacks, lateral movement via remote services, and persistence mechanism creation. MDE's built-in detections fire from this telemetry.
For forensics: When you need to understand exactly what happened on a compromised endpoint, Advanced Hunting tables like DeviceProcessEvents, DeviceFileEvents, and DeviceNetworkEvents give you the granular timeline.
For hunting: This is the hunting workhorse. Build queries that look for unusual parent-child process relationships, suspicious command-line arguments, living-off-the-land binary abuse, and network connections to unusual destinations.
Sentinel connector: Microsoft Defender XDR native connector streams alerts and, optionally, raw Advanced Hunting data into Sentinel.
Here's a basic hunt for suspicious PowerShell execution:
DeviceProcessEvents
| where Timestamp > ago(7d)
| where FileName =~ "powershell.exe" or FileName =~ "pwsh.exe"
| where ProcessCommandLine has_any ("-enc", "-nop", "-w hidden", "downloadstring", "iex")
| project Timestamp, DeviceName, AccountName, ProcessCommandLine, InitiatingProcessFileName
| order by Timestamp desc
Endpoint process and command-line events
For environments that need deeper process telemetry or coverage on systems not running MDE, Sysmon or Windows Security Event ID 4688 with command-line auditing provides essential visibility.
For detection: Process creation events with command-line arguments catch the execution phase of attacks that might not trigger higher-level EDR alerts.
For forensics: Sysmon's event correlation (process GUID, parent process GUID) lets you build complete process trees even across system reboots.
For hunting: Command-line hunting is one of the highest-value hunting techniques — attackers frequently use legitimate tools with suspicious arguments.
Sentinel connector: Windows Security Events via AMA with custom XPath queries:
<QueryList>
<Query Id="0">
<Select Path="Microsoft-Windows-Sysmon/Operational">
*[System[Provider[@Name='Microsoft-Windows-Sysmon'] and (EventID=1 or EventID=3 or EventID=7 or EventID=11)]]
</Select>
</Query>
</QueryList>
This captures process creation (1), network connections (3), image loads (7), and file creation (11).
Email logs: where social engineering lands
With business email compromise generating over $6.3 billion in FBI IC3-reported losses (Verizon 2025 DBIR) and phishing remaining a primary initial access vector, email visibility is essential.
Defender for Office 365
MDO logs capture email delivery decisions, URL clicks, attachment detonations, and post-delivery actions like ZAP remediations.
For detection: Real-time alerting on phishing delivery, malicious attachments, and credential harvesting attempts. MDO's detection capabilities are strong, but you need the logs in Sentinel for correlation.
For forensics: Reconstruct the phishing campaign — what emails were delivered, who clicked, what payloads were accessed, and which users' credentials were potentially compromised.
For hunting: Analyze attachment patterns, URL domains, and sender reputation trends to find campaigns that evaded initial detection.
Sentinel connector: Microsoft Defender XDR connector includes email events from Defender for Office 365.
Active Directory and identity attack detection
Defender for Identity
MDI monitors on-premises Active Directory for attack patterns, feeding from domain controller security events including 4624 (logon), 4625 (failed logon), 4728 (group membership), and 4769 (Kerberos service ticket).
For detection: This is your primary source for detecting DCSync, Kerberoasting, pass-the-hash, and other AD-specific attack techniques that precede ransomware encryption. With 79% of ransomware cases involving remote monitoring tools and 44% of all breaches involving ransomware (Verizon 2025 DBIR), catching the pre-encryption chain matters.
For forensics: Trace how an attacker moved from initial compromise to domain dominance — which accounts were compromised, which groups were accessed, which service tickets were requested.
For hunting: Look for unusual Kerberos traffic patterns, service account authentication anomalies, and LDAP query volumes that indicate reconnaissance.
Sentinel connector: Microsoft Defender XDR connector includes MDI alerts. For raw AD events, configure Windows Security Events collection from domain controllers via Azure Monitor Agent.
Cloud activity logs
Defender for Cloud Apps
MCAS provides visibility into cloud application usage, OAuth permissions, and data movement across sanctioned and unsanctioned apps.
For detection: Anomalous access patterns to cloud apps, suspicious OAuth consent grants, and mass file downloads that indicate exfiltration.
For forensics: When data has been exfiltrated to external cloud storage, MCAS logs show you what was accessed, when, and where it went.
For hunting: Shadow IT discovery, unusual file sharing patterns, and application usage trends that deviate from baselines.
Sentinel connector: Microsoft Defender for Cloud Apps native connector.
Azure activity logs
Azure Activity logs capture all control-plane operations in your Azure subscriptions — resource creation, modification, deletion, and role assignments.
For detection: Unauthorized resource deployment, role elevation, and security configuration changes.
For forensics: When an attacker pivots into cloud infrastructure, Activity logs show what they touched and what they changed.
For hunting: Infrastructure change patterns, unusual deployment activity, and resource access from unexpected principals.
Sentinel connector: Azure Activity built-in connector. Billable on ingestion.
Azure Firewall and NSG flow logs
Network flow data captures traffic patterns at the perimeter and between network segments.
For detection: Lateral movement patterns, C2 communication to known-bad infrastructure, and unusual outbound data volumes.
For forensics: Network path reconstruction — how traffic flowed during an incident.
For hunting: Traffic baseline deviation, unusual port usage, and connection patterns to external destinations.
Sentinel connector: Azure Firewall connector for firewall logs; NSG flow logs via diagnostic settings to Log Analytics.
Third-party connectors
For network appliances, non-Microsoft EDR, and other security tools, CEF and Syslog connectors bring external telemetry into Sentinel.
For detection: Perimeter alerts, firewall blocks, and network-based detections from specialized appliances.
For forensics: Cross-platform correlation when attacks span multiple security tool coverage areas.
For hunting: Extended visibility into areas where Microsoft tools don't have native coverage.
Sentinel connector: Common Event Format (CEF) via AMA or Syslog via AMA depending on the source format.
Maturity tiers: where to start
Must-have (start here)
- Entra ID Sign-in and Audit Logs — covers password spray, AiTM, credential theft chains
- MDE Device Telemetry — covers endpoint execution, lateral movement, ransomware
- Defender for Office 365 — covers phishing delivery, BEC, initial access
- Defender for Identity — covers AD attacks, Kerberoasting, DCSync
These four sources cover the attack chains that matter most: infostealer infections leading to credential theft, BEC targeting financial processes, and ransomware lateral movement. With 51% of initial access involving infostealer malware and 47% involving ClickFix social engineering (Microsoft Digital Defense Report 2025), these are your foundation.
Should-have (expand here)
- Defender for Cloud Apps — critical for cloud account takeover and exfiltration detection
- Azure Activity Logs — essential for cloud pivot visibility
- NSG Flow Logs — necessary for network-based lateral movement detection
With 30% of breaches now involving third-party exposure (Verizon 2025 DBIR), cloud visibility has moved from nice-to-have to necessary.
Advanced (optimize here)
- Raw endpoint command-line / Sysmon — necessary for deep hunting but expensive at scale
- Third-party network appliances — valuable for extended visibility
- DNS query logs — powerful for C2 detection and hunting
Consider Sentinel Auxiliary Logs tier for high-volume sources in this category to manage cost.
Retention and cost architecture
Different jobs need different retention strategies:
Detection (90-day hot tier): Real-time and near-real-time query access for active alerting and triage. This is standard Analytics Logs pricing.
Forensics (1-year interactive tier): Historical access for incident reconstruction. Use Sentinel Auxiliary Logs for high-volume sources to reduce cost while maintaining query capability.
Hunting (long-term archive): Raw endpoint data and high-volume sources that you don't need daily but want available for deep-dive investigations. Use Search Jobs in Sentinel for historical hunting without full retention costs.
Worth noting: you're paying for query performance, not just storage. Match the tier to how you'll actually use the data.
Honest limitation: Log retention costs vary significantly based on ingestion volume. The must-have tier can be expensive for high-volume environments. Validate Sentinel Auxiliary Logs pricing against your actual data volumes before committing. Pricing as of early 2025 — check the Microsoft Azure pricing page for current rates.
What you're blind to without these
Let's be direct about the gaps:
Without Entra sign-in logs: You can't detect password spray or AiTM session theft. With 80% of password spray activity concentrated in just 20 ASNs (Microsoft Digital Defense Report 2025), the patterns are there — but only if you're collecting the data.
Without Defender for Identity: You can't see DCSync or Kerberoasting — the ransomware pre-encryption chain that gives you your last chance to stop the attack before encryption begins.
Without MCAS: Exfiltration to external cloud storage is invisible. Data collection occurred in 80% of incident response engagements, with 51% confirmed exfiltration (Verizon 2025 DBIR). You need to see where it's going.
Without endpoint command-line data: You're relying entirely on signature-based detection. Sophisticated attackers using living-off-the-land techniques will move undetected.
Dr. Marcus Shields executive summary
For the security leader who needs the strategic view:
Three questions for the CISO:
- Can we trace a credential theft chain end-to-end through our logs today?
- Do we have 12 months of identity and AD event logs for forensic reconstruction?
- Are we paying ingestion costs for logs we have no detection coverage for?
If the answer to any of those is "no" or "unknown," log architecture is the first investment to make. Every detection capability, forensic investigation, and hunting program depends on having the right data connected to the right tier at the right retention. This isn't a technical implementation detail — it's the foundation of your security operations capability.
What comes next
What you forward to Sentinel defines what you can see. Getting this right before the breach is easier than reconstructing it after.
Start with the must-have tier. Validate you have the Sentinel connectors configured:
- Microsoft Entra ID (free tier for sign-in and audit logs)
- Microsoft Defender XDR (MDE, MDO, MDI alerts and telemetry)
- Azure Activity (for cloud operations)
Then expand based on your environment and threat model. You can explore cost and tier options in the Microsoft Sentinel pricing documentation.
The data lake isn't cold storage — it's your agent's hunting ground
There's a common misconception about the Sentinel data lake: that it's a cost-reduction tier for logs you keep because compliance requires it but never actually query. That model made sense when log query meant paying analytics-tier ingest costs. It doesn't hold up anymore.
Microsoft's unified SecOps platform supports direct KQL queries against data lake storage. Full 12-year retention, same query language, queryable by analysts, automation workflows, and AI agents through the Sentinel MCP server. This is generally available.
What this changes for your log architecture:
Behavioral baseline data must reach the lake. Detection rules focused on static thresholds can run on analytics-tier logs. AI agent use cases that require baseline comparison — "is this user's authentication pattern unusual relative to 90 days of history?" — require the historical depth that only the data lake provides. If you're routing behavioral telemetry to analytics-only and pruning at 30 days, your agents have nothing meaningful to compare against.
Jupyter notebooks and the lake are a pair. Python-based threat hunting against the lake is generally available. The use cases that benefit most are behavioral — machine learning over access patterns, statistical anomaly detection over authentication sequences, clustering over lateral movement chains — all of which require 6-12 months of history to produce meaningful baselines. You can build and run these notebooks directly in the Microsoft Sentinel workspace.
Data promotion jobs let you escalate retroactively. When an IOC lands — a new threat actor hash, a CISA advisory, a new cross-prompt injection pattern — you can run a promotion job that pulls matching historical events from the data lake into the analytics tier for full incident investigation. This is generally available. It means you don't have to make log routing decisions as if they're permanent.
Agent query access via MCP. If you're running an AI triage agent connected to Sentinel through the MCP server, the query_lake tool gives the agent access to long-retention data without requiring a human to manually pull historical context. The practical impact: an agent investigating a credential abuse pattern can look back 12 months without an analyst doing manual KQL pivots.
The routing implication is concrete: behavioral baseline sources — Entra sign-in logs, Entra audit logs, MDE process execution, and Defender for Identity lateral movement signals — should be configured for dual-ingest or lake-priority routing, not analytics-only.
Third-party and infrastructure sources
Microsoft-native log sources cover identity, endpoint, cloud, and email. They don't cover your network perimeter, your privileged access management system, your VPN gateway, or your DNS resolver. For most environments, those gaps represent meaningful blind spots in detection coverage.
The sources below draw from the extended log catalog in the agentic SOC blueprint. Each includes the recommended ingest pattern and the threat categories it addresses.
Must-have tier
| Source | Ingest Pattern | Threats Covered |
|---|---|---|
| DNS (corporate resolver) | Dual-ingest: high-signal alerts to analytics, raw query logs to lake | T2 (malware C2 domains), T8 (ClickFix lure domains), T9 (supply chain beacon domains) |
| VPN gateway logs | Analytics tier | T1 (credential-based initial access), T5 (edge device exploitation), T7 (access broker persistence) |
| PAM / CyberArk vault events | Analytics tier | T7 (access broker pivot to privileged accounts), T10 (insider privilege abuse) |
| Firewall threat / IPS alert logs | Analytics tier | T2 (malware lateral movement), T3 (ransomware network staging), host-based firewall for containment signal validation |
| Third-party EDR alerts (non-MDE endpoints) | Analytics tier | T1–T3 for mixed-fleet environments; bridges non-MDE endpoints into unified incident investigation |
| Proxy / URL filtering logs | Dual-ingest: policy violations to analytics, full request logs to lake | T4 (BEC phishing redirect traffic), T6 (data exfiltration over web), T8 (ClickFix lure page requests) |
Should-have tier
| Source | Ingest Pattern | Threats Covered |
|---|---|---|
| Linux auditd | Dual-ingest | T2 (credential access on Linux systems), T3 (ransomware staging on Linux infrastructure), T9 (compromised build server activity) |
| NDR / Corelight / Vectra / ExtraHop | Dual-ingest: high-confidence detections to analytics, raw flows to lake | T2 (internal reconnaissance), T5 (exploitation traffic patterns), T6 (encrypted exfiltration detection) |
| Cloud storage access logs (Azure Blob, S3) | Dual-ingest | T6 (cloud data exfiltration), T7 (access broker data staging before handoff) |
| CI/CD pipeline logs (GitHub Actions, GitLab CI) | Dual-ingest | T9 (supply chain attack tracing), T10 (insider repository abuse and unauthorized package publishing) |
The dual-ingest pattern
Several sources above appear in both tiers for a reason. High-volume log sources — DNS, proxy, NDR, auditd — generate raw data at a rate that makes full analytics-tier ingest expensive and alert noise unmanageable. The dual-ingest pattern routes filtered, high-signal events — threat alerts, policy violations, anomalous connection patterns — to the analytics tier for immediate detection, while the complete event stream goes to the data lake for forensics and behavioral analysis.
Sentinel Data Collection Rules support this split at ingestion time. Configure the DCR transformation to filter and route in a single pass, not collect twice. This keeps analytics-tier costs controlled while maintaining full forensic fidelity for investigations that need it.
Next up: Article 3 covers how AI agents are changing the way SOCs hunt, report, and stay current, because having the logs is only half the job.
This article is part of the Threat-Informed Defense Series: The Agentic SOC. See the pillar article for the complete framework.