The Design Decisions That Will Define Your Detection Capability
Log routing, retention tiers, dual-ingest patterns, agent integration points — here are the specific decisions that determine what your SOC can see and investigate.
The Design Decisions That Will Define Your Detection Capability
Article 4a established the case: the default analytics-first SIEM architecture doesn't hold up against modern attacks, and the cost is paid in forensic gaps, absent behavioral baselines, and AI agents working with incomplete context.
This article is the implementation half. Specific routing decisions, table by table. Retention targets, compliance driver by driver. The dual-ingest pattern for high-volume sources. And how to make the data lake queryable for both analysts and agents.
These aren't IT configuration decisions. They're security strategy decisions with a shelf life measured in years.
The routing decision framework
Before getting into specific tables, the decision framework is three questions. If all three answers are "no," the data lake is the right tier:
- Does this table drive a scheduled analytics rule or near-real-time rule?
- Does it appear in entity enrichment lookups — user, device, or IP reputation scoring?
- Does it require sub-five-minute latency from event to analyst alert?
The majority of log sources fail all three tests. That doesn't make them unimportant. It means their value is in retroactive investigation and behavioral analysis, not in real-time alert generation.
Analytics tier: what belongs there
These tables earn full analytics-tier placement because they directly drive detection rules, feed entity enrichment, or require real-time latency.
Identity and authentication
SigninLogs — the primary detection surface for T1 (infostealer credential replay), T4 (BEC via AiTM session theft), and T5 (password spray). Conditional access policy enforcement fires against sign-in risk scores that come from here. This stays in analytics.
AuditLogs — role assignment changes, conditional access policy modifications, new app registrations. These are the telltale signs of T2 (lateral movement privilege escalation) and T7 (third-party access abuse). Analytics tier with at least 6 months of retention beyond the 90-day default.
IdentityDirectoryEvents — MDI Active Directory telemetry. DCSync detection, Kerberoasting, LDAP reconnaissance. These are the T2 patterns for human-operated ransomware staging before encryption. Not optional for environments with on-premises AD.
Note on AADNonInteractiveUserSignInLogs: This table is a split-tier candidate. Service principal sign-ins tied to specific AppIds that drive analytics rules (OAuth consent abuse, M365 application anomalies) should go to analytics with tight rule scoping. Full-volume goes to data lake. High volume against broad analytics rules generates noise-to-signal ratios that make the tables effectively useless.
Endpoint
SecurityAlert (MDE-sourced) — MDE-generated alerts feed directly into SecurityIncident correlation. Must stay in analytics tier.
DeviceLogonEvents (scoped to domain controllers and high-value servers only) — lateral movement detection. Pass the Hash, unusual service account logons (T2). Full volume goes to data lake. The scoped subset — DCs and named high-value servers — stays in analytics.
DeviceProcessEvents (scoped to high-confidence detections only) — scope to known-bad parent/child process patterns and LOLBin signatures. Raw full volume goes to data lake. If you put full DeviceProcessEvents in analytics at enterprise scale, you'll spend more on that one table than your entire security budget allows.
EmailEvents, EmailUrlInfo, EmailAttachmentInfo — phishing delivery, URL detonation, attachment hash correlation. All three are required for T4 BEC and AiTM campaign correlation. Analytics tier.
Cloud and infrastructure
AzureActivity — subscription-level resource changes. T3 (exploitation), T6 (exfiltration routing changes), T7 (third-party pivot through infrastructure modifications). Analytics tier.
CloudAppEvents (alert-level only) — MCAS anomaly alerts: impossible travel, mass download, OAuth consent abuse. Alert-level events go to analytics. Full-volume cloud app activity goes to data lake.
MicrosoftPurviewInformationProtection — XPIADetected and JailbreakDetected event types. T11 AI agent attack detection. Production audit schema, generally available, no preview caveat needed. These must be in analytics to drive alerts.
SecurityIncident, ThreatIntelligenceIndicator — the operational nerve center of your Sentinel deployment. Both stay in analytics.
Data lake tier: what belongs there
These tables have forensic and hunting value that justifies every byte of retention — just not at analytics-tier cost or latency requirements.
Endpoint — full volume raw telemetry
DeviceProcessEvents (full volume), DeviceFileEvents, DeviceNetworkEvents, DeviceRegistryEvents — the complete forensic record of endpoint activity. At enterprise scale, full-volume endpoint telemetry in the analytics tier is cost-prohibitive. In the data lake, it's queryable when you need to reconstruct a full attack chain.
DeviceLogonEvents (non-DC full volume), DeviceImageLoadEvents, DeviceFileCertificateInfo — workstation-to-workstation logon chains, DLL hijacking forensics, code signing verification for T9 supply chain investigation. All hunting and forensic workloads. Data lake.
Network
AzureNetworkAnalytics_CL (NSG Flow Logs), AzureFirewallApplicationRule/AzureFirewallNetworkRule (full traffic), DnsEvents — C2 domain hunting, DGA detection, lateral movement path reconstruction, exfiltration path analysis. DNS query logs at full enterprise volume will exhaust analytics-tier budget quickly. Data lake at full retention gives you a searchable record for every hunting hypothesis.
Identity — high-volume full feeds
AADNonInteractiveUserSignInLogs (full volume beyond scoped rules), AADUserRiskEvents, AADRiskyUsers — service principal sign-in history for T7 lookback, historical risk event enrichment, risky user state history for forensic timelines.
Sysmon
Sysmon Event IDs 1, 3, 7, 11 — raw Sysmon telemetry complements MDE at the hunting layer. Analytics rules should use MDE tables with their enriched alert structure. Sysmon is the forensics and depth hunting asset.
AI agent activity
CloudAppEvents (Frontier preview action types: InvokeAgent, InferenceCall, ExecuteToolBySDK) — T11 agent activity hunting. Too new and high-volume for production analytics rules. Data lake route with hunting-first access pattern.
The dual-ingest pattern
Some sources generate both high-volume raw telemetry and high-signal alerts from the same activity stream. Routing all of it to analytics is expensive. Routing all of it to the data lake sacrifices real-time detection. The dual-ingest pattern solves this.
The approach: a single collection point filters and routes in one pass using Azure Monitor Data Collection Rules (DCRs). High-signal events — threat alerts, policy violations, anomalous classifications — route to the analytics tier. The complete raw event stream routes to the data lake.
Sources where this pattern applies:
| Source | Analytics (alert-level) | Data Lake (full volume) |
|---|---|---|
| DNS (corporate resolver) | NXDOMAIN spikes, threat-classified queries | Full query log |
| Proxy / URL filtering | Policy violations, threat category matches | Full request log |
| NDR (Corelight, Vectra, ExtraHop) | High-confidence detections | Raw flow records |
| Linux auditd | Privilege escalation alerts | Full syscall log |
| Cloud storage (Azure Blob, S3) | Anomalous access alerts | Full access log |
| CI/CD pipeline logs | Unauthorized change events | Full build and deploy log |
DeviceLogonEvents | DC and high-value server subset | Full workstation volume |
CloudAppEvents | MCAS anomaly alerts | Full app activity |
Configure the DCR transformation to split the stream at ingestion. Don't collect twice, pay for both, and manage two separate connectors. One collection point, one transformation, two routing destinations.
Retention targets by compliance driver
Default 90-day analytics retention is a starting point, not a program. These are the targets that fit the threat model and common compliance frameworks.
90 days (analytics tier default) — SecurityAlert, SecurityIncident, ThreatIntelligenceIndicator. Active incident management. IOC feeds refresh frequently; older indicators have declining detection value.
6 months (analytics tier extended) — SigninLogs, AuditLogs. Extend beyond the default for the identity tables that anchor most hunting hypotheses. Sign-in and audit log lookback is the most common forensic requirement in credential-based incident investigations.
1 year (data lake) — SigninLogs (data lake copy), AzureActivity, DeviceLogonEvents (full volume), DeviceProcessEvents, SecurityEvent (scoped DC events), MicrosoftPurviewInformationProtection. Covers PCI-DSS v4.0 Requirement 10.5.1 (1-year audit trail) and most forensic windows for breach reconstruction. The MDDR 2025 58-day attack length with investigation starting at encryption requires a minimum 90-day lookback; 1-year provides substantial margin.
2 years (data lake) — OfficeActivity, AADNonInteractiveUserSignInLogs, CloudAppEvents, DeviceFileEvents, DeviceNetworkEvents, DnsEvents. GDPR Article 5 investigation carve-out, ISO 27001, Cyber Essentials Plus. Covers behavioral baseline building requirements for high-confidence anomaly detection.
7+ years (data lake + immutable archive) — AuditLogs (full), SecurityEvent (full), AzureActivity (full), OfficeActivity (full). SOX Section 802 (7 years), HIPAA 45 CFR 164.530 (6 years), FedRAMP High (12 months online + 3 years archived minimum). Implement via Log Analytics workspace data export to Azure Blob Storage with immutable (WORM) policies. This satisfies audit chain-of-custody requirements without requiring active Sentinel workspace retention at full analytics cost.
Making the data lake queryable for agents
Two steps. Both required.
Step 1: Enable the Sentinel data lake on your workspace. In the Sentinel settings, enable the data lake (Basic/Auxiliary tier) on your workspace. Without this, data lake routing isn't available as a DCR destination.
Step 2: Verify KQL query access across tiers. Run a test query that explicitly targets a data lake table to confirm cross-tier query works. The syntax is the same KQL — the Sentinel hunting and query interfaces handle tier routing transparently.
Once those two steps are done, the query_lake tool in the Sentinel MCP server has access to the full retention horizon. An agent running entity triage against SigninLogs with 12-month retention gets 12 months of behavioral context. An agent running a hunting hypothesis against DnsEvents gets the full DNS query history.
The graph MCP tools (analyze_user_entity, analyze_url_entity) in preview operate against structured entity data rather than raw log tables — the data lake context makes entity enrichment richer because the entity analytics engine is drawing on longer behavioral history to compute risk scores.
The implementation sequence
Based on the blueprint's phased implementation structure:
Phase 1 (available today): Enable the Sentinel data lake. Update DCR configurations for your highest-volume sources — start with DNS and DeviceProcessEvents full-volume routing. Extend SigninLogs and AuditLogs analytics retention to 6 months.
Phase 2 (available today): Add NDR, CI/CD pipeline, and cloud storage dual-ingest patterns. Set data lake retention targets by compliance driver. Validate cross-tier KQL queries work from the hunting interface.
Phase 3 (preview enrollment): Enroll in graph MCP tools for agent-accessible blast radius and exposure perimeter queries. Integrate analyze_user_entity with the UC5 entity triage workflow from Article 3. Build materialized identity graph for continuous privilege topology monitoring.
Executive Summary for Security Leadership
Log routing and retention decisions made in the next 90 days will shape what your SOC can see, investigate, and automate for years. These aren't IT configuration questions — they're security strategy questions with direct impact on breach reconstruction capability and AI workflow performance.
The 90-day analytics retention default fails two of three common forensic requirements: late-discovered breach reconstruction (requiring 90+ day lookback) and behavioral baseline building for low-and-slow attack detection (requiring 90-180 days minimum).
The dual-ingest pattern makes it possible to maintain real-time detection coverage while adding full forensic depth for high-volume sources — without doubling ingestion costs. DNS, proxy, NDR, and endpoint telemetry are all cost-controlled by routing alert-level events to analytics and full volume to the data lake.
AI agent workflows are data-constrained before they're technology-constrained. Entity triage agents without 30-day behavioral history, hunting agents without 6-12 months of telemetry, and posture agents without trend data all produce lower-quality output than manual analyst review. The investment in data architecture is what unlocks agent value.
This quarter: identify your top five highest-volume log sources. Apply the three-question routing framework to each. Move those that fail all three questions to data lake tier. Extend analytics retention on SigninLogs and AuditLogs to 6 months.
What's next
You now have the architecture foundation and the specific routing decisions. The next question is how to migrate an existing Sentinel deployment — or a legacy SIEM — into this architecture without disrupting live detection coverage.
Article 5 is the migration playbook: workspace restructuring, data connector migration, log routing rule updates, the unified SecOps portal switchover, and validation steps before you cut over.
Article 5: Migrating to Unified SecOps and the Sentinel Data Lake
This article is part of the Threat-Informed Defense Series: The Agentic SOC. See the pillar article for the complete framework.