Threat Hunting Without a Hypothesis Is Just Browsing Logs With Extra Steps

Let's Be Honest About What You're Actually Doing

I've sat in enough threat hunting retrospectives to recognize the pattern. Someone opens Splunk, types a query for PowerShell executions with base64 encoding, scrolls through 400 results, closes the tab, and writes "conducted threat hunting activities" in the weekly report. That's not threat hunting. That's log tourism. You're a visitor. You showed up, looked around, didn't know what you were looking for, and left.

The word "hunting" implies intent. A hunter doesn't wander into the woods and hope an animal runs into their face. They know the terrain, they know the prey's behavior, they've chosen a specific target, and they have a theory about where that target is going to be. The hypothesis is the whole point. Without it, you're just making noise that sounds like due diligence.

This frustrates me because threat hunting is genuinely one of the most valuable capabilities a security team can develop — and the cargo-cult version of it actively undermines that value. Teams burn analyst hours, produce nothing actionable, and then wonder why leadership doesn't fund the function properly. The dysfunction is self-inflicted.

What a Hypothesis Actually Looks Like (And Why Sqrrl Got It Right)

In 2015, Sqrrl published a threat hunting reference model that, despite the company being acquired by Amazon years ago, remains the clearest framework for this conversation. Their core argument was simple: hunting starts with a hypothesis derived from threat intelligence or situational awareness, not from "let's see what's weird." David Bianco's Pyramid of Pain fits perfectly here — if you're hunting on file hashes, you're playing defense at the lowest possible tier. The adversary rotates infrastructure in hours. A hypothesis grounded in TTPs, sitting at the top of that pyramid, forces you to think about behavior, not artifacts.

A real hypothesis looks like: "Based on recent threat intel around APT29 activity targeting government contractors, we believe a threat actor may have established persistence via scheduled tasks that masquerade as legitimate Windows maintenance jobs, specifically targeting the SYSTEM account with tasks registered in \Microsoft\Windows\ subtrees." That's a hypothesis. It names an adversary category, identifies a specific technique (T1053.005 — Scheduled Task/Job: Scheduled Task), scopes the environment, and gives you a falsifiable claim you can actually test.

Compare that to "let's look for suspicious scheduled tasks." You'll find 10,000 of them. Congratulations. Now what? The hypothesis doesn't just guide what you query — it tells you when to stop, what constitutes a finding, and what a negative result actually means for your environment's risk posture.

The MITRE ATT&CK framework operationalized this thinking at scale. Hunt playbooks built around specific technique IDs give your team a repeatable starting point. T1003 (OS Credential Dumping) has well-documented sub-techniques — LSASS memory, SAM database, NTDS — each of which has distinct behavioral signatures you can hunt on. A playbook for T1003.001 should specify: what telemetry sources you need (Sysmon Event ID 10 for LSASS access, Windows Security Event 4656/4663), what process relationships are anomalous (rundll32.exe or comsvcs.dll calling MiniDump), and what constitutes a true positive versus a legitimate AV or EDR process touching LSASS. That's a hunt. Not "query for LSASS access and see what's there."

The Tooling Conversation Nobody Finishes

There's a persistent mythology that threat hunting is about having the right SIEM queries. It's not. The query is the last ten percent. The first ninety is understanding what you're looking for and why. That said, tooling does matter for execution, and there are significant gaps in how most teams approach it.

Sigma rules deserve more credit than they get as hunt starting points. The SigmaHQ repository has thousands of community-maintained detection rules that double as hunt hypotheses. A Sigma rule for scheduled task creation via schtasks.exe with network connections is already encoding someone's analytical judgment about what's suspicious — you don't have to derive it from scratch. Convert it to KQL for Sentinel or SPL for Splunk, run it against 90 days of historical data instead of just real-time alerts, and you've got a structured hunt that produces comparable results across environments.

For KQL specifically, hunting T1053 in Defender/Sentinel looks something like querying DeviceProcessEvents for schtasks.exe with ProcessCommandLine containing /create alongside network events within a tight time window from the same device. Stack count the parent processes. If cmd.exe spawned by mshta.exe is creating scheduled tasks, that's worth your time. If it's System running a well-known maintenance binary, you've confirmed normal baseline behavior — which is also a useful hunt output, even if it's boring.

Velociraptor changed endpoint hunting for teams that run it. VQL (Velociraptor Query Language) lets you query across your entire fleet in near-real-time without waiting for EDR telemetry pipelines to catch up. A hunt for scheduled tasks with encoded payloads can run across 5,000 endpoints simultaneously. The artifact Windows.System.ScheduledTasks pulls the full task XML from every host, and you can filter on-the-fly for tasks with base64 in the action argument, tasks registered within the last 30 days, or tasks with nonstandard author fields. That's not something you can easily approximate in a SIEM that's only ingesting process creation events.

And then there's documentation. This is where most hunt programs quietly die. The hunt happens, the analyst finds something (or doesn't), closes the notebook, and the institutional knowledge evaporates. Jupyter notebooks as hunt documentation artifacts — with the hypothesis, the queries, the intermediate findings, and the final disposition all in one reproducible document — solve this problem. A well-structured hunt notebook should be executable by a junior analyst six months from now. If it isn't, you haven't documented a hunt, you've documented a vibe.

Network Hunting Is Where People Get Humble Fast

I watched a senior analyst spend three days "hunting for C2 traffic" by looking at firewall denies and threat intel IP blocklist hits. That's not C2 hunting. Those are indicators your security controls already know about. The whole point of mature adversary C2 infrastructure is that it doesn't hit your blocklists — it uses legitimate services, bulletproof hosting with clean reputation, or domain generation algorithms with freshly registered domains that predate your intel feeds.

Actual C2 beaconing analysis requires behavioral math. Beacon jitter is the giveaway — most C2 frameworks introduce randomness to avoid the perfectly-periodic connection pattern that IDS/IPS signatures catch trivially. A Cobalt Strike beacon set to 60 seconds with 20% jitter will check in somewhere between 48 and 72 seconds. Over 200 connections, that distribution has a shape. Tools like RITA (Real Intelligence Threat Analytics) do this automatically against Zeek/Bro logs, calculating connection intervals and flagging hosts with suspiciously consistent beacon periods even with jitter applied.

Zeek logs are the underrated workhorse here. The conn.log has duration, bytes transferred, and connection state. Long-tail analysis on connection duration to rare external destinations finds the slow exfil that never trips a volume threshold. Stack counting on dns.log query lengths can surface DGA activity — legitimate domains have a characteristic length distribution that DGA domains distort. If you're not doing frequency analysis on your DNS query entropy, you're flying blind on a significant chunk of the C2 landscape.

The honest answer is that network hunting at this level requires someone who's comfortable doing statistical analysis, not just writing SPL queries. That's a different skillset than most SOC tier-2 analysts have, and it's worth being clear-eyed about it rather than pretending the SIEM will surface these patterns automatically.

Detection Engineering Is Not Threat Hunting (Stop Conflating Them)

This is the conversation that breaks down in almost every org I've consulted with. A threat hunt is a time-boxed, hypothesis-driven investigation into whether a specific threat is present in your environment. Detection engineering is the ongoing process of building, tuning, and maintaining systematic detection logic that runs continuously. They feed each other, but they are not the same job.

The confusion matters because it creates organizational dysfunction. When a hunt produces a finding, the output should be a detection rule or an analytic improvement — that's how hunting funds detection. But if your threat hunters are spending their time maintaining SIEM content, they're not hunting. You've just hired expensive detection engineers and called them hunters to make the budget line look sexier.

The Hunt Maturity Model (HMM), originally outlined by Sqrrl and later elaborated by practitioners like Katie Nickels and others in the detection engineering community, describes a progression from HMM0 (no hunting capability, purely reactive) to HMM4 (fully automated analytics with machine learning integration). Most enterprise security teams are operating at HMM1 — they have some threat intel integration and they're doing ad-hoc queries when something looks bad. That's fine as a starting point. The mistake is thinking you've arrived somewhere when you're still at the trailhead.

Moving from HMM1 to HMM2 requires documented hunt playbooks, consistent hypothesis formation before hunts begin, and a feedback loop where hunt outputs turn into detection content. It's not glamorous. It's process engineering. But it's the difference between a threat hunting function and a threat hunting performance.

The Analyst Who Changed How I Think About This

A few years ago I worked with a threat hunter at a financial services firm who had come up through network forensics rather than the traditional SOC analyst path. She didn't write SIEM queries the way most analysts do — she'd spend two days just building a baseline model of normal behavior for a specific subsystem before writing a single detection query. Her hunt notebooks were almost academic in structure: stated hypothesis, null hypothesis, data sources required, expected indicators if hypothesis true, expected indicators if false.

Her hunt completion rate — meaning hunts that produced a definitive positive or negative conclusion — was significantly higher than anyone else on the team. And her false positive rate was near zero. Not because she was smarter, but because she refused to start querying until she could articulate exactly what a true positive would look like. That discipline is reproducible. It's not a talent, it's a habit. Most analysts never develop it because nobody tells them it's required.

The SOC-to-hunter transition is a real skillset gap that organizations underestimate. A strong tier-2 SOC analyst is excellent at working known alert types, following playbooks, and escalating ambiguous situations. A threat hunter needs to be comfortable sitting with uncertainty for days, following a hypothesis into dead ends, and knowing when a negative result is conclusive versus when it just means you're looking in the wrong place. Those are fundamentally different cognitive skills. You can develop a good SOC analyst into a good threat hunter, but it takes deliberate training in hypothesis formation, statistical intuition, and adversary emulation thinking — not just access to better tooling.

What Good Looks Like, Practically

A mature hunt program runs structured hunts tied to specific threat intel inputs — new adversary TTPs from ISACs, red team findings, recent incident learnings. Each hunt has a written hypothesis before anyone opens a query interface. The hypothesis maps to at least one MITRE ATT&CK technique. The analyst identifies required telemetry sources before starting and flags coverage gaps that would prevent the hunt from being conclusive. Hunt outputs — whether positive, negative, or inconclusive — are documented in a shared repository with enough context that another analyst can reproduce or extend the work.

Negative hunts have value too, but only if they're rigorous. "We looked and didn't find it" is worthless. "We looked for T1053.005 indicators across Windows endpoints using Sysmon process creation logs and Velociraptor task enumeration, with 98% fleet coverage, and found no anomalous scheduled task registrations matching our hypothesis criteria during the 30-day window" is a meaningful security statement. It tells you something about your exposure. The first version just tells you someone was busy.

If your hunting program can't produce that second kind of output consistently, you're not hunting. You're browsing logs with extra steps — and the security posture of your organization is no better for it.

Threat Hunting Without a Hypothesis Is Just Browsing Logs With Extra Steps

Let's Be Honest About What You're Actually Doing

What a Hypothesis Actually Looks Like (And Why Sqrrl Got It Right)

The Tooling Conversation Nobody Finishes

Network Hunting Is Where People Get Humble Fast

Detection Engineering Is Not Threat Hunting (Stop Conflating Them)

The Analyst Who Changed How I Think About This

What Good Looks Like, Practically

Enjoying this article?

Related Posts

Your Incident Response Plan Won't Survive First Contact

What is Security Information & Event Management Sys