Detection Engineering: Writing Rules That Actually Work

Detection engineering is an art as much as it's a science. Anyone can write a detection rule. But writing a rule that catches real threats while minimizing false positives? That's the challenge.

I've written hundreds of detection rules over the past few years in my personal homelab and security research environment-Sigma rules, YARA rules, custom SIEM logic, EDR queries. These are experiments and learning exercises in my own lab setup, not production MDR systems.

Note: At Trend Micro, I build internal tools and automation platforms. All detection rule work mentioned in this post is from my personal homelab and self-hosted security lab, where I practice and experiment with threat detection.

What is Detection Engineering?

Detection engineering is the practice of creating automated logic to identify malicious or suspicious activity in your environment. It's the bridge between threat intelligence and operational security.

Threat Intel: "APT29 is using mimikatz to steal credentials"
Detection Engineering: "Write a rule that detects mimikatz execution in our environment"
SOC: "Alert fires → analyst investigates → threat contained"

Without good detection rules, you're flying blind.

The Anatomy of a Good Detection Rule

A good detection rule has these qualities:

1. High True Positive Rate

It catches the thing you're trying to detect. Obvious, right? But you'd be surprised how many rules miss obvious evasion techniques.

2. Low False Positive Rate

It doesn't alert on normal business activity. A rule that fires 100 times a day on benign activity will be tuned off or ignored. Alert fatigue is real.

3. Contextually Rich

When the alert fires, the analyst should have enough context to triage quickly. Include relevant fields: user, hostname, process path, command line, parent process, etc.

4. Resilient to Evasion

Attackers will try to bypass your rule. Good rules account for common evasion tactics.

5. Maintainable

Six months from now, someone (probably you) will need to update this rule. Use comments. Use clear naming. Make it readable.

Rule Types and When to Use Them

Different tools, different rule formats:

Sigma Rules (SIEM-agnostic)

Sigma is a generic signature format for SIEM systems. Write once, compile to Splunk/Elasticsearch/QRadar/etc.

Example: Detecting Mimikatz

title: Mimikatz Detection via Command Line
id: a1e8e7d1-1234-5678-90ab-cdef12345678
status: stable
description: Detects potential Mimikatz execution via command line arguments
author: Janusz Czeropski
date: 2026/02/19
tags:
    - attack.credential_access
    - attack.t1003
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        CommandLine|contains:
            - 'sekurlsa::logonpasswords'
            - 'lsadump::sam'
            - 'privilege::debug'
            - 'kerberos::golden'
    condition: selection
falsepositives:
    - Security tools or penetration testing (approved)
level: critical

Portable across SIEMs
Community-driven (Sigma HQ has 1000s of rules)
Easy to read
Lowest common denominator (can't use advanced SIEM features)
Conversion can be imperfect

YARA Rules (File/Memory Scanning)

YARA is for identifying files or memory patterns based on strings, hex patterns, etc.

Example: Detecting Cobalt Strike Beacon

rule CobaltStrike_Beacon_Strings {
    meta:
        description = "Detects Cobalt Strike Beacon via common strings"
        author = "Janusz Czeropski"
        date = "2026-02-19"
        reference = "https://www.cobaltstrike.com/"
        
    strings:
        $s1 = "%s as %s\\%s: %d" ascii wide
        $s2 = "%s&%s=%s" ascii wide
        $s3 = "IEX (New-Object Net.Webclient).DownloadString" ascii wide
        $s4 = "%c%c%c%c%c%c%c%c%cMSSE" ascii wide
        $s5 = "\\\\.\\pipe\\msagent_" ascii wide
        
    condition:
        3 of ($s*)
}

Great for malware analysis
Pattern matching flexibility (regex, hex, conditions)
Fast scanning
Static analysis only (unless memory scanning)
Can be evaded by obfuscation

SIEM Custom Logic (Splunk SPL, KQL, etc.)

Sometimes you need the full power of your SIEM query language.

Example: Detecting Suspicious PowerShell (Splunk)

index=windows EventCode=4104
| eval script_lower=lower(ScriptBlockText)
| where (match(script_lower, "invoke-mimikatz") 
    OR match(script_lower, "iex.*downloadstring") 
    OR match(script_lower, "invoke-expression.*frombase64string"))
    AND NOT match(ComputerName, "APPROVED_ADMIN_HOST")
| table _time, ComputerName, User, ScriptBlockText
| eval severity="high"

Full access to SIEM features (lookups, stats, correlation)
Can build complex multi-stage detections
Vendor lock-in (SPL only works in Splunk)
Harder to share across organizations

The Detection Rule Lifecycle

Writing the rule is just the beginning. Here's the full lifecycle:

1. Research

The attacker technique (MITRE ATT&CK)
What logs are available
What normal activity looks like
Known evasion techniques

Example: You want to detect credential dumping.

Technique: MITRE T1003 (Credential Dumping)
Logs: Windows Security 4688 (process creation), Sysmon Event ID 1, EDR telemetry
Normal activity: Administrators running legitimate tools (Task Manager, Process Explorer)
Evasion: Mimikatz can be renamed, run from memory, obfuscated

2. Write

Start with a simple rule. Test it in a controlled environment (your homelab!).

# Version 1: Overly broad, will have FPs
detection:
    selection:
        Image|endswith: '\lsass.exe'
    condition: selection

3. Test

Known malicious samples (does it catch them?)
Benign activity (does it false positive?)
Evasion attempts (can you bypass it?)
EVTX log samples (github.com/sbousseaden/EVTX-ATTACK-SAMPLES)
Your own test environment
Historical production logs (if you have them)

4. Tune

Refine based on testing:

# Version 2: More specific, filters FPs
detection:
    selection:
        Image|endswith: '\lsass.exe'
        User|contains: 'SYSTEM'
    filter:
        ParentImage|endswith: 
            - '\services.exe'
            - '\wininit.exe'
    condition: selection and not filter

5. Deploy

Push to production. Monitor closely for the first 48 hours.

6. Monitor & Maintain

Rules drift over time. Software updates, new techniques, environmental changes-your rules need to evolve.

Check true positive vs false positive rates
Look for bypasses
Update based on new threat intel

Common Pitfalls (and How to Avoid Them)

Pitfall 1: The "Keyword Soup" Rule

# BAD: Too many keywords, high FP rate
detection:
    selection:
        CommandLine|contains:
            - 'powershell'
            - 'cmd'
            - 'net'
            - 'user'

Every single admin script will trigger this.

Fix: Be specific. Combine multiple indicators. Use exclusions.

Pitfall 2: Ignoring Context

# BAD: No context, can't triage
detection:
    selection:
        EventID: 4688
        CommandLine|contains: 'whoami'

Alerts with just "whoami detected" tell the analyst nothing.

Fix: Include user, hostname, parent process, full command line.

Pitfall 3: Hardcoded Paths

# BAD: Assumes attacker uses default path
detection:
    selection:
        Image: 'C:\\Windows\\Temp\\mimikatz.exe'

Attackers rename files and move them.

Fix: Match on behavior, not static paths.

# BETTER: Match on command line arguments
detection:
    selection:
        CommandLine|contains:
            - 'sekurlsa::logonpasswords'
            - 'lsadump::sam'

Pitfall 4: Not Testing Evasion

Mimikatz renamed to update.exe?
Mimikatz run from memory (no file on disk)?
Mimikatz with obfuscated command line arguments?

Fix: Red team your own rules. Try to bypass them.

Pitfall 5: Alert Fatigue

A rule that fires 50 times a day on false positives will be ignored or disabled.

Fix: Tune aggressively. Use allowlists. Start with "alert" mode, then promote to "block" after validation.

Real-World Example: Detecting Lateral Movement

Let's walk through a real detection I wrote for detecting Pass-the-Hash attacks.

The Threat

Attacker steals NTLM hash, uses it to authenticate to other systems without knowing the plaintext password.

Logs Available

Windows Event ID 4624 (Logon) with LogonType 3 (Network) and LogonProcess "NtLmSsp"

Initial Rule (Too Broad)

detection:
    selection:
        EventID: 4624
        LogonType: 3
        LogonProcessName: 'NtLmSsp'
    condition: selection

Result: 10,000 alerts per day. Every file share access triggers it.

Refined Rule (Adding Context)

detection:
    selection:
        EventID: 4624
        LogonType: 3
        LogonProcessName: 'NtLmSsp'
        LogonGuid: '{00000000-0000-0000-0000-000000000000}'  # Null session
    condition: selection

Result: Better, but still some FPs from legitimate automation.

Final Rule (Correlation + Exclusions)

detection:
    selection:
        EventID: 4624
        LogonType: 3
        LogonProcessName: 'NtLmSsp'
        LogonGuid: '{00000000-0000-0000-0000-000000000000}'
    filter_admin:
        TargetUserName|startswith: 'svc_'  # Service accounts
    filter_source:
        IpAddress|startswith: '10.1.1.'  # Admin jump box VLAN
    condition: selection and not (filter_admin or filter_source)

Result: 5-10 alerts per day, mostly true positives.

Advanced Techniques

Technique 1: Behavioral Stacking

Instead of single-event detection, look for sequences.

Example: Detecting reconnaissance → credential access → lateral movement

| tstats count WHERE index=windows BY _time, host, user
| where (recon_commands > 0) AND (credential_access > 0) AND (lateral_movement > 0)
| where _time span 1h  # All within 1 hour

Technique 2: Statistical Anomalies

Baseline normal behavior, alert on outliers.

Example: User accessing 50+ servers in 10 minutes (unusual)

index=windows EventCode=4624
| stats dc(ComputerName) as unique_hosts by User
| where unique_hosts > 50

Technique 3: Threat Hunting → Detection

Start with an open-ended hunt, then formalize successful hunts into rules.

Example Hunt: "Are there any unsigned DLLs loaded by critical processes?"

If you find malicious DLLs, convert that hunt into an automated detection.

Tools I Use

Sigma: Rule writing and sharing
YARA: Malware analysis
Splunk: Production SIEM
Jupyter Notebooks: Ad-hoc analysis and prototyping
Git: Version control for all rules
Sigma Converter (sigmac): Convert Sigma to SIEM-specific queries

Resources for Detection Engineers

Sigma HQ: github.com/SigmaHQ/sigma
YARA Rules: github.com/Yara-Rules/rules
Elastic Detection Rules: github.com/elastic/detection-rules
MITRE ATT&CK: attack.mitre.org
Sigma Docs: sigmahq.io
Detection Engineering posts by Florian Roth, Olaf Hartong, Roberto Rodriguez
Security Onion (full detection platform)
Splunk Attack Range (build labs with attacker/victim VMs)
Mordor Project (attack datasets)

Wrapping Up

Detection engineering is an iterative process. Your first rule won't be perfect. Your tenth rule won't be perfect. But each iteration makes you better.

Know your adversary (research TTPs)
Understand your environment (what's normal?)
Start simple, iterate (don't boil the ocean)
Test rigorously (prod is not a playground)
Monitor and maintain (rules decay without care)

Write rules that give analysts superpowers, not headaches.

---

Janusz Czeropski builds internal tools and applications for MDR operations at Trend Micro. All detection engineering work described in this post is from his personal homelab and security research environment. He's written hundreds of rules and deleted almost as many. When he's not tuning alerts, he's probably hunting for threats or complaining about false positives. You can find him on GitHub or LinkedIn.