The hardest problem in firewall engineering is not detection. It is the space between detection and noise. A firewall that catches every SQL injection attempt but generates hundreds of false positives per day is a firewall that gets its alerts ignored. A firewall that never fires a false positive but misses real attacks has a different problem that is arguably worse.

At IVO, our firewall implementation was designed from the start to operate in the gap between these two failure modes. This post covers the rule engine architecture, how we approach pattern matching for common attack classes, and the feedback loop we built with enterprise security operations teams to continuously improve detection accuracy.

Rule Engine Architecture

Our firewall rule engine processes traffic in two stages: a fast pre-filter that eliminates obviously benign traffic, and a deep inspection stage that applies full signature and behavioral analysis to the remaining flows.

The pre-filter operates on connection metadata — source and destination addresses, ports, protocol, and basic header fields. Its job is not to make security decisions but to reduce the volume of traffic that reaches the computationally expensive inspection stage. In a typical enterprise deployment, the pre-filter passes through approximately 15–20% of total traffic for deep inspection. The remaining 80–85% is classified by network policy rules (allow, deny, or route) without incurring inspection overhead.

The deep inspection stage is where signature matching, protocol validation, and behavioral analysis happen. Rules in this stage are organized into rule chains — ordered sequences of checks grouped by protocol and attack category. HTTP traffic is evaluated against the HTTP rule chain, which includes checks for SQL injection, cross-site scripting, command injection, path traversal, and protocol violations. DNS traffic goes through the DNS rule chain. TLS traffic that has been decrypted upstream is re-classified and routed to the appropriate protocol chain.

Rule chains are ordered by computational cost, cheapest first. A simple string match that can eliminate 90% of candidate packets runs before a complex regular expression that is only relevant to the remaining 10%. This ordering is not static — we profile rule execution costs during compilation and reorder chains when rules are updated.

Pattern Matching: SQL Injection, XSS, and Command Injection

Each attack class requires a different matching strategy, and getting this right is where detection quality is won or lost.

SQL injection detection combines syntax-aware tokenization with context analysis. Rather than matching on blacklisted strings like "OR 1=1" — which produces enormous false positive rates on any site that discusses SQL or mathematics — our engine tokenizes the suspect input and evaluates whether the token sequence could alter the semantic structure of a SQL statement. The difference matters: the string "1 OR 1=1" in a URL parameter is suspicious; the same string in a blog comment about database tutorials is not. Context (which parameter, which endpoint, which content type) drives the decision.

Cross-site scripting detection follows a similar principle. We parse suspect inputs looking for syntactic structures that would execute in a browser context — script tags, event handlers, javascript: URIs, and encoded variants. The challenge is that XSS payloads are endlessly creative in their encoding. We normalize inputs through multiple decoding passes (URL decoding, HTML entity decoding, Unicode normalization, mixed-case normalization) before evaluation. This catches double-encoded and polyglot payloads that single-pass decoders miss.

Command injection detection focuses on identifying shell metacharacters and command separators in contexts where user input might reach a system shell. Semicolons, pipes, backticks, and $() constructs are evaluated in the context of the destination application's known behavior. An API endpoint that processes filenames gets stricter scrutiny than one that accepts freeform text.

In all three cases, the matching strategy is the same: normalize the input, analyze it in context, and evaluate the structural risk rather than matching against a static blacklist. Blacklists are easy to build and easy to evade. Structural analysis is harder to build but much harder to evade.

Signature-Based vs. Behavioral Detection

Signature-based detection catches known attack patterns. Behavioral detection catches anomalous activity that does not match any known signature. Both have failure modes, and our architecture uses both.

Signatures are fast, deterministic, and auditable. When a signature fires, the analyst can see exactly which pattern matched and evaluate whether it is a true positive. But signatures only catch what they were written to catch — they are inherently retrospective.

Behavioral detection catches novel attacks by identifying statistical anomalies: unusual request rates, abnormal parameter distributions, unexpected protocol transitions, or traffic patterns that deviate from a learned baseline. Behavioral rules fire when something looks wrong without matching a specific known pattern.

The engineering tradeoff is that behavioral detection generates a higher base rate of false positives, because anomalous does not mean malicious. A legitimate user running an automated testing tool looks anomalous. A new application deployment changes traffic patterns. A marketing campaign drives unusual request rates.

We handle this by scoring rather than blocking behavioral detections by default. Behavioral rule matches contribute to a per-session risk score. When the score exceeds a configurable threshold, the session is flagged for analyst review or subjected to additional inspection. Only specific behavioral rules with high historical accuracy (validated through the feedback loop described below) are configured to block automatically.

This means signature matches produce immediate enforcement while behavioral detections produce risk signals that accumulate. The combination catches more than either approach alone without overwhelming the operations team with noise.

The False Positive Feedback Loop

A firewall rule set that is deployed and never tuned will either become increasingly noisy (as new applications trigger old rules) or increasingly blind (as analysts disable noisy rules without replacement). Neither outcome is acceptable.

Our approach is a structured feedback loop between IVO's engineering team and the enterprise SOC teams operating our appliances.

When an analyst identifies a false positive, they flag it through the management interface with the triggering rule ID, the traffic sample, and a classification (false positive, needs tuning, or informational). These reports are aggregated across deployments (with customer consent and data anonymization) and reviewed by our rule engineering team on a weekly cycle.

Each false positive report drives one of three actions:

Rule refinement. The rule is modified to exclude the false positive condition while maintaining detection of the true positive class. This often means adding context checks — for example, exempting a specific application's normal behavior from a generic injection rule.

Rule splitting. A broad rule that detects a real attack class but generates false positives in specific contexts is split into a precise rule (high confidence, auto-block) and a broad rule (lower confidence, alert-only). This preserves detection coverage while reducing noise.

Rule retirement. Rules that generate persistent false positives without corresponding true positives are candidates for retirement. We maintain metrics on every rule's true positive and false positive rates across deployments. Rules with a sustained false positive rate above a threshold and zero confirmed true positives over a rolling window are flagged for review and potential removal.

This feedback loop means our rule set improves continuously. Enterprises deploying IVO appliances benefit not only from their own tuning but from the aggregated operational experience across all deployments.

Operational Reality: Working With SOC Teams

The best firewall rule engine is useless if it produces output that SOC analysts cannot efficiently act on. We designed our alert output with analyst workflow in mind.

Every alert includes the matched rule with a human-readable description, the normalized traffic sample that triggered it (not just raw bytes), the confidence level (signature-high, signature-medium, behavioral), and a recommended action. Alerts are pre-grouped by session, so an analyst reviewing a flagged session sees all associated alerts in chronological order rather than scattered across a queue.

We also provide a rule simulation mode that allows SOC teams to test rule changes against captured traffic before deploying them. This is critical for organizations that need change control around security policy — they can validate that a rule modification reduces false positives without introducing false negatives before it goes live.

Engineering for the Real World

Firewall engineering is ultimately about operational effectiveness. Detection accuracy, false positive rates, and analyst workflow integration matter more than any individual technical feature. IVO's firewall was built with the understanding that the rule engine is only as good as the operational process around it — and that process must include continuous measurement, feedback, and improvement.

Ready to see how IVO's firewall engineering reduces alert fatigue in your SOC? Call +1 (650) 286-1335 or start your 30-day free trial today.