High-Performance Network Modules Architecture: Partnering with NVIDIA for ConnectX-6, RSS, and Multi-Queue Packet Distribution

An inside look at how IVO Networks appliances use NVIDIA ConnectX-6 (Mellanox) powered network modules with Receive Side Scaling and multi-queue packet distribution to achieve wire-speed processing across multiple CPU cores — eliminating bottlenecks in high-throughput VPN and security workloads.


When we designed the latest generation of IVO Networks appliances, one of the hardest engineering problems wasn't the software — it was what happens in the microseconds between a packet arriving on the wire and the operating system handing it to a CPU core for processing. At 100 Gbps line rates, you have roughly 6.7 nanoseconds per minimum-size packet. If your NIC hands everything to a single core, or if your interrupt distribution is unbalanced, no amount of software optimization will save you. The architecture of the network module itself becomes the critical path.

That's why we partnered with NVIDIA (Mellanox) and built the IVO-MCX6-100 around the ConnectX-6 chipset — and why the decisions we made around RSS, queue mapping, and PCIe topology matter to every IT leader responsible for high-throughput VPN and security infrastructure.

The IVO-MCX6-100: What's on the Module

The IVO-MCX6-100 is a purpose-built network module designed for IVO Networks appliances. Here's what's inside:

Specification Detail
Chipset NVIDIA (Mellanox) ConnectX-6
Network Ports 2x 100 GbE QSFP28
Host Interface 2x Gen4 PCIe x8 (Golden Finger with 1x PCIe x16 signal)

The ConnectX-6 dual-port QSFP28 adapter delivers 2x 100 Gb/s of connectivity with packet rates and near-zero latency performance optimized for real-time, scalable network security workloads.

But the spec sheet only tells part of the story. What makes this module critical to our appliance architecture is how it distributes packets across CPU cores before software ever gets involved.

The Problem: Why Single-Queue NICs Can't Scale

To understand why this matters, consider what happens in a naive network stack. A packet arrives at the NIC. The NIC raises an interrupt. The kernel handles the interrupt on whatever core it lands on, copies the packet into a socket buffer, and delivers it up the stack. At low traffic rates, this works fine.

At 100 Gbps, it falls apart.

A single CPU core — even a modern one running at 4+ GHz — simply cannot process the interrupt load, checksum validation, flow classification, and memory copies required to keep up with wire-speed 100G traffic. The result is packet loss, increased latency, and degraded throughput on your VPN tunnels and security inspection pipelines. You've paid for 100G ports, but you're getting a fraction of the performance.

This is the bottleneck that Receive Side Scaling was designed to break.

Receive Side Scaling: Distributing the Work at the Hardware Level

Receive Side Scaling (RSS) is a NIC-level capability that distributes incoming network traffic across multiple receive queues, each mapped to a different CPU core. Rather than funneling all traffic through a single interrupt and a single core, the NIC itself makes a per-packet decision about which queue — and therefore which core — will process that packet.

The ConnectX-6 on the IVO-MCX6-100 implements RSS with a hardware-accelerated Toeplitz hash function. For each incoming packet, the NIC extracts fields from the packet header — typically the source IP, destination IP, source port, and destination port (the 4-tuple) — and computes a hash. That hash value is used as an index into an indirection table that maps to a specific receive queue. Each receive queue is bound to a specific CPU core via interrupt affinity.

The result: packets belonging to the same flow always land on the same core (preserving packet ordering), while different flows are spread across all available cores. This happens entirely in hardware, at line rate, with no software overhead.

Why ConnectX-6 RSS Is Different

Not all RSS implementations are equal. The ConnectX-6 brings several capabilities that directly impact performance in IVO Networks appliances:

Deep queue counts. The ConnectX-6 supports a large number of hardware receive queues — far more than the typical 8 or 16 found on commodity NICs. This matters when you're running appliances with high core counts. If you have 32 or 64 cores available for packet processing but only 8 receive queues, you've immediately created an imbalance. The ConnectX-6 lets us map queues to cores at a ratio that eliminates this contention.

Programmable indirection tables. The RSS indirection table on the ConnectX-6 isn't fixed — we can reprogram it dynamically. This means we can adapt queue distribution based on workload characteristics, core utilization, or NUMA topology. If a particular core is handling heavy VPN encryption work, we can steer new flows away from it at the NIC level.

Symmetric hashing support. For security and VPN workloads, it's common to need both directions of a flow processed on the same core — the outbound packet and the corresponding inbound response. The ConnectX-6 supports symmetric RSS hashing, where the hash function produces the same result regardless of the direction of the flow. This keeps bidirectional flow state local to a single core, avoiding expensive cross-core cache synchronization.

Hardware flow steering. Beyond RSS, the ConnectX-6 supports more granular flow steering rules that can direct specific traffic classes to dedicated queues. We use this to ensure that management traffic, control plane protocols, and high-priority flows get deterministic core placement, independent of the general RSS distribution.

Multi-Queue Architecture and PCIe Topology

The IVO-MCX6-100 connects to the host via two Gen4 PCIe x8 lanes, presented as a single PCIe x16 signal through a golden finger interface. This is a deliberate design choice.

Gen4 PCIe x8 delivers approximately 16 GB/s of raw bandwidth per link. Two links give us 32 GB/s of aggregate bandwidth between the NIC and the CPU complex. At 200 Gbps aggregate line rate (2x 100G ports), the theoretical data rate is 25 GB/s — well within the PCIe bandwidth envelope even after accounting for protocol overhead, DMA descriptor traffic, and interrupt signaling.

But bandwidth is only half the equation. Latency and NUMA awareness matter just as much. In our appliance designs, we pay close attention to which PCIe root complex the NIC is attached to, and we ensure that the receive queues are mapped to cores on the same NUMA node as the NIC's PCIe attachment point. A packet that arrives on a ConnectX-6 port, gets DMA'd into memory on NUMA node 0, but then gets processed by a core on NUMA node 1, will incur cross-node memory access penalties that add microseconds of latency — exactly the kind of hidden overhead that erodes performance at scale.

What This Means for VPN and Security Workloads

VPN tunnels — whether IPsec or proprietary protocols — are computationally expensive. Each packet requires encryption or decryption, integrity checking, encapsulation, and routing decisions. Security inspection adds deep packet analysis on top of that. These are CPU-bound operations, and the only way to scale them is to distribute the work across as many cores as possible.

The IVO-MCX6-100 and its ConnectX-6 chipset make this distribution happen at the first possible moment: when the packet arrives at the NIC. By the time our software stack sees the packet, it's already on the right core, in the right memory region, ready for processing. There's no re-steering, no inter-core message passing, no lock contention on a shared queue. Each core operates on its own independent stream of packets.

This is how IVO Networks appliances achieve wire-speed VPN and security processing at 100G and beyond — not through a single fast path, but through massive parallelism orchestrated at the hardware level.

Engineering Decisions That Compound

The choice of the ConnectX-6 wasn't made in isolation. It's part of a larger set of engineering decisions — CPU selection, memory topology, PCIe layout, interrupt affinity, kernel bypass paths — that all need to be coherent for the system to perform at its theoretical maximum. A ConnectX-6 paired with a misaligned NUMA topology, or RSS enabled but with an incorrect indirection table, or queues mapped to cores that are already saturated with other work — any of these will leave performance on the table.

Getting this right is one of the things IVO Networks invests engineering effort in, and it's why our appliances deliver predictable, consistent throughput under real-world traffic conditions — not just in lab benchmarks with synthetic workloads.

Looking Ahead

As network speeds continue to increase and security workloads grow more complex, the boundary between hardware and software in network processing will continue to shift. Technologies like hardware offload for encryption, programmable packet processing pipelines are all areas where the ConnectX platform is advancing. We're actively working with NVIDIA on how these capabilities will integrate into future IVO Networks appliances.

For now, the IVO-MCX6-100 with ConnectX-6 represents the foundation of our high-performance network architecture — purpose-built hardware, intelligent packet distribution, and a system design that treats every nanosecond between wire and core as an engineering opportunity.


If you'd like to learn more about the IVO-MCX6-100 or discuss how IVO Networks appliances can address your high-throughput security and VPN requirements, contact our team or reach out to your IVO Networks account representative.