To build a 10Gbps Network IDS using pf_ring you can either follow these steps or register at nsm.metaflows.com to download an automated installation script for CentOS or RHEL 7. After the install you will have all the system components installed and you can then decide if you want to go the free, open-source route or continue using our complete product for 2 weeks for free through a SaaS subscription.
Metaflows’ contribution to the PF_RING project has produced open source technology capable of scaling network monitoring from 10 Mbps to 10 Gbps. Below, we summarize the results of our peer-reviewed testing showing that it is possible to build extremely effective network monitoring appliances on inexpensive commodity hardware.
We have modified PF_RING to work with in line Snort (while still supporting the current passive multiprocessing functionality). PF_RING load balances the traffic to analyze by hashing the IP headers into multiple buckets. This allows PF_RING to spawn multiple instances of Snort, each processing a single bucket and achieving higher throughput through multiprocessing. In order to take full advantage of this, you need a multi-core processor (such as an Intel i7 with 8 processing threads). This should also work well with dual or quad processor boards to increase parallelism even further.
Our test system had the following setup:
As the graphs illustrate, running in line with 1 core can only sustain 100 Mbps or less (that’s what is already available today). With PF_RING in line , we parallelize the in line processing on up to 8 cores, thus achieving almost 700 Mbps sustained throughput. Performance numbers are greatly affected by the type and number of Snort rules used, as well as the type of traffic being processed. However, it appears that no matter what your setup is, PF_RING in line with 8 cores should achieve 700-800 Mbps sustained throughput with an approximately 200 µs latency. That is impressive performance!
To reach 5 Gbps sustained throughput, we needed better hardware. In this experiment, we are running Snort on a dual processor board with a total of 24 hyper-threads (using the Intel X5670). Besides measuring Snort processing throughput while varying the number of rules, we also:
PF_RING NAPI performs the hashing of the packets in software and has a traditional architecture where the packets are copied to user space by the driver. Snort is parallelized using 24 processes that are allowed to float on the 24 hardware threads while the interrupts are parallelized on 16 of the 24 hardware threads.
PF_RING DNA performs the hashing of the packets in hardware (using the Intel 52599 RSS functionality) and relies on 16 hardware queues. The DNA driver allows 16 instances of Snort to read packets directly from the hardware queues, thereby virtually eliminating system-level processing overhead. There are limitations, though. PF_RING DNA:
Number 2 in the list above is a significant limitation, because it does not allow multiple processes to receive the same data. For example, if you run
tcpdump -i dna0, you could not also run snort
snort -i dna0 -c config.snort -A console at the same time. The second invocation would return an error.
GCC is the standard open source compiler that comes with CentOS 6 and virtually all other Unix systems. It is the foundation of open source and without it we would still be in the stone age (computationally).
ICC is an Intel proprietary compiler that goes much farther in extracting instruction-level and data-level parallelism of modern multi-core processors such as the Intel i7 and Xeons.
Our results below are excellent and show that you can build a 5-7 Gbps IDS using standard off-the-shelf machines and PF_RING. The system we used to perform these experiments is below.
Our test system used the following setup:
The graph above shows the sustained Snort performance for 4 different configurations using a varying number of Emerging Threats Pro rules. As expected, the number of rules has a dramatic effect on performance for all configurations (the more rules, the lower the performance). In all cases, memory access contention is likely to be the main limiting factor.
Given our experience, we think that our setup is fairly representative of an academic institution. We have to admit that measuring Snort performance in absolute terms is difficult. No two networks are the same and rule configurations vary even more widely. Nevertheless, the relative performance variations are important and of general interest. You can draw your own conclusions from the above graph; however here are some interesting observations: