And Now For Something Completely Technical: PF Ring 10 Gbps Snort IDS
You can always visit the MetaFlows Website for more information.
PF_RING based 10 Gbps Snort multiprocessing
Tested on CentOS 6 64bit using our custom PF_RING source
PF_RING load balances network traffic originating from an Ethernet interface by hashing the IP headers into N buckets. This allows it to spawn N instances of Snort, each processing a single bucket and achieve higher throughput through multiprocessing. In order to take full advantage of this, you need a multicore processor (like an I7 with 8 processing threads) or a dual or quad processor board that increases parallelism even further across multiple chips.
In a related article we measured the performance of PF_RING with Snort inline at 1 Gbps on an I7 950. The results were impressive.
The big deal is that now you can build low-cost IDPS systems using standard off-the-shelf hardware.
You can purchase our purpose-built Hardware with MetaFlows PF_RING pre-installed, giving you a low cost high performance platform to run your custom PF_RING applications on. If you are interested in learning more, please contact us.
In this article we report on our experiment running Snort on a dual processor board with a total of 24 hyperthreads (using the Intel X5670). Besides measuring Snort processing throughput varying the number of rules, we also (1) changed the compiler used to compile Snort (GCC vs. ICC) and (2) compared PF_RING in NAPI mode (running 24 Snort processes in parallel) and PF_RING Direct NIC Access technology (DNA) (running 16 Snort processes in parallel).
PF_RING NAPI performs the hashing of the packets in software and has a traditional architecture where the packets are copied to user space by the driver. Snort is parallelized using 24 processes that are allowed to float on the 24 hardware threads while the interrupts are parallelized on 16 of the 24 hardware threads.
PF_RING DNA performs the hashing of the packets in hardware (using the Intel 52599 RSS functionality) and relies on 16 hardware queues. The DNA driver allows 16 instances of Snort to read packets directly from the hardware queues therefore virtually eliminating system-level processing overhead. The limitation of DNA is that (1) supports a maximum of 16x parallelism per 10G interface, (2) it only allows 1 process to attach to each hardware queue and (3) it costs a bit of money or requires Silicom cards(well worth it). (2) is significant because it does not allow multiple processes to receive the same data. So, for example if you run “tcpdump -i dna0″, you could not also run “snort -i dna0 -c config.snort -A console” at the same time. The second invocation would return an error.
GCC is the standard open source compiler that comes with CentOS 6 and virtually all other Unix systems. It is the foundation of open source and without it we would still be in the stone age (computationally).
ICC is an Intel proprietary compiler that goes much further in extracting instruction- and data-level parallelism of modern multicore processors such as the i7 and Xeons.
All results are excellent and show that you can build a 5-7 Gbps IDS using standard off-the-shelf machines and PF_RING. The system we used to perform these experiments is below:
The graph above shows the sustained Snort performance of 4 different configurations using a varying number of Emerging Threats Pro rules. As expected, the number of rules has a dramatic effect on performance for all configurations (the more rules, the lower the performance). In all cases, memory access contention is likely to be the main limiting factor.
Given our experience, we think that our setup is fairly representative of an academic institution we have to admit that measuring Snort performance in the absolute is hard. No two networks are the same and rule configurations vary even more widely, nevertheless, the relative performance variations are important and of general interest. You can draw your own conclusions from the above graph; however here are some interesting observations:
- At the high end (6900 rules) ICC makes a big difference by increasing the throughput by ~1 Gbps (25%)
- GCC is just as good at maintaining throughput around 5 Gbps
- PF_RING DNA is always better than PF_RING NAPI.
We describe below how to reproduce these numbers on Linux CentOS 6. If you do not want to go through these steps, we also provide this functionality through our security system (MSS) pre-packaged and ready to go. It would help us if you tried it and let us know what you think.