------------------------------------------------------------------- Inline Snort multiprocessing with PF_RING – Tested on CentOS 6 We have modified PF_RING to work with inline Snort (while still supporting the current passive multiprocessing functionality). PF_RING load balances the traffic to analyze by hashing the TCP/UDP headers in multiple buckets. This allows to spawn multiple instances of Snort each processing a single bucket and achieve higher throughput though multiprocessing. In order to take full advantage of this you need a multicore processor (like an I7 with 8 processing threads). This should also work well with dual or quad processor boards to increase parallelism even further. The big deal is that now you can build really cheap IPS systems using standard off-the-shelf machines. Here are is the system we have ported PF_RING inline to: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz PF_RING e1000e driver, transparent_mode=1 Snort 2.9.0.x using the SRI BotHunter Ruleset, in addition to the Emerging Threats Pro Ruleset Throughput: ~800Mbps Latency: ~200us Please install the following packages first. Most of these can be installed as: yum install –package– kernel-devel libtool subversion automake make autoconf pcre-devel libpcap-devel flex bison byacc gcc zlib-devel gcc-c++ #Download and install https://www.metaflows.com/assets/downloads/pf_ring/libdnet-1.12.tgz #Build the PF_RING inline libraries and kernel module: #download our modified PF_RING source here https://www.metaflows.com/assets/downloads/pf_ring/PF_RING.tgz tar xvfz pfring_inline.tgz cd PF_RING; make clean cd kernel; make clean; make; make install cd ../userland/lib; export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib; export LIBS=’-L/usr/local/lib’; ./configure; make clean; make; make install cd ../libpcap; export LIBS=’-L/usr/local/lib -lpfring -lpthread’; ./configure; make clean; make; make install; make clean; make; make install-shared ln -s /usr/local/lib/libpfring.so /usr/lib/libpfring.so #Build the daq-0.6.2 libraries: #downlaod daq-0.6.2 here https://www.metaflows.com/assets/downloads/pf_ring/daq-0.6.2.tgz tar xvfz daq-0.6.2.tgz cd daq-0.6.2; chmod 755 configure; export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib; export LIBS="-L/usr/local/lib -lpcap -lpthread" ./configure --disable-nfq-module --disable-ipq-module \ --with-libpcap-includes=/usr/local/include \ --with-libpcap-libraries=/usr/local/lib \ --with-libpfring-includes=/usr/local/include/ \ --with-libpfring-libraries=/usr/local/lib make clean; make; make install #Go back to the PF_RING directory and build the daq interface module cd PF_RING/userland/snort/pfring-daq-module; autoreconf -ivf; export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib export LIBS=’-L/usr/local/lib -lpcap -lpfring -lpthread’; ./configure; make; make install # Build Snort 2.9.x # cd snort-2.9.x;make clean ; export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib; export LIBS=’-L/usr/local/lib -lpfring -lpthread’ ./configure –with-libpcap-includes=/usr/local/includes –with-libpcap-libraries=/usr/local/lib –with-libpfring-includes=/usr/local/include/ –with-libpfring-libraries=/usr/local/lib –enable-zlib –enable-perfprofiling make make install # Load PF_RING MODULE #never run inline with tx_capture!!!! insmod pf_ring.ko enable_tx_capture=0 # Run Snort # # Run as many instances as your system can handle limited only to value of CLUSTER_LEN in PF_RING/kernel/linux/pf_ring.h at compile time (and your memory). #Remember to replace the interfaces with ones appropriate for your instance. ifconfig eth0 up ifconfig eth1 up snort -c snort.serv.conf -A console -y -i eth0:eth1 –daq-dir /usr/local/lib/daq –daq pfring –daq-var clusterid=10 –daq-mode inline -Q ---------------------------------------------------------- Configuring PF_RING for 5-7 Gbps Multiprocessing Building and Running PF_RING NAPI Load ixgbe driver. We found that setting the InterruptThrottleRate to 4000 was optimal for our traffic. modprobe ixgbe InterruptThrottleRate=4000 Load PF_RING in transparent mode 2 and set a reasonable buffer size. modprobe pf_ring.ko transparent_mode=2 min_num_slots=16384 Bring up the 10gbe interface (in our case this was eth3). ifconfig eth3 up Optimise the Ethernet device. We mostly turned off options which hinder throughput. Substitute eth3 with the interface appropriate to your instance. ethtool -C eth3 rx-usecs 1000 ethtool -C eth3 adaptive-rx off ethtool -K eth3 tso off ethtool -K eth3 gro off ethtool -K eth3 lro off ethtool -K eth3 gso off ethtool -K eth3 rx off ethtool -K eth3 tx off ethtool -K eth3 sg off Set up CPU affinity for interrupts based on the number of RX queues on the NIC, balanced across both processors. This may vary from system to system. Check /proc/cpuinfo to see which processor IDs are associated with each physical CPU. printf "%s" 1 > /proc/irq/73/smp_affinity #cpu0 node0 printf "%s" 2 > /proc/irq/74/smp_affinity #cpu1 node0 printf "%s" 4 > /proc/irq/75/smp_affinity #cpu2 node0 printf "%s" 8 > /proc/irq/76/smp_affinity #cpu3 node0 printf "%s" 10 > /proc/irq/77/smp_affinity #cpu4 node0 printf "%s" 20 > /proc/irq/78/smp_affinity #cpu5 node0 printf "%s" 40 > /proc/irq/79/smp_affinity #cpu6 node1 printf "%s" 80 > /proc/irq/80/smp_affinity #cpu7 node1 printf "%s" 100 > /proc/irq/81/smp_affinity #cpu8 node1 printf "%s" 200 > /proc/irq/82/smp_affinity #cpu9 node1 printf "%s" 400 > /proc/irq/83/smp_affinity #cpu10 node1 printf "%s" 800 > /proc/irq/84/smp_affinity #cpu11 node1 printf "%s" 1000 > /proc/irq/85/smp_affinity #cpu12 node0 printf "%s" 2000 > /proc/irq/86/smp_affinity #cpu13 node0 printf "%s" 40000 > /proc/irq/78/smp_affinity #cpu18 node1 printf "%s" 80000 > /proc/irq/88/smp_affinity #cpu19 node1 Launch Snort instances in a PF_RING cluster. In our test, we spawned 24 instances with the following command: for i in `seq 0 1 23`; do snort -c snort.serv.conf -N -A none -i eth3 --daq-dir /usr/local/lib/daq \ --daq pfring --daq-var clusterid=10 & done ----------------------------------------------------------------------------- Building and Running PF_RING DNA Download PF_RING 5.1 . Configure and make from the top-level directory. cd PF_RING-5.1.0 ./configure make Load the DNA driver. insmod /root/PF_RING-5.1.0/drivers/DNA/ixgbe-3.3.9-DNA/src/ixgbe.ko Load PF_RING in transparent mode 2 and set a reasonable buffer size. insmod /root/PF_RING-5.1.0/kernel/pf_ring.ko min_num_slots=8192 transparent_mode=2 Bring the DNA interface up. ifconfig dna0 up #optimise the Ethernet device, mostly turning off options #which hinder throughput. #substitute eth3 with the interface appropriate to your instance ethtool -C eth3 rx-usecs 1000 ethtool -C eth3 adaptive-rx off ethtool -K eth3 tso off ethtool -K eth3 gro off ethtool -K eth3 lro off ethtool -K eth3 gso off ethtool -K eth3 rx off ethtool -K eth3 tx off ethtool -K eth3 sg off Set up CPU affinity for interrupts based on the number of RX queues on the NIC, balanced across both processors. This may vary from system to system. Check /proc/cpuinfo to see which processor IDs are associated with each physical CPU. printf "%s" 1 > /proc/irq/73/smp_affinity #cpu0 node0 printf "%s" 2 > /proc/irq/74/smp_affinity #cpu1 node0 printf "%s" 4 > /proc/irq/75/smp_affinity #cpu2 node0 printf "%s" 8 > /proc/irq/76/smp_affinity #cpu3 node0 printf "%s" 10 > /proc/irq/77/smp_affinity #cpu4 node0 printf "%s" 20 > /proc/irq/78/smp_affinity #cpu5 node0 printf "%s" 40 > /proc/irq/79/smp_affinity #cpu6 node1 printf "%s" 80 > /proc/irq/80/smp_affinity #cpu7 node1 printf "%s" 100 > /proc/irq/81/smp_affinity #cpu8 node1 printf "%s" 200 > /proc/irq/82/smp_affinity #cpu9 node1 printf "%s" 400 > /proc/irq/83/smp_affinity #cpu10 node1 printf "%s" 800 > /proc/irq/84/smp_affinity #cpu11 node1 printf "%s" 1000 > /proc/irq/85/smp_affinity #cpu12 node0 printf "%s" 2000 > /proc/irq/86/smp_affinity #cpu13 node0 printf "%s" 40000 > /proc/irq/78/smp_affinity #cpu18 node1 printf "%s" 80000 > /proc/irq/88/smp_affinity #cpu19 node1 This loop spawns 16 Snort processes. Each is bound to an RX queue of the NIC interface and is specified as dnaX@Y, where X is the DNA device ID, and Y is the RX queue. for i in `seq 0 1 15`; do /nsm/bin/snort-2.9.0/src/snort -c /nsm/etc/snort.serv.conf \ -A none -N -y -i dna0@$i & done -------------------------------------------------------------------- Compiling Snort With ICC We found that ICC gives the best performance using its profiling capability with -march=corei7 -fomit-frame-pointer -no-prec-div -fimf-precision:low -fno-alias -fno-fnalias. Note: the Intel compiler is free to use for research purposes only; using this in production requires a paid license. Set environmental variables for Intel's compiler. source /opt/intel/bin/compilervars.sh intel64 export CC=/opt/intel/composer_xe_2011_sp1.6.233/bin/intel64/icc Compile Snort first using -prof-gen. export CFLAGS='-march=corei7 -fomit-frame-pointer -no-prec-div \ -fimf-precision:low -fno-alias -fno-fnalias -prof-gen \ -prof-dir=/root/nsm_intel/bin/snort-2.9.x' cd snort-2.9.x/ make clean; ./configure; make Run Snort on as many hardware threads as you would like using PF_RING_NAPI: for i in `seq 0 1 23`; do snort -c /nsm/etc/snort.serv.conf -N -A none -i eth3 \ --daq-dir /usr/local/lib/daq --daq pfring --daq-var clusterid=10 & done Or PF_RING DNA: for i in `seq 0 1 15`; do snort -c /nsm/etc/snort.serv.conf -N -A none -i dna0@$i & done Make sure to send some traffic for a few minutes. You might notice very high CPU utilization in prof-gen mode because the application is in profiling mode: # Kill Snort to output the profile data killall snort # Now recompile it using "-prof-use" instead export CFLAGS='-march=corei7 -fomit-frame-pointer -no-prec-div \ -fimf-precision:low -fno-alias -fno-fnalias -prof-use \ -prof-dir=/root/nsm_intel/bin/snort-2.9.x' cd snort-2.9.x/ make clean; ./configure; make