Using irqbalance and SMP IRQ affinity
Problem: An Amazon EC2 instance can suffer from too much I/O wait when one of its CPUs gets saturated with interrupt requests. This can happen on a virtualized cloudserver instance because it doesn't have the dedicated hardware to balance interrupts between CPUs. We have to configure mechanisms for the interrupts to be balanced in software, in the kernel.
We've seen this so far most drastically on the production PostgreSQL server.
The solutions here have not been automated yet, due to time constraints, and because this may still be an evolving situation.
References:
- Pinterest engineering blog post. This covers the basic issue but the solutions given are not the exact ones that worked for us. https://engineering.pinterest.com/blog/building-pinterest-cloud
- Receive Packet Steering. This is not exactly what we ended up using, but it's related: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/network-rps.html
- Related slideshow: https://www.percona.com/live/mysql-conference-2015/sites/default/files/slides/all_your_iops_are_belong_to_usPLMCE2015.pdf
- Explanation of SMP IRQ affinity (this plus irqbalance are what appear to work for us): https://cs.uwaterloo.ca/~brecht/servers/apic/SMP-affinity.txt
Findings
We found that we needed to enable Enhanced Networking for our Amazon EC2 instance, compile irqbalance from source and install it (disabling the irqbalance that ships with Ubuntu 12.04), and execute commands to configure SMP affinity and start irqbalance (which are run in /etc/rc.local at boot).
The steps below result in two of the database server's CPUs handling ethernet interrupts during a mapping run, instead of just one. However, CPU utilization is still skewed heavily toward one CPU. We're going to have to observe what happens under a heavier load than a small mapping run on staging before making further determinations. The I/O interrupt balancing done by irqbalance can not be put to a proper test without heavier I/O than we've been able to generate on staging. The slideshow reference above indicates that irqbalance should balance block device interrupts. I think that what it must actually do is periodically switch the SMP affinity based on which CPU is less burdened.
It's not clear what we will have to do when we want to upgrade our servers to Ubuntu 14.04 LTS. (The relevant ones at the moment, the PostgreSQL database servers, are running 12.04.). The Amazon Enhanced Networking document says that the ethernet driver that needs to be installed (ixgbevf) is not compatible with 14.04, and says that you can use the version of ixgbevf that is bundled with Ubuntu 14.04. That sounds good, but I we would need to research what to do to upgrade with a self-installed version of this module already present.
Steps
- Install these apt packages: dh-autoreconf, git, pkg-config, ethtool
- Install ixgbevf driver and enable enhanced networking: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#enhanced-networking-ubuntu. Due to the complexity of this step, and the fact that it requires the use of the AWS EC2 CLI, this has not been automated, given the limited time available to fix the immediate issues in front of us.
- Disable Ubuntu's irqbalance in /etc/default/irqbalance
- Compile and install irqbalance 1.0.9
Add starting of irqbalance and smp_affinity commands to /etc/rc.local, having irqbalance ignore IRQs that we configure specifically. The following code has been inserted into /etc/rc.local, but is not yet reflected in our automation. The IRQs can differ between systems, so this has to be edited to suit the installation.
rl=`runlevel | /usr/bin/awk '{print $2}'` if [ "$rl" -eq "2" ]; then # Run irqbalance to balance i/o interrupts. We'll configure the kernel's # SMP IRQ affinity for ethernet interrupts, below. /usr/local/sbin/irqbalance --banirq=81 --banirq=82 || exit 1 # Distribute interrupts for paired interrupt queues across 2 CPUS with hex # bitmask values. Ethernet interrupts can not be loadbalanced between # multiple CPUs the way i/o interrupts can. You have to use the ixgbevf # (enhanced networking) kernel module and have individual interrupt queues # be handled by specific CPUs. # ... eth0-TxRx-0, IRQ #81, 0001 echo 1 > /proc/irq/81/smp_affinity # ... eth0-TxRx-1, IRQ #82, 0010 echo 2 > /proc/irq/82/smp_affinity fi
The relevant interrupts can be determined by
grep eth0 /proc/interrupts