2017年11月22日 星期三

Use numactl to enhance performance or to eliminate bottleneck

Example on using numactl utility to perform bandwidth test with GPU.

Below is the numa architecture for my server hardware.


Running regular operation and limit the resource dedicated from CPU0. Bandwidth is around 11850~12867 MB/s, H2D and D2H


Changing memory resource accessing from CPU1 group. Bandwidth is dropping to lower performance result, approximately at 6669~7175MB/s



Conclusion:
While performing test, using correct numa and understanding hardware architecture is extremely important. It may impact the performance result if you run the benchmark utility for certain PCIe device under particular CPU group. I saw lots of cases reported from clients and asked the solutions. In many cases with DP motherboards, customer installed PCIe cards in 2nd CPU and ran benchmark utility for overall result. The default of OS will start from 1st CPU and it may get into trouble.