複数のシステムで具体的な性能差を比較するために、負荷プログラムを実行しその挙動を分析する。
目的に応じ多種多様なプログラムが存在するため、適切なベンチマークを選定することが重要。
http://www.opensourcetesting.org/performance.php unixbench http://itpro.nikkeibp.co.jp/article/COLUMN/20111108/373362/?ST=oss&P=1 http://www.hermit.org/Linux/Benchmarking/ http://byte-unixbench.googlecode.com/files/UnixBench5.1.3.tgz Phoronix Test Suite SPEC http://www.spec.org Standard Performance Evaluation Corporation IOzone - RPMforge HDBENCH - single processor , 2001/07〜 移行活動停止
コンピュータ雑誌 BYTE により策定されたプログラムのLinux移植版。
コメント部分の詳細
option | Description |
---|---|
-s | (ld)シンボル情報の全削除 |
-static | (gcc)静的リンク、共有ライブラリを使わない |
-Wall | (gcc)全ての警告(Warning)を表示 |
-O3 | (gcc)最適化レベル3 |
-fomit-frame-pointer | (gcc)フレームポインタの省略 |
-march=i686 | (gcc)i686向け最適化コード生成 |
-fforce-addr | (gcc)演算前にオペランドのメモリアドレス(定数)をレジスタに格納 |
-fforce-mem | (gcc)演算前にオペランドのメモリアドレス(ポインタ)をレジスタに格納(*gcc 4.2で廃止予定) |
-falign-loops=2 | (gcc)ループ命令のアラインメント(境界調整)を保障する。(*) |
-falign-functions=2 | (gcc)関数のアラインメント(境界調整)を保障する。(*) |
-falign-jumps=2 | (gcc)ジャンプのアラインメント(境界調整)を保障する。(*) |
-funroll-loops | (gcc)。ループの展開(*) |
# CFLAGS = -s -static -Wall -O3 # if your gcc lets you do it, then try this one #CFLAGS = -s -static -Wall -O3 -fomit-frame-pointer -funroll-loops # for gcc on an older Pentium type processor you can try the following #CFLAGS = -s -static -O3 -fomit-frame-pointer -Wall -m486 \ # -fforce-addr -fforce-mem -falign-loops=2 -falign-functions=2 \ # -falign-jumps=2 -funroll-loops # for a newer gcc on a newer Pentium type processor you can try the following CFLAGS = -s -static -O3 -fomit-frame-pointer -Wall -march=i686 \ -fforce-addr -fforce-mem -falign-loops=2 -falign-functions=2 \ -falign-jumps=2 -funroll-loops実行例
BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97) Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index : : Pentium 90* : AMD K6/233* --------------------:------------------:-------------:------------ NUMERIC SORT : 685.92 : 17.59 : 5.78 STRING SORT : 112.32 : 50.19 : 7.77 BITFIELD : 2.8028e+08 : 48.08 : 10.04 FP EMULATION : 100.92 : 48.43 : 11.17 FOURIER : 15532 : 17.66 : 9.92 ASSIGNMENT : 15.056 : 57.29 : 14.86 IDEA : 3090.1 : 47.26 : 14.03 HUFFMAN : 1134.6 : 31.46 : 10.05 NEURAL NET : 21.015 : 33.76 : 14.20 LU DECOMPOSITION : 713.2 : 36.95 : 26.68 ==========================ORIGINAL BYTEMARK RESULTS========================== INTEGER INDEX : 40.381 FLOATING-POINT INDEX: 28.033 Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0 ==============================LINUX DATA BELOW=============================== CPU : Dual AuthenticAMD AMD E-350 APU with Radeon(tm) HD Graphics 800MHz L2 Cache : 512 KB OS : Linux 2.6.18-308.11.1.el5.centos.plus C compiler : libc : libc-2.5.so MEMORY INDEX : 10.505 INTEGER INDEX : 9.767 FLOATING-POINT INDEX: 15.548 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 * Trademarks are property of their respective holder.Run 02
BYTEmark* Native Mode Benchmark ver. 2 (10/95) Index-split by Andrew D. Balsa (11/97) Linux/Unix* port by Uwe F. Mayer (12/96,11/97) TEST : Iterations/sec. : Old Index : New Index : : Pentium 90* : AMD K6/233* --------------------:------------------:-------------:------------ NUMERIC SORT : 620.32 : 15.91 : 5.22 STRING SORT : 116 : 51.83 : 8.02 BITFIELD : 2.7449e+08 : 47.09 : 9.83 FP EMULATION : 170.68 : 81.90 : 18.90 FOURIER : 15188 : 17.27 : 9.70 ASSIGNMENT : 20.399 : 77.62 : 20.13 IDEA : 3284.1 : 50.23 : 14.91 HUFFMAN : 1479.4 : 41.02 : 13.10 NEURAL NET : 24.4 : 39.20 : 16.49 LU DECOMPOSITION : 831.6 : 43.08 : 31.11 ==========================ORIGINAL BYTEMARK RESULTS========================== INTEGER INDEX : 47.026 FLOATING-POINT INDEX: 30.781 Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0 ==============================LINUX DATA BELOW=============================== CPU : Dual AuthenticAMD AMD E-350 APU with Radeon(tm) HD Graphics 800MHz L2 Cache : 512 KB OS : Linux 2.6.18-308.11.1.el5.centos.plus C compiler : libc : libc-2.5.so MEMORY INDEX : 11.668 INTEGER INDEX : 11.785 FLOATING-POINT INDEX: 17.072 Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38 * Trademarks are property of their respective holder.
SCSI/SATA用の sdparm(RPMforge) もある。
# hdparm -v /dev/hda /dev/hda: multcount = 16 (on) IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 0 (off) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 30401/255/63, sectors = 488397168, start = 0 SATA ディスクに対しても共有できるパラメータがある # hdparm -v /dev/sda1 /dev/sda1: IO_support = 0 (default 16-bit) readonly = 0 (off) readahead = 256 (on) geometry = 30401/255/63, sectors = 256977, start = 63
RPMforge から
# yum install iozone : ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: iozone i386 3.394-1.el5.rf rpmforge 751 k Transaction Summary ================================================================================ :実行例
$ iozone -a -g 4096 -q 4096 -b iozone_result.xls -f /tmp/tmpfile
$ cp /usr/share/iozone/* . $ iozone -a < iozone.out $ ./Generate_Graphs iozone.outSample : Write performance graph
前提パッケージは以下の通り。
gtk2, gtk2-devel gtk+, gtk+-devel libX11t-devel
[student@cent580 ~]$ sysctl fs.file-max fs.file-max = 24566 [student@cent580 ~]$ ## VBOX なので値が小さい、再度 LA で収集 [student@cent580 ~]$ sysctl fs.file-nr fs.file-nr = 768 0 24566 [student@cent580 ~]$ sysctl fs.inode-nr kernel.shmmax fs.inode-nr = 2756 416 kernel.shmmax = 4294967295 [student@cent580 ~]$ sysctl kernel.shmall kernel.shmmni kernel.sem kernel.shmall = 268435456 kernel.shmmni = 4096 kernel.sem = 250 32000 32 128 [student@cent580 ~]$ sysctl net.ipv4. net.ipv4.conf.eth1.promote_secondaries = 0 net.ipv4.conf.eth1.force_igmp_version = 0 net.ipv4.conf.eth1.disable_policy = 0 : (中略) net.ipv4.tcp_sack = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_timestamps = 1