Changelog#
24/08/20
Replaced some links with permanent ones (Internet Archive)
Added additional links.
Added clarifications to the SME introduction.
Updated all E-Core performance numbers (
QOS_CLASS_UTILITY
instead ofQOS_CLASS_BACKGROUND
).Added performance for SME2 multi-vector group FMLA.
Updated multithreading numbers (up to 10 threads).
Replaced copy bandwidth benchmark with separate read/write benchmarks.
Data:
bandwidth_p_1_2d9566d.log
,micro_p_1_2d9566d.log
,micro_p_2_2d9566d.log
,micro_p_3_2d9566d.log
,micro_p_4_2d9566d.log
,micro_p_5_2d9566d.log
,micro_p_6_2d9566d.log
,micro_p_7_2d9566d.log
,micro_p_8_2d9566d.log
,micro_p_9_2d9566d.log
,micro_p_10_2d9566d.log
,micro_e_1_2d9566d.log
,micro_e_2_2d9566d.log
,micro_e_3_2d9566d.log
,micro_e_4_2d9566d.log
,micro_e_5_2d9566d.log
,micro_e_6_2d9566d.log
,micro_e_7_2d9566d.log
,micro_e_8_2d9566d.log
,micro_e_9_2d9566d.log
,micro_e_10_2d9566d.log
24/06/15
Added results of copy benchmark.
Data:
copy_p_1_61d3860.log
.
24/05/26
Added C+=AB^T kernel with M=N=K=32.
Data:
gemm_p_1_23efa23.log
,gemm_p_2_23efa23.log
,gemm_p_3_23efa23.log
,gemm_p_4_23efa23.log
.
24/05/21
Added outer-product microbenchmarks for FP64, BF16-BF16-FP32, FP16-FP16-FP32, INT8-INT8-INT32 and INT16-INT16-INT32.
Added missing multi-core results on 4, 5 and 6 e-cores.
Data:
micro_p_1_7e5e61b.log
,micro_e_1_7e5e61b.log
,micro_e_4_499197c.log
,micro_e_5_499197c.log
,micro_e_6_499197c.log
.
24/05/18
Added AMX peak.
Added performance results for changing SME Pstate.
Added multi-core results.
Pushed code to GitHub.
Data:
micro_p_1_499197c.log
,micro_p_2_499197c.log
,micro_p_3_499197c.log
,micro_p_4_499197c.log
,micro_e_1 499197c.log
,micro_e_2_499197c.log
,micro_e_3_499197c.log
.
24/05/17:
Added performance results for efficiency cores.
Added microbenchmark for tile reuse.
Added benchmark results for Acccelerate’s GEMM performance.
Data:
driver
,kernels
,logs p-core
,logs e-core
.
24/05/15