Linux内核支持huge page机制,huge page的介绍参见:
为了说明hugepage的用途,现需要说一下tlb的作用。
TLB: Translation Lookaside Buffer. 为了说明TLB,又不得不说虚拟内存。应用程序使用的是虚拟内存,而cpu操作的是物理内存,cpu和操作系统一起通过页表机制完成虚拟内存到物理内存的转化工作。页表是常驻内存的,但相对于cpu来说,访问速度仍然太慢。
为了加快虚拟内存和物理内存的转化,cpu中有一组专门的tlb用于cache页表项,这样,可以加快页表查询的速度。
http://blog.csdn.net/zenny_chen/article/details/6137028
从上面文章中可以看出,tlb的大小是有限的(很小),因此就有了命中率的问题。
好了,这下可以说huge page的用途了:
(1)huge page机制分配的内存空间会常驻内存,不会被swap出去
(2)通过分配很大的页(目前默认是2MB),huge page访问时tlb命中率更高(原因很简单,相同的tlb空间,缓存的页表项指向了更大的地址空间范围)。
Linux内核源码文档: Documentation/vm/hugetlbpage.txt介绍了huge page的用法。通过其中的例子来对比使用huge page和不使用huge page时tlb命中率的不同。
test_hp.c
通过创建创建hugetlbfs类型文件系统,然后mmap到用户空间,来利用huge page机制使用内存。这里首先需要将hugetlbfsmount到/mnt/huge目录下。
#includetest_nohp.c#include #include #include #include #define FILE_NAME "/mnt/huge/hugepagefile" #define LENGTH (32UL*1024*1024) #define PROTECTION (PROT_READ | PROT_WRITE) /* Only ia64 requires this */ #ifdef __ia64__ #define ADDR (void *)(0x8000000000000000UL) #define FLAGS (MAP_SHARED | MAP_FIXED) #else #define ADDR (void *)(0x0UL) #define FLAGS (MAP_SHARED) #endif void check_bytes(char *addr) { printf("First hex is %x\n", *((unsigned int *)addr)); } void write_bytes(char *addr) { unsigned long i; int j = 0; for (j = 0; j < 10; j++ ){ for (i = 0; i < LENGTH; i++) *(addr + i) = (char)i; } } void read_bytes(char *addr) { unsigned long i; check_bytes(addr); int j = 0; for (j = 0; j < 10; j++ ){ for (i = 0; i < LENGTH; i++) if (*(addr + i) != (char)i) { printf("Mismatch at %lu\n", i); break; } } } int main(void) { void *addr; int fd; fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755); if (fd < 0) { perror("Open failed"); exit(1); } addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, fd, 0); if (addr == MAP_FAILED) { perror("mmap"); unlink(FILE_NAME); exit(1); } printf("Returned address is %p\n", addr); check_bytes(addr); write_bytes(addr); read_bytes(addr); munmap(addr, LENGTH); close(fd); unlink(FILE_NAME); return 0; }
通过常规的malloc方式申请内存,页面大小仍然是默认的4K
#include#include #include #include #include #define LENGTH (32UL*1024*1024) void check_bytes(char *addr) { printf("First hex is %x\n", *((unsigned int *)addr)); } void write_bytes(char *addr) { unsigned long i; int j = 0; for (j = 0; j < 10; j++ ){ for (i = 0; i < LENGTH; i++) *(addr + i) = (char)i; } } void read_bytes(char *addr) { unsigned long i; check_bytes(addr); int j = 0; for (j = 0; j < 10; j++ ){ for (i = 0; i < LENGTH; i++) if (*(addr + i) != (char)i) { printf("Mismatch at %lu\n", i); break; } } } int main(void) { void *addr; addr = malloc(LENGTH); if (addr == NULL) { perror("malloc"); exit(1); } printf("Returned address is %p\n", addr); check_bytes(addr); write_bytes(addr); read_bytes(addr); return 0; }
利用perf指令来查看tlb的命中率
/usr/libexec/perf.2.6.32 record -e dTLB-loads -e dTLB-load-misses -e dTLB-stores -e dTLB-store-misses -e cache-references -e cache-misses -e cpu-cycles -e page-faults ./test_hp
Returned address is 0x7f9188200000
First hex is 0
First hex is 3020100
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.210 MB perf.data (~9173 samples) ]
/usr/libexec/perf.2.6.32-220.23.1.tb753.el6.x86_64 report
# Events: 891 dTLB-loads
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
58.84% test_hp test_hp [.] read_bytes
41.15% test_hp test_hp [.] write_bytes
0.02% test_hp [kernel.kallsyms] [k] __perf_sw_event
0.00% test_hp [kernel.kallsyms] [k] ctx_sched_in
0.00% test_hp [kernel.kallsyms] [k] perf_event_context_sched_in
# Events: 135 dTLB-load-misses
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .............................
#
38.86% test_hp [kernel.kallsyms] [k] load_elf_binary
14.94% test_hp test_hp [.] write_bytes
14.46% test_hp test_hp [.] read_bytes
10.63% test_hp [kernel.kallsyms] [k] run_posix_cpu_timers
4.59% test_hp ld-2.12.so [.] strcmp
4.25% test_hp [kernel.kallsyms] [k] do_softirq
2.55% test_hp [kernel.kallsyms] [k] call_softirq
1.70% test_hp [kernel.kallsyms] [k] smp_apic_timer_interrupt
1.28% test_hp [kernel.kallsyms] [k] ret_from_intr
1.28% test_hp [kernel.kallsyms] [k] apic_timer_interrupt
1.28% test_hp [kernel.kallsyms] [k] perf_event_task_tick
0.71% test_hp [kernel.kallsyms] [k] perf_event_context_sched_in
0.51% test_hp [kernel.kallsyms] [k] _spin_lock
0.43% test_hp [kernel.kallsyms] [k] check_preempt_wakeup
0.43% test_hp [kernel.kallsyms] [k] raise_softirq
0.43% test_hp [kernel.kallsyms] [k] hrtimer_interrupt
0.43% test_hp [kernel.kallsyms] [k] dyntick_save_progress_counter
0.43% test_hp [kernel.kallsyms] [k] group_sched_out
0.43% test_hp [kernel.kallsyms] [k] find_vma
0.43% test_hp [kernel.kallsyms] [k] kmem_cache_free
# Events: 884 dTLB-stores
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
67.05% test_hp test_hp [.] write_bytes
32.45% test_hp test_hp [.] read_bytes
0.43% test_hp [kernel.kallsyms] [k] clear_page_c
0.06% test_hp [kernel.kallsyms] [k] lookup_page_cgroup
0.01% test_hp [kernel.kallsyms] [k] inotify_inode_queue_event
0.00% test_hp [kernel.kallsyms] [k] perf_event_context_sched_in
# Events: 9 dTLB-store-misses
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ......................
#
56.18% test_hp [kernel.kallsyms] [k] page_fault
14.61% test_hp [kernel.kallsyms] [k] memcpy
12.64% test_hp [kernel.kallsyms] [k] rcu_irq_exit
8.43% test_hp [kernel.kallsyms] [k] clear_page_c
7.87% test_hp test_hp [.] write_bytes
0.28% test_hp [kernel.kallsyms] [k] perf_event_comm_output
# Events: 809 cache-references
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
48.07% test_hp [kernel.kallsyms] [k] clear_page_c
19.77% test_hp test_hp [.] read_bytes
16.41% test_hp test_hp [.] write_bytes
2.64% test_hp [kernel.kallsyms] [k] event_sched_out
2.06% test_hp [kernel.kallsyms] [k] perf_ctx_adjust_freq
1.37% test_hp [kernel.kallsyms] [k] group_sched_out
1.20% test_hp [kernel.kallsyms] [k] do_softirq
1.07% test_hp [kernel.kallsyms] [k] _spin_lock
0.46% test_hp [kernel.kallsyms] [k] update_curr
0.43% test_hp [kernel.kallsyms] [k] raise_softirq
0.36% test_hp [kernel.kallsyms] [k] read_tsc
0.34% test_hp [kernel.kallsyms] [k] perf_event_task_tick
0.30% test_hp [kernel.kallsyms] [k] update_wall_time
0.29% test_hp [kernel.kallsyms] [k] account_user_time
0.28% test_hp [kernel.kallsyms] [k] tick_do_update_jiffies64
0.24% test_hp [kernel.kallsyms] [k] ctx_sched_out
0.24% test_hp [kernel.kallsyms] [k] rb_erase
0.23% test_hp [kernel.kallsyms] [k] ctx_sched_in
0.18% test_hp [kernel.kallsyms] [k] __percpu_counter_add
0.18% test_hp [kernel.kallsyms] [k] perf_swevent_read
0.18% test_hp [kernel.kallsyms] [k] hrtimer_interrupt
0.17% test_hp [kernel.kallsyms] [k] rcu_process_callbacks
0.17% test_hp [kernel.kallsyms] [k] tick_sched_timer
0.17% test_hp [kernel.kallsyms] [k] native_read_tsc
0.17% test_hp [kernel.kallsyms] [k] hrtimer_forward
0.15% test_hp [kernel.kallsyms] [k] sched_clock_tick
0.14% test_hp [kernel.kallsyms] [k] rcu_irq_enter
0.14% test_hp [kernel.kallsyms] [k] __run_hrtimer
0.14% test_hp [kernel.kallsyms] [k] task_tick_fair
0.13% test_hp [kernel.kallsyms] [k] __do_softirq
0.13% test_hp [kernel.kallsyms] [k] run_posix_cpu_timers
0.12% test_hp [kernel.kallsyms] [k] scheduler_tick
0.12% test_hp [kernel.kallsyms] [k] update_vsyscall
0.12% test_hp [kernel.kallsyms] [k] account_process_tick
0.11% test_hp [kernel.kallsyms] [k] profile_tick
0.11% test_hp [kernel.kallsyms] [k] perf_swevent_add
0.11% test_hp [kernel.kallsyms] [k] account_cfs_rq_runtime
0.10% test_hp [kernel.kallsyms] [k] update_cfs_shares
0.07% test_hp [kernel.kallsyms] [k] event_sched_in
0.07% test_hp [kernel.kallsyms] [k] irq_complete_move
0.07% test_hp [kernel.kallsyms] [k] task_rq_lock
0.07% test_hp [kernel.kallsyms] [k] check_for_new_grace_period
0.07% test_hp [kernel.kallsyms] [k] rcu_irq_exit
0.06% test_hp [kernel.kallsyms] [k] rcu_check_callbacks
0.06% test_hp [kernel.kallsyms] [k] rb_next
0.06% test_hp [kernel.kallsyms] [k] irq_exit
0.06% test_hp [kernel.kallsyms] [k] clockevents_program_event
0.06% test_hp [kernel.kallsyms] [k] update_context_time
0.06% test_hp [kernel.kallsyms] [k] task_of
0.06% test_hp [kernel.kallsyms] [k] native_apic_mem_write
0.06% test_hp [kernel.kallsyms] [k] enqueue_task_fair
0.06% test_hp [kernel.kallsyms] [k] save_args
0.06% test_hp [kernel.kallsyms] [k] lapic_next_event
0.06% test_hp [kernel.kallsyms] [k] _spin_lock_irq
0.06% test_hp [kernel.kallsyms] [k] jiffies_to_timeval
0.06% test_hp [kernel.kallsyms] [k] enqueue_entity
0.05% test_hp [kernel.kallsyms] [k] apic_timer_interrupt
0.05% test_hp [kernel.kallsyms] [k] hrtimer_run_pending
0.05% test_hp [kernel.kallsyms] [k] perf_event_context_sched_in
0.05% test_hp [kernel.kallsyms] [k] check_preempt_wakeup
0.05% test_hp [kernel.kallsyms] [k] schedule
# Events: 797 cache-misses
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...............
#
48.27% test_hp [kernel.kallsyms] [k] clear_page_c
31.21% test_hp test_hp [.] read_bytes
20.48% test_hp test_hp [.] write_bytes
0.05% test_hp [kernel.kallsyms] [k] _spin_lock
0.00% test_hp [kernel.kallsyms] [k] alloc_huge_page
# Events: 879 cycles
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
51.41% test_hp test_hp [.] read_bytes
47.14% test_hp test_hp [.] write_bytes
1.44% test_hp [kernel.kallsyms] [k] clear_page_c
0.00% test_hp [kernel.kallsyms] [k] native_write_msr_safe
# Events: 6 page-faults
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ........................
#
86.63% test_hp ld-2.12.so [.] _dl_setup_hash
10.94% test_hp ld-2.12.so [.] _dl_sysdep_start
1.52% test_hp ld-2.12.so [.] _start
0.61% test_hp [kernel.kallsyms] [k] __clear_user
0.30% test_hp [kernel.kallsyms] [k] copy_user_generic_string
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
/usr/libexec/perf.2.6.32 record -e dTLB-loads -e dTLB-load-misses -e dTLB-stores -e dTLB-store-misses -e cache-references -e cache-misses -e cpu-cycles -e page-faults ./test_nonhp
Returned address is 0x7f7b65dda010
First hex is 0
First hex is 3020100
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.254 MB perf.data (~11115 samples) ]
# Events: 892 dTLB-loads
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ............................
#
58.45% test_nonhp test_nonhp [.] read_bytes
41.39% test_nonhp test_nonhp [.] write_bytes
0.11% test_nonhp [kernel.kallsyms] [k] __mem_cgroup_uncharge_common
0.04% test_nonhp [kernel.kallsyms] [k] ____pagevec_lru_add
0.01% test_nonhp [kernel.kallsyms] [k] copy_page_c
0.00% test_nonhp [kernel.kallsyms] [k] perf_event_comm
0.00% test_nonhp [kernel.kallsyms] [k] native_write_msr_safe
# Events: 566 dTLB-load-misses
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
81.82% test_nonhp test_nonhp [.] read_bytes
7.67% test_nonhp [kernel.kallsyms] [k] __alloc_pages_nodemask
2.07% test_nonhp [kernel.kallsyms] [k] do_perf_sw_event
1.95% test_nonhp test_nonhp [.] write_bytes
1.65% test_nonhp [kernel.kallsyms] [k] acl_permission_check
0.72% test_nonhp [kernel.kallsyms] [k] __percpu_counter_add
0.51% test_nonhp [kernel.kallsyms] [k] native_apic_mem_write
0.51% test_nonhp [kernel.kallsyms] [k] clear_page_c
0.47% test_nonhp [kernel.kallsyms] [k] update_curr
0.37% test_nonhp [kernel.kallsyms] [k] run_timer_softirq
0.25% test_nonhp [kernel.kallsyms] [k] down_read_trylock
0.23% test_nonhp [kernel.kallsyms] [k] call_softirq
0.21% test_nonhp [kernel.kallsyms] [k] leave_mm
0.21% test_nonhp [kernel.kallsyms] [k] rcu_irq_exit
0.20% test_nonhp [kernel.kallsyms] [k] __do_softirq
0.20% test_nonhp [kernel.kallsyms] [k] free_pcppages_bulk
0.18% test_nonhp [kernel.kallsyms] [k] raise_softirq
0.11% test_nonhp [kernel.kallsyms] [k] ret_from_intr
0.11% test_nonhp [kernel.kallsyms] [k] delayed_work_timer_fn
0.10% test_nonhp [kernel.kallsyms] [k] do_softirq
0.08% test_nonhp [kernel.kallsyms] [k] scheduler_tick
0.07% test_nonhp [kernel.kallsyms] [k] account_user_time
0.06% test_nonhp [kernel.kallsyms] [k] perf_event_do_pending
0.05% test_nonhp [kernel.kallsyms] [k] __perf_event_task_sched_in
0.03% test_nonhp [kernel.kallsyms] [k] perf_event_context_sched_in
0.03% test_nonhp [kernel.kallsyms] [k] __run_hrtimer
0.03% test_nonhp [kernel.kallsyms] [k] apic_timer_interrupt
0.03% test_nonhp [kernel.kallsyms] [k] task_of
0.03% test_nonhp [kernel.kallsyms] [k] run_posix_cpu_timers
0.03% test_nonhp [kernel.kallsyms] [k] _spin_lock_irq
0.03% test_nonhp [kernel.kallsyms] [k] tick_do_update_jiffies64
0.00% test_nonhp [kernel.kallsyms] [k] native_write_msr_safe
# Events: 887 dTLB-stores
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ..........................
#
66.97% test_nonhp test_nonhp [.] write_bytes
31.69% test_nonhp test_nonhp [.] read_bytes
0.57% test_nonhp [kernel.kallsyms] [k] clear_page_c
0.20% test_nonhp [kernel.kallsyms] [k] get_page_from_freelist
0.14% test_nonhp [kernel.kallsyms] [k] alloc_pages_vma
0.14% test_nonhp [kernel.kallsyms] [k] __mem_cgroup_commit_charge
0.08% test_nonhp [kernel.kallsyms] [k] __dec_zone_page_state
0.08% test_nonhp [kernel.kallsyms] [k] bit_spin_lock
0.05% test_nonhp [kernel.kallsyms] [k] lookup_page_cgroup
0.05% test_nonhp [kernel.kallsyms] [k] __do_page_fault
0.04% test_nonhp [kernel.kallsyms] [k] __memset
0.01% test_nonhp [kernel.kallsyms] [k] perf_event_comm_output
0.00% test_nonhp [kernel.kallsyms] [k] native_write_msr_safe
# Events: 464 dTLB-store-misses
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ......................
#
58.50% test_nonhp test_nonhp [.] write_bytes
23.41% test_nonhp [kernel.kallsyms] [k] page_fault
16.92% test_nonhp [kernel.kallsyms] [k] clear_page_c
0.55% test_nonhp [kernel.kallsyms] [k] __do_page_fault
0.32% test_nonhp [kernel.kallsyms] [k] __percpu_counter_add
0.18% test_nonhp [kernel.kallsyms] [k] rcu_irq_exit
0.10% test_nonhp [kernel.kallsyms] [k] perf_output_begin
0.02% test_nonhp [kernel.kallsyms] [k] run_timer_softirq
0.01% test_nonhp [kernel.kallsyms] [k] flush_signal_handlers
0.00% test_nonhp [kernel.kallsyms] [k] native_write_msr_safe
0.00% test_nonhp [kernel.kallsyms] [k] perf_event_comm_output
# Events: 815 cache-references
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
29.65% test_nonhp [kernel.kallsyms] [k] clear_page_c
25.17% test_nonhp test_nonhp [.] read_bytes
15.12% test_nonhp test_nonhp [.] write_bytes
4.43% test_nonhp [kernel.kallsyms] [k] __alloc_pages_nodemask
4.11% test_nonhp [kernel.kallsyms] [k] perf_ctx_adjust_freq
3.60% test_nonhp [kernel.kallsyms] [k] event_sched_out
1.41% test_nonhp [kernel.kallsyms] [k] _spin_lock
1.23% test_nonhp [kernel.kallsyms] [k] get_page_from_freelist
1.22% test_nonhp [kernel.kallsyms] [k] group_sched_out
1.00% test_nonhp [kernel.kallsyms] [k] __rmqueue
0.88% test_nonhp [kernel.kallsyms] [k] rb_erase
0.81% test_nonhp [kernel.kallsyms] [k] account_user_time
0.76% test_nonhp [kernel.kallsyms] [k] ctx_sched_in
0.73% test_nonhp [kernel.kallsyms] [k] update_curr
0.67% test_nonhp [kernel.kallsyms] [k] __percpu_counter_add
0.51% test_nonhp [kernel.kallsyms] [k] list_del
0.50% test_nonhp [kernel.kallsyms] [k] __mem_cgroup_commit_charge
0.41% test_nonhp [kernel.kallsyms] [k] raise_softirq
0.35% test_nonhp [kernel.kallsyms] [k] rcu_process_callbacks
0.33% test_nonhp [kernel.kallsyms] [k] native_apic_mem_write
0.32% test_nonhp [kernel.kallsyms] [k] perf_swevent_read
0.31% test_nonhp [kernel.kallsyms] [k] ctx_sched_out
0.31% test_nonhp [kernel.kallsyms] [k] tick_do_update_jiffies64
0.30% test_nonhp [kernel.kallsyms] [k] update_context_time
0.27% test_nonhp [kernel.kallsyms] [k] tick_sched_timer
0.27% test_nonhp [kernel.kallsyms] [k] read_tsc
0.26% test_nonhp [kernel.kallsyms] [k] rb_next
0.23% test_nonhp [kernel.kallsyms] [k] smp_apic_timer_interrupt
0.22% test_nonhp [kernel.kallsyms] [k] rcu_check_callbacks
0.22% test_nonhp [kernel.kallsyms] [k] apic_timer_interrupt
0.22% test_nonhp [kernel.kallsyms] [k] rcu_process_gp_end
0.16% test_nonhp [kernel.kallsyms] [k] lapic_next_event
0.16% test_nonhp [kernel.kallsyms] [k] __perf_event_task_sched_out
0.16% test_nonhp [kernel.kallsyms] [k] scheduler_tick
0.15% test_nonhp [kernel.kallsyms] [k] perf_event_task_tick
0.15% test_nonhp [kernel.kallsyms] [k] native_read_tsc
0.15% test_nonhp [kernel.kallsyms] [k] perf_pmu_nop_void
0.15% test_nonhp [kernel.kallsyms] [k] run_posix_cpu_timers
0.13% test_nonhp [kernel.kallsyms] [k] run_timer_softirq
0.13% test_nonhp [kernel.kallsyms] [k] select_task_rq_fair
0.12% test_nonhp [kernel.kallsyms] [k] __do_softirq
0.10% test_nonhp [kernel.kallsyms] [k] local_clock
0.09% test_nonhp [kernel.kallsyms] [k] exit_idle
0.09% test_nonhp [kernel.kallsyms] [k] account_process_tick
0.09% test_nonhp [kernel.kallsyms] [k] rb_insert_color
0.09% test_nonhp [kernel.kallsyms] [k] restore_args
0.09% test_nonhp [kernel.kallsyms] [k] clockevents_program_event
0.09% test_nonhp [kernel.kallsyms] [k] update_cpu_load
0.09% test_nonhp [kernel.kallsyms] [k] hrtimer_interrupt
0.09% test_nonhp [kernel.kallsyms] [k] profile_tick
0.08% test_nonhp [kernel.kallsyms] [k] select_idle_sibling
0.08% test_nonhp [kernel.kallsyms] [k] task_tick_fair
0.08% test_nonhp [kernel.kallsyms] [k] find_next_bit
0.08% test_nonhp [kernel.kallsyms] [k] ret_from_intr
0.08% test_nonhp [kernel.kallsyms] [k] update_wall_time
0.08% test_nonhp [kernel.kallsyms] [k] __run_hrtimer
0.08% test_nonhp [kernel.kallsyms] [k] jiffies_to_timeval
0.08% test_nonhp [kernel.kallsyms] [k] perf_ctx_lock
0.08% test_nonhp [kernel.kallsyms] [k] force_quiescent_state
0.08% test_nonhp [kernel.kallsyms] [k] acct_update_integrals
0.07% test_nonhp [kernel.kallsyms] [k] ktime_get
0.07% test_nonhp [kernel.kallsyms] [k] perf_event_task_sched_out
0.07% test_nonhp [kernel.kallsyms] [k] _spin_lock_irqsave
0.07% test_nonhp [kernel.kallsyms] [k] rcu_irq_enter
0.07% test_nonhp [kernel.kallsyms] [k] task_ctx_sched_out
0.07% test_nonhp [kernel.kallsyms] [k] __rcu_pending
0.07% test_nonhp [kernel.kallsyms] [k] rcu_irq_exit
0.07% test_nonhp [kernel.kallsyms] [k] do_softirq
0.07% test_nonhp [kernel.kallsyms] [k] enqueue_hrtimer
0.06% test_nonhp [kernel.kallsyms] [k] retint_swapgs
0.06% test_nonhp [kernel.kallsyms] [k] fget_light
0.06% test_nonhp [kernel.kallsyms] [k] _spin_lock_irq
0.06% test_nonhp [kernel.kallsyms] [k] _local_bh_enable
0.06% test_nonhp [kernel.kallsyms] [k] native_write_msr_safe
0.06% test_nonhp [kernel.kallsyms] [k] __rcu_process_callbacks
0.06% test_nonhp [kernel.kallsyms] [k] perf_pmu_enable
0.06% test_nonhp [kernel.kallsyms] [k] enqueue_entity
# Events: 840 cache-misses
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ......................
#
42.00% test_nonhp test_nonhp [.] read_bytes
29.40% test_nonhp [kernel.kallsyms] [k] clear_page_c
22.09% test_nonhp test_nonhp [.] write_bytes
6.45% test_nonhp [kernel.kallsyms] [k] __alloc_pages_nodemask
0.05% test_nonhp [kernel.kallsyms] [k] run_rebalance_domains
0.00% test_nonhp [kernel.kallsyms] [k] delayed_work_timer_fn
0.00% test_nonhp [kernel.kallsyms] [k] ctx_sched_in
# Events: 873 cycles
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................................
#
51.62% test_nonhp test_nonhp [.] read_bytes
48.10% test_nonhp test_nonhp [.] write_bytes
0.11% test_nonhp [kernel.kallsyms] [k] free_hot_cold_page
0.11% test_nonhp [kernel.kallsyms] [k] leave_mm
0.05% test_nonhp [kernel.kallsyms] [k] mem_cgroup_get_reclaim_stat_from_page
0.00% test_nonhp [kernel.kallsyms] [k] native_write_msr_safe
# Events: 44 page-faults
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ........................
#
97.79% test_nonhp test_nonhp [.] write_bytes
1.87% test_nonhp ld-2.12.so [.] _dl_map_object
0.26% test_nonhp ld-2.12.so [.] _dl_setup_hash
0.05% test_nonhp ld-2.12.so [.] _start
0.02% test_nonhp [kernel.kallsyms] [k] __clear_user
0.01% test_nonhp [kernel.kallsyms] [k] copy_user_generic_string
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
这里统计了比较多的指标,对比一下dTLBLoad 命中率:
hp:135/891=15%
nonhp: 566/892=63%