Performance Limit reasons #333
Replies: 22 comments 120 replies
-
For sure that is a great feature suggestion which I will implement. Clearing an event bit aka resetting its register may depend on the architecture. I'm remember freezing a Processor because writing the status log MSR was unsupported: despite the general case in specs. Definitely an enhancement to do. |
Beta Was this translation helpful? Give feedback.
-
MSR_CORE_PERF_LIMIT_REASONS (
|
Beta Was this translation helpful? Give feedback.
-
MSR_CORE_PERF_LIMIT_REASONS (
|
Beta Was this translation helpful? Give feedback.
-
MSR_CORE_PERF_LIMIT_REASONS (
|
Beta Was this translation helpful? Give feedback.
-
A quick bash script to check core performance limit reasons. I tried also resetting the log, it didn't hang my PC. #!/bin/bash
# MSR_CORE_PERF_LIMIT_REASONS (0x64f)
# https://www.intel.com/content/dam/develop/public/us/en/documents/335592-sdm-vol-4.pdf
# Table 2-39
printf "Cur: $( rdmsr -0 -f 0:0 0x64f ) Log: $( rdmsr -0 -f 16:16 0x64f ) \t PROCHOT \n"
printf "Cur: $( rdmsr -0 -f 1:1 0x64f ) Log: $( rdmsr -0 -f 17:17 0x64f ) \t Thermal Status \n"
printf "Cur: $( rdmsr -0 -f 2:2 0x64f ) Log: $( rdmsr -0 -f 18:18 0x64f ) \t [Reserved] \n"
printf "Cur: $( rdmsr -0 -f 3:3 0x64f ) Log: $( rdmsr -0 -f 19:19 0x64f ) \t [Reserved] \n"
printf "Cur: $( rdmsr -0 -f 4:4 0x64f ) Log: $( rdmsr -0 -f 20:20 0x64f ) \t Residency State Regulation Status \n"
printf "Cur: $( rdmsr -0 -f 5:5 0x64f ) Log: $( rdmsr -0 -f 21:21 0x64f ) \t Running Average thermal Limit Status \n"
printf "Cur: $( rdmsr -0 -f 6:6 0x64f ) Log: $( rdmsr -0 -f 22:22 0x64f ) \t VR Therm Alert Status \n"
printf "Cur: $( rdmsr -0 -f 7:7 0x64f ) Log: $( rdmsr -0 -f 23:23 0x64f ) \t VR Therm Design Current Status \n"
printf "Cur: $( rdmsr -0 -f 8:8 0x64f ) Log: $( rdmsr -0 -f 24:24 0x64f ) \t Electrical Design Current Limit / Other Status \n"
printf "Cur: $( rdmsr -0 -f 9:9 0x64f ) Log: $( rdmsr -0 -f 25:25 0x64f ) \t [Reserved] \n"
printf "Cur: $( rdmsr -0 -f 10:10 0x64f ) Log: $( rdmsr -0 -f 26:26 0x64f ) \t Package/Platform-Level Power Limiting PL1 Status \n"
printf "Cur: $( rdmsr -0 -f 11:11 0x64f ) Log: $( rdmsr -0 -f 27:27 0x64f ) \t Package/Platform-Level Power Limiting PL2 Status \n"
printf "Cur: $( rdmsr -0 -f 12:12 0x64f ) Log: $( rdmsr -0 -f 28:28 0x64f ) \t Max Turbo Limit Status \n"
printf "Cur: $( rdmsr -0 -f 13:13 0x64f ) Log: $( rdmsr -0 -f 29:29 0x64f ) \t Turbo Transition Attenuation Status \n"
printf "Cur: $( rdmsr -0 -f 14:14 0x64f ) Log: $( rdmsr -0 -f 30:30 0x64f ) \t [Reserved] \n"
printf "Cur: $( rdmsr -0 -f 15:15 0x64f ) Log: $( rdmsr -0 -f 31:31 0x64f ) \t [Reserved] \n"
printf "Cur: $( rdmsr -0 -f 63:32 0x64f ) \t [Reserved] \n"
# To reset Log:
# wrmsr 0x64f 0x00000000 Output while running a stress test:
|
Beta Was this translation helpful? Give feedback.
-
For your testings commit 324bc29 is available in |
Beta Was this translation helpful? Give feedback.
-
Very odd, without any change, except a fresh boot,
EDIT: No Errata found on subject. |
Beta Was this translation helpful? Give feedback.
-
In the past I put a condition to check presence of a thermal BIOS interrupt activation. Once Commit b3ff656 is taking the risk to change that and now double checks the If Interrupts not capable then clearing bits in Non regression tests are required on all Intel processors: Need everyone' reports! |
Beta Was this translation helpful? Give feedback.
-
A better mock-up of the real-time textual representation of CPU limit reasons that I had in mind and showed from another program in the opening post: 2022-03-31.16-58-41_limit-reasons-mockup.mp4What happens in the video:
Yellow = Logged limit reason |
Beta Was this translation helpful? Give feedback.
-
@BugReporterZ : Preview c4421b3 {General, Core, GFX, Ring} |
Beta Was this translation helpful? Give feedback.
-
A dedicated Events window function is available for testings. Four columns, General aka IA, Core, GFX, RING, don't fit the terminal width of 80 characters. A layout suggestion ? |
Beta Was this translation helpful? Give feedback.
-
@BugReporterZ |
Beta Was this translation helpful? Give feedback.
-
Added a |
Beta Was this translation helpful? Give feedback.
-
Hello @BugReporterZ This is the on-going development: Yellow for Event log, Magenta for Event status It has been simulated by code to raise all of them. Please let me know how does it work with your CPU ? |
Beta Was this translation helpful? Give feedback.
-
@BugReporterZ : In commit 7cadc81 temperature thresholds are now decoded based on TjMax. Thus you should read the Celsius in the right direction. But don't go beyond your TjMax. Here stressing two Cores to reach a temperature between the 2 indicators and to trigger only the first threshold alarm. |
Beta Was this translation helpful? Give feedback.
-
Commit cb803e1 is adding an array parameter to the driver.
Examples
|
Beta Was this translation helpful? Give feedback.
-
It isn't documented in the datasheet, but Bit 14 (log bit 30) appears to be for Thermal Velocity Boost/TVB (a functionality that decreases or increases core speed by a fixed bin depending on temperature). In the test below I set it to 70 °C. 2022-04-18_TVB-bit.mp4In the MSR datasheet it just says "reserved". I just found this since HWInfo64 (a popular CPU/system monitoring program for Windows) also lists TVB among CPU core limit reasons. |
Beta Was this translation helpful? Give feedback.
-
@BugReporterZ : TVB event is ready for your testings in latest EDIT: plz wait, forgot the log label |
Beta Was this translation helpful? Give feedback.
-
@BugReporterZ Hello,
|
Beta Was this translation helpful? Give feedback.
-
Seems to work fine. 2022-05-04_compact-limit-reasons.mp4The only remaining issue is that DTS events cannot be cleared. I made a test and if it works if I write 0 to the corresponding MSR for both core and package, i.e.:
|
Beta Was this translation helpful? Give feedback.
-
IINM your referring to a ratio/multiplier of which frequency depends on base/bus clock. |
Beta Was this translation helpful? Give feedback.
-
An important aspect of monitoring CPU performance is knowing what is currently limiting or throttling it. There is more than just thermal throttling.
On modern Intel CPUs counters exist to monitor performance limiting reasons for IA (Intel Architectures = Cores), Ring (Uncore), GT (Graphics Technology = Integrated graphics).
I'm aware that the following MSRs give this information:
MSR_GRAPHICS_PERF_LIMIT_REASONS
MSR_CORE_PERF_LIMIT_REASONS
MSR_RING_PERF_LIMIT_REASONS
(source: https://www.intel.com/content/dam/develop/external/us/en/documents/335592-sdm-vol-4.pdf . I think some of the bits that were "reserved" in this document are now used for meaningful information like for example Voltage limit)
A new View could be added in CoreFreq for this, taking inspiration from the Windows application Throttlestop.
In ASCII it could be something like this (with all limits engaged, although unlikely in practice):
Real world situations may only show a few limits in one or two columns like this:
Throttlestop color codes with RED currently engaged limits and with YELLOW limits that have engaged since the last counter reset. Some sort of short term logging could be done in many ways, though.
Beta Was this translation helpful? Give feedback.
All reactions