Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H11SSL-i fan problem on proxmox #37

Open
petersulyok opened this issue May 31, 2024 · 8 comments
Open

H11SSL-i fan problem on proxmox #37

petersulyok opened this issue May 31, 2024 · 8 comments
Labels
question Further information is requested

Comments

@petersulyok
Copy link
Owner

@Xyz00777 reported an issue in SMFC hardware compatibility #19 issue:

trying to get it working for my H11SSL-i with ASPEED AST2500 with an proxmox install.
because im not sure with fans are connected on what pwm i tried to set lower to 500 for every fan and 2000 as upper limit for every fan in the config

# This script must be executed by root.
if [ "$EUID" -ne 0 ]
then
    echo "ERROR: Please run as root"
    exit -1
fi

# Setup of the lower threshold limits of the fans (Noctua NF-F12 PWM rotation speed 300-1500 rpm).
# Edit the list of fans here (FAN1, FAN2, FAN4, FANA, FANB)!
for i in 1 2 3 5 A B;
do
    # Edit the lower threshold values here (0, 100, 200)!
    ipmitool sensor thresh FAN${i} lower 500 500 500 500 500 500
done

# Setup of the upper threshold limits of the fans (Noctua NF-F12 PWM rotation speed 300-1500 rpm).
# Edit the list of fans here (FAN1, FAN2, FAN4, FANA, FANB)!
for i in 1 2 3 5 A B;
do
    # Edit the upper threshold values here (1600, 1700, 1800)!
    ipmitool sensor thresh FAN${i} upper 2000 2000 2000 2000 2000 2000
done

i have Iceberg Thermal IceGALE Xtra with 500-2500 rpm and Noctua NH-U9 TR4-SP3 with 400-2000 rpm

after i loaded the modules and executed the install.sh file i have startet the service and got these journalctl log and the service crashed with 100% fan speed

May 31 03:07:18 ds9 systemd[1]: Started smfc.service - Super Micro Fan Control.
May 31 03:07:18 ds9 smfc.service[11931]: Logging module was initialized with:
May 31 03:07:18 ds9 smfc.service[11931]:    log_level = 3
May 31 03:07:18 ds9 smfc.service[11931]:    log_output = 2
May 31 03:07:18 ds9 smfc.service[11931]: Command line arguments:
May 31 03:07:18 ds9 smfc.service[11931]:    original arguments: /opt/smfc/smfc.py -c /opt/smfc/smfc.conf -l 3
May 31 03:07:18 ds9 smfc.service[11931]:    parsed config file = /opt/smfc/smfc.conf
May 31 03:07:18 ds9 smfc.service[11931]:    parsed log level = 3
May 31 03:07:18 ds9 smfc.service[11931]:    parsed log output = 2
May 31 03:07:19 ds9 smfc.service[11931]: Ipmi module was initialized with:
May 31 03:07:19 ds9 smfc.service[11931]:    command = /usr/bin/ipmitool
May 31 03:07:19 ds9 smfc.service[11931]:    fan_mode_delay = 10
May 31 03:07:19 ds9 smfc.service[11931]:    fan_level_delay = 2
May 31 03:07:19 ds9 smfc.service[11931]:    swapped_zones = False
May 31 03:07:29 ds9 smfc.py[11931]: Traceback (most recent call last):
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 1150, in <module>
May 31 03:07:29 ds9 smfc.py[11931]:     service.run()
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 1119, in run
May 31 03:07:29 ds9 smfc.py[11931]:     self.cpu_zone = CpuZone(self.log, self.ipmi, self.config)
May 31 03:07:29 ds9 smfc.py[11931]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 600, in __init__
May 31 03:07:29 ds9 smfc.py[11931]:     super().__init__(
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 395, in __init__
May 31 03:07:29 ds9 smfc.py[11931]:     self.build_hwmon_path(hwmon_path)
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 632, in build_hwmon_path
May 31 03:07:29 ds9 smfc.py[11931]:     raise ValueError(self.ERROR_MSG_FILE_IO.format(path))
May 31 03:07:29 ds9 smfc.py[11931]: ValueError: Cannot read file (/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input).
May 31 03:07:33 ds9 smfc.service[11931]: smfc terminated: all fans are switched back to the 100% speed.
May 31 03:07:33 ds9 systemd[1]: smfc.service: Main process exited, code=exited, status=1/FAILURE
May 31 03:07:33 ds9 systemd[1]: smfc.service: Failed with result 'exit-code'.

Please help i dont want my fans to spin up every ~10 sec for 5 sec :(

@petersulyok
Copy link
Owner Author

Hi @Xyz00777,

Your problem is that the CPU temperature cannot be read from HWMON, as the log stated:

ValueError: Cannot read file (/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input).

Based on SuperMicro official page you have AMD CPU and you have to configure the proper file manually in smfc config. You can find more information here, it will be something like this:

hwmon_path=/sys/bus/pci/drivers/k10temp/0000*/hwmon/hwmon*/temp1_input

@petersulyok petersulyok added the question Further information is requested label May 31, 2024
@Xyz00777
Copy link

the path is the same on my system, i decommented it in the /opt/smfc/smfc.con and i was able to start it :)

Thank you very mutch!
can i provide you/do you need anymore information for further developement if these really awesome software? or can we close the issue?

May 31 14:52:26 ds9 systemd[1]: Started smfc.service - Super Micro Fan Control.
May 31 14:52:26 ds9 smfc.service[6241]: Logging module was initialized with:
May 31 14:52:26 ds9 smfc.service[6241]:    log_level = 3
May 31 14:52:26 ds9 smfc.service[6241]:    log_output = 2
May 31 14:52:26 ds9 smfc.service[6241]: Command line arguments:
May 31 14:52:26 ds9 smfc.service[6241]:    original arguments: /opt/smfc/smfc.py -c /opt/smfc/smfc.conf -l 3
May 31 14:52:26 ds9 smfc.service[6241]:    parsed config file = /opt/smfc/smfc.conf
May 31 14:52:26 ds9 smfc.service[6241]:    parsed log level = 3
May 31 14:52:26 ds9 smfc.service[6241]:    parsed log output = 2
May 31 14:52:27 ds9 smfc.service[6241]: Ipmi module was initialized with:
May 31 14:52:27 ds9 smfc.service[6241]:    command = /usr/bin/ipmitool
May 31 14:52:27 ds9 smfc.service[6241]:    fan_mode_delay = 10
May 31 14:52:27 ds9 smfc.service[6241]:    fan_level_delay = 2
May 31 14:52:27 ds9 smfc.service[6241]:    swapped_zones = False
May 31 14:52:37 ds9 smfc.service[6241]: CPU zone fan controller was initialized with:
May 31 14:52:37 ds9 smfc.service[6241]:    ipmi zone = 0
May 31 14:52:37 ds9 smfc.service[6241]:    count = 1
May 31 14:52:37 ds9 smfc.service[6241]:    temp_calc = 1
May 31 14:52:37 ds9 smfc.service[6241]:    steps = 6
May 31 14:52:37 ds9 smfc.service[6241]:    sensitivity = 3.0
May 31 14:52:37 ds9 smfc.service[6241]:    polling = 2.0
May 31 14:52:37 ds9 smfc.service[6241]:    min_temp = 30.0
May 31 14:52:37 ds9 smfc.service[6241]:    max_temp = 60.0
May 31 14:52:37 ds9 smfc.service[6241]:    min_level = 35
May 31 14:52:37 ds9 smfc.service[6241]:    max_level = 100
May 31 14:52:37 ds9 smfc.service[6241]:    hwmon_path = ['/sys/bus/pci/drivers/k10temp/0000:00:18.3/hwmon/hwmon6/temp1_input']
May 31 14:52:37 ds9 smfc.service[6241]:    Temperature to level mapping:
May 31 14:52:37 ds9 smfc.service[6241]:    0. [T:30.0C - L:35%]
May 31 14:52:37 ds9 smfc.service[6241]:    1. [T:35.0C - L:45%]
May 31 14:52:37 ds9 smfc.service[6241]:    2. [T:40.0C - L:56%]
May 31 14:52:37 ds9 smfc.service[6241]:    3. [T:45.0C - L:67%]
May 31 14:52:37 ds9 smfc.service[6241]:    4. [T:50.0C - L:78%]
May 31 14:52:37 ds9 smfc.service[6241]:    5. [T:55.0C - L:89%]
May 31 14:52:37 ds9 smfc.service[6241]:    6. [T:60.0C - L:100%]
May 31 14:52:37 ds9 smfc.service[6241]: HD zone fan controller was initialized with:
May 31 14:52:37 ds9 smfc.service[6241]:    ipmi zone = 1
May 31 14:52:37 ds9 smfc.service[6241]:    count = 1
May 31 14:52:37 ds9 smfc.service[6241]:    temp_calc = 1
May 31 14:52:37 ds9 smfc.service[6241]:    steps = 4
May 31 14:52:37 ds9 smfc.service[6241]:    sensitivity = 2.0
May 31 14:52:37 ds9 smfc.service[6241]:    polling = 10.0
May 31 14:52:37 ds9 smfc.service[6241]:    min_temp = 32.0
May 31 14:52:37 ds9 smfc.service[6241]:    max_temp = 46.0
May 31 14:52:37 ds9 smfc.service[6241]:    min_level = 35
May 31 14:52:37 ds9 smfc.service[6241]:    max_level = 100
May 31 14:52:37 ds9 smfc.service[6241]:    hwmon_path = ['/sys/class/scsi_disk/1:0:0:0/device/hwmon/hwmon0/temp1_input']
May 31 14:52:37 ds9 smfc.service[6241]:    Temperature to level mapping:
May 31 14:52:37 ds9 smfc.service[6241]:    0. [T:32.0C - L:35%]
May 31 14:52:37 ds9 smfc.service[6241]:    1. [T:35.5C - L:51%]
May 31 14:52:37 ds9 smfc.service[6241]:    2. [T:39.0C - L:67%]
May 31 14:52:37 ds9 smfc.service[6241]:    3. [T:42.5C - L:83%]
May 31 14:52:37 ds9 smfc.service[6241]:    4. [T:46.0C - L:100%]
May 31 14:52:37 ds9 smfc.service[6241]:    WARNING: Standby guard is disabled ([HD zone] count=1
May 31 14:52:37 ds9 smfc.service[6241]:    hd_names = ['/dev/disk/by-id/ata-Patriot_P210_512GB_P210IBCB23102410314']
May 31 14:52:37 ds9 smfc.service[6241]:    Standby guard is disabled
May 31 14:52:37 ds9 smfc.service[6241]:    hddtemp_path = /usr/sbin/hddtemp
May 31 14:52:39 ds9 smfc.service[6241]: CPU zone: new level > 32.4C > [T:30.0C/L:35%]
May 31 14:52:41 ds9 smfc.service[6241]: HD zone: new level > 30.0C > [T:32.0C/L:35%]
@petersulyok
Copy link
Owner Author

Maybe a hint: if you have only one SSD installed, you may disable the HD Zone and connect all fans to CPU Zone.
Or do you have more hard disks?

@Xyz00777
Copy link

i have 8 hdd and 2 ssd :D
but i find out one thing after i restarted my server 2 times. every time he restarts, ~ at the moment smfc starts the fans ramp up completly even if smfc is running smoothly, i have to restart the smfc service one time to let the fans go down again... 🤔

@Xyz00777
Copy link

correction, it looks like it took around 3 and a half minute after system start to let the fans go down again

May 31 15:41:52 ds9 smfc.service[2585]: CPU zone: new level > 37.6C > [T:40.0C/L:50%]
May 31 15:41:54 ds9 smfc.service[2585]: HD zone: new level > 35.0C > [T:32.0C/L:25%]
May 31 15:45:23 ds9 smfc.service[2585]: CPU zone: new level > 34.6C > [T:35.0C/L:37%]

@petersulyok
Copy link
Owner Author

petersulyok commented May 31, 2024

i have 8 hdd and 2 ssd

They are not in the config currently. You have to specify them in hd_names= config parameter. I suggest to remove SSDs and keep HDDs in the config.

it looks like it took around 3 and a half minute after system start to let the fans go down again

Do not worry. This is a typical fine tuning of your configuration. The fan level is controlled in a dynamic way based on the temperature, meaning low temperature will define low fan rotation speed.

Please check and configure the proper temperatures and fan levels for the fans in the CPU and HD zones. The default values on the configuration will not fit to your system. Please take a look in the documentation, it is long but will help you to create a proper configuration. I'm also happy to help you here.

@petersulyok
Copy link
Owner Author

I was thinking on that:

at the moment smfc starts the fans ramp up completly even if smfc is running smoothly, i have to restart the smfc service one time to let the fans go down again

You may reset the IPMI BMC (sometime it has issues)

$ ipmitool mc reset cold

and after reset you should define threshold values again!

@Xyz00777
Copy link

Xyz00777 commented Jun 8, 2024

i think these didnt fixed it really but when it happens i just restart the facility so its okay for now :), thanks!
And i switched to the hdd temps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
2 participants