Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: GPU temperature/activity bias #16

Open
emansom opened this issue Mar 31, 2023 · 4 comments
Open

Feature Request: GPU temperature/activity bias #16

emansom opened this issue Mar 31, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@emansom
Copy link
Contributor

emansom commented Mar 31, 2023

In workstation configurations inside tower cases, when running certain GPU heavy and low CPU workloads; it can lead to scenarios where the top case fans are not running at sufficient CFM for the hot air to be drawn upwards.

When the CPU temperature is low, while the GPU temperature is not.

The GPU (blower style fan) is then recycling its own pocket of hot air, instead of the case fans helping.

To combat this, a bias of sorts could be introduced that influences the curve based on GPU temperature and/or activity.

@emansom
Copy link
Contributor Author

emansom commented Apr 17, 2023

While testing locally, I think the easiest way to tackle this, is by the following algorithm:

When GPU activity peaked >30% in the last 60 seconds, increase minimum fan speed to atleast 55%.

This drasticly improves thermals by 10C (from 50C to 40C on moderate GPU load) in a workstation ATX tower case.

@petersulyok
Copy link
Owner

Hi @emansom, some questions came into my mind on this topic:

  • Biggest bottleneck that IPMI has only two zones defined (CPU and HD), I assume you would like to use CPU for this purpose, right?
  • How would you read GPU temperature? I know only vendor specific tools (nvidia, amd etc.) for reading temperature but I do not know about standard interface (like HWMON)
  • Do you think if multiple GPUs should be supported?

Let me know your view on this.

@petersulyok petersulyok added the enhancement New feature or request label Aug 17, 2023
@emansom
Copy link
Contributor Author

emansom commented Aug 17, 2023

  • How would you read GPU temperature? I know only vendor specific tools (nvidia, amd etc.) for reading temperature but I do not know about standard interface (like HWMON)

One way to tackle this would be to take a look how nvtop implemented this and port this to a seperate Python module, e.g. called python-gpustats or similar.

For less code duplication, it would be useful to abstract this behind a shared library written in C with Python bindings so both projects could utilize the same paths and it would be somewhat agnostic.

There may already exist a library for this. I have not searched wide, nor asked around.

@emansom
Copy link
Contributor Author

emansom commented Aug 17, 2023

  • Biggest bottleneck that IPMI has only two zones defined (CPU and HD), I assume you would like to use CPU for this purpose, right?

Given there exists a multitude of case configurations, I think the zones should be configurable, defaulting to all for optimal airflow.

As increasing all zones by GPU load percentage and GPU temperature would result in the best temperatures.

Some users may prefer lower noise however, so this increase should be configurable.

  • Do you think if multiple GPUs should be supported?

I think it should loop over all GPUs in the system with the increase bias taking effect if any of them show load or have higher temperatures (not if-else based, just addition based math)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
2 participants