Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional optimisation in OpenGL3 impl: don't save and restore state #6085

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

slowriot
Copy link

@slowriot slowriot commented Jan 15, 2023

After profiling an ImGUI application running in the browser, compiled to WASM built with Emscripten, rendering WebGL using the OpenGL3 backend, I observed surprisingly slow performance - the majority of frame time is spent in expensive glGetIntegerv calls from ImGui_ImplOpenGL3_RenderDrawData():

image
image

Over 60ms is spent just querying OpenGL state every frame.

Analysis

These calls are made when querying the existing OpenGL state each frame, which is then saved and restored at the end of the function. For many applications, this is simply unnecessary - in some cases the application already sets up its state from scratch each frame, and in other cases the state set by the implementation may already coincide with the application's desired state.

In any case, it may often be desirable to manually set only those parts of the state which diverge from the application's desired state, after calling ImGui_ImplOpenGL3_RenderDrawData(), thus avoiding incurring the expensive glGetIntegerv calls.

The specific cost in this case comes from glGetIntegerv calls which in turn call emscriptenWebGLGet, producing an expensive WebGL call to getParameter. Calls to getParameter produce synchronous stalls on the calling thread, and the general advice is they should be avoided (https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API/WebGL_best_practices#avoid_blocking_api_calls_in_production).

This PR's changes

This PR makes the following changes to backends/imgui_impl_opengl3.cpp:

  • Gate expensive OpenGL state query and subsequent restore behind a new define, IMGUI_IMPL_RESTORE_STATE. This is set by default.
  • Accept an optional define IMGUI_IMPL_OPENGL_NO_RESTORE_STATE - if this is defined, it unsets IMGUI_IMPL_RESTORE_STATE, and OpenGL states are not saved and loaded.

Note that the new define does not disable all state saving and loading - simple state loaded with glIsEnabled (glIsEnabled(GL_BLEND), glIsEnabled(GL_DEPTH_TEST), etc) is still saved and restored, as glIsEnabled does not incur the performance penalty of glGetIntegerv.

Results

After making this change, and defining IMGUI_IMPL_OPENGL_NO_RESTORE_STATE in my project, my profile for a single frame looks like this:

image
image

Total time spent in ImGui_ImplOpenGL3_RenderDrawData() each frame went from 62.9ms to <0.1ms - a speedup of over 600x. For reference, this benchmark is on Chromium 108.0.5359.124 64bit on Linux, GPU is Nvidia RTX 2070 Super.

I recognise that the cost of glGetIntegerv here is disproportionately higher in a WebGL context than it may be for other users of the implementation, but I believe this change will be of some use to non-WebGL users also.

Gate expensive OpenGL state query and subsequent restore behind IMGUI_IMPL_RESTORE_STATE.

Parse optional define IMGUI_IMPL_OPENGL_NO_RESTORE_STATE - if set, unsets IMGUI_IMPL_RESTORE_STATE.
@ocornut
Copy link
Owner

ocornut commented Jan 16, 2023

Thanks for the detailed post. This is quite useful to know, and quite a surprise to be honest.

The specific cost in this case comes from glGetIntegerv calls which in turn call emscriptenWebGLGet, producing an expensive WebGL call to getParameter. Calls to getParameter produce synchronous stalls on the calling thread, and the general advice is they should be avoided (https://developer.mozilla.org/en-US/docs/Web/API/WebGL_API/WebGL_best_practices#avoid_blocking_api_calls_in_production).

For the records and future reference, do you know if all calls to glGetInteverv() are more or less as slow, or does the slow call only happens for a selection of calls but not all?

I think we can and we should rework that as a runtime option rather than a compile-time option, just because we can. It is easier and more convenient for people to toggle runtime options.

I would suggest adding a bool ConfigBackendRendererNoRestoreState; in ImGuiIO:

bool ConfigBackendRendererNoRestoreState; // Request renderer backend to not backup/restore all graphics state (if supported by backend). Application is responsible for explicitly setting its state after calling the renderer. In particular: WebGL on Emscripten gets much slower due to state query, hence this feature.

A feature notice in both imgui_impl_opengl3..cpp and .h under "Implemented Features"

//  [X] Renderer: Support for io.ConfigBackendRendererNoRestoreState (useful on some WebGL setups where glGetIntegerV() calls are slow).

Adding the corresponding checks in the code. Due to the interleaved initialization/getter the runtime version patch gets a little less easy to do. I'll let you try and find the nicer way to express it (with/without reformatting), one aim being to make that boilerplate backup/restore not so noisy in the source file.

Thank you!

@topolarity
Copy link

How serendipitous! I was looking into something similar last weekend.

Unfortunately, I think the speed-up here may be a red herring. The glGetIntegerv calls create a GPU -> CPU dependency, so that the time spent on the GPU shows up in the CPU profiling. Removing them reduces the start->end CPU time for the frame since we no longer have to wait for the GPU, but it doesn't actually help the GPU finish the frame any faster (at least in my tests)

@topolarity
Copy link

topolarity commented Jan 23, 2023

My suspicion is that the real bottleneck is related to how the draw calls are translated to OpenGL by the Browser.

On macOS, I noticed much better performance in Firefox and Safari. Even Chrome performed much better if I changed the ANGLE backend in chrome://flags to Metal.

I did try an optimization based very loosely on https://www.khronos.org/opengl/wiki/Vertex_Post-Processing#User-defined_clipping, and I found that it dramatically improved performance (15-20 fps -> 60 fps) on Chrome's default OpenGL ANGLE backend. This is the diff here, I'd be curious to hear if you see the same result.

(Also as a heads up the change in that diff is just an experiment and is very incomplete: It doesn't duplicate vertices when they are used with multiple scissor boxes, it doesn't verify that all of the draw commands are using the same texture, and it doesn't apply the user callback, if provided.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment