-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance issue with List.sortBy #631
base: master
Are you sure you want to change the base?
Conversation
I recently review this PR as part of a larger List API run-through, and came to the conclusion that while this improves the worst case performance, it reduces stuff like In Haskell, the overhead of calling map twice is likely very small due to deforestation and other compiler opts. But Elm, being strict, has no choice but to pay that overhead in full. For people who have complex sort fn's and who would benefit from this implementation, could easily define this in their programs. However, for simpler use cases, it would not be possible to make a new implementation which avoid the "double map" overhead this PR introduces, and so would not be able to avoid the performance degredation. As such, would not recommend applying this PR. |
Short answer:If the current implementation of the function is kept, then I think for honesty (and the users' sake) the function should be renamed into something like Long answer:We probably do not need to talk much about Haskell. I am pretty sure that there is a constant factor overhead of the So let's consider the situation in Elm. Your assumption that the "accessor case" is the most common use case for There is at least one very big Elm codebase. Maybe someone could check uses of |
Haskell is being mentioned only because the code from this PR is taken directly from Haskell's When the code mentioned in the PR copies to Haskell implementation, it makes sense to mention that Haskell has certain optimizations that reduces the overhead of such an implementaiton. Optimizations that Elm doesn't have. We also have to consider that by changing the implementation of Another thing to keep in mind is that if this PR is applied, those who only uses a simple accessor has no other avenue to speed up their code. If this PR is not applied, people could create their own I can give you datapoints from the 50KLOC Elm codebase I've got at work. Our uses of I'm convinced that Elm's current |
Let's talk about Elm, not Haskell. Your data is indeed interesting. I guess you would have been happy with the function being called something that actually conveys how it should be used, then? None of my comment from yesterday was trying to convince anybody (after three years) that this PR should be merged as is. Yes, there are other options to address the problem (which you acknowledge to exist). What's a way forward? Documentation change? Name change for the function? I am aware of all the points reiterated above (like the "Another thing to keep in mind ..." part). What I proposed in the "Short answer" part of my comment yesterday would address them, I think. Also, if the aim were to solve the performance issue itself, rather than relying on the assumption about most uses of this function (but possibly making it a non-silent assumption), then doing it exactly as in Haskell is not the only option. As I pointed out in the original GitHub issue (#485) that later was replaced by this PR: "in details this can be even further improved". I believe I was thinking of solving this in native code. That is, |
Actually, no integration with the sorting algorithm itself (the "on the fly along with the first sorting pass" mentioned above) should be needed at all. Looking at the relevant part of Line 75 in bf91c9f
there are already essentially two map passes performed anyway, namely the |
To me Your second comment about modifying native code might make sense. It might be something to add to the |
Cool. Just one "rebuttal" on the naming question: I agree that |
Fair point |
The issue fixed here is as follows, testable at http://elm-lang.org/try.
The following program:
needs, in the browser on my machine, almost 4 seconds after each button click to show the new result. (The time taken, in milliseconds, will be printed as the first component of the pair in the button text.)
The problem is that
f
is called far too often on individual elements oflist
.By just swapping in the following definition instead of the current
List.sortBy
:the 4 seconds from above are reduced to well under 1 second.
This is not just about a constant factor time difference. By changing
f
and/orlist
, the "old"List.sortBy
can be arbitrarily (even asymptotically) more expensive than the "new" version. The converse is not true. The new implementation is guaranteed to never be asymptotically worse than the currentList.sortBy
.The new version is also how the equivalent functionality is implemented in the Haskell standard library.
This pull request makes it so that Elm's
List.sortBy
uses the more efficient implementation.