Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove usages of Array.newInstance() to enable native compilation #30

Closed
fcurts opened this issue Jan 17, 2019 · 7 comments
Closed

Remove usages of Array.newInstance() to enable native compilation #30

fcurts opened this issue Jan 17, 2019 · 7 comments

Comments

@fcurts
Copy link

fcurts commented Jan 17, 2019

Currently, Paguro uses java.lang.reflect.Array.newInstance() in a few places. It would be great if these reflective array instantiations could be removed so that applications that depend on Paguro can be compiled to native code with GraalVM. (Currently, this is only possible if all potential element types are statically known and the corresponding array types are registered with GraalVM.)

@GlenKPeterson
Copy link
Owner

I think to fix this I should try compiling with GraalVM. I guess one version of it ships with a newer version of Java and can be invoked with some compiler arguments? Is there a certain version of GraalVM or packaging or whatever that I should target?

@fcurts
Copy link
Author

fcurts commented Jan 17, 2019

GraalVM native compilation doesn't work with a stock JDK yet. I'd recommend to download GraalVM CE from https://github.com/oracle/graal/releases, which has everything bundled and ready to go. Instructions are here: https://www.graalvm.org/docs/reference-manual/aot-compilation/

Use of java.lang.reflect.Array.newInstance() is the only problem right now, or at least the only one detected at image build time. Once I work around that, the resulting binary passes all my tests (which exercise Paguro quite a bit).

@GlenKPeterson
Copy link
Owner

GlenKPeterson commented Jan 20, 2019

Thank you for introducing me to Graal - it looks interesting.

Using these instructions I downloaded the graal docker container (I don't like installing stuff directly on my system)

docker pull oracle/graalvm-ce:1.0.0-rc11

I ran it with:

docker run \
	-it \
	-v /home/gpeterso/temp2/Paguro/target:/paguro \
	oracle/graalvm-ce:1.0.0-rc11 bash

Inside the container I did this (you have to cd - building in the root directory doesn't work):

bash-4.2# cd paguro
bash-4.2# native-image -H:Name=paguro-graal --verbose --shared -cp Paguro-3.1.2.jar
Build on Server(pid: 10, port: 38545)
SendBuildRequest [
-task=com.oracle.svm.hosted.NativeImageGeneratorRunner
-imagecp
file:///opt/graalvm-ce-1.0.0-rc11/jre/lib/svm/builder/pointsto.jar:file:///opt/graalvm-ce-1.0.0-rc11/jre/lib/svm/builder/objectfile.jar:file:///opt/graalvm-ce-1.0.0-rc11/jre/lib/svm/builder/svm.jar:file:///opt/graalvm-ce-1.0.0-rc11/jre/lib/jvmci/graal-management.jar:file:///opt/graalvm-ce-1.0.0-rc11/jre/lib/jvmci/graal.jar:file:///opt/graalvm-ce-1.0.0-rc11/jre/lib/jvmci/jvmci-api.jar:file:///opt/graalvm-ce-1.0.0-rc11/jre/lib/jvmci/jvmci-hotspot.jar:file:///opt/graalvm-ce-1.0.0-rc11/jre/lib/boot/graaljs-scriptengine.jar:file:///opt/graalvm-ce-1.0.0-rc11/jre/lib/boot/graal-sdk.jar:file:///opt/graalvm-ce-1.0.0-rc11/jre/lib/svm/library-support.jar:file:///paguro/Paguro-3.1.2.jar
-H:Path=/paguro
-H:Name=paguro-graal
-H:Kind=SHARED_LIBRARY
-H:CLibraryPath=/opt/graalvm-ce-1.0.0-rc11/jre/lib/svm/clibraries/linux-amd64
]
[paguro-graal:10]    classlist:     150.12 ms
[paguro-graal:10]        (cap):     373.95 ms
[paguro-graal:10]        setup:     565.76 ms
[paguro-graal:10]   (typeflow):   1,607.61 ms
[paguro-graal:10]    (objects):     273.83 ms
[paguro-graal:10]   (features):      42.86 ms
[paguro-graal:10]     analysis:   1,952.87 ms
[paguro-graal:10]     universe:      66.52 ms
[paguro-graal:10]      (parse):     208.77 ms
[paguro-graal:10]     (inline):     410.83 ms
[paguro-graal:10]    (compile):     902.15 ms
[paguro-graal:10]      compile:   1,611.31 ms
[paguro-graal:10]        image:     127.90 ms
[paguro-graal:10]        write:      42.13 ms
[paguro-graal:10]      [total]:   4,534.33 ms
bash-4.2# 
bash-4.2# ls -lt
total 3372
-rw-r--r-- 1 root root    3408 Jan 20 21:09 graal_isolate_dynamic.h
-rw-r--r-- 1 root root    3312 Jan 20 21:09 graal_isolate.h
-rwxr-xr-x 1 root root 2139856 Jan 20 21:09 paguro-graal.so
...

It looks like it built. I think the two places that I did something evil array-type-wise were removed in this version, so that every time it creates an array, it now has a reified type at compile time.

Are you sure it's still an issue? How do I see this issue? Do I need to make a program that basically runs all the unit tests, compile that with Graal, and run it? Do you have a specific test that's failing that I could compile in Java, then with Graal, and run?

@fcurts
Copy link
Author

fcurts commented Jan 20, 2019

I can confirm that 3.1.2 solves this issue for me. (With the previous version, the native-image command failed with an error.) As you said, that's because the tclass argument is now always null or Node.class.

It still feels odd to me that Cowry uses reflective array instantiation, but it's no longer a real issue.

@GlenKPeterson
Copy link
Owner

Yay! Thanks so much for your great bug report. You've made Paguro better for everyone.

Re: Arrays and casting...

My understanding is that all arrays have a type, determined at compile time, and saved ("reified") at runtime, even if that type is Object. So there's no performance impact to giving them the correct type other than passing one parameter and executing one if statement.

Casting makes code harder to read, because of the visual noise it creates, the possibility that you could cast wrong, and from the suppress-type-warnings that you'd need on almost every method, basically turning off the type checker and inviting unrelated type errors in those methods.

Cowry exists because a significant portion of my work on the RrbTree was debugging off-by-one errors in array copies. Also to put all those suppress-type-warnings on tiny methods I could test independently. I had written all the inputStartPointer, outputEndPointer, +1 to skip the new item, etc. all over the RrbTree, wrong a different way each time. I wanted to have it one place where I knew it was correct. I was willing to sacrifice some portion of 1% performance, in order to complete the RrbTree at all. I needed to think about a bigger picture than off-by-one errors and casting.

That said, I think whatever performance penalty there is to calling those methods disappears once the JVM inlines it. I believe that I timed it carefully to prove that assumption, but I may be remembering wrong. Having all that code collected in one place may be a valuable hint for the JVM to inline it sooner?

I think that Cowry uses compile-time-typed arrays. I don't think there's any reflection (any more - thanks to you!).

I actually played around with making a global store of type signatures so that they would be available at runtime. I never got anywhere life-changing, but still think it's an interesting idea. I hear .net works that way and programmers like it.

@fcurts
Copy link
Author

fcurts commented Jan 20, 2019

Thanks for the explanation. For the record, here is my revised understanding: By now it's a choice between creating instances of Object[] or Node[], and it's desirable for this distinction to be made. Because it's just these two cases, and the existing array helper methods already use unchecked casts, it would be easy to avoid reflective array instantiation, but modern JITs remove it anyways.

@GlenKPeterson
Copy link
Owner

Yes, that's exactly my understanding. If I'm wrong I'd like to know.

The Clojure collections take the opposite approach. Especially PersistentHashMap which stores keys and values in alternate indices in the same array of type Object. It's much more likely to make a speed improvement there because the processor can fetch up to 32 values and keys into the cache (by fetching the whole array). If it stored tuples or Map.Entry objects in the array, it could still read an array of Entry objects into the cache, but would have to then read each key and value as separate memory fetches, with no guarantee about what it would grab with read-ahead (because it's reading from the heap and not an array).

That code is too complicated for me to mess with lightly, but I expect that any change to that design could ruin performance due to fetching and cache misses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants