-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Request: Threadsafety of PersistentHashMap #40
Comments
I just created a Google Group mailing list for support questions, mainly in the hopes that if more people are looking at what I say, more people can correct me. So that's an option for future support requests: https://groups.google.com/d/forum/paguro Until a bug is found, I'm assuming that PersistentHashMap, on its own, is thread-safe. I think we may have to zoom out to see whether you are using this thread-safely or not, but I can say a few things.
The map itself is never modified in place. In order to represent a modification, the reference to the map must change. We need to see that in your code sample in order to determine how thread-safe that change is. Also, it's a good idea to state why you need multiple threads so that we can understand better the problem you are trying to solve. The JVM has some collections designed for specific types of concurrent modification depending on what you're doing. Paguro's collections are designed to be safe against accidental change which may or may not be what you want. If you post more code, maybe that's a good first use of the mailing list? |
That link to the google group did not work for me (error message "no access" although I logged in to Google). Perhaps it needs a while to become fully available. Here's the complete code:
The map in question is a cardinality map representing the marking of a Petri net. I'm doing concurrent alternative traversals of that net, each one running in its own worker thread, starting from a common initial marking. So several threads will concurrently call this method to add or remove tokens. Each thread will update its own map reference to the returned value. All those modifications "go back" to the shared initial marking. |
You probably already know the adage, "Synchronize access to shared mutable state." You must synchronize both read and write access to the shared state using the same lock object (the enclosing class if you don't specify). Presumably you have a private In any case, I still don't see where the map reference is changed. Passing the map reference to a function is exactly the opposite of what you probably want, because it encourages you to nest synchronized blocks for the read and for the write (which is bad). I'm running out right now for several hours, but will re-read all this later or tomorrow. Also, "Java Concurrency in Practice" by Brian Goetz has a nice recipe for this. Joshua Bloch's Effective Java has some good tips too. |
P.S. I made the group visible to all. Thanks for the heads-up! |
Thanks for your help! I know that I should synchronize shared mutable state. My question is, is PersistentHashMap thread-safe?
Imagine there are multiple instances of some class running such code repeatedly and concurrently, with each thread holding its own reference to the map, but that each thread was initialized with the same instance of PersistentHashMap to start with. The program may continually spawn new groups of such threads based on a common initial state, when another bit of the state space of the Petri net needs to be explored. This basically creates a tree of markings (modifications of the initial map) that I'm backtracking over. Can each thread create new modifications concurrently with the others? Are Furthermore, can one thread read entries from or iterate over the state of the map that it holds a reference to, while other threads create new modifications to the same state of the map? The class is not documented to throw ConcurrentModificationException, so I'd expect to be alright here. If PersistentHashMap is not thread-safe, I have two options:
I think that persistent collections ought to be thread-safe by nature. What would be the use of implementing a sophisticated structure-sharing copy-on-write mechanism in a non-thread-safe manner? As a matter of course, persistent collections would be used preferably in concurrent environments. I'm just not sure about PersistentHashMap, because nowhere is it stated that the class is thread-safe. (At least I haven't seen any such statement.) PS: I've been admitted to the Google group, but I guess it is now too late to move the entire discussion there. |
You might be looking for something like this: // Must be private
private ImMap<Place, Integer> map = map();
// Put atomic map read-and-write in one place. Do not pass or return map.
private void merge(
@Nullable Place key,
@NotNull Integer value,
@NotNull Fn2<Integer, Integer, Integer> remappingFunction
) {
// Synchronize around map read and write - this entire section must be atomic.
synchronized (this) {
Integer mapValue = map.get(key);
Integer newValue = (mapValue == null) ? value
: remappingFunction.apply(mapValue, value);
map = (newValue == null) ? map.without(key)
: map.assoc(key, newValue);
}
}
// Map is no longer read or modified directly in this function.
public void fire(Transition transition) {
// remove tokens
for (Place p : transition.preset()) {
merge(p, -1, this::tokenSum);
}
// add tokens
for (Place p : transition.postset()) {
merge(p, 1, this::tokenSum);
}
} |
Paguro has collections that are designed to be thread-safe. This means safe from accidental modification, even when that happens in multiple threads. Your example is almost entirely single-threaded code, since the part that does all the work has to be inside a synchronized block. I think you instead want high-concurrency modification of a shared map using I'm pretty sure that's a much better answer to your question. |
Thanks for that input. I may not have made myself clear, perhaps I'm missing something so very obvious that you take it for granted. I'm sure I do not want modification of a shared
That's a tree of modified states of some initial Now I want the same thing working concurrently, that is with different threads exploring parts of that tree. For example, thread 1 may create And what about iteration? Can thread 1 iterate over the |
Aha! Now I think I understand what you are doing. Yes, this is probably a good use of PersistentHashMap! I also understand that this is frustrating. I've been trying to skim what you say because I've been busy and still want to help. To be clear, PersistentHashMap has been used and has been thread-safe in Clojure and Paguro for years now. It is always possible that there could be a bug, but the likelihood is extremely small in that code. Each call to "without" or "assoc" does not modify the existing map, but instead returns a new map which you can think of as a modified copy of the original. It's thread-safe not because of synchronization or ConcurrentModificationExceptions, but because it's referentially transparent. It's also very fast, considering what it's doing. Just because something is thread-safe on its own, doesn't prevent you from misusing it! Consider, "The problems with synchronized collections" in chapter 5 of "Java Concurrency in Practice" by Brian Goetz. He specifically mentions the put-if-absent scenario saying that even though the "get" method and "put" method are individually thread-safe, you may still need to synchronize a block of code to make sure that no other thread modifies the collection between the "get" and the "put." He's got some nice diagrams. I really recommend you read that if you haven't already. This is not as good as the book, but:
Source: https://www.ibm.com/developerworks/library/j-jtp07233/index.html With PersistentHashMap, the collection itself is totally safe. But the line where you say The code you included in this discussion never initializes currentMarking. How you do that is important for solving this problem. How do you plan to get the leaf maps? I see how you the leaf maps would include the contents of the root, but they are all declared in a local scope that disappears once the function returns. You might store the finished ImMaps in a ConcurrentHashMap, or other java.util.Concurrent collection. Presumably whatever collection you use to ensure that you process each node exactly once could also be used to store the map for the processing up to that node. I don't need to know the answers to those questions, but you do, and you need to think about concurrency when you answer them. The information that needs to be communicated between threads. I hope that helps. I'm probably not going to respond again until Thursday. |
Thank you very much! That helped a lot. It may be that the idea of a shared reference has been the source of some confusion: I never intended the threads to share the same reference. I only intended to provide for the possibility that at some point in time these different references (thread-local ones) would refer to the same map, and that the code would still work the same as if it were single-threaded. Never at any point in time would one thread be able to update the reference held by another thread, although it might update the same (copy of the) map. I do not need to communicate any information between the threads. Indeed, I want to isolate the threads so that they cannot exchange any information. I need to collect the information gathered by exploring the branches of the tree I painted above. When recursion unwinds, at each inner node I select the "best" copy of the map to return to the next higher level. Just think of fork/join. Fork-join-tasks also do not usually exchange information. With that in mind I think that I'm totally OK with my code as it is, when I put everything you say together:
I hope we've got that straightened out now. I'm sorry to have taken up so much of your time, and thank you very much again! |
In terms of points 3, 4, and every time one of us uses the term "reference," and your statement, "I do not need to communicate any information between the threads"... If you just write each thread's output to the command line, or over the network, or maybe to some kind of messaging system, then everything you say makes sense - you don't need to communicate anything between threads. You are correct. Yay! I was assuming that each thread was going to produce some sort of result that you'd then want reported back to whatever part of the program started those threads. I assumed that all those threads might communicate their results simultaneously, which would then involve some Shared Mutable State which you'd have to think about very carefully in order for it to work correctly and still yield the maximum performance benefits from using multiple threads. Anyway, it sounds like you've got what you needed. Thanks for your interest! I'll close this ticket in a few days if I don't hear from you. |
Hello there,
I have not been able to determine whether PersistentHashMap is thread-safe. The Javadoc and usage examples don't seem to say. More specifically, say that I have a
PersistentHashMap<K, V> map
and concurrently execute the following code on it with different key-value combinations, would that be OK or do I need to synchronize?The reference to map never changes, and the map itself is never modified, but can I concurrently execute multiple calls to
without
andassoc
and get consistent results?The text was updated successfully, but these errors were encountered: