Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy/non-lazy concat() in PersistentHashSet versus PersistentVector #49

Closed
raner opened this issue May 6, 2022 · 6 comments
Closed

Comments

@raner
Copy link

raner commented May 6, 2022

I'm using Paguro in a project that deals with commutative and non-commutative operations, and I use PersistentHashSet for commutative ones and a PersistentVector for non-commutative ones, to distinguish between ordered and unordered elements. Elements are added to these data structures via the concat(Iterable) method. I noticed that for a PersistentVector this results (as expected) in a new PersistentVector object, but the PersistentHashSet I receive an intermediate transformation operator (Xform$AppendIterDesc) that (lazily, I assume) represents the concatenation. Notably, the result is not a PersistentHashSet.

Is this intentionally designed like this? If so, it might be helpful to include the rationale for this divergent behavior somewhere in the documentation. I'm using concat because it seems to be the only API method that allows me to append to either data structure without knowing what the underlying implementation is. Is there some other method that would allow for that while returning an object of the same type as the original data structure?

@GlenKPeterson
Copy link
Owner

Thanks for your interest in Paguro and for your insightful comments. It looks like I implemented concat() on PersistentVector and ImList, but not on ImSet or ImMap. Possibly because concat() is a very cheap operation on PersistentVector and I didn't know how expensive it would be on Set/Map?

If you're familiar with Clojure, Transformable/Xform is kind of like the Sequence Abstraction. As you figured out, by default, all methods on Xform produce another Xform, until you call one of the terminator methods like .toImList(), .toImSet(), .toImMap() at which point the transformation is actually processed and an immutable result produced. The theory was to avoid processing any data you don't need to. If you later .drop(1000000), or take(1) it shouldn't do any subsequent transformations on the excluded items (it may, in some cases, but that would be a bug if you find such a case).

I guess it would be purer and less surprising to get rid of that implementation and make you call .toImList() or .toImSet(). But it's probably more convenient for your use case to implement .concat() on ImSet and ImMap instead. Of course, then you'd want to implement .precat() on those, and on the RRB-Tree, but not on the PersistentVector because adding things to the beginning of a persistent vector is the most expensive operation you can do to it. I think you end up with a surprise somewhere (or a leak in the abstraction) either way.

For your purposes, maybe we should have the Xform remember the source? Have a "render to same form as source collection?" Not sure I'd want that to render to Arrays, if an Xform was started with an xformArray(). IDK. Thoughts? Suggestions?

In your use case, how are you making use of the Sets and Lists later? Do you check what they are and treat one as a set and the other as a List? Or use them both as Iterables or Transformables, or what?

@raner
Copy link
Author

raner commented May 7, 2022

Thanks for your prompt response and for shedding some light on why there are different implementations for the two types. I definitely see the different complexity for implementing concat on those two very different data structures.

Still, it can be very surprising to receive an eager, shape-retaining result for one type and a lazy, shape-discarding result for another type, when calling a method of a common interface. Since UnmodIterable.concat only guarantees a result type of UnmodIterable there is technically nothing wrong with returning a PersistentVector in one case and a Xform$AppendIterDesc in the other. In my case, a problem arises because these data structures themselves are stored in a hashed data structure, and I rely on the fact that a PersistentVector calculates hashes in an order-dependent way, whereas a PersistentHashSet uses order-independent hashing. The Xform$AppendIterDesc appears to inherit the system hash code, and after concatenation, sets with identical elements will in all likelihood have different hash codes, which in turn causes problems with a caching mechanism for these objects. Reimplementing hashCode/equals for Xform$AppendIterDesc is probably out of the question, since these would be O(n) implementations.

A common problem with most "fluent iterable" libraries is that the original "shape" or type of the initial data structure is lost on the first transformation, and all subsequent steps only deal with a generic Iterable or Transformable that loses many properties of the original data structure. For non-persistent libraries there is always a possibility to fall back on mutable data structures and retain the original type that way. No such option exists for persistent data structures, which, I think, makes it even more important to offer shape-retaining variants for all transformations. Sure, I can always call .toImList() or .toImSet(), but what if I don't know what the original shape was (and it actually matters, as in my case)?

Of course, in plain Java it is notoriously difficult to express the concept that a method returns the same type as the object that it was called on (and will typically require some sort of higher-kinded type library).

I'm still too new to Paguro to make any real design recommendations here, but it would be nice to have some consistency (i.e., either all implementations of an interface method are lazy, returning a transformation, or they are all eager, returning a new object of the original type). In the end, this might require supporting two sets of methods, a set of efficient transformer-based methods, and another set of possibly less efficient, shape-retaining methods. I like your suggestion of having the Xform remember the original shape and have some sort of .toOriginalShape() method, but it seems that would either require subsequent casting (and knowing what type to cast to!) or some sort of HKT signature, so I'm not sure how feasible that would be.

@GlenKPeterson
Copy link
Owner

GlenKPeterson commented May 7, 2022

Random thoughts:

New theory: If an abstraction is going to leak somewhere, you might as well put the leak where you only find it if you're doing something you shouldn't be with it. Like trying to prepend or precat to a PersistentVector. That way, I could implement concat on everything, and precat on everything except PersistentVector. That would let me fix your issue without making anything more surprising for anyone else?

This project includes an Equator and ComparisonContext, both of which allow you to use arbitrary concepts of equality and comparison that are not tied to the default implementation in .equals() and .hashCode().
Check out PersistentHashSet.ofEq(). A brief usage example is here:
https://github.com/GlenKPeterson/Paguro/blob/master/src/test/java/org/organicdesign/fp/collections/PersistentHashSetTest.java#L325

This project also includes (an experimental) RuntimeTypes, which could be used to preserve aspects of the original type for an Xform. Right now it's only used for the toString() implementation of the OneOf... union type classes, but it could be used to preserve type information for any reason.
https://github.com/GlenKPeterson/Paguro/blob/master/src/test/java/org/organicdesign/fp/type/RuntimeTypesTest.java#L108

@GlenKPeterson
Copy link
Owner

GlenKPeterson commented May 9, 2022

Version 3.10.2 is released (may take a few hours to be visible). Let me know if that solves your issue.

I implemented .concat() and .precat() on unordered sets and maps (and .precat() on RrbTree). It's sensible and cheap to do that. I didn't implement .precat() or .concat() on ordered sets/maps because that would suggest an ordering other than the one declared by the comparator. So if you call those methods on sorted maps/sets, you're still going to get an Xform. Ditto calling .precat() on ImList (but on RrbTree, it's fine to call that).

Hmm... Yeah, now I'm having misgivings.

SortedSet(3, 4, 5).precat(1, 2) yields an Xform that starts with 1, 2, then however the source is iterated (3, 4, 5). So it insures that the precat-stuff comes first. Similarly SortedSet(3, 4, 5).concat(6, 7) yields the itrerated source, then the to-be-concatenated additions. I just made UnorderedSet(3, 4, 5).precat(6, 7) return a set containing the numbers 3-7 in any order. That's not what precat (or concat) is supposed to do.

I think I'm going to back out some of these changes. Sorry!

@GlenKPeterson
Copy link
Owner

GlenKPeterson commented May 9, 2022

Version 3.10.3 is released, fixing my crazy stuff from the previous round.

This is how I would solve your issue: Make a wrapper class to abstract the type of the underlying collection. This is overly complicated in several ways, but hopefully it's helpful:

import org.jetbrains.annotations.NotNull;
import org.junit.jupiter.api.Test;
import org.organicdesign.fp.collections.ImList;
import org.organicdesign.fp.collections.ImSet;
import org.organicdesign.fp.collections.PersistentHashSet;
import org.organicdesign.fp.collections.PersistentVector;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.organicdesign.fp.FunctionUtils.stringify;

public class OneOf2Test {

    static class Set_List<T>
            extends OneOf2<ImSet<T>, ImList<T>>
            implements Iterable<T>
    {

        private final Class<T> clazz;
        // Constructor
        @SuppressWarnings("unchecked")
        private Set_List(@NotNull Object o, int n, Class<T> claz) {
            // Casting thanks to https://stackoverflow.com/a/1080525/1128668
            super(o,
                  (Class<ImSet<T>>)(Class<?>) ImSet.class,
                  (Class<ImList<T>>)(Class<?>) ImList.class,
                  n);
            clazz = claz;
        }


        @Override
        public @NotNull String toString() {
            return stringify(item) + ":" +
                   "ImSet<" + clazz.getSimpleName() + ">|" +
                   "ImList<" + clazz.getSimpleName() + ">";
        }

        public @NotNull Set_List<T> add(T item) {
            return match(set  -> new Set_List<>(set.put(item), 0, clazz),
                         list -> new Set_List<>(list.append(item), 1, clazz));
        }

        @Override
        public @NotNull Iterator<T> iterator() {
            return match(set  -> set,
                         list -> list).iterator();
        }

        // Alternate implementation:
//        @Override
//        @SuppressWarnings("unchecked")
//        public @NotNull Iterator<T> iterator() {
//            return ((Iterable<T>) item).iterator();
//        }

        // Factory method
        public static <U> @NotNull Set_List<U> createSet(Class<U> clazz) {
            return new Set_List<>(PersistentHashSet.empty(), 0, clazz);
        }

        // Factory method
        public static <U> @NotNull Set_List<U> createList(Class<U> clazz) {
            return new Set_List<>(PersistentVector.empty(), 1, clazz);
        }
    }

    @Test
    public void testStuff() {
        Set_List<Integer> operations1 = Set_List.createSet(Integer.class);
        Set_List<Integer> operations2 = Set_List.createList(Integer.class);
        operations1 = operations1.add(3);
        operations1 = operations1.add(5);
        operations1 = operations1.add(5);

        operations2 = operations2.add(3);
        operations2 = operations2.add(5);
        operations2 = operations2.add(5);

        assertEquals("PersistentHashSet(3,5):ImSet<Integer>|ImList<Integer>", operations1.toString());

        assertEquals("PersistentVector(3,5,5):ImSet<Integer>|ImList<Integer>", operations2.toString());
    }
}

You can make your wrapper implement whatever common interface is most helpful to you. Here I implemented Iterator. There's a lot of types in this example. I'm sure you can do something much simpler if you don't need .equals() and .hashCode() and can hardcode some types. I never even tried to make a union type with generics like this before.

@raner
Copy link
Author

raner commented May 11, 2022

Thanks for the quick fix, Glen! Yes, this looks like it will solve my issue (haven't fully coded everything yet, but I'll let know if I come across anything that doesn't work as expected). Thank you so much!

@raner raner changed the title Lazy/non-lazy concat() in PersistentHashMap versus PersistentVector May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants