What's the best way of ensuring a Function argument is serializable?
Asked Answered
O

1

14

I'm writing a serializable class that takes several arguments, including a Function:

public class Cls implements Serializable {
    private final Collection<String> _coll;
    private final Function<String, ?> _func;

    public Cls(Collection<String> coll, Function<String, ?> func) {
        _coll = coll;
        _func = func;        
    }
}

func is stored in a member variable, and so needs to be serializable. Java lambdas are serializable if the type they're being assigned to is serializable. What's the best way to ensure that the Function I get passed in my constructor is serializable, if it is created using a lambda?

  1. Create a SerializableFunction type and use that:

    public interface SerializableFunction<F, R> implements Function<F, R>, Serializable {}
    ....
    public Cls(Collection<String> coll, SerializableFunction<String, ?> func) {...}
    

    Issues:

    • There's now a mismatch between the coll and func arguments, in that func is declared as serializable in the signature, but coll is not, but both are required to be serializable for it to work.
    • It doesn't allow other implementations of Function that are serializable.
  2. Use a type parameter on the constructor:

    public <F extends Function<String, ?> & Serializable>
    Cls(Collection<String> coll, F func) {...}
    

    Issues:

    • More flexible than 1, but more confusing.
    • There's still a mismatch between the two arguments - the func argument is required to implement Serializable in the compile-time type heirarchy, but coll is just required to be serializable somehow (although this requirement can be cast away if required).

    EDIT This code doesn't actually compile when trying to call with a lambda or method reference.

  3. Leave it up to the caller

    This requires the caller to know (from the javadocs, or trial-and-error) that the argument needs to be serializable, and cast as appropriate:

    Cls c = new Cls(strList, (Function<String, ?> & Serializable)s -> ...);
    

    or

    Cls c = new Cls(strList, (Function<String, ?> & Serializable)Foo::processStr);
    

    This is ugly IMO, and the initial naive implementation of using a lambda is guaranteed to break, rather than likely to work as with coll (as most collections are serializable somehow). This also pushes an implementation detail of the class onto the caller.

At the moment I'm leaning towards option 2, as the one that imposes the least burden on the caller, but I don't think there's an ideal solution here. Any other suggestions for how to do this properly?

EDIT: Perhaps some background is required. This is a class that runs inside storm, in a bolt, which is serialized to transfer to a remove cluster to execute. The function is performing an operation on the processed tuples when run on the cluster. So it is very much part of the class's purpose that it is serializable and that the function argument is serializable. If it is not, then the class is not usable at all.

Outwash answered 29/6, 2015 at 11:53 Comment(13)
Wouldn't the problem with option 2 exist just the same if there was no second parameter at all?Peritonitis
I'm not sure how that would work if you receive a capturing lambda...Bettyebettzel
Ah yes, option 2 doesn't actually compile. I've removed it.Outwash
Why did you remove the second option? You only have to place the public modifier at the right location, i.e. before the declaration of the type argument. public <F extends Function<String, ?> & Serializable> Cls(Collection<String> coll, F func) { …Kishke
I get a compile error 'cannot infer type-variable F' when trying to pass a method referenceOutwash
@thecoop: I guess, you are talking about the attempt to pass a lambda expression as it works with concrete types implementing both interfaces. So, if the compiler cannot infer the type argument for the lambda expression, you have to insert an explicit type cast (or provide type arguments). Then it’s not more concise than option 3 but it is still enforcing the constraint which is what your question is all about.Kishke
Java generics go just so far before becoming convoluted and ugly. If you want FP quality - expressed in code -, use a more functional JVM language like Scala. Though Scala has its own issues. Momentarily java generics + lambdas is reminiscent to C++ templates in the beginning: very nice, erratic, and to be used moderately. Overdone it will not develop productively with. I mean: it will be fine not to specify the utmost.Glycol
One of the primary reasons for using Java is type safety. Given that what you are doing throws type safety out of the window I think it should be seen as a hint that there is something fundamentally wrong with your design. Which is to say don't try and serialize functions, instead just serialize your data and use classes to store your functions.Succinct
@Succinct added info as to why serialization is requiredOutwash
If I understand the use case you are using this to send code to a remote location and then perform remote code execution (something like 1990s agents). I would advocate for sending the class file to the remote machine and then using the class loader at run-time to load the class and execute and instance of it with the serialized data.Succinct
Serialization, unfortunately, throws all the rules out the window. It is a language feature that masqerades as a library feature. It is a dynamic typing feature that masquerades as a static typing feature. It violates all the rules of OO (objects are no longer exclusively created by constructors). So, as much as folks like @Succinct will want to wag their finger at you, once you're using serialization, you're already in a world of serious compromise, and you're left to choose the least bad of the alternatives in front of you.Dinorahdinosaur
@BrianGoetz in your opinion which is the least bad alternative for sending code to a server for remote execution?Succinct
@thecoop: you should understand that the Serialization process of lambda expressions is not that simple, it still depends on the class in which the lambda expression is defined. So the advantage of lambda expressions goes away when using Serialization. The most efficient way to transfer functions is to create an enum implementing the functional interface and having it’s constants implementing the desired behavior, then transfer these constants (that won’t transfer any data besides the enum class and constant name).Kishke
K
9

In most cases the answer is: don’t.

You may notice that most classes of the JRE, even ObjectOutputStream.writeObject do not enforce Serializable in their signature. There are simply too many APIs not specifically to Serialization where the compile-time information about an object implementing Serializable gets lost and using them together with Serialization would require lots of type casting if the latter enforced their inputs to be Serializable.

Since one of your parameters is a Collection, you may get examples from that API:

Collections.unmodifiableList:

The returned list will be serializable if the specified list is serializable.

You will find more of these operations which care to retain the Serialization capability without retaining the Serializable compile-time type on the result.

This also applies to all non-public types, e.g. the results of Collections.emptyList(), Arrays.asList(…) and Comparator.reverseOrder(). They all are Serializable without declaring it.


Further, every class having more use cases than just getting serialized should refrain from enforcing to be always Serializable. That would hinder the uses where no Serialization is involved.

Regarding the Collection parameter, you may consider removing the serializable constraint at all. Normally, you protect your class against later-on changes to the collection you received. A simple solution is to copy the collection and when your doing it, you may use a type which supports Serialization.

Even if you want to avoid copying, the Serialization itself is a copying process per se, so you can simply create custom readObject and writeObject methods storing the contents of the Collection, eliminating the need to have a Serializable collection.


To summarize it, usually the policy is that if the user of your class intends to serialize instances of it, it’s the responsibility of the user that all components put into it are themselves Serializable.

Kishke answered 29/6, 2015 at 13:41 Comment(1)
Added some more information as to why serialization is requiredOutwash

© 2022 - 2024 — McMap. All rights reserved.