Discussing the design, decisions and discovery involved in developing Mutability Detector, an open-source analysis tool for Java.

Sunday 5 September 2010

Detecting Setter Methods Ain't So Easy After All

Who could have thought that detecting if a class has setter methods could be so complicated?

Up until a few days ago, the SetterMethodChecker was the most simple checker of all. The meat of the checker could be summed up with the following code:

   if("<init>".equals(this.methodName)) return; // We're not concerned with the constructor 
    if(opcode == Opcodes.PUTFIELD) {
         // declare the class as mutable, as a field reference can be changed

This simple check was effective for the most part. It correctly detected the mutability of classes such as this:

     class MutableWithSetterMethod {
          private int foo;

          // constructor, and getter, etc

          public void setFoo(int newFoo) { this.foo = foo; }

This is fine for the most part, but if classes stray from this basic pattern, even just a little, the checker can be fooled into thinking an immutable class... isn't. In my quest for realistic examples of immutable classes, I came across a fairly common example: java.math.BigDecimal from Sun's JDK[1]. The SetterMethodChecker raises the following issues:

     Field [precision] can be reassigned within method [subtract]
     Field [scale] can be reassigned within method [divide]
     Field [scale] can be reassigned within method [divide]
     Field [precision] can be reassigned within method [negate]
     Field [precision] can be reassigned within method [precision]
     Field [precision] can be reassigned within method [setScale]
     Field [precision] can be reassigned within method [scaleByPowerOfTen]
     Field [stringCache] can be reassigned within method [toString]
     Field [intVal] can be reassigned within method [inflate]
     Field [intCompact] can be reassigned within method [readObject]
     Field [intVal] can be reassigned within method [stripZerosToMatchScale]
     Field [scale] can be reassigned within method [stripZerosToMatchScale]
     Field [precision] can be reassigned within method [stripZerosToMatchScale]
     Field [intCompact] can be reassigned within method [stripZerosToMatchScale]
     Field [intVal] can be reassigned within method [roundThis]
     Field [intCompact] can be reassigned within method [roundThis]
     Field [scale] can be reassigned within method [roundThis]
     Field [precision] can be reassigned within method [roundThis]
     Field [scale] can be reassigned within method [dropDigits]

Nineteen examples of how BigDecimal is mutable, apparently. Not really of course, BigDecimal is immutable, but the code is complicated enough to fool the checker. There are three patterns of field reassignment which a) retain immutability b) are common use cases and c) require more sophistication to detect. I hope to fix the SetterMethodChecker to recognise these patterns and handle them accordingly.

Valid Setter Method Patterns

Reassigning fields in private methods called only from the constructor.
As detailed above, the method 'roundThis' reassigns four fields of BigDecimal. However, the method is private, and only ever called from a constructor - in 7 of the 18 available BigDecimal constructors. This pattern was also discovered in the JodaTime libraries, which use a private setFields() method for serialisation. If the mutating method is guaranteed[2] to only be called before the constructor completes, the class can be safely called immutable.

Solution? A setter method makes the class mutable unless it is private, and called only from the constructor.

Reassigning fields on references to an object other than 'this'.
A common idiom for immutable types is, rather than having setter methods mutating internal state, is to construct new instances, modified from the original as required, and returned from the method. A common example is the likes of String.toUpperCase(), which, rather than mutating the char array of this String, copies it, converting the chars to upper case, and returns a new String instance. The toUpperCase() is not the most representative example, as it misses a common case - where an instance is created, and fields are reassigned after construction[3]. Consider the following code (an immutable equivalent of the earlier snippet):

     public class ImmutableWithNoSetterMethod {
          private int foo;
          public ImmutableWithNoSetterMethod(int foo) {    this.foo = foo; }
          public int foo() { return this.foo; }
          public static ImmutableWithNoSetterMethod newAndMutatedInstance() {
               ImmutableWithNoSetterMethod toReturn = new ImmutableWithNoSetterMethod(0);
               toReturn.foo = 42; // mutating an instance after construction
               return toReturn;

Here the newly created instance is mutated after construction - that's hardly a model of immutability! However, it's safe in this example, and in the case of BigDecimal. So how is this safe? Well, the instance is modified with private access, and it's done before the reference is published. So perhaps this case falls in the grey area between immutability and thread-safety. I would argue that, since non-final fields don't qualify for the Java Memory Model's view of immutability anyway, that this kind of pattern qualifies for any useful definition of immutability.

In the code above, despite the mutation of the already constructed object, there is no way for a client of the code to view a change in the object. Multithreaded code is not an issue for the newAndMutatedInstance() method, since nothing is published to multiple threads, and concurrent calls to the method will each construct their own stack, with no shared references[4]. Thus there is no danger in mutating the local variable toReturn. However, the SetterMethodChecker is fooled. Partly because the check is very basic: it looks for the PUTFIELD opcode, but doesn't investigate which object is having its field reassigned. Thus the code this.foo = 42 and toReturn.foo = 42 are treated equally.

Solution? Check that the field reassignment is performed on the 'this' object, either directly or through something reached by the this reference's object graph (i.e. `this.foo.bar.value = 42;` is still a mutation). This checker also has to collaborate with the (as yet unwritten) EscapedThisReferenceChecker to ensure that the reference isn't published to multiple threads before it's mutated. So if the ImmutableWithNoSetterMethod constructor or the newAndMutatedInstance() method is modified to call a method which shares the the reference before the mutation, the class will be dubbed mutable.

Reassignment of fields as a lazy-initialisation and caching strategy.
This is the main reason[5] that java.lang.String is found to be mutable. The result of hashCode() is computed lazily, and cached. But to the SetterMethodChecker, it just makes the class look mutable. The same goes for BigDecimal: the result of toString() is only computed when called the first time. The code for this pattern in BigDecimal is almost as simple an example as can be made:

     class BigDecimal // etc {
          private volatile transient String stringCache = null;
          // etc
          public String toString() {
               if (stringCache == null)
                    stringCache = layoutChars(true);
               return stringCache;

This example can be generalised, in psuedo-code, to the following idiom:

     class {
          private X field; // initialised to default value 
                                   // whatever that is for type X
               if(field == default value) {
                    compute value of field
               }                return field;           }      }

This kind of pattern satisfies immutability - the object can not seen to change by its clients. String's hashCode() and BigDecimal's toString() will always return the same result, even though the object was not fully formed on completion of the constructor. So what appears to be downright mutability... isn't. The SetterMethodChecker is fooled once again. I'm almost feeling sorry for it.

Solution? The scope surrounding a field reassignment should be investigated to find if there is a "if equal to default value" condition guarding the change. This isn't as satisfying as it should be, since fields could technically be initialised to an "uncalculated" marker (e.g. private String myField = "UN-INITIALISED";). Though I predict the complexity of detecting this pattern doesn't lie with what is being compared to, I'm happy planning the trade-off that fields must be initialised to the default value to qualify for the lazy initialisation exemption.

A Note On Lazy Initialisation and Data Races

If you've been reading about lazy initialisation (hello to someone other than me!) you may have been thinking "Well, String's hashCode() and BigDecimal's toString() aren't synchronised, surely there's a potential data race here?". If you have, you are right. Technically, myBigDecimal.toString() could be called across concurrently executing threads, and the second caller could execute the comparison while the first is computing, but has not yet assigned a result to the stringCache field (shown below in the diagram).

This could result in stringCache being assigned to multiple times. However, it is safe (I think). This example can be described as a "benign data race": yes the field can be reassigned, but only to the same value, each and every time. String's hashCode() is calculated with only immutable state, it doesn't matter that there is a data race to assign the field, the result will always be the same. Same goes for BigDecimal's toString(). Now, I say I only "think" this is safe, because there's a whole lotta stuff in the Java Memory Model (which I have investigated, but hasn't "sunk in" yet) that can throw spanners in the works. Because there is no synchronisation of either method, there is no guarantees of happens-before in terms of the assignment. This can mean that while the field is being assigned, other threads could view the field with stale data (i.e. a reference is still null) or even worse, the reference points to an object which isn't in a fully constructed state (i.e. fields are not as they will be when the constructor completes). As far as I can tell, BigDecimal and String are protected from this because the field assigned in the benign data race is immutable, and the Java Memory Model has some guarantees around visibility in that area. Or, in the case of BigDecimal, because the field has the volatile modifier - I'm not sure yet. Needless to say I can't commit to a solution for this problem until I can be confident of the safety.

So, we have three patterns which I think are common enough to need to be handled correctly before I can release a v1.0 of Mutability Detector. They've been added to the list. As I said, who could have thought a check for a setter method could be so complicated?

[1] Sun JDK 1.6 Update 12
Ignoring reflection as a possibility for calling methods.
toUpperCase() uses a String constructor which allows setting all the necessary changes in one call.
[4] This is what's known as "stack confinement", discussed in the excellent Java Concurrency In Practice.
[5] There is another reason, but it's caused by the primitiveness of the tool, rather than what I'd call a complicated problem.