Discussing the design, decisions and discovery involved in developing Mutability Detector, an open-source analysis tool for Java.

Sunday, 15 July 2012

Times Have Changed, But An Instant Stays The Same: Checking for Mutability in JSR 310


I recently attended a London Java Community meetup to try out the new date and time API scheduled for Java 8[1]. JSR 310 (or "ThreeTen" as named on GitHub) provides an upgrade to the standard date and time libraries available in the JDK. At first use, ThreeTen looks to have addressed the mistakes of old JDK Date API, and draws influence from the popular, defacto standard date/time library, JodaTime[2].

Overall I was very impressed with the design and discoverability of the API. And of course, I'm a big fan of it's design choice to use immutability wherever possible.

Since JodaTime is somewhat of a poster-child for immutability in Java, and it's unsurprising the same design principle has transferred to ThreeTen. I was curious to see how effective Mutability Detector could be in helping to enforce this principle.

Using the highly rigourous and scientific method of grep'ing the codebase for the term " immutable ", as included in the JavaDoc for such types, I found 55 immutable classes in ThreeTen. For each of those 55, I added a unit test case to assert the class is immutable. Out of those 55, 6 tests are failing. Among the passing tests are some of ThreeTen's core classes: LocalDate, LocalDateTime, Duration, Period, and more. I'm really impressed by how 'cleanly' these immutable implementations are. I'm also happy with the apparent success rate of Mutability Detector on this real-world codebase (though I know that the results assume zero false negatives).

There's two ways to look at 6 failing tests from 55. Firstly, assuming that when a ThreeTen author documents a class as immutable, it is, Mutability Detector produces 6 false positives. Secondly, assuming that Mutability Detector correctly analyzes, ThreeTen has unexpected mutability in 6 classes. Since these conditions are mutually exclusive, over the next few blog entries, I'm going to take a look at the classes in question. First up is DateTimeFormatter.

The assertion:

  assertImmutable(javax.time.format.DateTimeFormatter.class);

Fails with:

  org.mutabilitydetector.unittesting.MutabilityAssertionError: 
  Expected: javax.time.format.DateTimeFormatter to be IMMUTABLE
      but: javax.time.format.DateTimeFormatter is actually NOT_IMMUTABLE
      Reasons:
 Field can have a mutable type (java.util.Locale) assigned to it. [Field: locale, Class: javax.time.format.DateTimeFormatter]
 Field can have a mutable type (javax.time.format.DateTimeFormatterBuilder$CompositePrinterParser) assigned to it. [Field: printerParser, Class: javax.time.format.DateTimeFormatter]
      Allowed reasons:
 None.

DateTimeFormatter is considered mutable because it has two fields of mutable types. The types in question being java.util.Locale, and a ThreeTen internal type, CompositePrinterParser.  Since the result for Locale (for the JDK 1.6u25 incarnation at least) is a false positive relating to The String Problem, which I can suppress in the unit test, I'll take a more detailed look at CompositePrinterParser.

The problem that Mutability Detector has with this class is that it contains a field which is an array type. There are several ways of (ab)using an array field that breaks immutability. The class can: (A) modify the contents of the array field after construction; (B) let a mutable element of the array escape to where it can be modified; (C) let the array escape so that callers can modify the contents; (D) take the array passed to the constructor without performing a defensive copy. Note that Mutability Detector isn't yet capable of this kind of analysis, so it just panics at the sight of an array field, and leaves it up to us, the developer, to investigate.

Out of those potential problems described:

(A) doesn't happen. The class only iterates over the array, it doesn't modify the contents.
(B) isn't applicable here. The element type of the array is an interface which is contracted to be immutable.
(C) and (D) are the interesting parts. There is no getter for the array field, and it's private. The only way the array escapes, is that it is passed as the constructor argument to another instance of the same class. Since we know that same class doesn't modify the array, the destination for the escaping array is a "safe haven". That particular constructor is also only ever called by the same class, so it doesn't need to copy the passed in array. There is a second constructor, but that takes a List, and performs a defensive copy of it, so the class really is in control of that array field.

VERDICT: False Positive

WHAT TO DO?:
For both Locale, and CompositePrinterParser, we can add suppressions, or "allowed reasons" for mutability. Thus the test case can look like this:

assertInstancesOf(javax.time.format.DateTimeFormatter.class, 
                  areImmutable(), 
                  provided(Locale.class).isAlsoImmutable(),
                  provided("javax.time.format.DateTimeFormatterBuilder$CompositePrinterParser").isAlsoImmutable());
               
This will allow the test to pass, ignoring the false positives, but failing again if someone introduces mutability another way.
               
Score so far, ThreeTen 1, Mutability Detector 0. Next up, ResourceZoneRulesDataProvider. (Click here for the next blog entry).

[1] Thanks to OpenGamma, Stephen Colebourne, Richard Warburton, James Gough et al for organising.
[2] This is unsurprising, since one of the technical spec leads of JSR 310, Stephen Colebourne is the creator of JodaTime.




Thursday, 1 March 2012

Announcing Version 0.8 and MutabilityDetector4FindBugs


Version 0.8 of MutabilityDetector has been released and is now available from Maven Central. This release includes bug fixes for the following issues:

Probably the most significant change is that Mutability Detector now recognizes when you have prevented subclassing by only having private constructors (previously you had to mark your class final).

I'm also very pleased to announce the release of a FindBugs plugin for Mutability Detector. This is also available from Maven Central and the project's download page. Just configure your FindBugs installation to pickup the plugin detector during analysis (usually by placing it in the 'plugin' directory) and it will detect classes annotated with @Immutable. The detector will emit a FindBugs warning for each violation of the rules for achieving immutability.

Disclaimer, the Mutability Detector FindBugs plugin is marked as version 0.2 for a reason. It's a first cut, and since I personally use Mutability Detector as a unit testing tool, I don't plan to 'dog food' the plugin so much. So any and all feedback is welcome. Please feel free to email me (Grundlefleck at gmail dot com), file an issue, or post to the Google group.

Friday, 30 December 2011

Imprudence Begets Impetus - No Go For GoDaddy

It's funny how unrelated things can finally give you the motivation to check off an item that's been on the TODO list for a while. In this case it was the online movement to boycott GoDaddy, due to their stance on SOPA.

One of the items I've been meaning to get to Real Soon Now, was to register a domain for Mutability Detector. If for no other reason than making the org.mutabilitydetector package name in the source code actually match Java conventions.

Since I had no existing GoDaddy accounts, I couldn't lend my support to the boycott by moving my domains away. But I could vote with my wallet and go with a competitor. Enter Namecheap, another domain name registrar like GoDaddy, but with a much more amenable outlook on SOPA. Since this was the first domain name purchase I've made, and I'm currently only using it as a redirect, I can't comment on the service in too much detail. But the process was cheap and convenient and their interface was quite friendly and easy to use.

I might do more with www.mutabilitydetector.org in the future, but for now it just redirects to the project homepage, and gives me an ever-so-slight warm and fuzzy for doing a teensy-tiny bit in the efforts against SOPA.

Wednesday, 16 November 2011

Version 0.7 Released

Mutability Detector v0.7 has been released. It is available for download at the project homepage, or from Maven Central with the details:

Group ID Artifact ID Version
org.mutabilitydetector MutabilityDetector 0.7

This release included improvements in two areas: improve support allowing reasons for mutability in unit tests,  and the introduction of analysis on the 'this' reference escaping. It also includes more documentation to help get started with unit testing for immutability (view online or as JavaDoc embedded in the sources jar).

Other more minor improvements include:
Any and all feedback welcome.

Also, if you do use Mutability Detector, and your source is accessible, please get in touch. When releasing new versions I want to have as little inconvenience for users as possible, and currently I only judge this using a pretend client codebase. Potentially I could add your project to my continuous integration setup, and test my trunk code against yours to give more realistic feedback.

There's already a couple of issues I want to include in the next release (issues 17, 18, 19 at least) and of course, there's always "The String Problem", but if there's anything that's affecting you, please do get in touch.



Wednesday, 9 November 2011

New EscapedThisReferenceChecker due for next release

In a previous post I discussed a new check that Mutability Detector should make to provide a more accurate result. The implementation of that checker is now complete, and will be included in an upcoming 0.7 release.

As with any other checks the tool does, the analysis is unsound, that is, it doesn't guarantee correctness. It makes as best a guess as it can, and I hope that empirical study of its use would show that it's good enough to rely on, but it is possible that the tool can be fooled into emitting false positives or false negatives. I thought it would be useful to talk about the checker does and doesn't do.

The checker analyses every constructor of your class, and checks for the following problems.


Assigning 'this' to a field
The tool will render your class mutable if it does any of the following:
  1. assigns 'this' to static field 
    • SomeOtherClass.staticField = this;
  2. assigns 'this' to instance of other object
    • someOtherObject.instanceField = this;
  3. assigns 'this' to field of this instance
    • this.myField = this;
  4. assigns 'this' to static field of this class
    • ThisClass.staticField = this;
Some of these will be a false positive, e.g. with #2, if you construct an object in the constructor, and assign 'this' to one of its fields, before it's published, it'll be safe.  Also assigning 'this' to an instance field is safe as well (or at least, I can't think why it is unsafe), but it's weird enough that I don't want to lose too much sleep trying to be correct about it.


Passing 'this' as a parameter

The tool will render your class mutable if it does any of the following:

  1. passes 'this' as parameter to another method
    • new SomeOtherClass(this);
    • SomeOtherClass.staticMethod(this);
    • someOtherObject.instanceMethod(this);
  2. constructs an inner class (which has an implicit reference to 'this')
    • new InnerClass()
A false positive in this case could be, with #1, passing 'this' to somewhere that only saves the reference, without trying to read any values from it, or with #2, the instance of the inner class might not escape, meaning the 'this' reference hasn't escaped.

Unexpected false positives
The last examples of false positives are ones that can occur with a correct implementation. However, there's one false positive that I didn't expect to create. While the tool is trying to check for 'this' being passed to other methods, it also accidentally raises an issue when instance fields are passed as well.


Here's an example from java.util.Vector:

public Vector(Collection c) {
    elementData = c.toArray();
    elementCount = elementData.length;

    if (elementData.getClass() != Object[].class)
        elementData = Arrays.copyOf(elementData, elementCount, Object[].class);
    }
}



The issue being the call to Arrays.copyOf() with two instance fields. This piece of code will cause Vector to be reported as mutable (among many other reasons), and quite nicely demonstrates the issues involved. Here, the method is single threaded, but it hypothetically could change at any point to publish the references it receives to multiple threads. Without further (expensive) analysis into what Arrays.copyOf() does, I can't conclude whether Vector is immutable or not. I'm choosing to err on the side of caution. There is the scope for a potential whitelist of safe methods, but that's not on the cards just yet.


Another interesting point is that one field is a primitive int, the other is an array. Since the value of the int will be copied to the method, it's probably safe anyway, no matter whether threads are involved or not. However, for the array, the reference to the field will be passed, not the contents. If the array is not completely populated and made visible when it's passed, the receiving method could observe it to change, breaking immutability. This leads on to the wider question of any mutable fields escaping at any point in the object's life cycle, not just in the constructor. For now, I'm embracing the happy accident that the checker will warn you of some potentially dodgy code, and will deliberately leave that particular false positive active.


I don't expect any of these false positives to really cause an issue If you disagree, get in touch, email me, leave a comment, or raise a bug (that may require sign in).



I expect to release a new version within the next week, stay tuned for more news!






Monday, 24 October 2011

Introducing Google Group

A Google Group is now available for questions, discussions or abuse, relating to Mutability Detector. The pertinent details of the group are:

Don't be shy!

Friday, 30 September 2011

A better way of categorising immutability?

Currently Mutability Detector categorises a class' immutability as one of the following:


public static enum IsImmutable {
COULD_NOT_ANALYSE,
DEFINITELY,
MAYBE,
DEFINITELY_NOT;
}

For example:

 java.util.Date is DEFINITELY_NOT immutable.    Reasons:
        (... some reasons ommitted for brevity ...)
        Field [fastTime] can be reassigned within method [setTime]

Putting aside the general error case of COULD_NOT_ANALYSE, the three main categories are kind of... unsatisfying.

For example, if your class has a primitive array field, it will be classed as MAYBE immutable. In this case it's due to the tool being naive and not doing the kind of analysis it could, and there were so many false positives being generated. At the time I thought it was a pragmatic (read "weasel") way to have my cake and eat it too - I got to reduce false positives, without the effort of making the analysis more powerful. The problem is that it's not particularly helpful. When Joe Hypothetical runs the analysis, what is he supposed to concur from seeing that his class is "maybe immutable"?

Another example is the DEFINITELY category. In the discussion following a previous blog post it was pointed out that where I had used a class I considered DEFINITELY immutable, a commenter pointed out that under certain threading conditions, it could be seen to mutate. These categories were not being used to communicate certain, specific, well defined, and ultimately useful information.

So I've been thinking about having different categories, that will hopefully be more useful. I've borrowed a couple, which, because the book is so darn good, are based on the ideas and semantics from Java Concurrency in Practice.

  • IMMUTABLE 
  • EFFECTIVELY_IMMUTABLE
  • NOT_IMMUTABLE
  • COULD_NOT_ANALYSE
Immutable is the strictest category of immutability. All fields are final and the class is final. All fields are immutable (a 'turtles all the way down' kind of arrangement). Instances of this class can be published in any way, under any threading conditions, as the Java Memory Model guarantees that writes to the final field Happens-Before any reads from that field. Implicitly thread safe. Issues include: that old favourite - lazily loading fields that are expensive to compute. Detecting benign data races is Hard. I may have to hard code some common cases, e.g. java.lang.String/java.math.BigDecimal.

Instances of effectively immutable classes can be safely shared across threads, as long as they are safely published (JCIP covers what 'safely published' means in more detail). The class doesn't need to be final. Fields don't need to be final, but they can only be, and must be, assigned in the constructor or private methods called only by the constructor. Fields should all be immutable, effectively immutable, or mutable-but-never-mutated, meaning for example, a field can be of type java.util.List, or an array, but as long as it isn't mutated after construction, it's fine. This includes allowing references to mutable instances to escape, e.g. returning the array field from a method call without copying first. Issues include: the 'method called only from constructor' clause allows for serialisation... but should it? Also, confidently identifying fields which are mutable but never mutated is non-trivial.

Not immutable is everything else: interfaces and abstract classes, classes whose fields can be reassigned after construction, whose mutable fields are published, which assign a mutable type (e.g. List or an array) to a field. The analysis will still need to get better at detecting valid and common patterns, but the more esoteric the code gets, the more likely the analysis is just going to throw it's hands up in the air in desperation and admit mutability.

This scheme, would represent a slight change in direction for the analysis. I can remember when the code was thrown together as part of a uni assignment, that the requirements for being immutable were very strict, and since then I've tried to improve the analysis to allow for more leniency. Now I'm starting to think that strictness may be more of a strength - particularly since it would be possible to manually 'override' the result using a flexible and fluent API available in unit test assertions.

So not a particularly ground breaking or earth shattering change suggestion, but hopefully something that could potentially be more useful, communicative and more broadly understood.  As usual comments and suggestions welcome, thanks for reading.