Discussing the design, decisions and discovery involved in developing Mutability Detector, an open-source analysis tool for Java.

Monday, 24 October 2011

Introducing Google Group

A Google Group is now available for questions, discussions or abuse, relating to Mutability Detector. The pertinent details of the group are:

Don't be shy!

Friday, 30 September 2011

A better way of categorising immutability?

Currently Mutability Detector categorises a class' immutability as one of the following:


public static enum IsImmutable {
COULD_NOT_ANALYSE,
DEFINITELY,
MAYBE,
DEFINITELY_NOT;
}

For example:

 java.util.Date is DEFINITELY_NOT immutable.    Reasons:
        (... some reasons ommitted for brevity ...)
        Field [fastTime] can be reassigned within method [setTime]

Putting aside the general error case of COULD_NOT_ANALYSE, the three main categories are kind of... unsatisfying.

For example, if your class has a primitive array field, it will be classed as MAYBE immutable. In this case it's due to the tool being naive and not doing the kind of analysis it could, and there were so many false positives being generated. At the time I thought it was a pragmatic (read "weasel") way to have my cake and eat it too - I got to reduce false positives, without the effort of making the analysis more powerful. The problem is that it's not particularly helpful. When Joe Hypothetical runs the analysis, what is he supposed to concur from seeing that his class is "maybe immutable"?

Another example is the DEFINITELY category. In the discussion following a previous blog post it was pointed out that where I had used a class I considered DEFINITELY immutable, a commenter pointed out that under certain threading conditions, it could be seen to mutate. These categories were not being used to communicate certain, specific, well defined, and ultimately useful information.

So I've been thinking about having different categories, that will hopefully be more useful. I've borrowed a couple, which, because the book is so darn good, are based on the ideas and semantics from Java Concurrency in Practice.

  • IMMUTABLE 
  • EFFECTIVELY_IMMUTABLE
  • NOT_IMMUTABLE
  • COULD_NOT_ANALYSE
Immutable is the strictest category of immutability. All fields are final and the class is final. All fields are immutable (a 'turtles all the way down' kind of arrangement). Instances of this class can be published in any way, under any threading conditions, as the Java Memory Model guarantees that writes to the final field Happens-Before any reads from that field. Implicitly thread safe. Issues include: that old favourite - lazily loading fields that are expensive to compute. Detecting benign data races is Hard. I may have to hard code some common cases, e.g. java.lang.String/java.math.BigDecimal.

Instances of effectively immutable classes can be safely shared across threads, as long as they are safely published (JCIP covers what 'safely published' means in more detail). The class doesn't need to be final. Fields don't need to be final, but they can only be, and must be, assigned in the constructor or private methods called only by the constructor. Fields should all be immutable, effectively immutable, or mutable-but-never-mutated, meaning for example, a field can be of type java.util.List, or an array, but as long as it isn't mutated after construction, it's fine. This includes allowing references to mutable instances to escape, e.g. returning the array field from a method call without copying first. Issues include: the 'method called only from constructor' clause allows for serialisation... but should it? Also, confidently identifying fields which are mutable but never mutated is non-trivial.

Not immutable is everything else: interfaces and abstract classes, classes whose fields can be reassigned after construction, whose mutable fields are published, which assign a mutable type (e.g. List or an array) to a field. The analysis will still need to get better at detecting valid and common patterns, but the more esoteric the code gets, the more likely the analysis is just going to throw it's hands up in the air in desperation and admit mutability.

This scheme, would represent a slight change in direction for the analysis. I can remember when the code was thrown together as part of a uni assignment, that the requirements for being immutable were very strict, and since then I've tried to improve the analysis to allow for more leniency. Now I'm starting to think that strictness may be more of a strength - particularly since it would be possible to manually 'override' the result using a flexible and fluent API available in unit test assertions.

So not a particularly ground breaking or earth shattering change suggestion, but hopefully something that could potentially be more useful, communicative and more broadly understood.  As usual comments and suggestions welcome, thanks for reading.


Sunday, 24 April 2011

Announcing *cough* Better Maven Support

In a previous post I gave details of how to retrieve Mutability Detector in a build system such as Maven. It included details of the snapshot repository that had to be added to your pom.xml (or equivalent file), and the specifier for the latest SNAPSHOT version - just for extra complicatedness.

While trying to use my own project at work, where we use Gradle as a build system, I found that retrieving a snapshot dependency from a Maven repository was not particularly well supported. Since it at best required some non-default behaviour (if it's at all supported, I never found a solution), I leapt into action, much like a ninja mongoose, determined to rectify the situation immediately. So, just a short(!) 8 days later, I'm proud to announce that Mutability Detector is available from Maven Central, as a bonafide release version.

The relevant details now are:

Group IDArtifact IDVersion
org.mutabilitydetectorMutabilityDetector0.6

Mutability Detector is packaged as a single JAR, which can be used for runtime analysis, as an assertion mechanism in unit testing, or as a standalone application launched from the command line.

The 0.6 release which made it to Maven Central also fixed two issues I mentioned in the previous post (discussed here, and here), which are fairly significant.

As always, if you're a user with any questions or issues, please let me know. Well, if you're a user with no issues, it would be nice to hear that you actually exist...

Saturday, 16 April 2011

Announcing Maven support for the latest version of Mutability Detector

Mutability Detector is now available for Maven users*. If you want to try it out on your own code, do the following:

Make the Sonatype Snapshot repository available to your build system. In Maven, this means adding the following snippet within the <project> section of your pom.xml.
<repositories>
  <repository>
    <id>Sonatype Snapshots</id>  
    <url>http://oss.sonatype.org/content/repositories/snapshots</url>
  </repository>
</repositories>

Next add the Mutability Detector dependency to your pom.xml with this snippet:
<dependency>
  <groupId>org.mutabilitydetector</groupId>
  <artifactId>MutabilityDetector</artifactId>
  <version>0.6-SNAPSHOT</version>
  <scope>test</scope>
</dependency>

For other build systems (Gradle, Ivy, etc.) the pertinent bits of information for the dependency are:

Group IDArtifact IDVersion
org.mutabilitydetectorMutabilityDetector0.6-SNAPSHOT

Some points:
  • Source code is not available through Maven, working on fixing this. In the meantime, you can checkout or browse the source, details available at the Googlecode project page.
  • You may get surprising results with the current snapshot, as described in this issue
  • The packaged jar file includes dependencies (e.g. ASM, Classpath Explorer), so there may be some issues putting this jar on your classpath. Working on fixing this also.
As always, any questions, please ask.

Sunday, 13 March 2011

Lightning Talk Introducing Mutability Detector

Here's a video of a lightning talk I gave at work, introducing Mutability Detector to my colleagues. I had to cut it short so as not to break the rules of a maximum 5 minute duration, but I think it gives a good overview. If you find my Scots accent tough to understand, you might prefer to check out a couple of the previous blogs, particularly the introduction to the first version released, and an overview of a more recent release.



The Mutability Detector from youdevisetech on Vimeo.

Sunday, 3 October 2010

Version 0.02 of Mutability Detector now available.

... because nothing says "Progress!" like the increment by one-hundredth of a major release.

Sarcasm aside, in this new release, I definitely see a further step towards "Actually useful". I'll use this blog entry to demonstrate some of the improvements, discuss the planned features for the next release, and list what's still keeping that v1.0 release out of reach.

Unit Testing Support
I consider one of the major barriers to the adoption of a static analysis tool is the 'bootstrapping' cost: the time and resource required to configure and run the analysis as part of the build. There's also the issue of adapting the results to suit your codebase - trimming the false positives, and keeping the results to a meaningful and manageable level. With this problem in mind, I decided to stand on the shoulders of giants, to reuse, rather than reinvent.

JUnit, and its ilk (TestNG being the most prominent rival) have long been a weapon in the arsenal of software development. Without attempting to be controversial, any Java development shop worth its salt will have unit tests regularly run as part of a continuous integration build. The beauty of JUnit is that while it is suitable for large teams with large projects, it also works for a single developer with a small project (about a third of the Java source files in Mutability Detector are JUnit test classes, run regularly through an IDE).

With widespread adoption and an ecosystem of knowledge and tools available, it seems a sensible option to hook into that, rather than trying to roll my own. The v0.02 release of Mutability Detector enables this.

By simply adding the jar to your classpath, it is possible to write this kind of test case:

import static org.mutabilitydetector.junit.MutabilityAssert.assertImmutable;

@Test
 public void theClassIsImmutable() {
    assertImmutable(MyNewClass.class);
}

Which will fail if, for example, someone later comes along and adds a setter method.

That's all it takes. And there's several advantages that come with this approach:

  • There's no lengthy configuration: just add a jar to your development classpath, there's certainly no need to add a runtime dependency on Mutability Detector, unless you want to (there is an API for doing runtime checks).
  • There's no need to annotate your code: meaning nothing except test code has to be cleaned up if you decide to stop using the jar; no dependency on a third-party annotation; and no nasty "magic" comments to enable or disable the check. Developers who employ TDD will be familiar with the idea that test code is the documentation, there's no need for redundant source annotations. 
  • There's also no separate build phase: since the checks are made as part of running the unit tests, if you already do that, there's nothing more to configure. But, if you don't regularly run unit tests don't worry, the option to run the tool from the command line is still there, and always will be.
  • It's extremely selective: you won't have to prune all the results to get the ones for classes you actually care about, the analysis will only be run on classes that you specify as intending to be immutable. This means no noisy false positives.

However, like anything else, there are downsides and caveats, and I'd be remiss for not mentioning them.

  • The assertion mechanism for checking immutability assumes a JUnit-like approach. That is, failed checks will throw a (subclass of) AssertionError on the thread the test is run on. If your unit testing library of choice does not use this kind of mechanism, the tool may not be suitable for you.
  • The flipside of being selective is that it may require a lot of busywork to go around your codebase adding in single-line test methods which only call assertImmutable(). I intend to address this problem (somehow) but support for lessening this burden is not here yet.
  • The unit test task may require more memory than previously. I chose the option of speed over memory footprint, using a single analysis session across all unit tests run in the current JVM process. This prevents the need to re-analyse classes several times, but it does mean that more memory is used to cache the results. Unfortunately, this is not something that I have been able to benchmark in a real-world scenario, but Mutability Detector can analyse the entire rt.jar of Sun's 1.6.12 JDK (~17,000 classes) with less than 96Mb of memory, which gives a decent indication of the memory footprint.

The support for unit testing immutability is the most significant addition to Mutability Detector, and, in my humble opinion, represents a useful and novel way of introducing static analysis into development workflow.


Other Changes With This Release
Along with this change, there are several other improvements:
  • Better detection of setter methods, covering some of the false results described in an earlier post.
  • Support for reporting only on specified classes with a --classlist option. Run java -jar MutabilityDetector.jar -help for further detail.
  • The --match option for the command line interface is now implemented.
  • Improvements to the way classes are loaded, to reduce the amount of PermGen space required.


But, before I get ahead of myself, I must point out, there is a very good reason why this release is not close to a v1.0: java.lang.String is still reported as mutable.

What's Next?
Obviously the analysis of java.lang.String is a problem: since first releasing the source code I have been researching and considering ways to correctly handle the case. Since this is the major stumbling block to a v1.0, and I want to get it right, the work on this is ongoing. Ideas and thoughts are most welcome.

Besides that, there are another couple of items of work which I will be working on in the near future (as time and other commitments permit, of course).

  • Implement the EscapedThisReferenceChecker described in an earlier post.
  • Expose an API within the MutabilityAssert methods to allow more fine grained control to override potential false positives. For example, if your class does have one method which lazy inits a field, and the tool isn't at the level of sophistication required to detect it, there is the mechanism to say "In regards to reassigning this particular field, in this method, don't worry, I know what I'm doing". It's inevitable that there's always going to be code which can fool the tool, this should help ease that pain a little.
  • Investigate the following features: an Eclipse plugin; a FindBugs plugin; a nicer report from the command line tool (perhaps a nice HTML page with the capability of filtering and searching the results); and an ant task to wrap the command line interface. Some of these features may be worthwhile, some may not, so it would be prudent to consider them.
  • Improve the tool to be much smarter about detecting mutations. For example, having a mutable field makes the owning class mutable, whether that field is mutated or not. It should be possible for an immutable class to have mutable fields - as long as they are not actually mutated.
  • The usual bug fixes and incremental improvements.

Hopefully I've demonstrated the main improvements over the last version, and possibly convinced you to try the tool out. As always, if there are any comments, questions or flames, feel free to post to the blog, or email me.

So that's v0.02: roll on v0.03!

Sunday, 5 September 2010

Detecting Setter Methods Ain't So Easy After All

Who could have thought that detecting if a class has setter methods could be so complicated?

Up until a few days ago, the SetterMethodChecker was the most simple checker of all. The meat of the checker could be summed up with the following code:

   if("<init>".equals(this.methodName)) return; // We're not concerned with the constructor 
            
    if(opcode == Opcodes.PUTFIELD) {
         // declare the class as mutable, as a field reference can be changed
    }

This simple check was effective for the most part. It correctly detected the mutability of classes such as this:

     class MutableWithSetterMethod {
          private int foo;

          // constructor, and getter, etc

          public void setFoo(int newFoo) { this.foo = foo; }
     }

This is fine for the most part, but if classes stray from this basic pattern, even just a little, the checker can be fooled into thinking an immutable class... isn't. In my quest for realistic examples of immutable classes, I came across a fairly common example: java.math.BigDecimal from Sun's JDK[1]. The SetterMethodChecker raises the following issues:

     Field [precision] can be reassigned within method [subtract]
     Field [scale] can be reassigned within method [divide]
     Field [scale] can be reassigned within method [divide]
     Field [precision] can be reassigned within method [negate]
     Field [precision] can be reassigned within method [precision]
     Field [precision] can be reassigned within method [setScale]
     Field [precision] can be reassigned within method [scaleByPowerOfTen]
     Field [stringCache] can be reassigned within method [toString]
     Field [intVal] can be reassigned within method [inflate]
     Field [intCompact] can be reassigned within method [readObject]
     Field [intVal] can be reassigned within method [stripZerosToMatchScale]
     Field [scale] can be reassigned within method [stripZerosToMatchScale]
     Field [precision] can be reassigned within method [stripZerosToMatchScale]
     Field [intCompact] can be reassigned within method [stripZerosToMatchScale]
     Field [intVal] can be reassigned within method [roundThis]
     Field [intCompact] can be reassigned within method [roundThis]
     Field [scale] can be reassigned within method [roundThis]
     Field [precision] can be reassigned within method [roundThis]
     Field [scale] can be reassigned within method [dropDigits]

Nineteen examples of how BigDecimal is mutable, apparently. Not really of course, BigDecimal is immutable, but the code is complicated enough to fool the checker. There are three patterns of field reassignment which a) retain immutability b) are common use cases and c) require more sophistication to detect. I hope to fix the SetterMethodChecker to recognise these patterns and handle them accordingly.

Valid Setter Method Patterns


Reassigning fields in private methods called only from the constructor.
As detailed above, the method 'roundThis' reassigns four fields of BigDecimal. However, the method is private, and only ever called from a constructor - in 7 of the 18 available BigDecimal constructors. This pattern was also discovered in the JodaTime libraries, which use a private setFields() method for serialisation. If the mutating method is guaranteed[2] to only be called before the constructor completes, the class can be safely called immutable.

Solution? A setter method makes the class mutable unless it is private, and called only from the constructor.

Reassigning fields on references to an object other than 'this'.
A common idiom for immutable types is, rather than having setter methods mutating internal state, is to construct new instances, modified from the original as required, and returned from the method. A common example is the likes of String.toUpperCase(), which, rather than mutating the char array of this String, copies it, converting the chars to upper case, and returns a new String instance. The toUpperCase() is not the most representative example, as it misses a common case - where an instance is created, and fields are reassigned after construction[3]. Consider the following code (an immutable equivalent of the earlier snippet):

     public class ImmutableWithNoSetterMethod {
          private int foo;
          public ImmutableWithNoSetterMethod(int foo) {    this.foo = foo; }
            
          public int foo() { return this.foo; }
            
          public static ImmutableWithNoSetterMethod newAndMutatedInstance() {
               ImmutableWithNoSetterMethod toReturn = new ImmutableWithNoSetterMethod(0);
               toReturn.foo = 42; // mutating an instance after construction
               return toReturn;
          }
     }

Here the newly created instance is mutated after construction - that's hardly a model of immutability! However, it's safe in this example, and in the case of BigDecimal. So how is this safe? Well, the instance is modified with private access, and it's done before the reference is published. So perhaps this case falls in the grey area between immutability and thread-safety. I would argue that, since non-final fields don't qualify for the Java Memory Model's view of immutability anyway, that this kind of pattern qualifies for any useful definition of immutability.

In the code above, despite the mutation of the already constructed object, there is no way for a client of the code to view a change in the object. Multithreaded code is not an issue for the newAndMutatedInstance() method, since nothing is published to multiple threads, and concurrent calls to the method will each construct their own stack, with no shared references[4]. Thus there is no danger in mutating the local variable toReturn. However, the SetterMethodChecker is fooled. Partly because the check is very basic: it looks for the PUTFIELD opcode, but doesn't investigate which object is having its field reassigned. Thus the code this.foo = 42 and toReturn.foo = 42 are treated equally.

Solution? Check that the field reassignment is performed on the 'this' object, either directly or through something reached by the this reference's object graph (i.e. `this.foo.bar.value = 42;` is still a mutation). This checker also has to collaborate with the (as yet unwritten) EscapedThisReferenceChecker to ensure that the reference isn't published to multiple threads before it's mutated. So if the ImmutableWithNoSetterMethod constructor or the newAndMutatedInstance() method is modified to call a method which shares the the reference before the mutation, the class will be dubbed mutable.

Reassignment of fields as a lazy-initialisation and caching strategy.
This is the main reason[5] that java.lang.String is found to be mutable. The result of hashCode() is computed lazily, and cached. But to the SetterMethodChecker, it just makes the class look mutable. The same goes for BigDecimal: the result of toString() is only computed when called the first time. The code for this pattern in BigDecimal is almost as simple an example as can be made:

     class BigDecimal // etc {
          private volatile transient String stringCache = null;
        
          // etc
        
          public String toString() {
               if (stringCache == null)
                    stringCache = layoutChars(true);
               return stringCache;
          }
     }

This example can be generalised, in psuedo-code, to the following idiom:

     class {
          private X field; // initialised to default value 
                                   // whatever that is for type X
        
          getField(){
               if(field == default value) {
                    compute value of field
               }                return field;           }      }

This kind of pattern satisfies immutability - the object can not seen to change by its clients. String's hashCode() and BigDecimal's toString() will always return the same result, even though the object was not fully formed on completion of the constructor. So what appears to be downright mutability... isn't. The SetterMethodChecker is fooled once again. I'm almost feeling sorry for it.

Solution? The scope surrounding a field reassignment should be investigated to find if there is a "if equal to default value" condition guarding the change. This isn't as satisfying as it should be, since fields could technically be initialised to an "uncalculated" marker (e.g. private String myField = "UN-INITIALISED";). Though I predict the complexity of detecting this pattern doesn't lie with what is being compared to, I'm happy planning the trade-off that fields must be initialised to the default value to qualify for the lazy initialisation exemption.

A Note On Lazy Initialisation and Data Races

If you've been reading about lazy initialisation (hello to someone other than me!) you may have been thinking "Well, String's hashCode() and BigDecimal's toString() aren't synchronised, surely there's a potential data race here?". If you have, you are right. Technically, myBigDecimal.toString() could be called across concurrently executing threads, and the second caller could execute the comparison while the first is computing, but has not yet assigned a result to the stringCache field (shown below in the diagram).







This could result in stringCache being assigned to multiple times. However, it is safe (I think). This example can be described as a "benign data race": yes the field can be reassigned, but only to the same value, each and every time. String's hashCode() is calculated with only immutable state, it doesn't matter that there is a data race to assign the field, the result will always be the same. Same goes for BigDecimal's toString(). Now, I say I only "think" this is safe, because there's a whole lotta stuff in the Java Memory Model (which I have investigated, but hasn't "sunk in" yet) that can throw spanners in the works. Because there is no synchronisation of either method, there is no guarantees of happens-before in terms of the assignment. This can mean that while the field is being assigned, other threads could view the field with stale data (i.e. a reference is still null) or even worse, the reference points to an object which isn't in a fully constructed state (i.e. fields are not as they will be when the constructor completes). As far as I can tell, BigDecimal and String are protected from this because the field assigned in the benign data race is immutable, and the Java Memory Model has some guarantees around visibility in that area. Or, in the case of BigDecimal, because the field has the volatile modifier - I'm not sure yet. Needless to say I can't commit to a solution for this problem until I can be confident of the safety.

So, we have three patterns which I think are common enough to need to be handled correctly before I can release a v1.0 of Mutability Detector. They've been added to the list. As I said, who could have thought a check for a setter method could be so complicated?


[1] Sun JDK 1.6 Update 12
[2]
Ignoring reflection as a possibility for calling methods.
[3]
toUpperCase() uses a String constructor which allows setting all the necessary changes in one call.
[4] This is what's known as "stack confinement", discussed in the excellent Java Concurrency In Practice.
[5] There is another reason, but it's caused by the primitiveness of the tool, rather than what I'd call a complicated problem.