In Java, the ”synchronized” keyword defines an area protected by a mutex local to the object (even the class when associated with static methods). Assuming HLE support is available, an important task the compiler should perform is to decide whether or not to include the XACQUIRE and XRELEASE prefixes when going from bytecode to machine code in the context of a synchronized area. With HLE, an optimistic approach is attempted at first, after which in case of failure pessimistic locking is used. Given a large (execution time or data-wise) synchronized method, it may prove costly to try and execute it, only for it to be aborted and then re-executed. Therefore, while the default method should be to use HLE extensions, there are various cases when they may be left out.
One such case has to do, as mentioned above, with the synchronized area size. If the amount of accessed data exceeds the size of the individual caches (usually L1+L2), it is probable (based on the likely HLE implementation) that the operation will be aborted. In this case, the prefixes should not be included in the machine code. A note-worthy assumption of the HLE mechanism is strong isolation: an abort is triggered if a conflict is detected with any load/store, not just with the ones in the protected area. Hence, even code outside the synchronized block may trigger aborts. Therefore, even if the ”transaction” fits the private cache, based on size and application access patterns, the compiler could still decide not to use HLE prefixes. Also, if the compiler estimates a transaction with a long execution time, it may decide that it is not worth it to risk an abort late in the execution and have to redo everything.
Another issue is I/O: say for example the synchronized method outputs data to the user at some point. An abort and a redo is unacceptable is this case. While the architectural implementation of HLE would most likely abort when encountering such I/O events anyway, it is best not to use lock elision in this scenario. In general, the compiler should be aware of the conditions under which hardware triggers an abort, and when it can decide on its own that one of those conditions will be met, it should not use HLE.
While some of these decisions may be taken before runtime, others can also be taken by the JIT compiler. For example, contention may not be known before runtime. High contention is less likely to benefit optimistic concurrency control mechanisms, so in this case the JIT compiler may decide to remove the HLE prefixes. Also, the precise size of the data involved in the synchronized area may also not be known beforehand.
As presented above, there are a number of scenarios in which the Java compiler could decide not to use HLE prefixes. However, I believe that in the majority of situations their usage wouldbe beneficial for performance.