#performance #java #memory
In this article we will look at different ways to figure out and fix unnecessary memory consumption by JVM. This is useful for JVM applications that store huge amount of data in memory and can help avoid OutOfMemoryError if application data cannot be partitioned and scaled horizontally.
In part 2, we will take a look on couple of techniques used to reduce JVM heap size. The techniques listed in the order of least invasive to most, so that optimization does just enough for application to work and keeps it as maintainable as possible. The example application can be found in part 1 or in example source code.
Memory efficient data structures
As mentioned in part 1 we could fit around 204500 orders/54600 users into 64M heap. Let’s take a look on how we can improve this.
The least intrusive change is usually changing data structures that hold objects. These usually conform to some of Java collection interfaces and changing them would not introduce much change in the code. Let’s see how much memory we can save by doing so.
The first obvious candidate would be ArrayList
as it’s no arg constructor would create a list which will have capacity of 10 after first insertion. However, we have around 200K orders across 50K users, which brings the average number of orders per list to 4. We can be aggressive and set that size to 1 initially. That gives us 209000 orders/54900 users - an easy win, but only 2%.
Second optimization has to do with java HashMap
, which uses simple chaining implementation. That is memory efficient only if map key collision rate is low. In our case both order and user ids collision rate is about 18% (see this test), so there might be potential here. Maps that use open addressing implementation remain memory efficient even with high collision rates. Any open addressing implementation should do, as collision resolution strategy (probing, Robin-Hood, cuckoo, etc) should not significantly impact memory footprint.
I’ll use Agrona Object2ObjectHashMap
to test this. Agrona’s implementation also does not create extra Node
objects, as it stores keys and values as even and odd indexes in the same array. Changing map implementation (example 8) gives us 231500 orders/56000 users, which is significant 11% increase.
Using minimal types to store data
Let’s take a look at Order
objects themselves.
First step is to look at the structure of stored values. We want them to have as flat structure as possible, minimizing extra object header overhead. In our case we would just get rid of a boxed type. This brings us to 245900 orders/56600 users, which is another 6%. Quite significant increase for this little work.
Now when the low hanging fruit has been plucked, we should take a closer look at Order
and evaluate if we represent data with appropriate types.
In or case, we have another field which is secretly boxed, which is Order#id
. At the moment, this field is just a number, but wrapped as String
. Most times there is no need for id field to be a string, as we can represent enough unique identifiers with a simple number (or a tuple of numbers) and have a Long
keys in a map:
public final class Order {
private final long id;
private final String user;
private final int articleNr;
private final int count;
private final int price;
private final String address;
// rest of class omitted for clarity
}
Map<Long, Order> ordersById = new Object2ObjectHashMap<>();
By doing this, we can fit 276100 order/57900 users, and that’s another 12% increase. However, we are now forced to waste some memory for autoboxing long
, which we can fix by using primitive collection. There are ready libraries to do this, for example Trove collections. Here I’ll use Agrona, since I’ve already used it in example before:
Long2ObjectHashMap<Order> ordersById = new Long2ObjectHashMap<>();
That’s 288300 orders/58300 users, which is extra 4%, and the final code can be observed in example 9.
Data tokenization and shared object pool
At this point most of the memory would be occupied by address String
. The problem here is that addresses contain lots of repeating information: cities, post codes, street numbers, names and surnames, etc. What we can try to do, is to break address apart into separate components (tokens) and share repeating instances between Order
values. JVM even has built in String
pool for us to use!
I’ve decided to break up address like this:
public final class Order {
private final long id;
private final String user;
private final int articleNr;
private final int count;
private final int pricePence;
private final String addressNumber;
private final String addressStreet;
private final String addressCity;
private final String addressRegion;
private final String addressPostCode;
// rest of class omitted for clarity
}
Without pooling, we can only fit 144200 orders/50000 users into memory, which again proves that object header overhead is very significant. It might seem like a step back, but let’s turn String
pooling on in Order
constructor (example 10):
@JsonCreator
public Order(
// some fields omitted
@JsonProperty("addressNumber") String addressNumber,
@JsonProperty("addressStreet") String addressStreet,
@JsonProperty("addressCity") String addressCity,
@JsonProperty("addressRegion") String addressRegion,
@JsonProperty("addressPostCode") String addressPostCode) {
// some fields omitted
this.addressNumber = addressNumber.intern();
this.addressStreet = addressStreet.intern();
this.addressCity = addressCity.intern();
this.addressRegion = addressRegion.intern();
this.addressPostCode = addressPostCode.intern();
}
which lets us fit 386000 orders/61000 users into memory (+34%)! JVMs with version 7 and higher do not allocate interned String
objects in native memory, therefore this gain is fair comparison (this can be confirmed by running JVM with flags -XX:+UnlockDiagnosticVMOptions
, -XX:NativeMemoryTracking=summary
, -XX:+PrintNMTStatistics
and inspecting memory usage).
As far as I know, String
pool is the only JVM pool which is usable to applications (in contrast, Integer
pool goes only up to number 128). However, it shouldn’t be hard to pool any user defined object. Usability of this method depends on how much of data is repeating and how it could be split. Developers, considering this technique, should bear in mind object overhead of internal pool implementation (e.g. be careful with tree-like collections which allocate extra objects for every node).
Data encoding
Compression approach can save some memory at the expense of processing power and code maintainability as switching form JDK types to custom ones limits the usage of library methods.
Since we only use capital letters and digits for ids, we can encode strings using base 36 encoding. Base 36 values would take only 6 bits each, instead of usual 8 bits (16 bits before Java 9) per character, therefore it is expected to save about 25% of memory (60% before Java9):
public Order(
@JsonProperty("id") long id,
@JsonProperty("user") byte[] user,
// rest omitted
and we only need to store reference to string as a map key (byte[]
are not suitable as map keys as they do not override hashCode/equals properly and are mutable):
ordersByUser
.computeIfAbsent(
new Base36String(order.getUser()),
key -> new ArrayList<>(1)
)
.add(order);
More compact encoding (example 11) and not storing extra reference on Order
objects allows us to stuff JVM with 481300 orders/62800 users (+24%).
Ditching unnecessary references
Now we can try to get rid of extra String
object references and store only byte[]
. If we ever need a String
object later in code we can allocate a short-lived object for that operation or use a view wrapper. This is somewhat risky, because arrays in Java are not immutable, but let’s see how much memory we can save that way.
Simple type replacement, however, would not work here as our strings are pooled objects. Therefore, we need to roll out our own pool for byte[]
. This is a bit tricky in Java, cause Java arrays do not properly implement hashCode
or equals
, but we can leverage array order and TreeSet
:
public class TreeBasedByteArrayPool implements ByteArrayPool {
private final NavigableSet<byte[]> values = new TreeSet<>(Arrays::compare);
@Override
public byte[] intern(byte[] in) {
boolean newValue = values.add(in);
if (newValue) {
return in;
}
else {
return values.floor(in);
}
}
}
and inject pool during deserialization like so:
@JsonCreator
public Order(
@JsonProperty("id") long id,
// more fields here ...
@JacksonInject ByteArrayPool byteArrayPool
)
ObjectMapper mapper = new ObjectMapper();
mapper.setInjectableValues(new InjectableValues.Std().addValue(
ByteArrayPool.class,
new TreeBasedByteArrayPool()
));
This approach (example 12) doesn’t help us much, because by getting rid of some references in Order
object, we’ve created many more references within TreeSet
, and thus for this use-case we can actually fit fewer objects in memory. Even linear-time array-based implementation does not reduce memory footprint much, which means, that for our data structure this tree-like pool implementation is not sufficient. More elaborate pool implementation, based on open-addressing, is used in further example 14 of part 3.
Object compression
Now what if we could compress the whole Order
object? In our case we must only compress fields which are not shared between Order
instances (non-address fields are transient
), hence we end up with something like this (example 13):
public class OrderCompressed {
private final byte[] compressedPart;
// shared fields omitted
// constructor omitted
public static OrderCompressed compress(Order order) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
try (
GZIPOutputStream gzipOut = new GZIPOutputStream(out);
ObjectOutputStream objectOut = new ObjectOutputStream(gzipOut);
) {
objectOut.writeObject(order);
}
catch (IOException e) {
throw new RuntimeException("Unable to serialize " + order, e);
}
byte[] bytes = out.toByteArray();
return new OrderCompressed(
bytes,
order.getAddressNumber(),
order.getAddressStreet(),
order.getAddressCity(),
order.getAddressRegion(),
order.getAddressPostCode()
);
}
}
In our case this also does not give any memory reduction, as we are only compressing couple of primitive fields. This technique is much more useful where it is impossible to share much data between retained objects or object does not contain references. This technique can prove more useful in conjunction with manual memory management described in next part.
Conclusion
Using appropriate data structures, variable types, compression and pooling we have reduced memory consumption by 135% or 2.35 times (fitting 481300 orders instead of 204500).
More memory usage reduction can be achieved by bulk compressing larger chunks of data together or moving off-heap, which is covered in part 3. These however impact code and architecture quite a bit and must be considered along with data sharding approaches.