When benchmarking a simple QuickSort implementation in Java, I faced unexpected humps in the n vs time
graphics I was plotting:
I know HotSpot will attempt to compile code to native after it seems certain methods are being heavily used, so I ran the JVM with -XX:+PrintCompilation
. After repeated trials, it seems to be compiling the algorithm's methods always in the same way:
@ iteration 6 -> sorting.QuickSort::swap (15 bytes)
@ iteration 7 -> sorting.QuickSort::partition (66 bytes)
@ iteration 7 -> sorting.QuickSort::quickSort (29 bytes)
I am repeating the above graphic with this added info, just to make things a bit clearer:
At this point, we must all be asking ourselves : why are we still getting those ugly humps AFTER the code is compiled? Maybe it has something to do with the algorithm itself? It sure could be, and luckily for us there's a quick way to sort that out, with -XX:CompileThreshold=0
:
Bummer! It really must be something the JVM is doing in the background. But what?
I theorized that although code is being compiled, it may take a while until the compiled code actually starts to be used. Maybe adding a couple of Thread.sleep()
s here and there could help us a bit sorting this issue out?
Ouch! The green colored function is the QuickSort's code ran with a 1000ms internal between each run (details in the appendix), while the blue colored function is our old one (just for comparison).
At fist, giving time to the HotSpot only seems to make matters worse! Maybe it only seems worse by some other factor, such as caching issues?
Disclaimer : I am running 1000 trials for each point of the shown graphics, and using System.nanoTime()
to measure the results.
Some of you may at this stage wonder how the use of sleep()
might distort the results. I ran the Red Plot (no native compilation) again, now with the sleeps in-between:
Scary!
Here I present the QuickSort
code I am using, just in case:
public class QuickSort {
public <T extends Comparable<T>> void sort(int[] table) {
quickSort(table, 0, table.length - 1);
}
private static <T extends Comparable<T>> void quickSort(int[] table,
int first, int last) {
if (first < last) { // There is data to be sorted.
// Partition the table.
int pivotIndex = partition(table, first, last);
// Sort the left half.
quickSort(table, first, pivotIndex - 1);
// Sort the right half.
quickSort(table, pivotIndex + 1, last);
}
}
/**
* @author http://en.wikipedia.org/wiki/Quick_Sort
*/
private static <T extends Comparable<T>> int partition(int[] table,
int first, int last) {
int pivotIndex = (first + last) / 2;
int pivotValue = table[pivotIndex];
swap(table, pivotIndex, last);
int storeIndex = first;
for (int i = first; i < last; i++) {
if (table[i]-(pivotValue) <= 0) {
swap(table, i, storeIndex);
storeIndex++;
}
}
swap(table, storeIndex, last);
return storeIndex;
}
private static <T> void swap(int[] a, int i, int j) {
int h = a[i];
a[i] = a[j];
a[j] = h;
}
}
as well the code I am using to run my benchmarks:
public static void main(String[] args) throws InterruptedException, IOException {
QuickSort quickSort = new QuickSort();
int TRIALS = 1000;
File file = new File(Long.toString(System.currentTimeMillis()));
System.out.println("Saving @ \"" + file.getAbsolutePath() + "\"");
for (int x = 0; x < 30; ++x) {
// if (x > 4 && x < 17)
// Thread.sleep(1000);
int[] values = new int[x];
long start = System.nanoTime();
for (int i = 0; i < TRIALS; ++i)
quickSort.sort(values);
double duration = (System.nanoTime() - start) / TRIALS;
String line = x + "\t" + duration;
System.out.println(line);
FileUtils.writeStringToFile(file, line + "\r\n", true);
}
}
Well, it seems that I sorted the issue out on my own.
I was right about the idea that compiled code could take a while to kick in. The problem was a flaw in the way I actually implemented my benchmarking code:
if (x > 4 && x < 17)
Thread.sleep(1000);
in here I assumed that as the only "affected" area would be between 4 and 17, I could go on and just do a sleep over those values. This is simply not so. The following plot may be enlightening:
Here I am comparing the original no compilation function (red) to another no compilation function, but separed with sleeps in-between. As you may see, they work in different orders of magnitude, that meaning that mixing results of code with and without sleeps will yield unsound results, as I was guilty of doing.
The original question remains unaswered, yet. What causes the humps to ocurr even after compilation took place? Let's try to find that out, putting a 1s sleep in ALL points taken:
That yields the expected result. The odd humps were happening the native code still didn't kick in.
Comparing a sleep 50ms with a sleep 1000ms function yields yet again, the expected result:
(the gray one seems to still show a bit of delay)