I've been researching a question that was presented to me: How to write a function that takes a string as input and returns a string with spaces between the characters. The function is to be written to optimize performance when it is called thousands of times per second.
I know that .net has a function called String.Join
, to which I may pass in the space character as a separator along with the original string.
Barring the use of String.Join
, I can use the StringBuilder
class to append spaces after each character.
Another way to accomplish this task is to declare a character array with 2*n-1 characters (You have to add n-1 characters for the spaces). The character array can be filled in a loop and then passed to the String constructor
.
I've written some .net code that runs each of these algorithms one millions times each with the parameter "Hello, World"
and measures how long it takes to execute. Method (3) is much, much faster than (1) or (2).
I know that (3) should be very fast because it avoids creating any additional string references to be garbage collected, but it seems to me that a built-in .net function such as String.Join
should yield good performance. Why is using String.Join
so much slower than doing the work by hand?
public static class TestClass
{
// 491 milliseconds for 1 million iterations
public static string Space1(string s)
{
return string.Join(" ", s.AsEnumerable());
}
//190 milliseconds for 1 million iterations
public static string Space2(string s)
{
if (s.Length < 2)
return s;
StringBuilder sb = new StringBuilder();
sb.Append(s[0]);
for (int i = 1; i < s.Length; i++)
{
sb.Append(' ');
sb.Append(s[i]);
}
return sb.ToString();
}
// 50 milliseconds for 1 million iterations
public static string Space3(string s)
{
if (s.Length < 2)
return s;
char[] array = new char[s.Length * 2 - 1];
array[0] = s[0];
for (int i = 1; i < s.Length; i++)
{
array[2*i-1] = ' ';
array[2*i] = s[i];
}
return new string(array);
}
Update: I have changed my project to "Release" mode and updated my elapsed times in the question accordingly.
Why is using String.Join so much slower than doing the work by hand?
The reason String.Join
is slower in this case is that you can write an algorithm that has prior knowledge of the exact nature of your IEnumerable<T>
.
String.Join<T>(string, IEnumerable<T>)
(the overload you're using), on the other hand, is intended to work with any arbitrary enumerable type, which means it cannot pre-allocate to the proper size. In this case, it's trading flexibility for pure performance and speed.
Many of the framework methods do handle certain cases where things could be sped up by checking for conditions, but this typically is only done when that "special case" is going to be common.
In this case, you're effectively creating an edge case where a hand-written routine will be faster, but it is not a common use case of String.Join
. In this case, since you know, exactly, in advance what is required, you have the ability to avoid all of the overhead required to have a flexible design by pre-allocating an array of exactly the right size, and building the results manually.
You'll find that, in general, it's often possible to write a method that will out perform some of the framework routines for specific input data. This is common, as the framework routines have to work with any dataset, which means that you can't optimize for a specific input scenario.
Your String.Join
example works on an IEnumerable<char>
. Enumerating an IEnumerable<T>
with foreach
is often slower than executing a for
loop (it depends on the the collection type and other circumstances, as Dave Black pointed out in a comment). Even if Join
uses a StringBuilder
, the internal buffer of the StringBuilder
will have to be increased several times, since the number of items to append is not known in advance.
IEnumerable<T>
. Taking a look via reflector, it does indeed use a StringBuilder, but it does not/cannot set a capacity. Only LINQ gives you a way to get the size of an enumerable, but it is an O(n) operation - Ed S. 2012-04-05 17:16
String.Join
without casting it to array or enumerable, but I was wrong. Just using String.Join(" ", s)
runs in about 40ms compared to 50ms for the code that I wrote using a character array - Daniel Allen Langdon 2012-04-05 17:21
params object[]
, but only supplying one value, so it does nothing, which of course... makes it fast :) If you need it to be fast (seems like you do) just write your own function that can take advantage of knowledge you have that the library does not - Ed S. 2012-04-05 17:22
string[]
would have been fast, but it is not - Ed S. 2012-04-05 17:26
really
feel like a tool. You should deduct some points from my rep for that :- - Daniel Allen Langdon 2012-04-05 17:31
string[]
involved; however the string could be converted to a char[]
with s.ToCharArray()
and passed to the Join
overload with the object[]
parameter - Olivier Jacot-Descombes 2012-04-05 17:38
string[]
. It doesn't really matter though; they're all significantly slower than doing it yourself - Ed S. 2012-04-05 17:46
string
s. It's also something that a lot of people do often in contexts in which performance is not an issue. it's a convenience method, not a performance method, by design - Servy 2012-04-05 19:04
Join
, not a string[]
- Olivier Jacot-Descombes 2012-04-05 19:11
string[]
- Ed S. 2012-04-05 19:20
foreach
; however, for
is about twice as fast for lists as foreach
- Olivier Jacot-Descombes 2015-08-20 19:49
Since you aren't using the Release build (which should have optimizations checked by default) and/or you're debugging through visual studio then the JITer will be prevented from making a lot of it's optimizations. Because of this you're just not getting a good picture of how long each operation really takes. Once you add in the optimizations you can get the real picture of what's going on.
It's also important that you not be debugging in visual studio. Go to the bin/release folder and double click the executable entirely outside of visual studio.
String.Join
- Ed S. 2012-04-05 17:14
In your first method, you are using the overload of String.Join
that operates on an Enumerable, which requires that the method walk the characters of the string using an enumerator. Internally, this uses a StringBuilder
as the exact number of characters is unknown.
Have you considered using the String.Join
overload that takes a string (or string array) instead? That implementation allows a fixed length buffer to be used (similar to your third method) along with some internal unsafe string operations for speed. The call would change to - String.Join(" ", s);
Without actually doing the legwork to measure, I would expect this to be on par or faster than your third approach.
The bad performance is not coming from String.Join
, but from the way you handle each character. In this case, since characters have to be handled individually, your first method will create much more intermediate strings and the second method suffers from two .Append
method calls for each character. Your third method does not involve a lots of intermediate strings or methods calls and that's the reason why your third method is the fastest.
When you have passed an IEnumerable
to String.Join
, it has no idea on how much memory needs to be allocated. I allocates a chunk of memory, resizes it if it is insufficient and repeats the process until it gets enough memory to accommodate all the strings.
The array version is faster because we know the amount of memory allocated well ahead.
Also please not that when you are running the 1st version, GC might have occurred.