Home | About Me | Developer PFE Blog | Become a Developer PFE

Contact

Categories

On this page

Archive

Blogroll

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Sign In

# Friday, July 23, 2010
Friday, July 23, 2010 2:01:29 AM (Central Daylight Time, UTC-05:00) ( Development | Performance )

This is an old topic, but as I work with developers more and more, I find that there is still some gray area on this topic.

Concatenating strings is probably one of the most comment tasks that most developers perform in their work-lives. If you've ever done some searching around, though, you'll find that there are 2 main ways that people perform those concatenations:

First, you can use the "+" or the "&" operator with the native string object:

    Sub ConcatWithStrings(ByVal max As Int32)
        Dim value As String = ""
        For i As Int32 = 0 To max
            value += i.ToString()
        Next
    End Sub

Second, you can use the System.Text.StringBuilder() object to perform the concatenation:

    Sub ConcatWithStringBuilder(ByVal max As Int32)
        Dim value As New System.Text.StringBuilder()
        For i As Int32 = 0 To max
            value.Append(i.ToString())
        Next
    End Sub

So, which is better? Well, if you do any Google'ing or Live Search'ing on the topic, you'll find some great articles on the topic. Mahesh Chand has a great article, which I'll quote:

"You can concatenate strings in two ways. First, traditional way of using string and adding the new string to an existing string. In the .NET Framework, this operation is costly. When you add a string to an existing string, the Framework copies both the existing and new data to the memory, deletes the existing string, and reads data in a new string. This operation can be very time consuming in lengthy string
concatenation operations."

Now, I'm a big fan of short descriptions like the above, but I always find that without actually showing what is happening behind the scenes, you might lose some folks in the translation. To illustrate what happens behind the scenes, I wrote a quick console application that uses the first code sample - and then took a memory dump using ADPlus to show what actually gets kept around in memory after the String level concatenation.

I'm not going to go into a lot of detail on what I did to mine through the memory dump, but if you're interested in getting down to this detail on your own - I highly recommend the book Debugging Microsoft .NET 2.0 Applications. After you read that great book, you should read Tess's Great Blog to get even more practice on this area.

In any case, what was uncovered after the memory dump was that with the listing in the first example, there will be a separate String object placed into memory each time you perform a concatenation. Here is an excerpt from that dump:

# Size Value

1 28 "0123"
1 28 "01234"
1 32 "012345"
1 32 "0123456"
1 36 "01234567"
1 36 "012345678"
1 40 "0123456789"
1 44 "012345678910"
1 48 "01234567891011"
1 52 "0123456789101112"
1 56 "012345678910111213"
.....
1 124 "0123456789101112131415161718192021222324252627282930"
1 128 "012345678910111213141516171819202122232425262728293031"
1 132 "01234567891011121314151617181920212223242526272829303132"
1 136 "0123456789101112131415161718192021222324252627282930313233"
1 140 "012345678910111213141516171819202122232425262728293031323334"
1 144 "01234567891011121314151617181920212223242526272829303132333435"
65 17940 ""012345678910111213141516171819202122232425262728293031323334353"

First, the command I used outputs the first 65 characters or so within the String object. This is why the last entry has a count of 65 instances. In any case, as you can see, there is a separate copy of each string made into memory during each concatenation operation. Ouch. This can get expensive very quickly!

Now what about the StringBuilder operation?

# Size Value

1 52 "0123456789101112"
1 84 "01234567891011121314151617181920"
3 956 "012345678910111213141516171819202122232425262728293031323334353"

Gee, that seems a lot simpler - but if you're paying attention, you'll notice that there are a few other instances of this lengthy string in memory. Any ideas why?

Well, this appears to be my bug. When you instantiate a System.Text.StringBuilder object, one of the constructors allows for a integer parameter called "Capacity". If you do not specify an initial capacity, the noargs constructor defaults to "&H10" - which is 16 characters. If, during your string operations, you exceed that 16 character capacity, the StringBuilder will create a new new string with a capacity of 32 (16 * 2) characters. The next time, you perform an operation that needs more than 32 characters, the StringBuilder will double the capacity again - this time to 64 characters. This will continue to happen over and over again as you continually append more characters to the StringBuilder object.

So, what does this mean? Well, it means that we're still not achieving the maximum efficiency here. If we want to build a String like this - without creating extra instances of the base string object - even when using a StringBuilder object, you should specify a capacity in the constructor. To prove this hypothesis, we can rewrite the second code listing to be:

    Sub ConcatWithStringBuilder(ByVal max As Int32)
        Dim value As New System.Text.StringBuilder(193)
        For i As Int32 = 0 To max
            value.Append(i.ToString())
        Next
    End Sub

Now, when we take a memory dump and look for the String objects for the above loop:

# Size Value

1 404 "012345678910111213141516171819202122232425262728293031323334353"

We see that there is only one instance of the String object in memory.

Moral of the story? Even when using the StringBuilder object, if you know the final string is going to be lengthy, you should set an initial capacity to most efficiently perform your string concatenations.

Enjoy!

Comments are closed.