.NET and Strings and some interesting tidbits
Two important things to be considered as part of the System.String or simply string implementation in .NET are:
1.) Strings are reference types (although they are primitive types)
2.) Strings are immutable -- more about this follows.
The .NET framework defines and loads it's own standard AppDomains within the execution of the CLR and the other .NET programs. The following image has been reproduced from here.
So, anytime you do a strResult = strVal1 + strVal2, you are creating a new string as expected, but compared to strVal1 = strVal1 + strVal2, you end up creating a new strVal1 while the previous value of strVal1 also remains although not pointed to by strVal1.
Here's another example, which is somewhat analogous.
string str1 = "Some String";
string str2 = "Some String"; //note str1 and str2 are literally the same.
? Object.ReferenceEquals(str1, str2) will return true in the Command window. This means that they are pointing to the same global storage of string data.
This process in .NET-speak is called "Interning".
The System.Text.StringBuilder uses a combination of preallocation of storage and string represented in array format so that repeated addition of string data is stored in subscript values instead of new strings being created. Finally if you need to get the string value back, you can use the ToString() method. The usage of this class is highly recommended, when possible.
---
I did want to write about one interesting aspect of the optimizations performed by the .NET CLR. It highlights how one has to tailor's one's mind specific to the framework of the language under consideration. The usual thought process is that C# is so similar to C, and so any C programmer can easily adapt to C# and be able to write great programs; it does hold true under most cases, but there are always cases when common logic belies observations.
Let's us consider two simple loop, placed side by side for comparison:
| int[] intArr = new int[10000]; |
Loop 1 requires the length of the array to be checked at every loop iteration to decide whether the go ahead or not, while the other pre-stores the length of the one-dimensional array and uses that to make the decision. Given a decent C background, one is bound to believe that avoiding extra comparisons will make it run faster, hence Loop 2.
Which one do YOU think will run faster in .NET?
The answer is below in white font color, highlight it with your mouse to read it...
In .NET, Loop 1 will run faster. The CLR sees that the comparison factor is the Length property of the array, which can never be an unacceptable value and consequently avoids all checks for the "ArrayIndexOutofBounds" condition. Hence it runs faster.
Enjoy.