Home | About Me | Developer PFE Blog | Become a Developer PFE

Contact

Categories

On this page

PInvoke Error in .NET 4: Array size control parameter index is out of range
The (potentially) dark side of parallelism…
Yield Return…a little known but incredible language feature
Don’t guess when it comes to performance…a RegEx story.
Micro optimization or just good coding practice?
IIS: When to enable web gardening?
LINQ: When are my database connections closed?
String Concatenation – A Performance Story

Archive

Blogroll

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Sign In

# Saturday, 19 November 2011
Saturday, 19 November 2011 07:45:14 (Central Standard Time, UTC-06:00) ( .NET 4 | Perfmon | Performance )

So in a code-base I was working in yesterday, we use PInvoke to call out to the Performance Data Helper (PDH) API’s to collect performance information for machines without using Perfmon.  One of those PInvoke calls looked like this:

/*
PDH_STATUS PdhExpandCounterPath(
LPCTSTR szWildCardPath,
LPTSTR mszExpandedPathList,
LPDWORD pcchPathListLength
);
*/
[DllImport("pdh.dll", CharSet = CharSet.Unicode)]
private static extern PdhStatus PdhExpandCounterPath(
string szWildCardPath,
[MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 3)] char[] mszExpandedPathList,
ref uint pcchPathListLength
);

In .NET 3.5 and below, this PInvoke call works perfectly fine.  In .NET 4.0, though, I saw this exception:

System.Runtime.InteropServices.MarshalDirectiveException: 
Cannot marshal 'parameter #2': Array size control parameter index is out of range.
at System.Runtime.InteropServices.Marshal.InternalPrelink(IRuntimeMethodInfo m)
at System.Runtime.InteropServices.Marshal.Prelink(MethodInfo m)

So, can you identify what’s wrong in the code above?

Well, the Array size control parameter index indicates the zero-based parameter that contains the count of the array elements, similar to size_is in COM.  Because the marshaler cannot determine the size of an unmanaged array, you have to pass it in as a separate parameter.  So in the call above, parameter #2, we specify “SizeParamIndex = 3” to reference the pcchPathListLength parameter to set the length of the array.  So what’s the catch?

Well, since the SizeParamIndex is a zero-based index, the 3rd parameter doesn’t really exist.  So, to fix this, we just change the “SizeParamIndex=3” to “SizeParamIndex=2” to reference the pcchPathListLength:

/*
PDH_STATUS PdhExpandCounterPath(
LPCTSTR szWildCardPath,
LPTSTR mszExpandedPathList,
LPDWORD pcchPathListLength
);
*/
[DllImport("pdh.dll", CharSet = CharSet.Unicode)]
private static extern PdhStatus PdhExpandCounterPath(
string szWildCardPath,
[MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 2)] char[] mszExpandedPathList,
ref uint pcchPathListLength
);

It looks like in .NET 3.5 and below, though, we allowed you to reference either 1-based index or a zero-based index but in .NET 4.0, we buttoned that up a bit and force you to use the zero-based index.    Big thanks to my co-worker and frequent collaborator, Zach Kramer for his assistance in looking at this issue.

Until Next Time!

# Sunday, 05 December 2010
Sunday, 05 December 2010 12:52:17 (Central Standard Time, UTC-06:00) ( .NET 4 | ASP.NET | Parallel | Performance )

Read a great blog entry by Scott Hanselman recently talking about the parallel dilemma that I’m sure we’ll see folks face in the future with the (old/new) Parallel classes.  I wanted to add a few things to this discussion as he focused on the mechanics of the parallel requests but maybe not the potential effects it could have on the macro view of your application.  This was originally written as an e-mail I sent to my team but thought others might find it interesting.

There will be an inclination by people to use the new Parallel functionality in .NET 4.0 to easily spawn operations onto numerous background threads.  That will generally be okay for console/winform/wpf apps – but could also be potentially bad for ASP.NET apps as the spawned threads could take away from the processing power and threads available to process new webpage requests.  I’ll explain more on that later. 

For example, by default, when you do something like Parallel.ForEach(…) or some such, the parallel library starts firing Tasks to the thread pool so that it can best utilize the processing power available on your machine (oversimplification but you get the idea).  The downside is that the thread pool contains a finite number of worker threads threads available to a process.  Granted, you have about 100 threads per logical processor in .NET 4 – but it’s worth noting.

While Scott’s entry talks about the new way to implement the Async pattern, I’ve already seen a bunch of folks use the “Parallel” class because it abstracts away some of the plumbing of the Async operations and that ease of use could become problematic. 

For example, consider this code:

string[] myStrings = { "hello", "world", "you", "crazy", "pfes", "out", "there" };

Parallel.ForEach(myStrings, myString =>
{

System.Console.WriteLine(DateTime.Now + ":" + myString +
" - From Thread #" +
Thread.CurrentThread.ManagedThreadId);
Thread.Sleep(new Random().Next(1000, 5000));

});

This is a very simple implementation of Parallel’izing a foreach that just writes some string output with an artificial delay.  Output would be something like:

11/16/2010 2:40:05 AM:hello - From Thread #10
11/16/2010 2:40:05 AM:crazy - From Thread #11
11/16/2010 2:40:05 AM:there - From Thread #12
11/16/2010 2:40:06 AM:world - From Thread #13
11/16/2010 2:40:06 AM:pfes - From Thread #14
11/16/2010 2:40:06 AM:you - From Thread #12
11/16/2010 2:40:07 AM:out - From Thread #11 

Note the multiple thread ids and extrapolate that out to a server that has more than just my paltry 2 CPUs.  This can be potentially problematic for ASP.NET applications as you have a finite number of worker threads available in your worker process and they must be shared across not just one user but hundreds (or even thousands).  So, we might see that spawning an operation across tons of threads can potentially reduce the scalability of your site.

Fortunately, there is a ParallelOptions class where you can set the degree of parallel’ism.  Updated code as follows:

string[] myStrings = { "hello", "world", "you", "crazy", "pfes", "out", "there" };

ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 1;

Parallel.ForEach(myStrings,options, myString =>
{

// Nothing changes here
...

});

This would then output something like:

11/16/2010 2:40:11 AM:hello - From Thread #10
11/16/2010 2:40:12 AM:world - From Thread #10
11/16/2010 2:40:16 AM:you - From Thread #10
11/16/2010 2:40:20 AM:crazy - From Thread #10
11/16/2010 2:40:23 AM:pfes - From Thread #10
11/16/2010 2:40:26 AM:out - From Thread #10
11/16/2010 2:40:29 AM:there - From Thread #10

Since I set the MaxDegreeOfParallelism to “1”, we see that it just uses the same thread over and over.  Within reason, that setting *should* correspond to the number of threads it will use to handle the request. 

Applying to a website

So, let’s apply the code from the above to a simple website and compare the difference between the full parallel implementation and the non-parallel implementation.  The test I used ran for 10 minutes with a consistent load of 20 users on a dual-core machine running IIS 7.

In all of the images below, the blue line (or baseline) represents the single-threaded implementation and the purple line (or compared) represents the parallel implementation

We’ll start with the request execution time.  As we’d expect, the time to complete the request decreases significantly with the parallel implementation.

__GREGVAR-PC_ASP_NETv4_0_30319_RequestExecutionTime

But what is the cost from a thread perspective?  For that, we’ll look at the number of physical threads:

__GREGVAR-PC__NETCLRLocksAndThreads_w3wp__NumofcurrentphysicalThreads

As we’d also expect, there is a significant increase in the number of threads used in the process.  We go from ~20 threads in the process to a peak of almost 200 threads throughout the test.  Seeing as this was run on a dual-core machine, we’ll have a maximum of 200 worker threads available in the thread pool.  After those threads become depleted, you often see requests start getting queued, waiting for a thread to become available.  So, what happened in our simple test?  We’re look at the requests queued value for that:

__GREGVAR-PC_ASP_NETv4_0_30319_RequestsQueued 

We did, in-fact, start to see a small number of requests become queued throughout our test.  This indicates that some requests started to pile up waiting for an available thread to become available. 

Please note that I’m NOT saying that you should not use Parallel operations in your website.  You saw in the first image that the actual request execution time decreased significantly from the non-parallel implementation to the parallel implementation. But it’s important to note that nothing is free and while parallelizing your work can and will improve the performance of a single request, it should also be weighed against the potential performance of your site overall.

Until next time.

# Sunday, 17 October 2010
Sunday, 17 October 2010 17:54:49 (Central Daylight Time, UTC-05:00) ( .NET | Performance )

“We shall neither fail nor falter; we shall not weaken or tire…give us the tools and we will finish the job.” – Winston Churchill

I don’t often blog about specific language features but over the past few weeks I’ve spoken to a few folks that did not know of the “yield” keyword and the “yield return” and “yield break” statements, so I thought it might be a good opportunity to shed some light on this little known but extremely useful C# feature.  Chances are, you’ve probably indirectly used this feature before and just never known it.

We’ll start with the problem I’ve seen that plagues many applications.  Often times, you’ll call a method that returns a List<T> or some other concrete collection.  The method probably looks something like this:

public static List<string> GenerateMyList()
{
List<string> myList = new List<string>();

for (int i = 0; i < 100; i++)
{

myList.Add(i.ToString());

}

return myList;

}

I’m sure your logic is going to be significantly more complex than what I have above but you get the idea.  There are a few problems and inefficiencies with this method.  Can you spot them?

  • The entire List<T> must be stored in memory.
  • Its caller must wait for the List<T> to be returned before it can process anything.
  • The method itself returns a List<T> back to its caller.

As an aside – with public methods, you should strive to not return a List<T> in your methods.  Full details can be found here.  The main idea here is that if you choose to change the method signature and return a different collection type in the future, this would be considered a breaking change to your callers.

In any case, I’ll focus on the first two items in the list above.  If the List<T> that is returned from the GenerateMyList() method is large then that will be a lot of data that must be kept around in memory.  In addition, if it takes a long time to generate the list, your caller is stuck until you’ve completely finished your processing.

Instead, you can use that nifty “yield” keyword.  This allows the GenerateMyList() method to return items to its caller as they are being processed.  This means that you no longer need to keep the entire list in memory and can just return one item at a time until you get to the end of your returned items.  To illustrate my point, I’ll refactor the above method into the following:

private static IEnumerable<string> GenerateMyList()
{
for (int i = 0; i < 100; i++)
{
string value = i.ToString();
Console.WriteLine("Returning {0} to caller.", value);
yield return value;
}

Console.WriteLine("Method done!");

yield break;
}

A few things to note in this method.  The return type has been changed to an IEnumerable<string>.  This is one of those nifty interfaces that exposes an enumerator.  This allows its caller to cycle through the results in a foreach or while loop.  In addition, the “yield return i.ToString()” will return that item to its caller at that point and not when the entire method has completed its processing.  This allows for a very exciting caller-callee type relationship.  For example, if I call this method like so:

IEnumerable<string> myList = GenerateMyList();

foreach (string listItem in myList)
{
Console.WriteLine("Item: " + listItem);
}

The output would be:

image

Thus showing that each item gets returned at the time we processed it.  So, how does this work?  Well, at compile time, we will generate a class to implement the behavior in the iterator.  Essentially, this means that the GenerateMyList() method body gets placed into the MoveNext() method.  In-fact, if you open up the compiled assembly in Reflector, you see that plumbing in place (comments are mine and some code was omitted for clarity’s sake):

private bool MoveNext()
{

this.<>1__state = -1;
this.<i>5__1 = 0;
// My for loop has changed to a while loop.
while (this.<i>5__1 < 100)
{
// Sets a local value
this.<value>5__2 = this.<i>5__1.ToString();
// Here is my Console.WriteLine(...)
Console.WriteLine("Returning {0} to caller.", this.<value>5__2);
// Here is where the current member variable
// gets stored.
this.<>2__current = this.<value>5__2;
this.<>1__state = 1;
// We return "true" to the caller so it knows
// there is another record to be processed.
return true;
...
}
// Here is my Console.WriteLine() at the bottom
// when we've finished processing the loop.
Console.WriteLine("Method done!");
break;


}

Pretty straightforward.  Of course, the real power is that it the compiler converts the “yield return <blah>” into a nice clean enumerator with a MoveNext().  In-fact, if you’ve used LINQ, you’ve probably used this feature without even knowing it.  Consider the following code:

private static void OutputLinqToXmlQuery()
{
XDocument doc = XDocument.Parse
(@"<root><data>hello world</data><data>goodbye world</data></root>");

var results = from data in doc.Descendants("data")
select data;

foreach (var result in results)
{
Console.WriteLine(result);
}
}

The “results” object, by default will be of type “WhereSelectEnumerableIterator” which exposes a MoveNext() method.  In-fact, that is also why the results object doesn’t allow you to do something like this:

var results = from data in doc.Descendants("data")
select data;

var bad = results[1];

The IEnumerator does not expose an indexer allowing you to go straight to a particular element in the collection because the full collection hasn’t been generated yet.  Instead, you would do something like this:

var results = from data in doc.Descendants("data")
select data;

var good = results.ElementAt(1);

And then under the covers, the ElementAt(int) method will just keep calling MoveNext() until it reaches the index you specified.  Something like this:

Note:  this is my own code and is NOT from the .NET Framework – it is merely meant to illustrate a point.

public static XElement MyElementAt(this IEnumerable<XElement> elements, 
int index)
{
int counter = 0;
using (IEnumerator<XElement> enumerator =
elements.GetEnumerator())
{
while(enumerator.MoveNext()){
if (counter == index)
return enumerator.Current;
counter++;
}
}

return null;
}

Hope this helps to demystify some things and put another tool in your toolbox.

Until next time.

# Sunday, 19 September 2010
Sunday, 19 September 2010 13:30:02 (Central Daylight Time, UTC-05:00) ( Performance | RegEx )

Ricky_Bobby_I_wanna_go_fastMany times, when I work with a customer, it’s because they’ve tried to accomplish something and need a little extra help.  Often, this falls into the application optimization area.  For example, a few years ago, I had a customer that was developing a rather sophisticated SharePoint workflow that had some custom code that would process and merge two Excel spreadsheets together.  They were using Excel 2007 so their merging was being done using the excellent Open XML SDK.  To their credit, the application did what it needed to do – but it took about an hour to process these spreadsheets.  The developers on the project knew about the performance problems but as so often happens, they thought they knew where the bottlenecks were and how they should approach optimizing it.  So, they started injecting some tracing into their code and worked hard to optimize this lengthy process.  After a while, though, they had only shaved a few seconds off of that 60 minute time and while they did show some improvement – they knew they needed to get the processing done even faster for this to become a viable solution for their organization.  So, they sent me a simple repro of the code and together in just a few short days, we were able to get the processing from 60 minutes to under a minute.  That’s a BIG win.  Big like the Titanic big. 

I love these types of engagements because, well, I like to make things faster.  The biggest problem that I see, though, is that some people shy away from using the tools in optimization scenarios because they’ve been so invested in their code that they think they know why it’s not performing.  This is the “psychic” effect and the mindset is usually something like this: 

“I wrote this darn code and while I was writing it – I knew this method could be improved so now I’m going to finally optimize the darn thing.” 

It sounds good in-theory, right?  You wrote the code, so you should know how to improve it and really, what’s a tool going to tell you that you don’t already know?  In truth, it can tell you quite a bit.  In other situations, the developers will add some instrumentation (via tracing/debugging statements) to what they perceive as the critical code paths and the resultant timing points to one area when the problem really resides in a completely different section of code.  That’s right, folks, your tracing statements may be lying to you.  So what’s a developer to do?  Well, use the tools, of course – the right tool for the right job, as they say.  Let me expand upon that in the context of some code that yours truly wrote a little while ago.

The Problem

A few short weeks ago, I posted a entry about parsing your ASP.NET event log error messages.  For those that didn’t read it, we just convert the EventLog messages to XML, parse the messages using RegEx.Match and then generate some statistics on them.  The code I provided on that blog entry appeared to be very fast when it was processing a few records – but as the XML files grew, I started noticing that it seemed to take longer and longer to process the results.  So, I did what any developer does initially.  We’ll add some instrumentation into our code to see if we can figure out the problem.  This usually takes the form of something like the following:

Console.WriteLine("Start:  Load XDocument." + DateTime.Now.ToString());
XDocument document = XDocument.Load(@"C:\Work\TestData\AllEVTX\MyData.xml");
Console.WriteLine("End: Load XDocument." + DateTime.Now.ToString());

Console.WriteLine("Start: Load Messages into object." + DateTime.Now.ToString());
var messages = from message in document.Descendants("Message")
select EventLogMessage.Load(message.Value);
Console.WriteLine("End: Load Messages into object." + DateTime.Now.ToString());

Console.WriteLine("Start: Query Objects." + DateTime.Now.ToString());
var results = from log in messages
group log by log.Exceptiontype into l
orderby l.Count() descending, l.Key
select new
{
ExceptionType = l.Key,
ExceptionCount = l.Count()
};
Console.WriteLine("End: Query Objects." + DateTime.Now.ToString());

Console.WriteLine("Start: Output Exception Type Details." + DateTime.Now.ToString());
foreach (var result in results)
{
Console.WriteLine("{0} : {1} time(s)",
result.ExceptionType,
result.ExceptionCount);
}
Console.WriteLine("End: Output Exception Type Details." + DateTime.Now.ToString());

As you can see, all I’ve done is taken the code and slapped some Console.WriteLine statements with a DateTime.Now call to find out when the operation starts and when it completes.  If I run this code I get the following timings:

Start:  Load XDocument.9/19/2010 11:28:08 AM
End:  Load XDocument.9/19/2010 11:28:09 AM
Start:  Load Messages into object.9/19/2010 11:28:09 AM
End: Load Messages into object.9/19/2010 11:28:09 AM
Start:  Query Objects.9/19/2010 11:28:09 AM
End:  Query Objects.9/19/2010 11:28:09 AM
Start:  Output Exception Type Details.9/19/2010 11:28:09 AM
End:  Output Exception Type Details.9/19/2010 11:28:49 AM

This is clearly a problem  Now, the question is why?

The Research

This might lead you to believe that the simple loop I have to get the ExceptionType and the ExceptionCount was the root of all evil, so to speak.  The problem, though, is that there’s not really a lot you can do to improve this:

foreach (var result in results)
{
Console.WriteLine("{0} : {1} time(s)",
result.ExceptionType,
result.ExceptionCount);
}

Oh sure, you could use the excellent Parallel functionality in .NET 4 to send off the work to alternate threads.  So, you might do something like this:

Parallel.ForEach(results, result=> {

Console.WriteLine("{0} : {1} times(s)",
result.ExceptionType,
result.ExceptionCount);

});

But if you re-run the test, you get the following results:

Start:  Load XDocument.9/19/2010 11:46:20 AM
End:  Load XDocument.9/19/2010 11:46:20 AM
Start:  Load Messages into object.9/19/2010 11:46:20 AM
End: Load Messages into object.9/19/2010 11:46:20 AM
Start:  Query Objects.9/19/2010 11:46:20 AM
End:  Query Objects.9/19/2010 11:46:20 AM
Start:  Output Exception Type Details.9/19/2010 11:46:20 AM
End:  Output Exception Type Details.9/19/2010 11:46:54 AM

Wait a second…is it taking longer now???  That’s just not right.  At this point, you should take a step back and tell yourself “STOP GUESSING AND JUST USE THE PROFILER.”  So, always being one to listen to myself, I kick off the Visual Studio profiler and immediately it shows us the “hot path”.  This was actually a nice improvement in the VS2010 profiler.  Right on the summary page, it will show us our most expensive call paths:

image

But wait, it’s pointing to the RegEx.Match(…).  Why is it pointing to that?  From my own metrics above, I see that the loading of the Message strings into an object takes less than a second to execute.  Well, the real reason is that LINQ uses kind of a lazy loading algorithm.  Basically, it won’t necessarily process your queries until you try to do something with the data.  Yes, that’s an over-simplification but in our case, it means that my EventLogMessage.Load(…) method won’t be called until I actually try to do something with the data.  So, now, armed with this information, I can take a look at my Load() method and see what it’s actually doing and how it’s using the RegEx.Match(…) functionality:

Match myMatch = s_regex.Match(rawMessageText);

EventLogMessage message = new EventLogMessage();

message.Eventcode = myMatch.Groups["Eventcode"].Value;
message.Eventmessage = myMatch.Groups["Eventmessage"].Value;
message.Eventtime = myMatch.Groups["Eventtime"].Value;
message.EventtimeUTC = myMatch.Groups["EventtimeUTC"].Value;
...

return message;

So, basically we’re using the Match(…) to take that Message property and parse it out into the properties of the EventLogMessage object.  The fact that this is the slowest part of the code shouldn’t necessarily shock you.  Jeff Atwood wrote up a good blog entry a few years ago on something like this.  If we look just at the RegEx.Match(…) in the profiler, we see that the problem isn’t necessarily with each Match call but the overall cost with 10,000+ calls:

Function Name Number of Calls Min Elapsed Inclusive Time Avg Elapsed Inclusive Time Max Elapsed Inclusive Time
System.Text.RegularExpressions.Regex.Match(string)
10,881 1.01 3.27 684.29

The Fix

So, now that we know the problem and the reason for it – what’s a Dev to do?  Well, at this point, we should be thinking of an alternate way of performing this parsing without using RegEx.  The simplest method is just to use some general string parsing, so we’ll start off with replacing our large RegEx pattern with a simple string array:

public static readonly string[] ParserString = new string[] {
@"Event code:" ,
@"Event message:" ,
@"Event time:" ,
@"Event time (UTC):" ,
@"Event ID:" ,
...
@"Stack trace:" ,
@"Custom event details:"};

This array will be used in our fancy new GetValue method:

private static string GetValue(string rawMessageText, int Key)
{
int startLoc = rawMessageText.IndexOf(ParserString[Key]);
int endLoc;
if (Key + 1 == ParserString.Length)
endLoc = rawMessageText.Length;
else
endLoc = rawMessageText.IndexOf(ParserString[Key + 1], startLoc);

return rawMessageText.Substring(startLoc + ParserString[Key].Length, endLoc - startLoc - ParserString[Key].Length);
}

This method accepts our raw message string and a key index.  This method just finds the existence of a string like “Event message:” in our raw message and then finds the index of the next string, like “Event time:” and subtracts the two to get at the value of the field.  For example, given the following string:

“… Event message: An unhandled exception has occurred. Event time: …”

The red text are the keys and the highlighted blue text is the string between them.  The idea for the above GetValue(…) method was actually provided by a fellow PFE engineer, Richard Lang during a late night chat session.

The last step to this process is just to call the GetValue(…) method from our new Load method:

public static EventLogMessage Load(string rawMessageText)
{
EventLogMessage message = new EventLogMessage();

int Key = 0;
message.Eventcode = GetValue(rawMessageText, Key++);
message.Eventmessage = GetValue(rawMessageText, Key++); ;
message.Eventtime = GetValue(rawMessageText, Key++); ;
...
return message;

}

So, now we’ve essentially removed the need for RegEx by implementing our own string parsing algorithm.  Once we compile the code and run our application again, we see some major improvements:

Start:  Load XDocument.9/19/2010 12:55:31 PM
End:  Load XDocument.9/19/2010 12:55:31 PM
Start:  Load Messages into object.9/19/2010 12:55:31 PM
End: Load Messages into object.9/19/2010 12:55:31 PM
Start:  Query Objects.9/19/2010 12:55:31 PM
End:  Query Objects.9/19/2010 12:55:31 PM
Start:  Output Exception Type Details.9/19/2010 12:55:31 PM
End:  Output Exception Type Details.9/19/2010 12:55:33 PM

We essentially improved the processing of our records from 40 seconds to under 2 seconds.  I’d say that’s a pretty big improvement.  Even better, we can use our Parallel.ForEach(…) code from above to make this even faster since we’re no longer bound by the RegEx parser:

Start:  Load XDocument.9/19/2010 1:00:13 PM
End:  Load XDocument.9/19/2010 1:00:13 PM
Start:  Load Messages into object.9/19/2010 1:00:13 PM
End: Load Messages into object.9/19/2010 1:00:13 PM
Start:  Query Objects.9/19/2010 1:00:13 PM
End:  Query Objects.9/19/2010 1:00:13 PM
Start:  Output Exception Type Details.9/19/2010 1:00:13 PM
End:  Output Exception Type Details.9/19/2010 1:00:14 PM

So now it takes just about a second to process these records.  Considering it’s processing over 10,000 event log messages, I’d say this is acceptable performance, for now.

Closing Comments

I just want to say a few things real quick.  RegEx is not inherently evil.  It is still one of the fastest and easiest methods to consume and parse data.  You should not feel the need to go back to your own applications that are working just fine and refactor all of your code to strip out the RegEx expressions.  It just so happens that sometimes too much of a good thing can be bad for your health.  In our case, with over 10,000 RegEx.Match(…) calls in rapid succession, the RegEx appeared to be our bottleneck.  This may or may not be the root cause of your own performance problems.  The key takeaway from this blog entry should be that you should NOT guess when it comes to optimizing code paths.  Instead, you should use the tools available at your disposal to find the bottleneck. 

Until next time.

# Friday, 20 August 2010
Friday, 20 August 2010 23:36:27 (Central Daylight Time, UTC-05:00) ( .NET | Best Practice | C# | Code Reviews | Development | Performance )

cheetah This is a common topic and I thought I’d write up some thoughts I have on it.  In-fact, I was just working with a customer on improving their code reviews and what they should be checking for and the question arose - “Should performance be targeted during a code review?”  It’s an interesting question.  I’m a big fan of performance testing early and often and not waiting until the end of a dev cycle but code reviews, IMO, should focus on logic, maintainability and best practices.  I may be in the minority and if you look around the web, you’ll see varying opinions on the topic.  For example, one of the PAG articles states:

“Code reviews should be a regular part of your development process. Performance and scalability code reviews focus on identifying coding techniques and design choices that could lead to performance and scalability issues. The review goal is to identify potential performance and scalability issues before the code is deployed. The cost and effort of fixing performance and scalability flaws at development time is far less than fixing them later in the product deployment cycle.

Avoid performance code reviews too early in the coding phase because this can restrict your design options. Also, bear in mind that that performance decisions often involve tradeoffs. For example, it is easy to reduce maintainability and flexibility while striving to optimize code.”

As I mentioned above, I am a huge proponent of performance analysis and optimization many times throughout a typical product development cycle.  I can say with a fair amount of certainty that if you don’t build performance reviews into your project plan at regular intervals, you will hit some problem (or multiple problems) in production and have to refactor some code. 

Circling back to the original question, though, are code reviews the place for performance analysis?  Typically, I’d recommend using them to squash little bits of bad code but maintainability and code-cleanliness should be first and foremost in your minds.  That said, if you see a pattern that you know can be improved, by all means bring it up.  What’s an example of that type of situation? 

Let’s take a look at predicates, specifically their usage in the Find method of a List<T>.  If you’re not aware, the Find() method performs a linear search through all of the items until it finds the first match – then it returns.  This makes it a O(n) operation where “n” is the number of items in the list.  Basically, this means that the more items you have in the list, the longer a Find() operation can potentially take.  So, if we slam about 10,000 elements into a list:

private static List<Data> LoadList()
{
List<Data> myList = new List<Data>();
for (int i = 0; i < 10000; i++)
{
myList.Add(new Data() { Id = "Id" + i.ToString(),
Value = "Value" + i.ToString() });
}

return myList;
}

Then, if someone wants to return the instance of the Data class that contains an Id of say “Id10000”, they might write the following code:

static Data Find1(List<Data> myList, string idToFind)
{
Data data = myList.Find(s =>
s.Id.ToLower() ==
idToFind.ToLower());

return data;
}

Now, keep in mind that the predicate is executed for each element in the List<T> until it finds the instance you care about.  With that in mind, we would probably want to refactor out the “idToFind.ToLower()” above the predicate since that value isn’t changing.  So, you might end-up with something like this:

static Data Find2(List<Data> myList, string idToFind)
{

idToFind = idToFind.ToLower();

Data data = myList.Find(s =>
s.Id.ToLower() ==
idToFind);

return data;
}

Another route you may want to go is just to use the string.Equals(…) method to perform the comparison.  That would look like:

static Data Find3(List<Data> myList, string idToFind)
{

Data data = myList.Find(s =>
string.Equals(
s.Id,
idToFind,
StringComparison.
InvariantCultureIgnoreCase)
);

return data;

}

Fact is, the last method IS the fastest way to perform the operation.  I can say that without even needing to run it through a profiler.  But if you don’t believe me…  

Function Name
Elapsed
Inclusive Time
...Find1(System.Collections.Generic.List`1<....Data>,string)
6.34
...Find2(System.Collections.Generic.List`1<....Data>,string)
4.47
...Find3(System.Collections.Generic.List`1<....Data>,string)
3.65

That’s something I might put into the category of a micro-optimization AND just good coding practice.  But is this something that should be caught during a code review?  I’d say “yes” because logically it all makes sense and none of the solutions would really hurt maintainability or readability of the code.
 
So, I’d tag this as a good coding practice.  Other thoughts on the topic?
 
Enjoy!
# Thursday, 29 July 2010
Thursday, 29 July 2010 18:50:09 (Central Daylight Time, UTC-05:00) ( Best Practice | IIS | Performance )

I seem to get this question a lot and come across many customer environments where they have enabled web gardening thinking that it will automagically improve the performance for their site/application.

webgardening

Most time, that is not the case.  The funny thing is that once I finally convince them that web gardening is not the way to go, they try to apply that same knowledge to other sites and applications in their environment.  When this happens, I’ll get an e-mail or phone call asking for some guidelines on when to enable web gardening.

We typically recommend using Web Gardening as a stop-gap (or workaround) for when a customer has a core issue that is limiting their website and web application scalability.

For example, if a customer has a memory issue that is causing OutOfMemoryExceptions in their main website – we may recommend web gardening to spread the load across multiple worker processes while we assist them in resolving the core memory issue.  Please note that this would also increase the memory and processor utilization on the server and in some cases might not be viable.

As a best practice, create Web gardens only for Web applications that meet the following criteria (taken from here):

  • The application runs multi-instantiated, so that a different instance of the application can be assigned to each worker process.
  • The Web application is not CPU-intensive. If the CPU is the bottleneck, then adding worker processes cannot help improve performance.
  • The application is subject to synchronous high latency. For example, if an application calls a back-end database and the response is slow, then a Web garden supports other concurrent connections without waiting for the slow connection to complete.

A good discussion of why not to use Web Gardening can be found here as well:  http://blogs.technet.com/b/mscom/archive/2007/07/10/gardening-on-the-web-server.aspx

Enjoy!

# Friday, 23 July 2010
Friday, 23 July 2010 02:02:43 (Central Daylight Time, UTC-05:00) ( Database | Development | LINQ | Performance )

So, I was working with a customer who is writing their first application using LINQ. They had previously been bitten by the failure to close and dispose their SqlConnection objects. This is actually a fairly common problem and usually leads to those pesky SqlExceptions detailing that there are no connections left in the pool.

So, since LINQ to SQL abstracts out much of the direct database interaction, they were concerned about when the underlying SqlConnections are closed. I will walk through how I answered their question using a few of my favorite tools:

To start off, I created a simple SQL Table called Users:

UserTable

Then, I created a simple LINQ to SQL dbml:

Linq2SqlDBML

Now that the plumbing is in place, I can write some simple code to return the data from the table and display it to the console window:

   1: LinqConnectionSampleDataContext db = 
   2:     new LinqConnectionSampleDataContext();
   3:  
   4: Table<User> users = db.GetTable<User>();
   5:  
   6: IQueryable<User> userQuery =
   7:     from user in users
   8:     orderby user.firstName
   9:     select user;
  10:     
  11:  
  12: foreach (User user in userQuery)
  13: {
  14:  
  15:     Console.WriteLine("ID={0}, First Name={1}", 
  16:                         user.id, 
  17:                         user.firstName);

So, now when the application is executed, the output is as follows:

ConsoleSampleOutput

So, since Linq to Sql uses an underlying SqlConnection to do its work, we can set a breakpoint on the Close() method of that class in WinDBG. If you are unfamiliar with this great debugging tool, you can find a simple walkthrough on how to set it up here.

There are a number of ways to set a breakpoint in managed code in WinDBG. Here are the steps that I followed:

Step 1. Launch WinDBG and attach to the process in question.

Windbg_AttachToProcess_RedCircle

Step 2. Load the SOS extension into WinDBG by executing:

.loadby sos mscorwks

Step 3. Set the managed breakpoint using the !bpmd command. For this step, the !bpmd command accepts a variety of parameters. Basically, you can pass it either:

  • MethodDescr address.

  • Combination of Module Name and Managed Function Name

I chose the latter method because it’s relatively quick and I knew exactly what I wanted. So, the syntax for this method is:

!bpmd <module name> <managed function name>

You can get the module name from visiting the SqlConnection page up on MSDN. On this page, we can get the module name and the namespace to the class:

MSDN_sqlconnection

From this, we can get both parameters necessary:

  • Module Name: System.Data.dll
  • Managed Function Name: System.Data.SqlClient.SqlConnection.Close

So, our command in WinDBG becomes:

   1: !bpmd System.Data.dll 
   2: System.Data.SqlClient.SqlConnection.Close

Once you enter in this command, you should get output similar to the following in the WinDBG window:

0:014> !bpmd System.Data.dll System.Data.SqlClient.SqlConnection.Close
Found 1 methods...
MethodDesc = 544a0418
Setting breakpoint: bp 5455DC80 [System.Data.SqlClient.SqlConnection.Close()]

Step 4. “Go” in the debugger and wait for your breakpoint to be hit.

For this, the command is simply “g”.

0:014> g

Eventually, your breakpoint will be hit in the debugger and you should get output similar to the following:

Breakpoint 0 hit
eax=5457da68 ebx=04d7e9dc ecx=0185cd30 edx=018e56b0 esi=01870d80 edi=04d7e9a4
eip=5455dc80 esp=04d7e860 ebp=04d7e868 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
System_Data_ni+0xcdc80:
5455dc80 55              push    ebp

Step 5. Print out the call-stack.

The command to print out the call stack in SOS and WinDBG is “!clrstack”:

0:008> !clrstack

This will print out the managed call stack, which turns out to be:

OS Thread Id: 0x1d70 (8)
ESP       EIP     
04d7e860 5455dc80 System.Data.SqlClient.SqlConnection.Close()
04d7e864 77e20586 System.Data.Linq.SqlClient.SqlConnectionManager
                    .CloseConnection()
04d7e870 77e20554 System.Data.Linq.SqlClient.SqlConnectionManager
                    .ReleaseConnection(...)
04d7e87c 77e1da35 System.Data.Linq.SqlClient.
        ObjectReaderCompiler+ObjectReaderSession`1[...].Dispose()
04d7e888 77e1ddac System.Data.Linq.SqlClient.
        ObjectReaderCompiler+ObjectReaderSession`1[...].CheckNextResults()
04d7e894 77e1df2c System.Data.Linq.SqlClient.
        ObjectReaderCompiler+ObjectReaderBase`1[...].Read()
04d7e8a0 77e1ea2d System.Data.Linq.SqlClient.
        ObjectReaderCompiler+ObjectReader`2[...].MoveNext()
04d7e8ac 004f1a12 LINQ.SqlConnection.Program.Main(System.String[])

So, if you’re having trouble parsing this, the take away here is that when you iterate through a Linq resultset and you get to the end, the ObjectReaderSession will automatically close the Connection to the database.

Now, this is a simple HelloWorld code sample for retrieving a result-set and there are obviously a number of ways to do the same thing. The customer’s code was closer to the following:

   1: using (IEnumerator<User> enumerator = 
   2:           context.ExecuteQuery<User>(sqlStatement).GetEnumerator())
   3: {
   4:  
   5:        while (enumerator.MoveNext())
   6:        {
   7:  
   8:               // Do something here
   9:  
  10:         }
  11:  
  12: }

In this situation, we get an IEnumerator<T> back from the database call and iterate through it. Now, this part is very important. If you are iterating through the result set to completion – the connection will be closed the same as the above. However, if you do something like this:

   1: using (IEnumerator<User> enumerator = 
   2:     db.ExecuteQuery<User>(sqlStatement).GetEnumerator())
   3: {
   4:     while (enumerator.MoveNext())
   5:     {
   6:  
   7:         Console.WriteLine("ID={0}, First Name={1}", 
   8:             enumerator.Current.id, 
   9:             enumerator.Current.firstName);
  10:         
  11:         // Stop iterating after this record.
  12:         break;
  13:  
  14:     }
  15:  
  16: }

Please note the “break” statement. Essentially, if you are NOT iterating through to completion, the call stack looks like:

OS Thread Id: 0x251c (11)
ESP       EIP     
0522e73c 5455dc80 System.Data.SqlClient.SqlConnection.
                    Close()
0522e740 77e20586 System.Data.Linq.SqlClient.SqlConnectionManager.
                    CloseConnection()
0522e74c 77e20554 System.Data.Linq.SqlClient.SqlConnectionManager.
                    ReleaseConnection(...)
0522e758 77e1da35 System.Data.Linq.SqlClient.ObjectReaderCompiler+
                    ObjectReaderSession`1[...].Dispose()
0522e764 77e1ea12 System.Data.Linq.SqlClient.ObjectReaderCompiler+
                    ObjectReader`2[...].Dispose()
0522e768 00691bde LINQ.SqlConnection.Program.Main(System.String[])

The connection will NOT be closed until you call Dispose() on the ObjectReader (IEnumerable) object. This means that if you happen to write some code without the Using… statement when returning data like this:

   1: IEnumerator<User> enumerator =
   2:     db.ExecuteQuery<User>(sqlStatement).GetEnumerator();
   3:  
   4: while (enumerator.MoveNext())
   5: {
   6:  
   7:     Console.WriteLine("ID={0}, First Name={1}",
   8:         enumerator.Current.id,
   9:         enumerator.Current.firstName);
  10:  
  11:     // Stop iterating after this record.
  12:     break;
  13:  
  14: }

The SqlConnection.Close() method will NOT be called. This is because you have full control over the lifetime of the IEnumerator<T> object and you should know when you are done with it.

Now, along those lines, you may be asking yourself – what if I did something like this:

   1: LinqConnectionSampleDataContext db = 
   2:     new LinqConnectionSampleDataContext();
   3:  
   4: Table<User> users = db.GetTable<User>();
   5:  
   6: IQueryable<User> userQuery =
   7:     from user in users
   8:     orderby user.firstName
   9:     select user;
  10:     
  11:  
  12: foreach (User user in userQuery)
  13: {
  14:  
  15:     Console.WriteLine("ID={0}, First Name={1}", 
  16:                         user.id, 
  17:                         user.firstName);
  18:  
  19:     break;
  20: }

Where you break before you iterate through to completion? In that situation, Dispose() will still be called on the IQueryable<T> object. How? Because of a compile-time optimization we do. We insert a finally statement after the userQuery has been used. This compiles down to (in IL):

 try{
L_005d: br.s L_0084 L_005f: ldloc.s CS$5$0002 L_0061: callvirt instance !0...get_Current() L_0066: stloc.3 L_0067: ldstr "ID={0}, First Name={1}" L_006c: ldloc.3 L_006d: callvirt instance int32 LINQ.SqlConnection.User::get_id() L_0072: box int32 L_0077: ldloc.3 L_0078: callvirt instance string LINQ.SqlConnection.User::get_firstName() L_007d: call void [mscorlib]System.Console::WriteLine(string, object, object) L_0082: br.s L_008d L_0084: ldloc.s CS$5$0002 L_0086: callvirt instance bool [mscorlib]System.Collections.IEnumerator::MoveNext() L_008b: brtrue.s L_005f L_008d: leave.s L_009b
}finally{
L_008f: ldloc.s CS$5$0002 L_0091: brfalse.s L_009a L_0093: ldloc.s CS$5$0002 L_0095: callvirt instance void [mscorlib]System.IDisposable::Dispose() L_009a: endfinally
} L_009b: ret .try L_005d to L_008f finally handler L_008f to L_009b

The text in red is my emphasis. So, the moral of this story, when you take control of the data yourself, you MUST call dispose on the IEnumerable<T> object when you are done with it.

Enjoy!

Friday, 23 July 2010 02:01:29 (Central Daylight Time, UTC-05:00) ( Development | Performance )

This is an old topic, but as I work with developers more and more, I find that there is still some gray area on this topic.

Concatenating strings is probably one of the most comment tasks that most developers perform in their work-lives. If you've ever done some searching around, though, you'll find that there are 2 main ways that people perform those concatenations:

First, you can use the "+" or the "&" operator with the native string object:

    Sub ConcatWithStrings(ByVal max As Int32)
        Dim value As String = ""
        For i As Int32 = 0 To max
            value += i.ToString()
        Next
    End Sub

Second, you can use the System.Text.StringBuilder() object to perform the concatenation:

    Sub ConcatWithStringBuilder(ByVal max As Int32)
        Dim value As New System.Text.StringBuilder()
        For i As Int32 = 0 To max
            value.Append(i.ToString())
        Next
    End Sub

So, which is better? Well, if you do any Google'ing or Live Search'ing on the topic, you'll find some great articles on the topic. Mahesh Chand has a great article, which I'll quote:

"You can concatenate strings in two ways. First, traditional way of using string and adding the new string to an existing string. In the .NET Framework, this operation is costly. When you add a string to an existing string, the Framework copies both the existing and new data to the memory, deletes the existing string, and reads data in a new string. This operation can be very time consuming in lengthy string
concatenation operations."

Now, I'm a big fan of short descriptions like the above, but I always find that without actually showing what is happening behind the scenes, you might lose some folks in the translation. To illustrate what happens behind the scenes, I wrote a quick console application that uses the first code sample - and then took a memory dump using ADPlus to show what actually gets kept around in memory after the String level concatenation.

I'm not going to go into a lot of detail on what I did to mine through the memory dump, but if you're interested in getting down to this detail on your own - I highly recommend the book Debugging Microsoft .NET 2.0 Applications. After you read that great book, you should read Tess's Great Blog to get even more practice on this area.

In any case, what was uncovered after the memory dump was that with the listing in the first example, there will be a separate String object placed into memory each time you perform a concatenation. Here is an excerpt from that dump:

# Size Value

1 28 "0123"
1 28 "01234"
1 32 "012345"
1 32 "0123456"
1 36 "01234567"
1 36 "012345678"
1 40 "0123456789"
1 44 "012345678910"
1 48 "01234567891011"
1 52 "0123456789101112"
1 56 "012345678910111213"
.....
1 124 "0123456789101112131415161718192021222324252627282930"
1 128 "012345678910111213141516171819202122232425262728293031"
1 132 "01234567891011121314151617181920212223242526272829303132"
1 136 "0123456789101112131415161718192021222324252627282930313233"
1 140 "012345678910111213141516171819202122232425262728293031323334"
1 144 "01234567891011121314151617181920212223242526272829303132333435"
65 17940 ""012345678910111213141516171819202122232425262728293031323334353"

First, the command I used outputs the first 65 characters or so within the String object. This is why the last entry has a count of 65 instances. In any case, as you can see, there is a separate copy of each string made into memory during each concatenation operation. Ouch. This can get expensive very quickly!

Now what about the StringBuilder operation?

# Size Value

1 52 "0123456789101112"
1 84 "01234567891011121314151617181920"
3 956 "012345678910111213141516171819202122232425262728293031323334353"

Gee, that seems a lot simpler - but if you're paying attention, you'll notice that there are a few other instances of this lengthy string in memory. Any ideas why?

Well, this appears to be my bug. When you instantiate a System.Text.StringBuilder object, one of the constructors allows for a integer parameter called "Capacity". If you do not specify an initial capacity, the noargs constructor defaults to "&H10" - which is 16 characters. If, during your string operations, you exceed that 16 character capacity, the StringBuilder will create a new new string with a capacity of 32 (16 * 2) characters. The next time, you perform an operation that needs more than 32 characters, the StringBuilder will double the capacity again - this time to 64 characters. This will continue to happen over and over again as you continually append more characters to the StringBuilder object.

So, what does this mean? Well, it means that we're still not achieving the maximum efficiency here. If we want to build a String like this - without creating extra instances of the base string object - even when using a StringBuilder object, you should specify a capacity in the constructor. To prove this hypothesis, we can rewrite the second code listing to be:

    Sub ConcatWithStringBuilder(ByVal max As Int32)
        Dim value As New System.Text.StringBuilder(193)
        For i As Int32 = 0 To max
            value.Append(i.ToString())
        Next
    End Sub

Now, when we take a memory dump and look for the String objects for the above loop:

# Size Value

1 404 "012345678910111213141516171819202122232425262728293031323334353"

We see that there is only one instance of the String object in memory.

Moral of the story? Even when using the StringBuilder object, if you know the final string is going to be lengthy, you should set an initial capacity to most efficiently perform your string concatenations.

Enjoy!