Samurai Programmer.com

Home | About Me | Developer PFE Blog | Become a Developer PFE

Contact

Step 1: Change the extension from docx to zip

As you may or may not know, all OpenXML documents (or Office documents since Office 2007) are actually zip files at their core. That means you can just crack them open and peer inside.

See the difference? Easy!

Step 2: Extract the zip file to a folder

Once again, pretty straight forward. Once you extract the zip file above, you should see a structure like the following:

Now, from here – you’ll be able to locate the file referenced in that cryptic error message above.

Step 3: Find the file that’s causing the problem

In the example above, the message states that the problem lies with the file “/word/document.xml” so just navigate to the “word” folder and find the “document.xml”.

Step 4: Open and format the file in Visual Studio

One of the great features of Visual Studio is that it can format an XML file for you. So, in our case, the document.xml file is natively just one big line:

Incidentally, this is why the message always states “Line 1,…” in the error message. As far as Word is concerned, the problem IS on the first line. Fortunately for us, though, Word can take that single line file and format it for us. Just use the Edit > Advanced > Format Document option in Visual Studio:

That will then format the XML and make it look closer to:

Step 5: Recreate the Word doc and get the additional information

Now that you have the file formatted appropriately, you can just re-create the Word document and re-open it. For this, just go back to the root of the document, select all the files/folders and then zip it back up:

Once it’s zipped back up again, just change the extension from zip to docx and re-open the file. When you do so, you’ll see the following:

Note that now, you’ll see that it says “Line: 5667, Column: 0” – which will point to the exact line causing the problem – which allows you to just go back to the “document.xml” file you already have open in Visual Studio to see the problem. In our case:

Note that this won’t magically fix your problem. You’ll still need to examine the WordML to figure out the problem – but at least you know where to go. And knowing is half the battle!

That’s all for now and I will be back with some more developer stuff soon.

Until next time!

Comments [0] | Trackback |

Sunday, 05 December 2010

The (potentially) dark side of parallelism…

Sunday, 05 December 2010 12:52:17 (Central Standard Time, UTC-06:00) ( .NET 4 | ASP.NET | Parallel | Performance )

Read a great blog entry by Scott Hanselman recently talking about the parallel dilemma that I’m sure we’ll see folks face in the future with the (old/new) Parallel classes. I wanted to add a few things to this discussion as he focused on the mechanics of the parallel requests but maybe not the potential effects it could have on the macro view of your application. This was originally written as an e-mail I sent to my team but thought others might find it interesting.

There will be an inclination by people to use the new Parallel functionality in .NET 4.0 to easily spawn operations onto numerous background threads. That will generally be okay for console/winform/wpf apps – but could also be potentially bad for ASP.NET apps as the spawned threads could take away from the processing power and threads available to process new webpage requests. I’ll explain more on that later.

For example, by default, when you do something like Parallel.ForEach(…) or some such, the parallel library starts firing Tasks to the thread pool so that it can best utilize the processing power available on your machine (oversimplification but you get the idea). The downside is that the thread pool contains a finite number of worker threads threads available to a process. Granted, you have about 100 threads per logical processor in .NET 4 – but it’s worth noting.

While Scott’s entry talks about the new way to implement the Async pattern, I’ve already seen a bunch of folks use the “Parallel” class because it abstracts away some of the plumbing of the Async operations and that ease of use could become problematic.

For example, consider this code:

string[] myStrings = { "hello", "world", "you", "crazy", "pfes", "out", "there" };

Parallel.ForEach(myStrings, myString =>
{

    System.Console.WriteLine(DateTime.Now + ":" + myString + 
                             " - From Thread #" + 
                             Thread.CurrentThread.ManagedThreadId);
    Thread.Sleep(new Random().Next(1000, 5000));

});

This is a very simple implementation of Parallel’izing a foreach that just writes some string output with an artificial delay. Output would be something like:

11/16/2010 2:40:05 AM:hello - From Thread #10
11/16/2010 2:40:05 AM:crazy - From Thread #11
11/16/2010 2:40:05 AM:there - From Thread #12
11/16/2010 2:40:06 AM:world - From Thread #13
11/16/2010 2:40:06 AM:pfes - From Thread #14
11/16/2010 2:40:06 AM:you - From Thread #12
11/16/2010 2:40:07 AM:out - From Thread #11

Note the multiple thread ids and extrapolate that out to a server that has more than just my paltry 2 CPUs. This can be potentially problematic for ASP.NET applications as you have a finite number of worker threads available in your worker process and they must be shared across not just one user but hundreds (or even thousands). So, we might see that spawning an operation across tons of threads can potentially reduce the scalability of your site.

Fortunately, there is a ParallelOptions class where you can set the degree of parallel’ism. Updated code as follows:

string[] myStrings = { "hello", "world", "you", "crazy", "pfes", "out", "there" };

ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 1;

Parallel.ForEach(myStrings,options, myString =>
{

    // Nothing changes here
    ...

});

This would then output something like:

11/16/2010 2:40:11 AM:hello - From Thread #10
11/16/2010 2:40:12 AM:world - From Thread #10
11/16/2010 2:40:16 AM:you - From Thread #10
11/16/2010 2:40:20 AM:crazy - From Thread #10
11/16/2010 2:40:23 AM:pfes - From Thread #10
11/16/2010 2:40:26 AM:out - From Thread #10
11/16/2010 2:40:29 AM:there - From Thread #10

Since I set the MaxDegreeOfParallelism to “1”, we see that it just uses the same thread over and over. Within reason, that setting *should* correspond to the number of threads it will use to handle the request.

Applying to a website

So, let’s apply the code from the above to a simple website and compare the difference between the full parallel implementation and the non-parallel implementation. The test I used ran for 10 minutes with a consistent load of 20 users on a dual-core machine running IIS 7.

In all of the images below, the blue line (or baseline) represents the single-threaded implementation and the purple line (or compared) represents the parallel implementation.

We’ll start with the request execution time. As we’d expect, the time to complete the request decreases significantly with the parallel implementation.

But what is the cost from a thread perspective? For that, we’ll look at the number of physical threads:

As we’d also expect, there is a significant increase in the number of threads used in the process. We go from ~20 threads in the process to a peak of almost 200 threads throughout the test. Seeing as this was run on a dual-core machine, we’ll have a maximum of 200 worker threads available in the thread pool. After those threads become depleted, you often see requests start getting queued, waiting for a thread to become available. So, what happened in our simple test? We’re look at the requests queued value for that:

We did, in-fact, start to see a small number of requests become queued throughout our test. This indicates that some requests started to pile up waiting for an available thread to become available.

Please note that I’m NOT saying that you should not use Parallel operations in your website. You saw in the first image that the actual request execution time decreased significantly from the non-parallel implementation to the parallel implementation. But it’s important to note that nothing is free and while parallelizing your work can and will improve the performance of a single request, it should also be weighed against the potential performance of your site overall.

Until next time.

Comments [1] | Trackback |

Sunday, 17 October 2010

Yield Return…a little known but incredible language feature

Sunday, 17 October 2010 17:54:49 (Central Daylight Time, UTC-05:00) ( .NET | Performance )

“We shall neither fail nor falter; we shall not weaken or tire…give us the tools and we will finish the job.” – Winston Churchill

I don’t often blog about specific language features but over the past few weeks I’ve spoken to a few folks that did not know of the “yield” keyword and the “yield return” and “yield break” statements, so I thought it might be a good opportunity to shed some light on this little known but extremely useful C# feature. Chances are, you’ve probably indirectly used this feature before and just never known it.

We’ll start with the problem I’ve seen that plagues many applications. Often times, you’ll call a method that returns a List<T> or some other concrete collection. The method probably looks something like this:

public static List<string> GenerateMyList()
{
    List<string> myList = new List<string>();

    for (int i = 0; i < 100; i++)
    {

        myList.Add(i.ToString());    

    }

    return myList;

}

I’m sure your logic is going to be significantly more complex than what I have above but you get the idea. There are a few problems and inefficiencies with this method. Can you spot them?

The entire List<T> must be stored in memory.
Its caller must wait for the List<T> to be returned before it can process anything.
The method itself returns a List<T> back to its caller.

As an aside – with public methods, you should strive to not return a List<T> in your methods. Full details can be found here. The main idea here is that if you choose to change the method signature and return a different collection type in the future, this would be considered a breaking change to your callers.

In any case, I’ll focus on the first two items in the list above. If the List<T> that is returned from the GenerateMyList() method is large then that will be a lot of data that must be kept around in memory. In addition, if it takes a long time to generate the list, your caller is stuck until you’ve completely finished your processing.

Instead, you can use that nifty “yield” keyword. This allows the GenerateMyList() method to return items to its caller as they are being processed. This means that you no longer need to keep the entire list in memory and can just return one item at a time until you get to the end of your returned items. To illustrate my point, I’ll refactor the above method into the following:

private static IEnumerable<string> GenerateMyList()
{
    for (int i = 0; i < 100; i++)
    {
        string value = i.ToString();
        Console.WriteLine("Returning {0} to caller.", value);
        yield return value;
    }

    Console.WriteLine("Method done!");

    yield break;
}

A few things to note in this method. The return type has been changed to an IEnumerable<string>. This is one of those nifty interfaces that exposes an enumerator. This allows its caller to cycle through the results in a foreach or while loop. In addition, the “yield return i.ToString()” will return that item to its caller at that point and not when the entire method has completed its processing. This allows for a very exciting caller-callee type relationship. For example, if I call this method like so:

IEnumerable<string> myList = GenerateMyList();

foreach (string listItem in myList)
{
    Console.WriteLine("Item: " + listItem);
}

The output would be:

Thus showing that each item gets returned at the time we processed it. So, how does this work? Well, at compile time, we will generate a class to implement the behavior in the iterator. Essentially, this means that the GenerateMyList() method body gets placed into the MoveNext() method. In-fact, if you open up the compiled assembly in Reflector, you see that plumbing in place (comments are mine and some code was omitted for clarity’s sake):

private bool MoveNext()
{
    
            this.<>1__state = -1;
            this.<i>5__1 = 0;
            // My for loop has changed to a while loop.
            while (this.<i>5__1 < 100)
            {
                // Sets a local value
                this.<value>5__2 = this.<i>5__1.ToString();
                // Here is my Console.WriteLine(...)
                Console.WriteLine("Returning {0} to caller.", this.<value>5__2);
                // Here is where the current member variable
                // gets stored.
                this.<>2__current = this.<value>5__2;
                this.<>1__state = 1;
                // We return "true" to the caller so it knows
                // there is another record to be processed.
                return true;
                ...
            }
            // Here is my Console.WriteLine() at the bottom
            // when we've finished processing the loop.
            Console.WriteLine("Method done!");
            break;

    
}

Pretty straightforward. Of course, the real power is that it the compiler converts the “yield return <blah>” into a nice clean enumerator with a MoveNext(). In-fact, if you’ve used LINQ, you’ve probably used this feature without even knowing it. Consider the following code:

private static void OutputLinqToXmlQuery()
{
    XDocument doc = XDocument.Parse
                        (@"<root><data>hello world</data><data>goodbye world</data></root>");

    var results = from data in doc.Descendants("data")
                  select data;

    foreach (var result in results)
    {
        Console.WriteLine(result);                
    }
}

The “results” object, by default will be of type “WhereSelectEnumerableIterator” which exposes a MoveNext() method. In-fact, that is also why the results object doesn’t allow you to do something like this:

var results = from data in doc.Descendants("data")
              select data;

var bad = results[1];

The IEnumerator does not expose an indexer allowing you to go straight to a particular element in the collection because the full collection hasn’t been generated yet. Instead, you would do something like this:

var results = from data in doc.Descendants("data")
              select data;            

var good = results.ElementAt(1);

And then under the covers, the ElementAt(int) method will just keep calling MoveNext() until it reaches the index you specified. Something like this:

Note: this is my own code and is NOT from the .NET Framework – it is merely meant to illustrate a point.

public static XElement MyElementAt(this IEnumerable<XElement> elements, 
                                   int index)
{
    int counter = 0;
    using (IEnumerator<XElement> enumerator = 
                                 elements.GetEnumerator())
    {            
        while(enumerator.MoveNext()){
            if (counter == index)
                return enumerator.Current;
            counter++;
        }
    }

    return null;
}

Hope this helps to demystify some things and put another tool in your toolbox.

Until next time.

Comments [0] | Trackback |

Monday, 11 October 2010

Linking your pivot collections the fun and easy way

Monday, 11 October 2010 00:09:42 (Central Daylight Time, UTC-05:00) ( ASP.NET | Development | Pivot )

In my previous post, I added to my series of entries on making sense of your ASP.NET event log error messages. Note that this is entry #4 in this series. The previous three entries can be found here:

Part 1: Parsing ASP.NET event log error messages for fun and profit
Part 2: Don’t guess when it comes to performance…
Part 3: Pivoting your ASP.NET event log error messages

In that last post, I walked through the PAuthorLib.dll and showed you how to crawl through your event log error messages and create a pivot collection. The result of that initial effort was a nice view into our events:

While this certainly got the job done and is a very powerful and compelling view into our data, we need to realize that as our data grow, the amount of entries in our linked collection is limited. From the Developer Overview, we see that the maximum number of items we should have in a single collection is 3,000:

So, while a simple collection will get the job done for your smaller amounts of data, you will really run into some challenges with your larger datasets like our ASP.NET event log error messages. To combat this limitation you can create what’s called a Linked Collection. The idea is that it’s just a way for you to link together related collections in order to provide a seamless experience for your users. In our case, a natural break for our collections will be based upon the exception type with a summary collection and then a separate collection for each exception type. If I were to draw this out:

Event Log Header Collection Source

The idea behind this structure is that the Exception summary would simply link to each of these exception collections. First, we’ll create a colleciton source for our exception summary. As in our previous collection source (in my last blog post), we inherit from the AbstractCollectionSource class and use the LoadHeaderData() method to add our facet categories to the collection. In this case, we’ll create two categories – the number of Occurrences and the Urls where the exception occurred. Another difference is that we are going to pass the already parsed collection of messages into the constructor. The reason for that is so we don’t have to repeat the parsing of the event log messages multiple times.

class EventLogHeaderCollectionSource : AbstractCollectionSource
{

    private IEnumerable<EventLogMessage> m_messages = null;

    public EventLogHeaderCollectionSource(IEnumerable<EventLogMessage> messages, 
                                          string inputFile)
        : base(inputFile)
    {

        m_messages = messages;


    }

    #region Facets

    private const string OCCURRENCES = "Occurrences";
    private const string URLS = "Urls";

    #endregion

    protected override void LoadHeaderData()
    {
        this.CachedCollectionData.FacetCategories.
                        Add(new PivotFacetCategory(OCCURRENCES, PivotFacetType.Number));
        this.CachedCollectionData.FacetCategories.
                        Add(new PivotFacetCategory(URLS, PivotFacetType.String));

        this.CachedCollectionData.Name = 
                        "ASP.NET Error Messages - Summary";
        this.CachedCollectionData.Copyright = 
                        new PivotLink("Source", "http://www.samuraiprogrammer.com");

    }
}

Then, in the LoadItems() method, we provide the logic to generate the PivotItem collection. The one key item to make note of is the use of the Href property of the PivotItem object. This is where we specify the collection we wish to link to this item. Since each of the PivotItems will be a summary of the number of each exception type – we’ll name the sub-collections by its exception type. For example, NullReferenceException.cxml, SqlException.cxml, etc.

protected override IEnumerable<PivotItem> LoadItems()
{
    var results = from log in m_messages
                  group log by log.Exceptiontype into l
                  orderby l.Count() descending, l.Key
                  select new
                  {
                      ExceptionType = l.Key,
                      ExceptionCount = l.Count()
                  };


    int index = 0;
    foreach (var result in results)
    {
        PivotItem item = new PivotItem(index.ToString(), this);       
        item.Name = result.ExceptionType;
        item.Description = "# of Exceptions: " + result.ExceptionCount.ToString();
        item.AddFacetValues(OCCURRENCES, result.ExceptionCount);
        item.Href = result.ExceptionType + ".cxml";

        ...                

        index++;
        yield return item;
    }

    yield break;
}

Event Log Collection Source Redux

Previously, when we generated the pivot collections, we were outputting all of the records into a single collection. Now that we are generating a collection for each exception type, we will need to put a filter in our exception collection and then incorporate that filter into our item generation. Other than that, the code we wrote previously remains largely unchanged, so I left the majority of it out and only included the snippets that we care about below.

class EventLogCollectionSource : AbstractCollectionSource
{
    private IEnumerable<EventLogMessage> m_messages = null;
    private string m_exceptionType = string.Empty;

    public EventLogCollectionSource(
                    IEnumerable<EventLogMessage> messages, 
                    string exceptionType, 
                    string path)
        : base(path)
    {
        m_messages = messages;
        m_exceptionType = exceptionType;
    }

    protected override void LoadHeaderData()
    {
        ...
        this.CachedCollectionData.Name = 
                string.Format("{0} Error Messages", m_exceptionType);
        ...
    }

    protected override IEnumerable<PivotItem> LoadItems()
    {
        var results = (from message in m_messages
                       where message.Exceptiontype == m_exceptionType
                       select message);

        int index = 0;
        foreach (EventLogMessage message in results)
        {
            PivotItem item = 
                    new PivotItem(index.ToString(), this);
            item.Name = message.Exceptiontype;
            item.Description = message.Exceptionmessage;

            ...

            index++;
            yield return item;
        }
        yield break;
    }
}

Generate and test the collection

Then, the only thing we have left to do is generate and test our linked collections. I won’t go into a lengthy explanation of how we generate the collections because I did that in the last blog entry. I will show the broad strokes required to tie this all together, though:

// Load the raw messages into a collection
IEnumerable<EventLogMessage> messages = 
                LoadEventLogMessages(inputFile).ToList();

// Generate summary pivot collection
EventLogHeaderCollectionSource sourceSummary = 
                new EventLogHeaderCollectionSource(messages, inputFile);
...
summaryTargetFilter1.Write(sourceSummaryFilter1);

// Get the aggregate results so we know the filters
// for our exception pivot collections
var summaryResults = from log in messages
              group log by log.Exceptiontype into l
              orderby l.Count() descending, l.Key
              select new
              {
                  ExceptionType = l.Key,
                  ExceptionCount = l.Count()
              };

foreach (var resultItem in summaryResults)
{
    // Generate pivots for each exception type
    EventLogCollectionSource source = 
            new EventLogCollectionSource(messages, 
                                         resultItem.ExceptionType, 
                                        inputFile);
    ...
    targetFilter1.Write(sourceFilter1);
}

Once we we have this code and everything has been generated, if we open the output folder, we’ll see the following structure:

We see our ExceptionSummary pivot collection and all of the deep zoom folders. So, when we open the Pivot tool, we’ll see a nice parent collection:

This gives us a nice breakdown of the number of occurrences for each exception in our source data. Immediately we see an outlier (more on that later) between the 6,000 and 7,000 item mark and when we select that tile, we see the following:

We also see a green “Open” box (surrounded in a red rectangle, my emphasis) which links to our NullReferenceException.cxml. When we click that box, the tool will immediately open that collection in the UI for our perusal – providing a very similar look to what we saw in the last entry:

Closing Thoughts

Now, you may have noticed a contradiction above. I said that a collection should have no more than 3,000 items and yet, with the NullReferenceException collection, we saw in the summary that it had over 6,000 items. That is a very good point and will be a subject of a future blog post. I wanted to illustrate the simple collections and the linked collections before we got into that third type of collection from above – the Dynamic Collection. Stay tuned!

Comments [0] | Trackback |

Monday, 04 October 2010

Pivoting ASP.NET event log error messages

Monday, 04 October 2010 10:43:49 (Central Daylight Time, UTC-05:00) ( ASP.NET | Development | Pivot )

Unless you’ve been hiding under the proverbial rock, you’ve probably seen the recent Pivot hoopla. If you’re not familiar with it, it’s a way to visualize a large amount of data in a nice filterable format. The nice thing about it is that it’s really easy to put together a pivot collection and there are a ton of tools available for just this purpose. Just do a search on CodePlex for Pivot and you’ll get about 40’ish good results for tools you can use to create a Pivot Collection.

So, I was putting together a proof-of-concept for an internal project and thought I would continue on with my series of blog posts on ASP.NET Error Message event logs with a post on how to visualize this data using a pivot. You may wish to read parts 1 and 2 here:

So, when I put together my pivot, I worked out a 3 step process:

Figure out what you want to Pivot
Find an API and convert the data
Generate and Test the collection

Let’s begin, shall we.

Figure out what you want to Pivot

The structure for the Pivot Collection is deceptively simple -

<?xml version="1.0"?> 
<Collection Name="Hello World Collection" …>
  <FacetCategories> 
    <FacetCategory Name="Hello World Facet Category One" Type="String"/> 
  </FacetCategories> 
  <Items ImgBase="helloworld.dzc"> 
    <Item Img="#0" Id="0" Href="http://www.getpivot.com" Name="Hello World!"> 
      <Description> This is the only item in the collection.</Description> 
      <Facets> 
        <Facet Name="Hello World Facet Category One"> 
         <String Value="Hello World Facet Value"/> 
        </Facet>       
      </Facets> 
    </Item> 
  </Items> 
</Collection>

The way that I think about the Items in the Collection are in the same way that you might think about an object. For example, a Car object might have the following properties:

Advertising blurb
Car and Driver Reviews
Color
Make
Model
Engine
0-60mph time
Max Speed

The common values like the Color, Make, Model, 0-60mph time and max speed become the facets or attributes that describe your object in relation to other instances of objects. Things like the advertising blurbs and car and driver reviews or descriptions belong instead as properties of your Item directly in the Description element.

For our data, namely ASP.NET exceptions, we’re going to define an exception as the following:

Item

Name = Exception Type
Description = Exception Message
Facets

Request Path
Stack Trace
Event Time
Top Method of Stack Trace
Top My Code Method of Stack Trace

This should allow us to group and drill through the common properties that might link exceptions together and still provide detailed error information when necessary.

Find an API and code it

The second step here is to find some code/API/tool that we can enhance for our purposes. There are some great tools published by the Live Labs team – for example:

While both tools could be used in this instance, in part 2 we found that some of our Event Logs we were parsing contained more than 10,000 items and I wanted a bit more control over how I converted the data. “No touching” is the phrase of the day. Fortunately, the command line tool was published on CodePlex with an API we can use. Once you download the product you see that it contains 3 assemblies:

The last item there is the PauthorLib.dll which encapsulates many of the extension points within this great tool. In-fact, it exposes about 7 different namespaces for our purposes:

For our purposes, we are going to focus on the Streaming set of namespaces. Why? Well, this is namely because we are going to be dealing with a lot of data and I didn’t want to load everything into memory before writing it to disk. If you look at the contents of the Streaming namespace, you’ll see a great class called “AbstractCollectionSource”. This looks fairly promising because it exposes two main methods:

class EventLogExceptionCollectionSource : AbstractCollectionSource
{
    protected override void LoadHeaderData()
    {
        throw new NotImplementedException();
    }

    protected override IEnumerable<PivotItem> LoadItems()
    {
        throw new NotImplementedException();
    }
}

Before we do anything, though, we need a constructor. The constructor will be responsible for taking a string representing the path to our data and passing it to our base class’s constructor.

public EventLogExceptionCollectionSource(string filePath)
    : base(filePath)
{

    // Do nothing else.

}

Then, the first method, LoadHeaderData, is where we define our facets – Request Path, Stack Trace, etc. - as well as the data types that each facet will be. So, our code will be fairly simple and straight-forward:

protected override void LoadHeaderData()
{

    this.CachedCollectionData.FacetCategories.Add(
                                new PivotFacetCategory(STACKTRACE, 
                                                       PivotFacetType.LongString));
    this.CachedCollectionData.FacetCategories.Add(
                                new PivotFacetCategory(REQUESTPATH, 
                                                       PivotFacetType.String));
    this.CachedCollectionData.FacetCategories.Add(
                                new PivotFacetCategory(EVENTTIME, 
                                                       PivotFacetType.DateTime));
    this.CachedCollectionData.FacetCategories.Add(
                                new PivotFacetCategory(TOPMETHOD, 
                                                       PivotFacetType.String));
    this.CachedCollectionData.FacetCategories.Add(
                                new PivotFacetCategory(TOPAPPMETHOD, 
                                                       PivotFacetType.String));

    this.CachedCollectionData.Name = "Event Log Error Messages";


}

The second method, LoadItems(), is responsible for doing exactly what it suggests – this is where we load the data from whichever source we care about and then convert it into our PivotItem collection. For our purposes, we’re going to load the XML file we defined in Part 1 of this series into a list of EventLogMessage objects and then convert those EventLogMessage objects into PivotItem objects:

protected override IEnumerable<PivotItem> LoadItems()
{
    // Load XML file
    XDocument document = XDocument.Load(this.BasePath);

    // Populate collection of EventLogMessage objects
    var messages = from message in document.Descendants("Message")
                   select EventLogMessage.Load(message.Value);

    int index = 0;
    foreach (EventLogMessage message in messages)
    {

        PivotItem item = new PivotItem(index.ToString(), this);
        item.Name = message.Exceptiontype;
        item.Description = message.Exceptionmessage;
        item.AddFacetValues(REQUESTPATH, message.Requestpath);
        item.AddFacetValues(STACKTRACE, message.Stacktrace);
        item.AddFacetValues(EVENTTIME, message.Eventtime);
        item.AddFacetValues(TOPMETHOD, message.StackTraceFrames[0].Method);
        item.AddFacetValues(TOPAPPMETHOD, GetFirstNonMicrosoftMethod(message.StackTraceFrames));

        index++;
        yield return item;

    }
}

The key method calls from above are the AddFacetValues(…) method calls. This method essentially sets the attributes we wish to have our data pivot upon. This, by itself, isn’t enough to generate our great pivot calls – we need to call our code from somewhere. Since this is a simple app, we’re going to make it a console app. For our Collection to get generated we need to use a few other objects included in this API:

EventLogExceptionCollectionSource – The class we created above.
HtmlImageCreationSourceFilter – This class will generate the tile in the Pivot based upon some HTML template we specify.
LocalCxmlCollectionTarget – Generates the Collection XML file at the path we specify.
DeepZoomTargetFilter – Generates the deep zoom files to support our collection XML file and also enables all of our fancy transitions.

In practice, the code is pretty simple and straight forward and I applaud the people who wrote this library:

private static void GenerateExceptionPivot(string inputFile, string outputFolder)
{
    string collectionName = Path.Combine(outputFolder, "MyExceptions.cxml");

    EventLogExceptionCollectionSource source = 
                    new EventLogExceptionCollectionSource(inputFile);
    HtmlImageCreationSourceFilter sourceFilter1 = 
                    new HtmlImageCreationSourceFilter(source);
    sourceFilter1.HtmlTemplate = 
                    "<html><body><h1>{name}</h1>{description}</body></html>";
    sourceFilter1.Width = 600;
    sourceFilter1.Height = 600;

    LocalCxmlCollectionTarget target = 
                    new LocalCxmlCollectionTarget(collectionName);
    DeepZoomTargetFilter targetFilter1 = 
                    new DeepZoomTargetFilter(target);
    targetFilter1.Write(sourceFilter1);

}

That last statement, targetFilter1.Write(…) is what will actually execute everything and write our resultant files to disk.

Generate and Test the collection

So, now if we run our console application and call that GenerateExceptionPivot(…) method, we’ll get some great output.

What’s nice about the Library is that it provides progress as it iterates through your data (in the red rectangle) and also in the blue rectangle, we need that it’s multi-threaded by default. This is primarily for the most intensive part of the operation – the creation of the deep zoom artifacts. If you have one of those new fangled machines with 2+ cores, you can tweak the number of threads that it will spawn for this operation by setting the ThreadCount property of the DeepZoomTargetFilter object. This may or may not improve your performance but it’s nice that the option is available.

...
DeepZoomTargetFilter targetFilter1 = 
                new DeepZoomTargetFilter(target);

targetFilter1.ThreadCount = 100;
targetFilter1.Write(sourceFilter1);
...

Once our collection has been generated, we can browse it in an explorer.exe window just to get an idea of what our code has wrought:

And then to test it, you can just point the Live Labs Pivot application at our “MyExceptions.cxml” file and view the wonderful data. For example, you can look at the Event Time in the histogram view to see how your exceptions broke down over time. You can also filter your data by things like the RequestPath (the page that threw the exception) or the Method that was at the top of the callstack.

Then, you can zoom in on a specific time slice you care about:

Then, if you want to view the details for a specific instance, just click the corresponding tile. Then, a new side bar will appear on the right hand side with all of the details we stored in this record:

We generated a single collection in this blog post. One thing to keep in mind is that each collection should have no more than 3,000 items. For collections in which you want to have more than 3,000 items, you should look at potentially creating a Linked Collection. That will be the subject of an upcoming blog post.

Until next time!

Comments [0] | Trackback |

Monday, 27 September 2010

Visualizing Azure performance counters…

Monday, 27 September 2010 00:19:44 (Central Daylight Time, UTC-05:00) ( Azure | Perfmon )

I’ve blogged before about how some things are different in the cloud, namely Azure. That post dealt with finding out why your Azure Role crashed by using some of the logging facilities and storage available in Azure. That’s kind of a worst case scenario, though. Often, you’ll just want to gauge the health of your application. In the non-cloud world, you’d just use the Windows Performance Monitor to capture performance counters that tell you things like your CPU utilization, memory usage, request execution time, etc. All of these counters are great to determine your overall application and server health as well as helping you troubleshoot problems that may have arisen with your application.

I’m often surprised at how many developers have never used Perfmon before or think that it is purely a task that administrators need to care about. Performance is everyone’s responsibility and a tool like Perfmon is invaluable if you need to gain an accurate understanding of your application’s performance.

When you’ve needed to log counters in the past, you would have probably become familiar with this dialog:

In the Azure world, you can add counters to be collected in a variety of ways. A popular way is just to add the counters you want in your OnStart() event.

public override bool OnStart()
{
    DiagnosticMonitorConfiguration dmc = DiagnosticMonitor.GetDefaultInitialConfiguration();

    // Add counter(s) to collect.
    dmc.PerformanceCounters.DataSources.Add(
        new PerformanceCounterConfiguration()
            {
                CounterSpecifier = @"\Processor(*)\*",
                SampleRate = TimeSpan.FromSeconds(5)
            });

    // Set transfer period and filter.
    dmc.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromMinutes(5);

    // Start logging, using the connection string in your config file.
    DiagnosticMonitor.Start("DiagnosticsConnectionString", dmc);

    ...

    return base.OnStart();
}

This gets the job done quite nicely and even though we’re sampling our performance counter every 5 seconds, the data won’t get written to our persistent storage until we hit the 5 minute mark.

So, let’s fast forward a bit. You’ve added counters to monitor and now you want to view some of this data. Fortunately, there are a few tools at your disposal:

For example, using the Azure Management Tool, you can navigate to the Diagnostics/Analysis section to view the performance monitor counters you stored.

Herein lies one of the challenges you may encounter. Azure is kind of a different beast and it doesn’t store the data in storage using the typical BLG format. Instead, if you look at the raw data, it will be in an XML or tabular format and may look like:

While this is nice and clean, the problem you’ll hit is the visualization of the data. Most organizations use Perfmon to visualize the counter data you record. Alas, in this case you can’t use it that way right out of the box. Fortunately, my friend Tom Fuller and I realized this gap and he wrote a plug-in to the Azure Management Tool that will transform this data from the format above into a native BLG format. He wrote up a great blog entry with step-by-step instructions and the source code here. You can also download a compiled version of the extension here (along with the Windows Azure MMC) here. To install the plugin, just copy the DLL into the \WindowsAzureMMC\release folder:

Then, when you open up the tool, you’ll see another option in that dropdown in the Diagnostics > Analysis window:

Then, when you click show, you’ll get a nice popup telling you to click here to view your data in Perfmon:

Then, when you click on it, Perfmon will start with all of the counters added to the display:

A few things to note:

It will store the resulting BLG file at the following path:

c:\Users\<user name>\AppData\Local\Temp\<File Name>.blg

The BLG can be used in a variety of applications. For example, you can use it with the great PAL tool.

All of the counters will appear in the graph by default. Depending upon the amount of data you’re capturing, this could give you data overload. Just something to be aware of.

Until next time.

Comments [0] | Trackback |

Sunday, 19 September 2010

Don’t guess when it comes to performance…a RegEx story.

Sunday, 19 September 2010 13:30:02 (Central Daylight Time, UTC-05:00) ( Performance | RegEx )

Many times, when I work with a customer, it’s because they’ve tried to accomplish something and need a little extra help. Often, this falls into the application optimization area. For example, a few years ago, I had a customer that was developing a rather sophisticated SharePoint workflow that had some custom code that would process and merge two Excel spreadsheets together. They were using Excel 2007 so their merging was being done using the excellent Open XML SDK. To their credit, the application did what it needed to do – but it took about an hour to process these spreadsheets. The developers on the project knew about the performance problems but as so often happens, they thought they knew where the bottlenecks were and how they should approach optimizing it. So, they started injecting some tracing into their code and worked hard to optimize this lengthy process. After a while, though, they had only shaved a few seconds off of that 60 minute time and while they did show some improvement – they knew they needed to get the processing done even faster for this to become a viable solution for their organization. So, they sent me a simple repro of the code and together in just a few short days, we were able to get the processing from 60 minutes to under a minute. That’s a BIG win. Big like the Titanic big.

I love these types of engagements because, well, I like to make things faster. The biggest problem that I see, though, is that some people shy away from using the tools in optimization scenarios because they’ve been so invested in their code that they think they know why it’s not performing. This is the “psychic” effect and the mindset is usually something like this:

“I wrote this darn code and while I was writing it – I knew this method could be improved so now I’m going to finally optimize the darn thing.”

It sounds good in-theory, right? You wrote the code, so you should know how to improve it and really, what’s a tool going to tell you that you don’t already know? In truth, it can tell you quite a bit. In other situations, the developers will add some instrumentation (via tracing/debugging statements) to what they perceive as the critical code paths and the resultant timing points to one area when the problem really resides in a completely different section of code. That’s right, folks, your tracing statements may be lying to you. So what’s a developer to do? Well, use the tools, of course – the right tool for the right job, as they say. Let me expand upon that in the context of some code that yours truly wrote a little while ago.

The Problem

A few short weeks ago, I posted a entry about parsing your ASP.NET event log error messages. For those that didn’t read it, we just convert the EventLog messages to XML, parse the messages using RegEx.Match and then generate some statistics on them. The code I provided on that blog entry appeared to be very fast when it was processing a few records – but as the XML files grew, I started noticing that it seemed to take longer and longer to process the results. So, I did what any developer does initially. We’ll add some instrumentation into our code to see if we can figure out the problem. This usually takes the form of something like the following:

Console.WriteLine("Start:  Load XDocument." + DateTime.Now.ToString());
XDocument document = XDocument.Load(@"C:\Work\TestData\AllEVTX\MyData.xml");
Console.WriteLine("End:  Load XDocument." + DateTime.Now.ToString());

Console.WriteLine("Start:  Load Messages into object." + DateTime.Now.ToString());
var messages = from message in document.Descendants("Message")
               select EventLogMessage.Load(message.Value);
Console.WriteLine("End: Load Messages into object." + DateTime.Now.ToString());

Console.WriteLine("Start:  Query Objects." + DateTime.Now.ToString());
var results = from log in messages
              group log by log.Exceptiontype into l
              orderby l.Count() descending, l.Key
              select new
              {
                  ExceptionType = l.Key,
                  ExceptionCount = l.Count()
              };
Console.WriteLine("End:  Query Objects." + DateTime.Now.ToString());

Console.WriteLine("Start:  Output Exception Type Details." + DateTime.Now.ToString());
foreach (var result in results)
{
    Console.WriteLine("{0} : {1} time(s)",
                      result.ExceptionType,
                      result.ExceptionCount);
}
Console.WriteLine("End:  Output Exception Type Details." + DateTime.Now.ToString());

As you can see, all I’ve done is taken the code and slapped some Console.WriteLine statements with a DateTime.Now call to find out when the operation starts and when it completes. If I run this code I get the following timings:

Start: Load XDocument.9/19/2010 11:28:08 AM
End: Load XDocument.9/19/2010 11:28:09 AM
Start: Load Messages into object.9/19/2010 11:28:09 AM
End: Load Messages into object.9/19/2010 11:28:09 AM
Start: Query Objects.9/19/2010 11:28:09 AM
End: Query Objects.9/19/2010 11:28:09 AM
Start: Output Exception Type Details.9/19/2010 11:28:09 AM
End: Output Exception Type Details.9/19/2010 11:28:49 AM

This is clearly a problem Now, the question is why?

The Research

This might lead you to believe that the simple loop I have to get the ExceptionType and the ExceptionCount was the root of all evil, so to speak. The problem, though, is that there’s not really a lot you can do to improve this:

foreach (var result in results)
{
    Console.WriteLine("{0} : {1} time(s)", 
                      result.ExceptionType, 
                      result.ExceptionCount);             
}

Oh sure, you could use the excellent Parallel functionality in .NET 4 to send off the work to alternate threads. So, you might do something like this:

Parallel.ForEach(results, result=> {

    Console.WriteLine("{0} : {1} times(s)", 
                      result.ExceptionType, 
                      result.ExceptionCount);

});

But if you re-run the test, you get the following results:

Start: Load XDocument.9/19/2010 11:46:20 AM
End: Load XDocument.9/19/2010 11:46:20 AM
Start: Load Messages into object.9/19/2010 11:46:20 AM
End: Load Messages into object.9/19/2010 11:46:20 AM
Start: Query Objects.9/19/2010 11:46:20 AM
End: Query Objects.9/19/2010 11:46:20 AM
Start: Output Exception Type Details.9/19/2010 11:46:20 AM
End: Output Exception Type Details.9/19/2010 11:46:54 AM

Wait a second…is it taking longer now??? That’s just not right. At this point, you should take a step back and tell yourself “STOP GUESSING AND JUST USE THE PROFILER.” So, always being one to listen to myself, I kick off the Visual Studio profiler and immediately it shows us the “hot path”. This was actually a nice improvement in the VS2010 profiler. Right on the summary page, it will show us our most expensive call paths:

But wait, it’s pointing to the RegEx.Match(…). Why is it pointing to that? From my own metrics above, I see that the loading of the Message strings into an object takes less than a second to execute. Well, the real reason is that LINQ uses kind of a lazy loading algorithm. Basically, it won’t necessarily process your queries until you try to do something with the data. Yes, that’s an over-simplification but in our case, it means that my EventLogMessage.Load(…) method won’t be called until I actually try to do something with the data. So, now, armed with this information, I can take a look at my Load() method and see what it’s actually doing and how it’s using the RegEx.Match(…) functionality:

Match myMatch = s_regex.Match(rawMessageText);

EventLogMessage message = new EventLogMessage();

message.Eventcode = myMatch.Groups["Eventcode"].Value;
message.Eventmessage = myMatch.Groups["Eventmessage"].Value;
message.Eventtime = myMatch.Groups["Eventtime"].Value;
message.EventtimeUTC = myMatch.Groups["EventtimeUTC"].Value;
...

return message;

So, basically we’re using the Match(…) to take that Message property and parse it out into the properties of the EventLogMessage object. The fact that this is the slowest part of the code shouldn’t necessarily shock you. Jeff Atwood wrote up a good blog entry a few years ago on something like this. If we look just at the RegEx.Match(…) in the profiler, we see that the problem isn’t necessarily with each Match call but the overall cost with 10,000+ calls:

Function Name

Number of Calls

Min Elapsed Inclusive Time

Avg Elapsed Inclusive Time

Max Elapsed Inclusive Time

System.Text.RegularExpressions.Regex.Match(string)

10,881

1.01

3.27

684.29

The Fix

So, now that we know the problem and the reason for it – what’s a Dev to do? Well, at this point, we should be thinking of an alternate way of performing this parsing without using RegEx. The simplest method is just to use some general string parsing, so we’ll start off with replacing our large RegEx pattern with a simple string array:

public static readonly string[] ParserString = new string[] {
            @"Event code:" ,
            @"Event message:" ,
            @"Event time:" ,
            @"Event time (UTC):" ,
            @"Event ID:" ,
            ...
            @"Stack trace:" ,
            @"Custom event details:"};

This array will be used in our fancy new GetValue method:

private static string GetValue(string rawMessageText, int Key)
{
    int startLoc = rawMessageText.IndexOf(ParserString[Key]);
    int endLoc;
    if (Key + 1 == ParserString.Length)
        endLoc = rawMessageText.Length;
    else
        endLoc = rawMessageText.IndexOf(ParserString[Key + 1], startLoc);

    return rawMessageText.Substring(startLoc + ParserString[Key].Length, endLoc - startLoc - ParserString[Key].Length);
}

This method accepts our raw message string and a key index. This method just finds the existence of a string like “Event message:” in our raw message and then finds the index of the next string, like “Event time:” and subtracts the two to get at the value of the field. For example, given the following string:

“… Event message: An unhandled exception has occurred. Event time: …”

The red text are the keys and the highlighted blue text is the string between them. The idea for the above GetValue(…) method was actually provided by a fellow PFE engineer, Richard Lang during a late night chat session.

The last step to this process is just to call the GetValue(…) method from our new Load method:

public static EventLogMessage Load(string rawMessageText)
{
    EventLogMessage message = new EventLogMessage();

    int Key = 0;
    message.Eventcode = GetValue(rawMessageText, Key++);
    message.Eventmessage = GetValue(rawMessageText, Key++); ;
    message.Eventtime = GetValue(rawMessageText, Key++); ;
    ...
    return message;

}

So, now we’ve essentially removed the need for RegEx by implementing our own string parsing algorithm. Once we compile the code and run our application again, we see some major improvements:

Start: Load XDocument.9/19/2010 12:55:31 PM
End: Load XDocument.9/19/2010 12:55:31 PM
Start: Load Messages into object.9/19/2010 12:55:31 PM
End: Load Messages into object.9/19/2010 12:55:31 PM
Start: Query Objects.9/19/2010 12:55:31 PM
End: Query Objects.9/19/2010 12:55:31 PM
Start: Output Exception Type Details.9/19/2010 12:55:31 PM
End: Output Exception Type Details.9/19/2010 12:55:33 PM

We essentially improved the processing of our records from 40 seconds to under 2 seconds. I’d say that’s a pretty big improvement. Even better, we can use our Parallel.ForEach(…) code from above to make this even faster since we’re no longer bound by the RegEx parser:

Start: Load XDocument.9/19/2010 1:00:13 PM
End: Load XDocument.9/19/2010 1:00:13 PM
Start: Load Messages into object.9/19/2010 1:00:13 PM
End: Load Messages into object.9/19/2010 1:00:13 PM
Start: Query Objects.9/19/2010 1:00:13 PM
End: Query Objects.9/19/2010 1:00:13 PM
Start: Output Exception Type Details.9/19/2010 1:00:13 PM
End: Output Exception Type Details.9/19/2010 1:00:14 PM

So now it takes just about a second to process these records. Considering it’s processing over 10,000 event log messages, I’d say this is acceptable performance, for now.

Closing Comments

I just want to say a few things real quick. RegEx is not inherently evil. It is still one of the fastest and easiest methods to consume and parse data. You should not feel the need to go back to your own applications that are working just fine and refactor all of your code to strip out the RegEx expressions. It just so happens that sometimes too much of a good thing can be bad for your health. In our case, with over 10,000 RegEx.Match(…) calls in rapid succession, the RegEx appeared to be our bottleneck. This may or may not be the root cause of your own performance problems. The key takeaway from this blog entry should be that you should NOT guess when it comes to optimizing code paths. Instead, you should use the tools available at your disposal to find the bottleneck.

Until next time.

Comments [0] | Trackback |

Saturday, 11 September 2010

Optimizing your life: E-Mail Edition

Saturday, 11 September 2010 20:43:24 (Central Daylight Time, UTC-05:00) ( Life | Productivity )

As with most people, I hate to do the same thing more than once. In-fact, I had a job in college where I audited expense reports and then had to enter them into an application so they could be paid. The auditing of the expense report was actually fun because I got to see some interesting things people would attempt to pass off as “business expenses” – like the guy that tried to expense an entire addition to his house claiming that it was for his new office. That said, I absolutely HATED the data entry part. It was repetitive and boring and I just couldn’t understand why the person submitting expense reports couldn’t do the data entry into some application (they were already using Excel) and then electronically submit the data into our system for review and payment. That led me to design and build a simple application (in my off-time) to do exactly that. It was fairly sophisticated (for a college student doing something in his down time). The project was a tremendous success and it increased efficiency on my team to the Nth degree because it cut out steps, reduced the volume of duplicate data entry and everything was online. It was just a better workflow. I had optimized a fairly simple business process and I was seeing the tremendous dividends that such a project could have in the business world. In-fact, it was so successful and my team became so efficient that I eventually got “let go” because there just wasn’t enough work to go around anymore and the other folks on the team were permanent employees and I was a lowly hourly worker.

So, while that situation didn’t turn out so well for me, I became addicted to optimization – be it in software or my own personal life. And I’ve carried that lesson forward into the years since. For example, a few years back, I wrote a Quick Link Plugin for Windows Live Writer. It was a very simple plugin that would open up a form in Windows Live Writer to do a web search and then quickly insert the link into your blog post. This was another way that I was optimizing my (digital) life in that I hated having to leave the WLW environment to look something up and then copying/pasting the link back into the WLW window. It’s painful, repetitive and just plain wasteful.

Scott Hanselman blogged a bit ago about whether recipients of your e-mails deserve the gift of your keystrokes. I found it an interesting read and I highly recommend it. As an alternative to reaching only one person, he recommended taking it an extra step and blogging about it or posting it to a wiki and then sending the link to the individual. It’s a great idea but in my role, I’m often asked similar questions over and over and the customer doesn’t want to wade through a list of links to get the answer they need. Or, taking it one step further – once you blog something once, how do you retain that link somewhere so you don’t have to go hunting around for the URL when you need it later on down the line.

My Solution (for now)

The solution that I’ve used for years is to put these resources and snippets in my Outlook Signature list. If you’re not aware of Outlook signatures, well, I can’t really help you. Kidding! Signatures have been in Outlook for years and have been used primarily to append contact information and what-not into your e-mail. It always struck me as kind of odd that Outlook allows you to add so many signatures when really, how many ways to say “I’m so and so and here is my information” could you possibly have? So, I’ve decided to repurpose that list for my own uses. I will essentially add a new signature for each important blurb/link/etc. to my list. Then, when I am deeply into writing an e-mail and I get a question I’ve answered before, like:

“How can I turn on ValidateIntegratedModeConfiguration for all of the sites on my server and not allow my applications to override it?”

I can simply go to my signature list and select that item and it will magically appear in my e-mail:

Less clicks and key presses for me, the person gets their response quicker – it’s a win-win all around. I’ve even prefixed my most frequently used signatures at the top with an “_” in the name since I don’t want to go hunting for the stuff I use frequently. You may have also noticed that I have one called “_Expense Report”. This is just a simple e-mail that I send to my boss when I have some receipts I need to send her so I can get my expense reports approved. Yes, it’s likely a micro-optimization but if it shaves about 2 minutes off each of those e-mails I send:

2 minute savings * 4 expense reports/month * 12 months/year = 96 minutes savings/year

This makes it totally worth it for me. I can’t wait until we put something like the Visual Studio “Snippet” functionality in Outlook or someone writes an Evernote add-in for Outlook to allow me to search and insert text from my notes. That would just make my life complete.

Until next time…

Comments [0] | Trackback |

Friday, 03 September 2010

Exposing any class as a WCF endpoint…

Friday, 03 September 2010 01:41:11 (Central Daylight Time, UTC-05:00) ( .NET 4 | WCF )

Often, I’ll have a customer that wants to utilize a new feature in the .NET framework but they don’t necessarily know where to start or they have a large investment in another area and they’re afraid of the amount of rework required to get something working. This week, for example, I was teaching a customer about WPF and they asked:

“We have several class libraries and we’d like to expose them as WCF services. How can we do it?”

Yes, it was a bit off-topic from WPF but not really. I’ve found that when a customer chooses to start investing in a newer technology, like WPF, I’ve found that they also want to look at ways to revamp their existing applications to use other new features. But I digress…

In pre-.NET 4.0, if you wanted to expose a class as a WCF endpoint and make it available in IIS as an SVC file, you would have to do the following:

Create a new WCF Service (in the same category as WebSite).
Add a bunch of new “WCF Service” classes (files with an extension of “.SVC”).
Write some code to call out to your existing class library from the WCF Service.

It’s not exactly a painful experience but would require a bit of work to get everything up and running. In many cases, if you have a huge investment in something like a class library, the amount of work may be incredibly daunting. Fortunately, using some of the features in WCF 4 (part of .NET 4), this type of story is ridiculously easy. The steps that we’ll follow are the following:

Decorate your existing class.
Add configuration.
Deploy to IIS

Then, I’ll discuss additional options for debugging your new service.

Step 1: Decorate your existing class

So, I’m sure we all have a class in some Class Library somewhere that looks like this:

public class MyBusinessLogic
{

    public string GetName(int id)
    {

        return "My ID is: " + id;

    }

    public MyDataClass GetEmpInformation(int id)
    {

        return new MyDataClass() 
                    { 
                        FirstName = "Greg", 
                        LastName = "Varveris", 
                        Occupation = "Developer" 
                    };


    }
}

This is just a simple little business logic class that exposes two public methods – GetName and GetEmpInformation. So, let’s say we wanted to make this class accessible as a WCF service. The first thing we need to do is add a reference to the .NET 4 version of System.ServiceModel in our Class Library project.

I’ve included a screenshot to our left just to further show that there is no special sauce here. All of the assemblies being referenced in this project are straight up class library defaults. Now that we have the assembly referenced, let’s decorate our class with two attributes:

ServiceContract – This decorates our class and tells .NET that we have a WCF Service.
OperationContract – This decorates our methods and tells .NET which methods/functions we want to expose in our service.

The decorated class will now look like:

[ServiceContract]
public class MyBusinessLogic
{

    [OperationContract]
    public string GetName(int id)
    {

        return "My ID is: " + id;

    }

    [OperationContract]
    public MyDataClass GetEmpInformation(int id)
    {

        return new MyDataClass() 
                    { 
                        FirstName = "Greg", 
                        LastName = "Varveris", 
                        Occupation = "Developer" 
                    };


    }
}

And really that’s all you have to do from the code side.

Step 2: Add Configuration

WCF configuration has always been synonymous with “WOAH” before. Even using the great configuration editor tool, you may end up with a large amount of XML in your .config file. This is mostly because WCF is just so darn extensible. Well, for our scenario, we’ll use some great new features in WCF 4 to make the configuration a breeze:

File-less Activation
Default Configuration model

You can read-up about each of these features in the nice MSDN documentation. Basically, WCF 4 exposes the ability to map a file-name to a class and then it can expose a default endpoint (using BasicHttpBinding). For our scenario my entire config file is:

<configuration>
  <system.serviceModel>
    <serviceHostingEnvironment>
      <serviceActivations>
        <add relativeAddress="MyBusinessLogic.svc" service="WCFSample.Library.MyBusinessLogic"/>
      </serviceActivations>
    </serviceHostingEnvironment>
      <behaviors>
        <serviceBehaviors>
          <behavior>
            <serviceMetadata httpGetEnabled="true"/>
          </behavior>
        </serviceBehaviors>
      </behaviors>
  </system.serviceModel>
</configuration>

That’s it. In the first snippet, we are adding a relative address of “MyBusinessLogic.svc” which will point to our MyBusinessLogic class we defined above. The second section along the bottom is just to enable the service metadata.

The only minor manual snafu you’ll need to handle is that Visual Studio won’t let you add a web.config to a Class Library project. This can be easily remedied in a variety of ways. I just added an app.config to my project and then changed the name from “app.config” to “web.config”. Then, you just need to tell it to Copy to the output directory. This makes our project look like the following:

Step 3: Deploy to IIS

Now that we have our minuscule configuration and trivial decorations to our class, we need to host our service somewhere. For this, we’ll just use IIS with an ASP.NET 4.0 application pool. The structure of this folder will need to have the assemblies in a “bin” folder and the web.config in the root.

The bin folder is required so that the IIS engine can find your assembly. Once you configure this in IIS, you can simply navigate to your service (make sure you include the SVC in your url) in a browser and you’ll see the glorious WCF Service start page. For example, on my machine, when I navigate to: http://localhost/SampleWCFService/MyBusinessLogic.svc, I’ll see:

Debugging your service

So you have created your service, added configuration to it and deployed it to your hosting environment and everything is running smoothly. What if you want to debug the service in Visual Studio? Well, you really have 3 different options:

Attach to the instance of w3wp hosting your WCF service.
Cassini…errr, the ASP.NET Development Server.
IIS Express

I personally prefer either options 2 or 3 as it provides that seamless debugging experience we all crave where we just hit the “Green Arrow” to start the debugging session. With either option you select, we’ll need to make one or two minor configuration changes.

First, we’ll need to tell the solution to start our process. For this, you can right-click on your solution file in the Solution explorer and select “Properties”. In this window, you’ll need to select the Multiple Startup Projects radio button and set both of your projects (the Client and the Service) to the “Start” option.

Second, you’ll need to add a pre/post-build step for your service to structure the output folder correctly. Since we will deleting the contents of the “Bin” folder wherever our assemblies are being output, the command we will use is:

if exist "$(TargetDir)bin" del /q "$(TargetDir)bin\*.*"

Then, for our post-build step, we will need to create the bin folder if it doesn’t exist and then copy over our DLL’s into that bin folder. Those commands will be:

if not exist "$(TargetDir)bin" mkdir "$(TargetDir)bin"
copy "$(TargetDir)\*.dll" "$(TargetDir)\bin"
copy "$(TargetDir)\*.pdb" "$(TargetDir)\bin"
del "$(TargetDir)\*.dll"
del "$(TargetDir)\*.pdb"

That will take our output directory from zero to hero quite nicely:

Now that the prerequisites are met, we just need to point our local hosting tool (either the Development Server or IIS Express) at that folder. We’ll do this in the Debug tab of the project properties. Set the “Start External Program” radio button and set the path to:

C:\Program Files (x86)\Common Files\microsoft shared\DevServer\10.0\WebDev.WebServer40.EXE

Then, in the Command Line Arguments, specify the following:

/port:12345
/path:"<path to folder containing bin & web.config>"
/vpath:/myservice

Then, you can set a Service Reference on whatever vpath you chose above. For example, my service reference points to: http://localhost:12345/myservice/MyBusinessLogic.svc

Once the service reference is set, you can just hit the fancy green arrow to start debugging:

Enjoy!

Comments [3] | Trackback |

Sunday, 29 August 2010

Parsing ASP.NET event log error messages for fun and profit

Sunday, 29 August 2010 17:10:07 (Central Daylight Time, UTC-05:00) ( ASP.NET | Best Practice | Development )

Sometimes a customer will ask me to look at their site and make some recommendations on what can be improved. One of the many things I’ll look at is their event logs. One of the nice things about ASP.NET is that when you encounter an unhandled exception, an event will be placed into your Application event log. The message of the event log entry will usually include lots of good stuff like the application, path, machine name, exception type, stack trace, etc. Loads of great stuff and all for free. For customers that don’t have a centralized exception logging strategy, this can be a gold mine.

The way it usually works is that they will provide me an EVTX from their servers. If you’re not aware, an EVTX is just an archive of the events from the event log you specify. By itself, looking at the raw event logs from a server can be quite daunting. There are usually thousands of entries in the event log and filtering down to what you actually care about can be exhausting. Even if you do find a series of ASP.NET event log messages, the problem has always been – how do you take all of this great information that’s just dumped into the Message property of the event log entry and put it into a format you can easily report on, generate statistics, etc. Fortunately, I have a non-painful solution.

I’ve broken this down into a relatively simple 4-step process:

Get the EVTX
Generate a useful XML file
Parse into an object model
Analyze and report on the data

Let’s get to it.

Step 1: Get the EVTX

This step is pretty short and sweet. In the Event Log manager, select the “Application” group and then select the “Save All Events As…” option.

That will produce an EVTX file with whatever name you specify. Once you have the file, transfer it to your machine as you generally do not want to install too many tools in your production environment.

Step 2: Generate a useful XML file

Now that we have the raw EVTX file, we can get just the data we care about using a great tool called LogParser. Jeff Atwood did a nice little write-up on the tool but simply put it’s the Swiss Army knife of parsing tools. It can do just about anything data related you would wish using a nice pseudo-SQL language. We’ll use the tool to pull out just the data from the event log we want and dump it into an XML file. The query that we can use for this task is daunting in its simplicity:

SELECT Message INTO MyData.xml
FROM ‘*.evtx’
WHERE EventID=1309

The only other thing we need to tell LogParser is the format in which it the data is coming in and the format to put it into. This makes our single command the following:

C:\>logparser -i:EVT -o:XML
        "SELECT Message INTO MyData.xml FROM ‘*.evtx’ WHERE EventID=1309"

This will produce a nice XML file that looks something like the following:

<?xml version="1.0" encoding="ISO-10646-UCS-2" standalone="yes" ?>
<ROOT DATE_CREATED="2010-08-29 06:04:20" CREATED_BY="Microsoft Log Parser V2.2">
 <ROW>
  <Message>Event code: 3005 Event message: An unhandled exception has occurred...  
  </Message>
 </ROW>
 ...
</ROOT>

One thing that you may notice is that all of the nicely formatted data from our original event log message is munged together into one unending string. This will actually work in our favor but more on that in the next step.

Step 3: Parse into an object model

So, now that we have an XML file with all of our event details, let’s do some parsing. Since all of our data is in one string, the simplest method is to apply a RegEx expression with grouping to grab the data we care about.

In a future post, I’ll talk about a much faster way of getting this type of data without a RegEx expression. After all, refactoring is a way of life for developers.

private const string LargeRegexString = @"Event code:(?<Eventcode>.+)" +
            @"Event message:(?<Eventmessage>.+)" +
            @"Event time:(?<Eventtime>.+)" +
            @"Event time \(UTC\):(?<EventtimeUTC>.+)" +
            @"Event ID:(?<EventID>.+)" +
            @"Event sequence:(?<Eventsequence>.+)" +
            @"Event occurrence:(?<Eventoccurrence>.+)" +
            @"Event detail code:(?<Eventdetailcode>.+)" +
            @"Application information:(?<Applicationinformation>.+)" +
            @"Application domain:(?<Applicationdomain>.+)" +
            @"Trust level:(?<Trustlevel>.+)" +
            @"Full Application Virtual Path:(?<FullApplicationVirtualPath>.+)" +
            @"Application Path:(?<ApplicationPath>.+)" +
            @"Machine name:(?<Machinename>.+)" +
            @"Process information:(?<Processinformation>.+)" +
            @"Process ID:(?<ProcessID>.+)" +
            @"Process name:(?<Processname>.+)" +
            @"Account name:(?<Accountname>.+)" +
            @"Exception information:(?<Exceptioninformation>.+)" +
            @"Exception type:(?<Exceptiontype>.+)" +
            @"Exception message:(?<Exceptionmessage>.+)" +
            @"Request information:(?<Requestinformation>.+)" +
            @"Request URL:(?<RequestURL>.+)" +
            @"Request path:(?<Requestpath>.+)" +
            @"User host address:(?<Userhostaddress>.+)" +
            @"User:(?<User>.+)" +
            @"Is authenticated:(?<Isauthenticated>.+)" +
            @"Authentication Type:(?<AuthenticationType>.+)" +
            @"Thread account name:(?<Threadaccountname>.+)" +
            @"Thread information:(?<Threadinformation>.+)" +
            @"Thread ID:(?<ThreadID>.+)" +
            @"Thread account name:(?<Threadaccountname>.+)" +
            @"Is impersonating:(?<Isimpersonating>.+)" +
            @"Stack trace:(?<Stacktrace>.+)" +
            @"Custom event details:(?<Customeventdetails>.+)";

Now that we have our RegEx, we’ll just write the code to match it against a string and populate our class. While I’ve included the entire regex above, I’ve only included a partial implementation of the class population below.

public class EventLogMessage
    {

        private static Regex s_regex = new Regex(LargeRegexString, RegexOptions.Compiled);

        public static EventLogMessage Load(string rawMessageText)
        {

            Match myMatch = s_regex.Match(rawMessageText);
            EventLogMessage message = new EventLogMessage();
            message.Eventcode = myMatch.Groups["Eventcode"].Value;
            message.Eventmessage = myMatch.Groups["Eventmessage"].Value;
            message.Eventtime = myMatch.Groups["Eventtime"].Value;
            message.EventtimeUTC = myMatch.Groups["EventtimeUTC"].Value;
            message.EventID = myMatch.Groups["EventID"].Value;
            message.Eventsequence = myMatch.Groups["Eventsequence"].Value;
            message.Eventoccurrence = myMatch.Groups["Eventoccurrence"].Value;
            ...
            return message;
        }

        public string Eventcode { get; set; }
        public string Eventmessage { get; set; }
        public string Eventtime { get; set; }
        public string EventtimeUTC { get; set; }
        public string EventID { get; set; }
        public string Eventsequence { get; set; }
        public string Eventoccurrence { get; set; }
        ...
    }

The last step is just to read in the XML file and instantiate these objects.

XDocument document = XDocument.Load(@"<path to data>\MyData.xml");

var messages = from message in document.Descendants("Message")
               select EventLogMessage.Load(message.Value);

Now that we have our objects and everything is parsed just right, we can finally get some statistics and make sense of the data.

Step 4: Analyze and report on the data

This last step is really the whole point of this exercise. Fortunately, now that all of the data is an easily query’able format using our old friend LINQ, the actual aggregates and statistics are trivial. Really, though, everyone’s needs are going to be different but I’ll provide a few queries that might be useful.

Query 1: Exception Type Summary

For example, let’s say you wanted to output a breakdown of the various Exception Types in your log file. The query you would use for that would be something like:

var results = from log in messages
              group log by log.Exceptiontype into l
              orderby l.Count() descending, l.Key
              select new
              {
                  ExceptionType = l.Key,
                  ExceptionCount = l.Count()
              };

foreach (var result in results)
{

    Console.WriteLine("{0} : {1} time(s)", 
                      result.ExceptionType, 
                      result.ExceptionCount);

}

This would then output something like:

WebException : 15 time(s)
InvalidOperationException : 7 time(s)
NotImplementedException : 2 time(s)
InvalidCastException : 1 time(s)
MissingMethodException : 1 time(s)

Query 2: Exception Type and Request URL Summary

Let’s say that you wanted to go deeper and get the breakdown of which URL’s generated the most exceptions. You can just expand that second foreach loop in the above snippet to do the following:

foreach (var result in results)
{

    Console.WriteLine("{0} : {1} time(s)", 
                      result.ExceptionType, 
                      result.ExceptionCount);

    var requestUrls = from urls in messages
                       where urls.Exceptiontype == result.ExceptionType
                       group urls by urls.RequestURL.ToLower() into url
                       orderby url.Count() descending, url.Key
                       select new
                       {
                           RequestUrl = url.Key,
                           Count = url.Count()
                       };

    foreach (var url in requestUrls){

        Console.WriteLine("\t{0} : {1} times ",
                          url.RequestUrl,
                          url.Count);
    }                
}

This then would produce output like this:

WebException  : 15 time(s)
        http://localhost/menusample/default.aspx  : 11 times
        http://localhost:63188/menusample/default.aspx  : 4 times
InvalidOperationException  : 7 time(s)
        http://localhost:63188/menusample/default.aspx  : 6 times
        http://localhost/menusample/default.aspx  : 1 times
NotImplementedException  : 2 time(s)
        http://localhost/samplewebsiteerror/default.aspx  : 2 times
InvalidCastException  : 1 time(s)
        http://localhost:63188/menusample/default.aspx  : 1 times
MissingMethodException  : 1 time(s)
        http://localhost:63188/menusample/default.aspx  : 1 times

Query 3: Exception Type, Request URL and Method Name Summary

You can even go deeper, if you so desire, to find out which of your methods threw the most exceptions. For this to work, we need to make a slight change to our EventLogMessage class to parse the Stack Trace data into a class. First, we’ll start with our simple little StackTraceFrame class:

public class StackTraceFrame
{
    public string Method { get; set; }

}

Second, add a new property to our EventLogMessage class to hold a List<StackTraceFrame>:

public List<StackTraceFrame> StackTraceFrames { get; set; }

Lastly, add a method (and its caller) to parse out the stack frames and assign the resulting List to the StackTraceFrames property mentioned above:

public EventLogMessage(string rawMessageText)
{
    Match myMatch = s_regex.Match(rawMessageText);
    ...
    Stacktrace = myMatch.Groups["Stacktrace"].Value;
    ...
    StackTraceFrames = ParseStackTrace(Stacktrace);
}

private List<StackTraceFrame> ParseStackTrace(string stackTrace)
{
    List<StackTraceFrame> frames = new List<StackTraceFrame>();
    string[] stackTraceSplit = stackTrace.Split(new string[] { " at " }, 
                                        StringSplitOptions.RemoveEmptyEntries);
    foreach (string st in stackTraceSplit)
    {
        if (!string.IsNullOrEmpty(st))
        {
            frames.Add(new StackTraceFrame() { Method = st });
        }
    }
    return frames;
}

Please Note: You could enhance the ParseStackTrace(…) method to parse out the source files, line numbers, etc. I’ll leave this as an exercise for you, dear reader.

Now that we have the infrastructure in place, the query is just as simple. We’ll just nest this additional query inside of our URL query like so:

foreach (var url in requestUrls){

    Console.WriteLine("\t{0} : {1} times ",
                      url.RequestUrl,
                      url.Count);

    var methods = from method in messages
                  where string.Equals(method.RequestURL, 
                                      url.RequestUrl, 
                                      StringComparison.InvariantCultureIgnoreCase) 
                        &&
                        string.Equals(method.Exceptiontype, 
                                      result.ExceptionType, 
                                      StringComparison.InvariantCultureIgnoreCase)
                  group method by method.StackTraceFrames[0].Method into mt
                  orderby mt.Count() descending, mt.Key
                  select new
                  {
                      MethodName = mt.Key,
                      Count = mt.Count()
                  };

    foreach (var method in methods)
    {
        Console.WriteLine("\t\t{0} : {1} times ",
                          method.MethodName,
                          method.Count
                          );
    }
}

This would then produce output like the following:

WebException  : 15 time(s)
         http://localhost/menusample/default.aspx  : 11 times
                System.Net.HttpWebRequest.GetResponse() : 11 times
         http://localhost:63188/menusample/default.aspx  : 4 times
                System.Net.HttpWebRequest.GetResponse() : 4 times
InvalidOperationException  : 7 time(s)
         http://localhost:63188/menusample/default.aspx  : 6 times
                System.Web.UI.WebControls.Menu... : 6 times
         http://localhost/menusample/default.aspx  : 1 times
                System.Web.UI.WebControls.Menu... : 1 times

One last thing you may notice is that the in the example above, the first frame for each of those exceptions are somewhere in the bowels of the .NET BCL. You may want to filter this out even further to only return YOUR method. This can be accomplished very easily with the method below. It will simply loop through the StackTraceFrame List and return the first method it encounters that does not start with “System.” or “Microsoft.”.

private static string GetMyMethod(List<StackTraceFrame> frames)
{

    foreach (StackTraceFrame frame in frames)
    {

        if (!frame.Method.StartsWith("System.") &&
            !frame.Method.StartsWith("Microsoft."))
            return frame.Method;


    }

    return "No User Code detected.";
}

Then, you can just call that method from the new query we wrote above:

var methods = from method in messages
              where ...
              group method by 
                GetMyMethod(method.StackTraceFrames) into mt
              ...

Finally, with this new snippet in place, we’ll get output like this:

WebException  : 15 time(s)
         http://localhost/menusample/default.aspx  : 11 times
                _Default.Page_Load(Object sender, EventArgs e)...: 8 times
                No User Code detected. : 3 times

         http://localhost:63188/menusample/default.aspx  : 4 times
                _Default.Page_Load(Object sender, EventArgs e)... : 1 times
                No User Code detected. : 1 times
                WebControls.CustomXmlHierarchicalDataSourceView.Select()... : 2 times

As you can see, the sky’s the limit.

Enjoy!

Comments [0] | Trackback |

I know kung fu

Contact

Categories

On this page

Archive

Blogroll

Step 1: Change the extension from docx to zip

Step 2: Extract the zip file to a folder

Step 3: Find the file that’s causing the problem

Step 4: Open and format the file in Visual Studio

Step 5: Recreate the Word doc and get the additional information

Applying to a website

Event Log Header Collection Source

Event Log Collection Source Redux

Generate and test the collection

Closing Thoughts

Figure out what you want to Pivot

Find an API and code it

Generate and Test the collection

The Problem

The Research

The Fix

Closing Comments

My Solution (for now)

Step 1: Decorate your existing class

Step 2: Add Configuration

Step 3: Deploy to IIS

Debugging your service

Step 1: Get the EVTX

Step 2: Generate a useful XML file

Step 3: Parse into an object model

Step 4: Analyze and report on the data

Query 1: Exception Type Summary

Query 2: Exception Type and Request URL Summary

Query 3: Exception Type, Request URL and Method Name Summary