Samurai Programmer.com

Contact

Blogroll

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Are you ready for your next challenge?

Site Update July 2010 - (aka - Where are all of your posts?)

Latest Tweet:

Sunday, 17 October 2010

Yield Return…a little known but incredible language feature

Sunday, 17 October 2010 17:54:49 (Central Daylight Time, UTC-05:00) ( .NET | Performance )

“We shall neither fail nor falter; we shall not weaken or tire…give us the tools and we will finish the job.” – Winston Churchill

I don’t often blog about specific language features but over the past few weeks I’ve spoken to a few folks that did not know of the “yield” keyword and the “yield return” and “yield break” statements, so I thought it might be a good opportunity to shed some light on this little known but extremely useful C# feature. Chances are, you’ve probably indirectly used this feature before and just never known it.

We’ll start with the problem I’ve seen that plagues many applications. Often times, you’ll call a method that returns a List<T> or some other concrete collection. The method probably looks something like this:

public static List<string> GenerateMyList()
{
    List<string> myList = new List<string>();

    for (int i = 0; i < 100; i++)
    {

        myList.Add(i.ToString());    

    }

    return myList;

}

I’m sure your logic is going to be significantly more complex than what I have above but you get the idea. There are a few problems and inefficiencies with this method. Can you spot them?

The entire List<T> must be stored in memory.
Its caller must wait for the List<T> to be returned before it can process anything.
The method itself returns a List<T> back to its caller.

As an aside – with public methods, you should strive to not return a List<T> in your methods. Full details can be found here. The main idea here is that if you choose to change the method signature and return a different collection type in the future, this would be considered a breaking change to your callers.

In any case, I’ll focus on the first two items in the list above. If the List<T> that is returned from the GenerateMyList() method is large then that will be a lot of data that must be kept around in memory. In addition, if it takes a long time to generate the list, your caller is stuck until you’ve completely finished your processing.

Instead, you can use that nifty “yield” keyword. This allows the GenerateMyList() method to return items to its caller as they are being processed. This means that you no longer need to keep the entire list in memory and can just return one item at a time until you get to the end of your returned items. To illustrate my point, I’ll refactor the above method into the following:

private static IEnumerable<string> GenerateMyList()
{
    for (int i = 0; i < 100; i++)
    {
        string value = i.ToString();
        Console.WriteLine("Returning {0} to caller.", value);
        yield return value;
    }

    Console.WriteLine("Method done!");

    yield break;
}

A few things to note in this method. The return type has been changed to an IEnumerable<string>. This is one of those nifty interfaces that exposes an enumerator. This allows its caller to cycle through the results in a foreach or while loop. In addition, the “yield return i.ToString()” will return that item to its caller at that point and not when the entire method has completed its processing. This allows for a very exciting caller-callee type relationship. For example, if I call this method like so:

IEnumerable<string> myList = GenerateMyList();

foreach (string listItem in myList)
{
    Console.WriteLine("Item: " + listItem);
}

The output would be:

Thus showing that each item gets returned at the time we processed it. So, how does this work? Well, at compile time, we will generate a class to implement the behavior in the iterator. Essentially, this means that the GenerateMyList() method body gets placed into the MoveNext() method. In-fact, if you open up the compiled assembly in Reflector, you see that plumbing in place (comments are mine and some code was omitted for clarity’s sake):

private bool MoveNext()
{
    
            this.<>1__state = -1;
            this.<i>5__1 = 0;
            // My for loop has changed to a while loop.
            while (this.<i>5__1 < 100)
            {
                // Sets a local value
                this.<value>5__2 = this.<i>5__1.ToString();
                // Here is my Console.WriteLine(...)
                Console.WriteLine("Returning {0} to caller.", this.<value>5__2);
                // Here is where the current member variable
                // gets stored.
                this.<>2__current = this.<value>5__2;
                this.<>1__state = 1;
                // We return "true" to the caller so it knows
                // there is another record to be processed.
                return true;
                ...
            }
            // Here is my Console.WriteLine() at the bottom
            // when we've finished processing the loop.
            Console.WriteLine("Method done!");
            break;

    
}

Pretty straightforward. Of course, the real power is that it the compiler converts the “yield return <blah>” into a nice clean enumerator with a MoveNext(). In-fact, if you’ve used LINQ, you’ve probably used this feature without even knowing it. Consider the following code:

private static void OutputLinqToXmlQuery()
{
    XDocument doc = XDocument.Parse
                        (@"<root><data>hello world</data><data>goodbye world</data></root>");

    var results = from data in doc.Descendants("data")
                  select data;

    foreach (var result in results)
    {
        Console.WriteLine(result);                
    }
}

The “results” object, by default will be of type “WhereSelectEnumerableIterator” which exposes a MoveNext() method. In-fact, that is also why the results object doesn’t allow you to do something like this:

var results = from data in doc.Descendants("data")
              select data;

var bad = results[1];

The IEnumerator does not expose an indexer allowing you to go straight to a particular element in the collection because the full collection hasn’t been generated yet. Instead, you would do something like this:

var results = from data in doc.Descendants("data")
              select data;            

var good = results.ElementAt(1);

And then under the covers, the ElementAt(int) method will just keep calling MoveNext() until it reaches the index you specified. Something like this:

Note: this is my own code and is NOT from the .NET Framework – it is merely meant to illustrate a point.

public static XElement MyElementAt(this IEnumerable<XElement> elements, 
                                   int index)
{
    int counter = 0;
    using (IEnumerator<XElement> enumerator = 
                                 elements.GetEnumerator())
    {            
        while(enumerator.MoveNext()){
            if (counter == index)
                return enumerator.Current;
            counter++;
        }
    }

    return null;
}

Hope this helps to demystify some things and put another tool in your toolbox.

Until next time.

Comments [0] | Trackback |

Monday, 11 October 2010

Linking your pivot collections the fun and easy way

Monday, 11 October 2010 00:09:42 (Central Daylight Time, UTC-05:00) ( ASP.NET | Development | Pivot )

In my previous post, I added to my series of entries on making sense of your ASP.NET event log error messages. Note that this is entry #4 in this series. The previous three entries can be found here:

Part 1: Parsing ASP.NET event log error messages for fun and profit
Part 2: Don’t guess when it comes to performance…
Part 3: Pivoting your ASP.NET event log error messages

In that last post, I walked through the PAuthorLib.dll and showed you how to crawl through your event log error messages and create a pivot collection. The result of that initial effort was a nice view into our events:

While this certainly got the job done and is a very powerful and compelling view into our data, we need to realize that as our data grow, the amount of entries in our linked collection is limited. From the Developer Overview, we see that the maximum number of items we should have in a single collection is 3,000:

So, while a simple collection will get the job done for your smaller amounts of data, you will really run into some challenges with your larger datasets like our ASP.NET event log error messages. To combat this limitation you can create what’s called a Linked Collection. The idea is that it’s just a way for you to link together related collections in order to provide a seamless experience for your users. In our case, a natural break for our collections will be based upon the exception type with a summary collection and then a separate collection for each exception type. If I were to draw this out:

Event Log Header Collection Source

The idea behind this structure is that the Exception summary would simply link to each of these exception collections. First, we’ll create a colleciton source for our exception summary. As in our previous collection source (in my last blog post), we inherit from the AbstractCollectionSource class and use the LoadHeaderData() method to add our facet categories to the collection. In this case, we’ll create two categories – the number of Occurrences and the Urls where the exception occurred. Another difference is that we are going to pass the already parsed collection of messages into the constructor. The reason for that is so we don’t have to repeat the parsing of the event log messages multiple times.

class EventLogHeaderCollectionSource : AbstractCollectionSource
{

    private IEnumerable<EventLogMessage> m_messages = null;

    public EventLogHeaderCollectionSource(IEnumerable<EventLogMessage> messages, 
                                          string inputFile)
        : base(inputFile)
    {

        m_messages = messages;


    }

    #region Facets

    private const string OCCURRENCES = "Occurrences";
    private const string URLS = "Urls";

    #endregion

    protected override void LoadHeaderData()
    {
        this.CachedCollectionData.FacetCategories.
                        Add(new PivotFacetCategory(OCCURRENCES, PivotFacetType.Number));
        this.CachedCollectionData.FacetCategories.
                        Add(new PivotFacetCategory(URLS, PivotFacetType.String));

        this.CachedCollectionData.Name = 
                        "ASP.NET Error Messages - Summary";
        this.CachedCollectionData.Copyright = 
                        new PivotLink("Source", "http://www.samuraiprogrammer.com");

    }
}

Then, in the LoadItems() method, we provide the logic to generate the PivotItem collection. The one key item to make note of is the use of the Href property of the PivotItem object. This is where we specify the collection we wish to link to this item. Since each of the PivotItems will be a summary of the number of each exception type – we’ll name the sub-collections by its exception type. For example, NullReferenceException.cxml, SqlException.cxml, etc.

protected override IEnumerable<PivotItem> LoadItems()
{
    var results = from log in m_messages
                  group log by log.Exceptiontype into l
                  orderby l.Count() descending, l.Key
                  select new
                  {
                      ExceptionType = l.Key,
                      ExceptionCount = l.Count()
                  };


    int index = 0;
    foreach (var result in results)
    {
        PivotItem item = new PivotItem(index.ToString(), this);       
        item.Name = result.ExceptionType;
        item.Description = "# of Exceptions: " + result.ExceptionCount.ToString();
        item.AddFacetValues(OCCURRENCES, result.ExceptionCount);
        item.Href = result.ExceptionType + ".cxml";

        ...                

        index++;
        yield return item;
    }

    yield break;
}

Event Log Collection Source Redux

Previously, when we generated the pivot collections, we were outputting all of the records into a single collection. Now that we are generating a collection for each exception type, we will need to put a filter in our exception collection and then incorporate that filter into our item generation. Other than that, the code we wrote previously remains largely unchanged, so I left the majority of it out and only included the snippets that we care about below.

class EventLogCollectionSource : AbstractCollectionSource
{
    private IEnumerable<EventLogMessage> m_messages = null;
    private string m_exceptionType = string.Empty;

    public EventLogCollectionSource(
                    IEnumerable<EventLogMessage> messages, 
                    string exceptionType, 
                    string path)
        : base(path)
    {
        m_messages = messages;
        m_exceptionType = exceptionType;
    }

    protected override void LoadHeaderData()
    {
        ...
        this.CachedCollectionData.Name = 
                string.Format("{0} Error Messages", m_exceptionType);
        ...
    }

    protected override IEnumerable<PivotItem> LoadItems()
    {
        var results = (from message in m_messages
                       where message.Exceptiontype == m_exceptionType
                       select message);

        int index = 0;
        foreach (EventLogMessage message in results)
        {
            PivotItem item = 
                    new PivotItem(index.ToString(), this);
            item.Name = message.Exceptiontype;
            item.Description = message.Exceptionmessage;

            ...

            index++;
            yield return item;
        }
        yield break;
    }
}

Generate and test the collection

Then, the only thing we have left to do is generate and test our linked collections. I won’t go into a lengthy explanation of how we generate the collections because I did that in the last blog entry. I will show the broad strokes required to tie this all together, though:

// Load the raw messages into a collection
IEnumerable<EventLogMessage> messages = 
                LoadEventLogMessages(inputFile).ToList();

// Generate summary pivot collection
EventLogHeaderCollectionSource sourceSummary = 
                new EventLogHeaderCollectionSource(messages, inputFile);
...
summaryTargetFilter1.Write(sourceSummaryFilter1);

// Get the aggregate results so we know the filters
// for our exception pivot collections
var summaryResults = from log in messages
              group log by log.Exceptiontype into l
              orderby l.Count() descending, l.Key
              select new
              {
                  ExceptionType = l.Key,
                  ExceptionCount = l.Count()
              };

foreach (var resultItem in summaryResults)
{
    // Generate pivots for each exception type
    EventLogCollectionSource source = 
            new EventLogCollectionSource(messages, 
                                         resultItem.ExceptionType, 
                                        inputFile);
    ...
    targetFilter1.Write(sourceFilter1);
}

Once we we have this code and everything has been generated, if we open the output folder, we’ll see the following structure:

We see our ExceptionSummary pivot collection and all of the deep zoom folders. So, when we open the Pivot tool, we’ll see a nice parent collection:

This gives us a nice breakdown of the number of occurrences for each exception in our source data. Immediately we see an outlier (more on that later) between the 6,000 and 7,000 item mark and when we select that tile, we see the following:

We also see a green “Open” box (surrounded in a red rectangle, my emphasis) which links to our NullReferenceException.cxml. When we click that box, the tool will immediately open that collection in the UI for our perusal – providing a very similar look to what we saw in the last entry:

Closing Thoughts

Now, you may have noticed a contradiction above. I said that a collection should have no more than 3,000 items and yet, with the NullReferenceException collection, we saw in the summary that it had over 6,000 items. That is a very good point and will be a subject of a future blog post. I wanted to illustrate the simple collections and the linked collections before we got into that third type of collection from above – the Dynamic Collection. Stay tuned!

Comments [0] | Trackback |

Monday, 04 October 2010

Pivoting ASP.NET event log error messages

Monday, 04 October 2010 10:43:49 (Central Daylight Time, UTC-05:00) ( ASP.NET | Development | Pivot )

Unless you’ve been hiding under the proverbial rock, you’ve probably seen the recent Pivot hoopla. If you’re not familiar with it, it’s a way to visualize a large amount of data in a nice filterable format. The nice thing about it is that it’s really easy to put together a pivot collection and there are a ton of tools available for just this purpose. Just do a search on CodePlex for Pivot and you’ll get about 40’ish good results for tools you can use to create a Pivot Collection.

So, I was putting together a proof-of-concept for an internal project and thought I would continue on with my series of blog posts on ASP.NET Error Message event logs with a post on how to visualize this data using a pivot. You may wish to read parts 1 and 2 here:

So, when I put together my pivot, I worked out a 3 step process:

Figure out what you want to Pivot
Find an API and convert the data
Generate and Test the collection

Let’s begin, shall we.

Figure out what you want to Pivot

The structure for the Pivot Collection is deceptively simple -

<?xml version="1.0"?> 
<Collection Name="Hello World Collection" …>
  <FacetCategories> 
    <FacetCategory Name="Hello World Facet Category One" Type="String"/> 
  </FacetCategories> 
  <Items ImgBase="helloworld.dzc"> 
    <Item Img="#0" Id="0" Href="http://www.getpivot.com" Name="Hello World!"> 
      <Description> This is the only item in the collection.</Description> 
      <Facets> 
        <Facet Name="Hello World Facet Category One"> 
         <String Value="Hello World Facet Value"/> 
        </Facet>       
      </Facets> 
    </Item> 
  </Items> 
</Collection>

The way that I think about the Items in the Collection are in the same way that you might think about an object. For example, a Car object might have the following properties:

Advertising blurb
Car and Driver Reviews
Color
Make
Model
Engine
0-60mph time
Max Speed

The common values like the Color, Make, Model, 0-60mph time and max speed become the facets or attributes that describe your object in relation to other instances of objects. Things like the advertising blurbs and car and driver reviews or descriptions belong instead as properties of your Item directly in the Description element.

For our data, namely ASP.NET exceptions, we’re going to define an exception as the following:

Item

Name = Exception Type
Description = Exception Message
Facets

Request Path
Stack Trace
Event Time
Top Method of Stack Trace
Top My Code Method of Stack Trace

This should allow us to group and drill through the common properties that might link exceptions together and still provide detailed error information when necessary.

Find an API and code it

The second step here is to find some code/API/tool that we can enhance for our purposes. There are some great tools published by the Live Labs team – for example:

While both tools could be used in this instance, in part 2 we found that some of our Event Logs we were parsing contained more than 10,000 items and I wanted a bit more control over how I converted the data. “No touching” is the phrase of the day. Fortunately, the command line tool was published on CodePlex with an API we can use. Once you download the product you see that it contains 3 assemblies:

The last item there is the PauthorLib.dll which encapsulates many of the extension points within this great tool. In-fact, it exposes about 7 different namespaces for our purposes:

For our purposes, we are going to focus on the Streaming set of namespaces. Why? Well, this is namely because we are going to be dealing with a lot of data and I didn’t want to load everything into memory before writing it to disk. If you look at the contents of the Streaming namespace, you’ll see a great class called “AbstractCollectionSource”. This looks fairly promising because it exposes two main methods:

class EventLogExceptionCollectionSource : AbstractCollectionSource
{
    protected override void LoadHeaderData()
    {
        throw new NotImplementedException();
    }

    protected override IEnumerable<PivotItem> LoadItems()
    {
        throw new NotImplementedException();
    }
}

Before we do anything, though, we need a constructor. The constructor will be responsible for taking a string representing the path to our data and passing it to our base class’s constructor.

public EventLogExceptionCollectionSource(string filePath)
    : base(filePath)
{

    // Do nothing else.

}

Then, the first method, LoadHeaderData, is where we define our facets – Request Path, Stack Trace, etc. - as well as the data types that each facet will be. So, our code will be fairly simple and straight-forward:

protected override void LoadHeaderData()
{

    this.CachedCollectionData.FacetCategories.Add(
                                new PivotFacetCategory(STACKTRACE, 
                                                       PivotFacetType.LongString));
    this.CachedCollectionData.FacetCategories.Add(
                                new PivotFacetCategory(REQUESTPATH, 
                                                       PivotFacetType.String));
    this.CachedCollectionData.FacetCategories.Add(
                                new PivotFacetCategory(EVENTTIME, 
                                                       PivotFacetType.DateTime));
    this.CachedCollectionData.FacetCategories.Add(
                                new PivotFacetCategory(TOPMETHOD, 
                                                       PivotFacetType.String));
    this.CachedCollectionData.FacetCategories.Add(
                                new PivotFacetCategory(TOPAPPMETHOD, 
                                                       PivotFacetType.String));

    this.CachedCollectionData.Name = "Event Log Error Messages";


}

The second method, LoadItems(), is responsible for doing exactly what it suggests – this is where we load the data from whichever source we care about and then convert it into our PivotItem collection. For our purposes, we’re going to load the XML file we defined in Part 1 of this series into a list of EventLogMessage objects and then convert those EventLogMessage objects into PivotItem objects:

protected override IEnumerable<PivotItem> LoadItems()
{
    // Load XML file
    XDocument document = XDocument.Load(this.BasePath);

    // Populate collection of EventLogMessage objects
    var messages = from message in document.Descendants("Message")
                   select EventLogMessage.Load(message.Value);

    int index = 0;
    foreach (EventLogMessage message in messages)
    {

        PivotItem item = new PivotItem(index.ToString(), this);
        item.Name = message.Exceptiontype;
        item.Description = message.Exceptionmessage;
        item.AddFacetValues(REQUESTPATH, message.Requestpath);
        item.AddFacetValues(STACKTRACE, message.Stacktrace);
        item.AddFacetValues(EVENTTIME, message.Eventtime);
        item.AddFacetValues(TOPMETHOD, message.StackTraceFrames[0].Method);
        item.AddFacetValues(TOPAPPMETHOD, GetFirstNonMicrosoftMethod(message.StackTraceFrames));

        index++;
        yield return item;

    }
}

The key method calls from above are the AddFacetValues(…) method calls. This method essentially sets the attributes we wish to have our data pivot upon. This, by itself, isn’t enough to generate our great pivot calls – we need to call our code from somewhere. Since this is a simple app, we’re going to make it a console app. For our Collection to get generated we need to use a few other objects included in this API:

EventLogExceptionCollectionSource – The class we created above.
HtmlImageCreationSourceFilter – This class will generate the tile in the Pivot based upon some HTML template we specify.
LocalCxmlCollectionTarget – Generates the Collection XML file at the path we specify.
DeepZoomTargetFilter – Generates the deep zoom files to support our collection XML file and also enables all of our fancy transitions.

In practice, the code is pretty simple and straight forward and I applaud the people who wrote this library:

private static void GenerateExceptionPivot(string inputFile, string outputFolder)
{
    string collectionName = Path.Combine(outputFolder, "MyExceptions.cxml");

    EventLogExceptionCollectionSource source = 
                    new EventLogExceptionCollectionSource(inputFile);
    HtmlImageCreationSourceFilter sourceFilter1 = 
                    new HtmlImageCreationSourceFilter(source);
    sourceFilter1.HtmlTemplate = 
                    "<html><body><h1>{name}</h1>{description}</body></html>";
    sourceFilter1.Width = 600;
    sourceFilter1.Height = 600;

    LocalCxmlCollectionTarget target = 
                    new LocalCxmlCollectionTarget(collectionName);
    DeepZoomTargetFilter targetFilter1 = 
                    new DeepZoomTargetFilter(target);
    targetFilter1.Write(sourceFilter1);

}

That last statement, targetFilter1.Write(…) is what will actually execute everything and write our resultant files to disk.

Generate and Test the collection

So, now if we run our console application and call that GenerateExceptionPivot(…) method, we’ll get some great output.

What’s nice about the Library is that it provides progress as it iterates through your data (in the red rectangle) and also in the blue rectangle, we need that it’s multi-threaded by default. This is primarily for the most intensive part of the operation – the creation of the deep zoom artifacts. If you have one of those new fangled machines with 2+ cores, you can tweak the number of threads that it will spawn for this operation by setting the ThreadCount property of the DeepZoomTargetFilter object. This may or may not improve your performance but it’s nice that the option is available.

...
DeepZoomTargetFilter targetFilter1 = 
                new DeepZoomTargetFilter(target);

targetFilter1.ThreadCount = 100;
targetFilter1.Write(sourceFilter1);
...

Once our collection has been generated, we can browse it in an explorer.exe window just to get an idea of what our code has wrought:

And then to test it, you can just point the Live Labs Pivot application at our “MyExceptions.cxml” file and view the wonderful data. For example, you can look at the Event Time in the histogram view to see how your exceptions broke down over time. You can also filter your data by things like the RequestPath (the page that threw the exception) or the Method that was at the top of the callstack.

Then, you can zoom in on a specific time slice you care about:

Then, if you want to view the details for a specific instance, just click the corresponding tile. Then, a new side bar will appear on the right hand side with all of the details we stored in this record:

We generated a single collection in this blog post. One thing to keep in mind is that each collection should have no more than 3,000 items. For collections in which you want to have more than 3,000 items, you should look at potentially creating a Linked Collection. That will be the subject of an upcoming blog post.

Until next time!

Comments [0] | Trackback |

I know kung fu

Contact

Categories

On this page

Archive

Blogroll

Event Log Header Collection Source

Event Log Collection Source Redux

Generate and test the collection

Closing Thoughts

Figure out what you want to Pivot

Find an API and code it

Generate and Test the collection