Samurai Programmer.com

Home | About Me | Developer PFE Blog | Become a Developer PFE

Contact

Blogroll

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Are you ready for your next challenge?

Site Update July 2010 - (aka - Where are all of your posts?)

Latest Tweet:

Sunday, 19 September 2010

Don’t guess when it comes to performance…a RegEx story.

Sunday, 19 September 2010 13:30:02 (Central Daylight Time, UTC-05:00) ( Performance | RegEx )

Many times, when I work with a customer, it’s because they’ve tried to accomplish something and need a little extra help. Often, this falls into the application optimization area. For example, a few years ago, I had a customer that was developing a rather sophisticated SharePoint workflow that had some custom code that would process and merge two Excel spreadsheets together. They were using Excel 2007 so their merging was being done using the excellent Open XML SDK. To their credit, the application did what it needed to do – but it took about an hour to process these spreadsheets. The developers on the project knew about the performance problems but as so often happens, they thought they knew where the bottlenecks were and how they should approach optimizing it. So, they started injecting some tracing into their code and worked hard to optimize this lengthy process. After a while, though, they had only shaved a few seconds off of that 60 minute time and while they did show some improvement – they knew they needed to get the processing done even faster for this to become a viable solution for their organization. So, they sent me a simple repro of the code and together in just a few short days, we were able to get the processing from 60 minutes to under a minute. That’s a BIG win. Big like the Titanic big.

I love these types of engagements because, well, I like to make things faster. The biggest problem that I see, though, is that some people shy away from using the tools in optimization scenarios because they’ve been so invested in their code that they think they know why it’s not performing. This is the “psychic” effect and the mindset is usually something like this:

“I wrote this darn code and while I was writing it – I knew this method could be improved so now I’m going to finally optimize the darn thing.”

It sounds good in-theory, right? You wrote the code, so you should know how to improve it and really, what’s a tool going to tell you that you don’t already know? In truth, it can tell you quite a bit. In other situations, the developers will add some instrumentation (via tracing/debugging statements) to what they perceive as the critical code paths and the resultant timing points to one area when the problem really resides in a completely different section of code. That’s right, folks, your tracing statements may be lying to you. So what’s a developer to do? Well, use the tools, of course – the right tool for the right job, as they say. Let me expand upon that in the context of some code that yours truly wrote a little while ago.

The Problem

A few short weeks ago, I posted a entry about parsing your ASP.NET event log error messages. For those that didn’t read it, we just convert the EventLog messages to XML, parse the messages using RegEx.Match and then generate some statistics on them. The code I provided on that blog entry appeared to be very fast when it was processing a few records – but as the XML files grew, I started noticing that it seemed to take longer and longer to process the results. So, I did what any developer does initially. We’ll add some instrumentation into our code to see if we can figure out the problem. This usually takes the form of something like the following:

Console.WriteLine("Start:  Load XDocument." + DateTime.Now.ToString());
XDocument document = XDocument.Load(@"C:\Work\TestData\AllEVTX\MyData.xml");
Console.WriteLine("End:  Load XDocument." + DateTime.Now.ToString());

Console.WriteLine("Start:  Load Messages into object." + DateTime.Now.ToString());
var messages = from message in document.Descendants("Message")
               select EventLogMessage.Load(message.Value);
Console.WriteLine("End: Load Messages into object." + DateTime.Now.ToString());

Console.WriteLine("Start:  Query Objects." + DateTime.Now.ToString());
var results = from log in messages
              group log by log.Exceptiontype into l
              orderby l.Count() descending, l.Key
              select new
              {
                  ExceptionType = l.Key,
                  ExceptionCount = l.Count()
              };
Console.WriteLine("End:  Query Objects." + DateTime.Now.ToString());

Console.WriteLine("Start:  Output Exception Type Details." + DateTime.Now.ToString());
foreach (var result in results)
{
    Console.WriteLine("{0} : {1} time(s)",
                      result.ExceptionType,
                      result.ExceptionCount);
}
Console.WriteLine("End:  Output Exception Type Details." + DateTime.Now.ToString());

As you can see, all I’ve done is taken the code and slapped some Console.WriteLine statements with a DateTime.Now call to find out when the operation starts and when it completes. If I run this code I get the following timings:

Start: Load XDocument.9/19/2010 11:28:08 AM
End: Load XDocument.9/19/2010 11:28:09 AM
Start: Load Messages into object.9/19/2010 11:28:09 AM
End: Load Messages into object.9/19/2010 11:28:09 AM
Start: Query Objects.9/19/2010 11:28:09 AM
End: Query Objects.9/19/2010 11:28:09 AM
Start: Output Exception Type Details.9/19/2010 11:28:09 AM
End: Output Exception Type Details.9/19/2010 11:28:49 AM

This is clearly a problem Now, the question is why?

The Research

This might lead you to believe that the simple loop I have to get the ExceptionType and the ExceptionCount was the root of all evil, so to speak. The problem, though, is that there’s not really a lot you can do to improve this:

foreach (var result in results)
{
    Console.WriteLine("{0} : {1} time(s)", 
                      result.ExceptionType, 
                      result.ExceptionCount);             
}

Oh sure, you could use the excellent Parallel functionality in .NET 4 to send off the work to alternate threads. So, you might do something like this:

Parallel.ForEach(results, result=> {

    Console.WriteLine("{0} : {1} times(s)", 
                      result.ExceptionType, 
                      result.ExceptionCount);

});

But if you re-run the test, you get the following results:

Start: Load XDocument.9/19/2010 11:46:20 AM
End: Load XDocument.9/19/2010 11:46:20 AM
Start: Load Messages into object.9/19/2010 11:46:20 AM
End: Load Messages into object.9/19/2010 11:46:20 AM
Start: Query Objects.9/19/2010 11:46:20 AM
End: Query Objects.9/19/2010 11:46:20 AM
Start: Output Exception Type Details.9/19/2010 11:46:20 AM
End: Output Exception Type Details.9/19/2010 11:46:54 AM

Wait a second…is it taking longer now??? That’s just not right. At this point, you should take a step back and tell yourself “STOP GUESSING AND JUST USE THE PROFILER.” So, always being one to listen to myself, I kick off the Visual Studio profiler and immediately it shows us the “hot path”. This was actually a nice improvement in the VS2010 profiler. Right on the summary page, it will show us our most expensive call paths:

But wait, it’s pointing to the RegEx.Match(…). Why is it pointing to that? From my own metrics above, I see that the loading of the Message strings into an object takes less than a second to execute. Well, the real reason is that LINQ uses kind of a lazy loading algorithm. Basically, it won’t necessarily process your queries until you try to do something with the data. Yes, that’s an over-simplification but in our case, it means that my EventLogMessage.Load(…) method won’t be called until I actually try to do something with the data. So, now, armed with this information, I can take a look at my Load() method and see what it’s actually doing and how it’s using the RegEx.Match(…) functionality:

Match myMatch = s_regex.Match(rawMessageText);

EventLogMessage message = new EventLogMessage();

message.Eventcode = myMatch.Groups["Eventcode"].Value;
message.Eventmessage = myMatch.Groups["Eventmessage"].Value;
message.Eventtime = myMatch.Groups["Eventtime"].Value;
message.EventtimeUTC = myMatch.Groups["EventtimeUTC"].Value;
...

return message;

So, basically we’re using the Match(…) to take that Message property and parse it out into the properties of the EventLogMessage object. The fact that this is the slowest part of the code shouldn’t necessarily shock you. Jeff Atwood wrote up a good blog entry a few years ago on something like this. If we look just at the RegEx.Match(…) in the profiler, we see that the problem isn’t necessarily with each Match call but the overall cost with 10,000+ calls:

Function Name

Number of Calls

Min Elapsed Inclusive Time

Avg Elapsed Inclusive Time

Max Elapsed Inclusive Time

System.Text.RegularExpressions.Regex.Match(string)

10,881

1.01

3.27

684.29

The Fix

So, now that we know the problem and the reason for it – what’s a Dev to do? Well, at this point, we should be thinking of an alternate way of performing this parsing without using RegEx. The simplest method is just to use some general string parsing, so we’ll start off with replacing our large RegEx pattern with a simple string array:

public static readonly string[] ParserString = new string[] {
            @"Event code:" ,
            @"Event message:" ,
            @"Event time:" ,
            @"Event time (UTC):" ,
            @"Event ID:" ,
            ...
            @"Stack trace:" ,
            @"Custom event details:"};

This array will be used in our fancy new GetValue method:

private static string GetValue(string rawMessageText, int Key)
{
    int startLoc = rawMessageText.IndexOf(ParserString[Key]);
    int endLoc;
    if (Key + 1 == ParserString.Length)
        endLoc = rawMessageText.Length;
    else
        endLoc = rawMessageText.IndexOf(ParserString[Key + 1], startLoc);

    return rawMessageText.Substring(startLoc + ParserString[Key].Length, endLoc - startLoc - ParserString[Key].Length);
}

This method accepts our raw message string and a key index. This method just finds the existence of a string like “Event message:” in our raw message and then finds the index of the next string, like “Event time:” and subtracts the two to get at the value of the field. For example, given the following string:

“… Event message: An unhandled exception has occurred. Event time: …”

The red text are the keys and the highlighted blue text is the string between them. The idea for the above GetValue(…) method was actually provided by a fellow PFE engineer, Richard Lang during a late night chat session.

The last step to this process is just to call the GetValue(…) method from our new Load method:

public static EventLogMessage Load(string rawMessageText)
{
    EventLogMessage message = new EventLogMessage();

    int Key = 0;
    message.Eventcode = GetValue(rawMessageText, Key++);
    message.Eventmessage = GetValue(rawMessageText, Key++); ;
    message.Eventtime = GetValue(rawMessageText, Key++); ;
    ...
    return message;

}

So, now we’ve essentially removed the need for RegEx by implementing our own string parsing algorithm. Once we compile the code and run our application again, we see some major improvements:

Start: Load XDocument.9/19/2010 12:55:31 PM
End: Load XDocument.9/19/2010 12:55:31 PM
Start: Load Messages into object.9/19/2010 12:55:31 PM
End: Load Messages into object.9/19/2010 12:55:31 PM
Start: Query Objects.9/19/2010 12:55:31 PM
End: Query Objects.9/19/2010 12:55:31 PM
Start: Output Exception Type Details.9/19/2010 12:55:31 PM
End: Output Exception Type Details.9/19/2010 12:55:33 PM

We essentially improved the processing of our records from 40 seconds to under 2 seconds. I’d say that’s a pretty big improvement. Even better, we can use our Parallel.ForEach(…) code from above to make this even faster since we’re no longer bound by the RegEx parser:

Start: Load XDocument.9/19/2010 1:00:13 PM
End: Load XDocument.9/19/2010 1:00:13 PM
Start: Load Messages into object.9/19/2010 1:00:13 PM
End: Load Messages into object.9/19/2010 1:00:13 PM
Start: Query Objects.9/19/2010 1:00:13 PM
End: Query Objects.9/19/2010 1:00:13 PM
Start: Output Exception Type Details.9/19/2010 1:00:13 PM
End: Output Exception Type Details.9/19/2010 1:00:14 PM

So now it takes just about a second to process these records. Considering it’s processing over 10,000 event log messages, I’d say this is acceptable performance, for now.

Closing Comments

I just want to say a few things real quick. RegEx is not inherently evil. It is still one of the fastest and easiest methods to consume and parse data. You should not feel the need to go back to your own applications that are working just fine and refactor all of your code to strip out the RegEx expressions. It just so happens that sometimes too much of a good thing can be bad for your health. In our case, with over 10,000 RegEx.Match(…) calls in rapid succession, the RegEx appeared to be our bottleneck. This may or may not be the root cause of your own performance problems. The key takeaway from this blog entry should be that you should NOT guess when it comes to optimizing code paths. Instead, you should use the tools available at your disposal to find the bottleneck.

Until next time.

Comments [0] | Trackback |

Saturday, 11 September 2010

Optimizing your life: E-Mail Edition

Saturday, 11 September 2010 20:43:24 (Central Daylight Time, UTC-05:00) ( Life | Productivity )

As with most people, I hate to do the same thing more than once. In-fact, I had a job in college where I audited expense reports and then had to enter them into an application so they could be paid. The auditing of the expense report was actually fun because I got to see some interesting things people would attempt to pass off as “business expenses” – like the guy that tried to expense an entire addition to his house claiming that it was for his new office. That said, I absolutely HATED the data entry part. It was repetitive and boring and I just couldn’t understand why the person submitting expense reports couldn’t do the data entry into some application (they were already using Excel) and then electronically submit the data into our system for review and payment. That led me to design and build a simple application (in my off-time) to do exactly that. It was fairly sophisticated (for a college student doing something in his down time). The project was a tremendous success and it increased efficiency on my team to the Nth degree because it cut out steps, reduced the volume of duplicate data entry and everything was online. It was just a better workflow. I had optimized a fairly simple business process and I was seeing the tremendous dividends that such a project could have in the business world. In-fact, it was so successful and my team became so efficient that I eventually got “let go” because there just wasn’t enough work to go around anymore and the other folks on the team were permanent employees and I was a lowly hourly worker.

So, while that situation didn’t turn out so well for me, I became addicted to optimization – be it in software or my own personal life. And I’ve carried that lesson forward into the years since. For example, a few years back, I wrote a Quick Link Plugin for Windows Live Writer. It was a very simple plugin that would open up a form in Windows Live Writer to do a web search and then quickly insert the link into your blog post. This was another way that I was optimizing my (digital) life in that I hated having to leave the WLW environment to look something up and then copying/pasting the link back into the WLW window. It’s painful, repetitive and just plain wasteful.

Scott Hanselman blogged a bit ago about whether recipients of your e-mails deserve the gift of your keystrokes. I found it an interesting read and I highly recommend it. As an alternative to reaching only one person, he recommended taking it an extra step and blogging about it or posting it to a wiki and then sending the link to the individual. It’s a great idea but in my role, I’m often asked similar questions over and over and the customer doesn’t want to wade through a list of links to get the answer they need. Or, taking it one step further – once you blog something once, how do you retain that link somewhere so you don’t have to go hunting around for the URL when you need it later on down the line.

My Solution (for now)

The solution that I’ve used for years is to put these resources and snippets in my Outlook Signature list. If you’re not aware of Outlook signatures, well, I can’t really help you. Kidding! Signatures have been in Outlook for years and have been used primarily to append contact information and what-not into your e-mail. It always struck me as kind of odd that Outlook allows you to add so many signatures when really, how many ways to say “I’m so and so and here is my information” could you possibly have? So, I’ve decided to repurpose that list for my own uses. I will essentially add a new signature for each important blurb/link/etc. to my list. Then, when I am deeply into writing an e-mail and I get a question I’ve answered before, like:

“How can I turn on ValidateIntegratedModeConfiguration for all of the sites on my server and not allow my applications to override it?”

I can simply go to my signature list and select that item and it will magically appear in my e-mail:

Less clicks and key presses for me, the person gets their response quicker – it’s a win-win all around. I’ve even prefixed my most frequently used signatures at the top with an “_” in the name since I don’t want to go hunting for the stuff I use frequently. You may have also noticed that I have one called “_Expense Report”. This is just a simple e-mail that I send to my boss when I have some receipts I need to send her so I can get my expense reports approved. Yes, it’s likely a micro-optimization but if it shaves about 2 minutes off each of those e-mails I send:

2 minute savings * 4 expense reports/month * 12 months/year = 96 minutes savings/year

This makes it totally worth it for me. I can’t wait until we put something like the Visual Studio “Snippet” functionality in Outlook or someone writes an Evernote add-in for Outlook to allow me to search and insert text from my notes. That would just make my life complete.

Until next time…

Comments [0] | Trackback |

Friday, 03 September 2010

Exposing any class as a WCF endpoint…

Friday, 03 September 2010 01:41:11 (Central Daylight Time, UTC-05:00) ( .NET 4 | WCF )

Often, I’ll have a customer that wants to utilize a new feature in the .NET framework but they don’t necessarily know where to start or they have a large investment in another area and they’re afraid of the amount of rework required to get something working. This week, for example, I was teaching a customer about WPF and they asked:

“We have several class libraries and we’d like to expose them as WCF services. How can we do it?”

Yes, it was a bit off-topic from WPF but not really. I’ve found that when a customer chooses to start investing in a newer technology, like WPF, I’ve found that they also want to look at ways to revamp their existing applications to use other new features. But I digress…

In pre-.NET 4.0, if you wanted to expose a class as a WCF endpoint and make it available in IIS as an SVC file, you would have to do the following:

Create a new WCF Service (in the same category as WebSite).
Add a bunch of new “WCF Service” classes (files with an extension of “.SVC”).
Write some code to call out to your existing class library from the WCF Service.

It’s not exactly a painful experience but would require a bit of work to get everything up and running. In many cases, if you have a huge investment in something like a class library, the amount of work may be incredibly daunting. Fortunately, using some of the features in WCF 4 (part of .NET 4), this type of story is ridiculously easy. The steps that we’ll follow are the following:

Decorate your existing class.
Add configuration.
Deploy to IIS

Then, I’ll discuss additional options for debugging your new service.

Step 1: Decorate your existing class

So, I’m sure we all have a class in some Class Library somewhere that looks like this:

public class MyBusinessLogic
{

    public string GetName(int id)
    {

        return "My ID is: " + id;

    }

    public MyDataClass GetEmpInformation(int id)
    {

        return new MyDataClass() 
                    { 
                        FirstName = "Greg", 
                        LastName = "Varveris", 
                        Occupation = "Developer" 
                    };


    }
}

This is just a simple little business logic class that exposes two public methods – GetName and GetEmpInformation. So, let’s say we wanted to make this class accessible as a WCF service. The first thing we need to do is add a reference to the .NET 4 version of System.ServiceModel in our Class Library project.

I’ve included a screenshot to our left just to further show that there is no special sauce here. All of the assemblies being referenced in this project are straight up class library defaults. Now that we have the assembly referenced, let’s decorate our class with two attributes:

ServiceContract – This decorates our class and tells .NET that we have a WCF Service.
OperationContract – This decorates our methods and tells .NET which methods/functions we want to expose in our service.

The decorated class will now look like:

[ServiceContract]
public class MyBusinessLogic
{

    [OperationContract]
    public string GetName(int id)
    {

        return "My ID is: " + id;

    }

    [OperationContract]
    public MyDataClass GetEmpInformation(int id)
    {

        return new MyDataClass() 
                    { 
                        FirstName = "Greg", 
                        LastName = "Varveris", 
                        Occupation = "Developer" 
                    };


    }
}

And really that’s all you have to do from the code side.

Step 2: Add Configuration

WCF configuration has always been synonymous with “WOAH” before. Even using the great configuration editor tool, you may end up with a large amount of XML in your .config file. This is mostly because WCF is just so darn extensible. Well, for our scenario, we’ll use some great new features in WCF 4 to make the configuration a breeze:

File-less Activation
Default Configuration model

You can read-up about each of these features in the nice MSDN documentation. Basically, WCF 4 exposes the ability to map a file-name to a class and then it can expose a default endpoint (using BasicHttpBinding). For our scenario my entire config file is:

<configuration>
  <system.serviceModel>
    <serviceHostingEnvironment>
      <serviceActivations>
        <add relativeAddress="MyBusinessLogic.svc" service="WCFSample.Library.MyBusinessLogic"/>
      </serviceActivations>
    </serviceHostingEnvironment>
      <behaviors>
        <serviceBehaviors>
          <behavior>
            <serviceMetadata httpGetEnabled="true"/>
          </behavior>
        </serviceBehaviors>
      </behaviors>
  </system.serviceModel>
</configuration>

That’s it. In the first snippet, we are adding a relative address of “MyBusinessLogic.svc” which will point to our MyBusinessLogic class we defined above. The second section along the bottom is just to enable the service metadata.

The only minor manual snafu you’ll need to handle is that Visual Studio won’t let you add a web.config to a Class Library project. This can be easily remedied in a variety of ways. I just added an app.config to my project and then changed the name from “app.config” to “web.config”. Then, you just need to tell it to Copy to the output directory. This makes our project look like the following:

Step 3: Deploy to IIS

Now that we have our minuscule configuration and trivial decorations to our class, we need to host our service somewhere. For this, we’ll just use IIS with an ASP.NET 4.0 application pool. The structure of this folder will need to have the assemblies in a “bin” folder and the web.config in the root.

The bin folder is required so that the IIS engine can find your assembly. Once you configure this in IIS, you can simply navigate to your service (make sure you include the SVC in your url) in a browser and you’ll see the glorious WCF Service start page. For example, on my machine, when I navigate to: http://localhost/SampleWCFService/MyBusinessLogic.svc, I’ll see:

Debugging your service

So you have created your service, added configuration to it and deployed it to your hosting environment and everything is running smoothly. What if you want to debug the service in Visual Studio? Well, you really have 3 different options:

Attach to the instance of w3wp hosting your WCF service.
Cassini…errr, the ASP.NET Development Server.
IIS Express

I personally prefer either options 2 or 3 as it provides that seamless debugging experience we all crave where we just hit the “Green Arrow” to start the debugging session. With either option you select, we’ll need to make one or two minor configuration changes.

First, we’ll need to tell the solution to start our process. For this, you can right-click on your solution file in the Solution explorer and select “Properties”. In this window, you’ll need to select the Multiple Startup Projects radio button and set both of your projects (the Client and the Service) to the “Start” option.

Second, you’ll need to add a pre/post-build step for your service to structure the output folder correctly. Since we will deleting the contents of the “Bin” folder wherever our assemblies are being output, the command we will use is:

if exist "$(TargetDir)bin" del /q "$(TargetDir)bin\*.*"

Then, for our post-build step, we will need to create the bin folder if it doesn’t exist and then copy over our DLL’s into that bin folder. Those commands will be:

if not exist "$(TargetDir)bin" mkdir "$(TargetDir)bin"
copy "$(TargetDir)\*.dll" "$(TargetDir)\bin"
copy "$(TargetDir)\*.pdb" "$(TargetDir)\bin"
del "$(TargetDir)\*.dll"
del "$(TargetDir)\*.pdb"

That will take our output directory from zero to hero quite nicely:

Now that the prerequisites are met, we just need to point our local hosting tool (either the Development Server or IIS Express) at that folder. We’ll do this in the Debug tab of the project properties. Set the “Start External Program” radio button and set the path to:

C:\Program Files (x86)\Common Files\microsoft shared\DevServer\10.0\WebDev.WebServer40.EXE

Then, in the Command Line Arguments, specify the following:

/port:12345
/path:"<path to folder containing bin & web.config>"
/vpath:/myservice

Then, you can set a Service Reference on whatever vpath you chose above. For example, my service reference points to: http://localhost:12345/myservice/MyBusinessLogic.svc

Once the service reference is set, you can just hit the fancy green arrow to start debugging:

Enjoy!

Comments [3] | Trackback |

Sunday, 29 August 2010

Parsing ASP.NET event log error messages for fun and profit

Sunday, 29 August 2010 17:10:07 (Central Daylight Time, UTC-05:00) ( ASP.NET | Best Practice | Development )

Sometimes a customer will ask me to look at their site and make some recommendations on what can be improved. One of the many things I’ll look at is their event logs. One of the nice things about ASP.NET is that when you encounter an unhandled exception, an event will be placed into your Application event log. The message of the event log entry will usually include lots of good stuff like the application, path, machine name, exception type, stack trace, etc. Loads of great stuff and all for free. For customers that don’t have a centralized exception logging strategy, this can be a gold mine.

The way it usually works is that they will provide me an EVTX from their servers. If you’re not aware, an EVTX is just an archive of the events from the event log you specify. By itself, looking at the raw event logs from a server can be quite daunting. There are usually thousands of entries in the event log and filtering down to what you actually care about can be exhausting. Even if you do find a series of ASP.NET event log messages, the problem has always been – how do you take all of this great information that’s just dumped into the Message property of the event log entry and put it into a format you can easily report on, generate statistics, etc. Fortunately, I have a non-painful solution.

I’ve broken this down into a relatively simple 4-step process:

Get the EVTX
Generate a useful XML file
Parse into an object model
Analyze and report on the data

Let’s get to it.

Step 1: Get the EVTX

This step is pretty short and sweet. In the Event Log manager, select the “Application” group and then select the “Save All Events As…” option.

That will produce an EVTX file with whatever name you specify. Once you have the file, transfer it to your machine as you generally do not want to install too many tools in your production environment.

Step 2: Generate a useful XML file

Now that we have the raw EVTX file, we can get just the data we care about using a great tool called LogParser. Jeff Atwood did a nice little write-up on the tool but simply put it’s the Swiss Army knife of parsing tools. It can do just about anything data related you would wish using a nice pseudo-SQL language. We’ll use the tool to pull out just the data from the event log we want and dump it into an XML file. The query that we can use for this task is daunting in its simplicity:

SELECT Message INTO MyData.xml
FROM ‘*.evtx’
WHERE EventID=1309

The only other thing we need to tell LogParser is the format in which it the data is coming in and the format to put it into. This makes our single command the following:

C:\>logparser -i:EVT -o:XML
        "SELECT Message INTO MyData.xml FROM ‘*.evtx’ WHERE EventID=1309"

This will produce a nice XML file that looks something like the following:

<?xml version="1.0" encoding="ISO-10646-UCS-2" standalone="yes" ?>
<ROOT DATE_CREATED="2010-08-29 06:04:20" CREATED_BY="Microsoft Log Parser V2.2">
 <ROW>
  <Message>Event code: 3005 Event message: An unhandled exception has occurred...  
  </Message>
 </ROW>
 ...
</ROOT>

One thing that you may notice is that all of the nicely formatted data from our original event log message is munged together into one unending string. This will actually work in our favor but more on that in the next step.

Step 3: Parse into an object model

So, now that we have an XML file with all of our event details, let’s do some parsing. Since all of our data is in one string, the simplest method is to apply a RegEx expression with grouping to grab the data we care about.

In a future post, I’ll talk about a much faster way of getting this type of data without a RegEx expression. After all, refactoring is a way of life for developers.

private const string LargeRegexString = @"Event code:(?<Eventcode>.+)" +
            @"Event message:(?<Eventmessage>.+)" +
            @"Event time:(?<Eventtime>.+)" +
            @"Event time \(UTC\):(?<EventtimeUTC>.+)" +
            @"Event ID:(?<EventID>.+)" +
            @"Event sequence:(?<Eventsequence>.+)" +
            @"Event occurrence:(?<Eventoccurrence>.+)" +
            @"Event detail code:(?<Eventdetailcode>.+)" +
            @"Application information:(?<Applicationinformation>.+)" +
            @"Application domain:(?<Applicationdomain>.+)" +
            @"Trust level:(?<Trustlevel>.+)" +
            @"Full Application Virtual Path:(?<FullApplicationVirtualPath>.+)" +
            @"Application Path:(?<ApplicationPath>.+)" +
            @"Machine name:(?<Machinename>.+)" +
            @"Process information:(?<Processinformation>.+)" +
            @"Process ID:(?<ProcessID>.+)" +
            @"Process name:(?<Processname>.+)" +
            @"Account name:(?<Accountname>.+)" +
            @"Exception information:(?<Exceptioninformation>.+)" +
            @"Exception type:(?<Exceptiontype>.+)" +
            @"Exception message:(?<Exceptionmessage>.+)" +
            @"Request information:(?<Requestinformation>.+)" +
            @"Request URL:(?<RequestURL>.+)" +
            @"Request path:(?<Requestpath>.+)" +
            @"User host address:(?<Userhostaddress>.+)" +
            @"User:(?<User>.+)" +
            @"Is authenticated:(?<Isauthenticated>.+)" +
            @"Authentication Type:(?<AuthenticationType>.+)" +
            @"Thread account name:(?<Threadaccountname>.+)" +
            @"Thread information:(?<Threadinformation>.+)" +
            @"Thread ID:(?<ThreadID>.+)" +
            @"Thread account name:(?<Threadaccountname>.+)" +
            @"Is impersonating:(?<Isimpersonating>.+)" +
            @"Stack trace:(?<Stacktrace>.+)" +
            @"Custom event details:(?<Customeventdetails>.+)";

Now that we have our RegEx, we’ll just write the code to match it against a string and populate our class. While I’ve included the entire regex above, I’ve only included a partial implementation of the class population below.

public class EventLogMessage
    {

        private static Regex s_regex = new Regex(LargeRegexString, RegexOptions.Compiled);

        public static EventLogMessage Load(string rawMessageText)
        {

            Match myMatch = s_regex.Match(rawMessageText);
            EventLogMessage message = new EventLogMessage();
            message.Eventcode = myMatch.Groups["Eventcode"].Value;
            message.Eventmessage = myMatch.Groups["Eventmessage"].Value;
            message.Eventtime = myMatch.Groups["Eventtime"].Value;
            message.EventtimeUTC = myMatch.Groups["EventtimeUTC"].Value;
            message.EventID = myMatch.Groups["EventID"].Value;
            message.Eventsequence = myMatch.Groups["Eventsequence"].Value;
            message.Eventoccurrence = myMatch.Groups["Eventoccurrence"].Value;
            ...
            return message;
        }

        public string Eventcode { get; set; }
        public string Eventmessage { get; set; }
        public string Eventtime { get; set; }
        public string EventtimeUTC { get; set; }
        public string EventID { get; set; }
        public string Eventsequence { get; set; }
        public string Eventoccurrence { get; set; }
        ...
    }

The last step is just to read in the XML file and instantiate these objects.

XDocument document = XDocument.Load(@"<path to data>\MyData.xml");

var messages = from message in document.Descendants("Message")
               select EventLogMessage.Load(message.Value);

Now that we have our objects and everything is parsed just right, we can finally get some statistics and make sense of the data.

Step 4: Analyze and report on the data

This last step is really the whole point of this exercise. Fortunately, now that all of the data is an easily query’able format using our old friend LINQ, the actual aggregates and statistics are trivial. Really, though, everyone’s needs are going to be different but I’ll provide a few queries that might be useful.

Query 1: Exception Type Summary

For example, let’s say you wanted to output a breakdown of the various Exception Types in your log file. The query you would use for that would be something like:

var results = from log in messages
              group log by log.Exceptiontype into l
              orderby l.Count() descending, l.Key
              select new
              {
                  ExceptionType = l.Key,
                  ExceptionCount = l.Count()
              };

foreach (var result in results)
{

    Console.WriteLine("{0} : {1} time(s)", 
                      result.ExceptionType, 
                      result.ExceptionCount);

}

This would then output something like:

WebException : 15 time(s)
InvalidOperationException : 7 time(s)
NotImplementedException : 2 time(s)
InvalidCastException : 1 time(s)
MissingMethodException : 1 time(s)

Query 2: Exception Type and Request URL Summary

Let’s say that you wanted to go deeper and get the breakdown of which URL’s generated the most exceptions. You can just expand that second foreach loop in the above snippet to do the following:

foreach (var result in results)
{

    Console.WriteLine("{0} : {1} time(s)", 
                      result.ExceptionType, 
                      result.ExceptionCount);

    var requestUrls = from urls in messages
                       where urls.Exceptiontype == result.ExceptionType
                       group urls by urls.RequestURL.ToLower() into url
                       orderby url.Count() descending, url.Key
                       select new
                       {
                           RequestUrl = url.Key,
                           Count = url.Count()
                       };

    foreach (var url in requestUrls){

        Console.WriteLine("\t{0} : {1} times ",
                          url.RequestUrl,
                          url.Count);
    }                
}

This then would produce output like this:

WebException  : 15 time(s)
        http://localhost/menusample/default.aspx  : 11 times
        http://localhost:63188/menusample/default.aspx  : 4 times
InvalidOperationException  : 7 time(s)
        http://localhost:63188/menusample/default.aspx  : 6 times
        http://localhost/menusample/default.aspx  : 1 times
NotImplementedException  : 2 time(s)
        http://localhost/samplewebsiteerror/default.aspx  : 2 times
InvalidCastException  : 1 time(s)
        http://localhost:63188/menusample/default.aspx  : 1 times
MissingMethodException  : 1 time(s)
        http://localhost:63188/menusample/default.aspx  : 1 times

Query 3: Exception Type, Request URL and Method Name Summary

You can even go deeper, if you so desire, to find out which of your methods threw the most exceptions. For this to work, we need to make a slight change to our EventLogMessage class to parse the Stack Trace data into a class. First, we’ll start with our simple little StackTraceFrame class:

public class StackTraceFrame
{
    public string Method { get; set; }

}

Second, add a new property to our EventLogMessage class to hold a List<StackTraceFrame>:

public List<StackTraceFrame> StackTraceFrames { get; set; }

Lastly, add a method (and its caller) to parse out the stack frames and assign the resulting List to the StackTraceFrames property mentioned above:

public EventLogMessage(string rawMessageText)
{
    Match myMatch = s_regex.Match(rawMessageText);
    ...
    Stacktrace = myMatch.Groups["Stacktrace"].Value;
    ...
    StackTraceFrames = ParseStackTrace(Stacktrace);
}

private List<StackTraceFrame> ParseStackTrace(string stackTrace)
{
    List<StackTraceFrame> frames = new List<StackTraceFrame>();
    string[] stackTraceSplit = stackTrace.Split(new string[] { " at " }, 
                                        StringSplitOptions.RemoveEmptyEntries);
    foreach (string st in stackTraceSplit)
    {
        if (!string.IsNullOrEmpty(st))
        {
            frames.Add(new StackTraceFrame() { Method = st });
        }
    }
    return frames;
}

Please Note: You could enhance the ParseStackTrace(…) method to parse out the source files, line numbers, etc. I’ll leave this as an exercise for you, dear reader.

Now that we have the infrastructure in place, the query is just as simple. We’ll just nest this additional query inside of our URL query like so:

foreach (var url in requestUrls){

    Console.WriteLine("\t{0} : {1} times ",
                      url.RequestUrl,
                      url.Count);

    var methods = from method in messages
                  where string.Equals(method.RequestURL, 
                                      url.RequestUrl, 
                                      StringComparison.InvariantCultureIgnoreCase) 
                        &&
                        string.Equals(method.Exceptiontype, 
                                      result.ExceptionType, 
                                      StringComparison.InvariantCultureIgnoreCase)
                  group method by method.StackTraceFrames[0].Method into mt
                  orderby mt.Count() descending, mt.Key
                  select new
                  {
                      MethodName = mt.Key,
                      Count = mt.Count()
                  };

    foreach (var method in methods)
    {
        Console.WriteLine("\t\t{0} : {1} times ",
                          method.MethodName,
                          method.Count
                          );
    }
}

This would then produce output like the following:

WebException  : 15 time(s)
         http://localhost/menusample/default.aspx  : 11 times
                System.Net.HttpWebRequest.GetResponse() : 11 times
         http://localhost:63188/menusample/default.aspx  : 4 times
                System.Net.HttpWebRequest.GetResponse() : 4 times
InvalidOperationException  : 7 time(s)
         http://localhost:63188/menusample/default.aspx  : 6 times
                System.Web.UI.WebControls.Menu... : 6 times
         http://localhost/menusample/default.aspx  : 1 times
                System.Web.UI.WebControls.Menu... : 1 times

One last thing you may notice is that the in the example above, the first frame for each of those exceptions are somewhere in the bowels of the .NET BCL. You may want to filter this out even further to only return YOUR method. This can be accomplished very easily with the method below. It will simply loop through the StackTraceFrame List and return the first method it encounters that does not start with “System.” or “Microsoft.”.

private static string GetMyMethod(List<StackTraceFrame> frames)
{

    foreach (StackTraceFrame frame in frames)
    {

        if (!frame.Method.StartsWith("System.") &&
            !frame.Method.StartsWith("Microsoft."))
            return frame.Method;


    }

    return "No User Code detected.";
}

Then, you can just call that method from the new query we wrote above:

var methods = from method in messages
              where ...
              group method by 
                GetMyMethod(method.StackTraceFrames) into mt
              ...

Finally, with this new snippet in place, we’ll get output like this:

WebException  : 15 time(s)
         http://localhost/menusample/default.aspx  : 11 times
                _Default.Page_Load(Object sender, EventArgs e)...: 8 times
                No User Code detected. : 3 times

         http://localhost:63188/menusample/default.aspx  : 4 times
                _Default.Page_Load(Object sender, EventArgs e)... : 1 times
                No User Code detected. : 1 times
                WebControls.CustomXmlHierarchicalDataSourceView.Select()... : 2 times

As you can see, the sky’s the limit.

Enjoy!

Comments [0] | Trackback |

Sunday, 22 August 2010

PowerShell: Restoring a whole heap of SQL Databases

Sunday, 22 August 2010 11:14:37 (Central Daylight Time, UTC-05:00) ( PowerShell | Scripting )

PowerShell is one of those things that falls into my “other duties as assigned” repertoire. It’s something that I’ve used for years to get things done but it’s not often I encounter a Dev at a customer that has worked with it much. In my honest opinion, I think adoption would increase if the PowerShell studio had intellisense or if it was just another project type in Visual Studio. That’s purely my opinion and is not a reflection on my employer in any way.

In any case, when I get pinged to help someone on a PowerShell task, I’ll usually jump on it. A customer at a large company came to me with a problem. He wanted to restore about 500 databases from their production environment to a development/staging environment. He found this script that should theoretically get the job done but he was running into a few problems. The problems he was encountering were mostly related to different drive mappings and different logical database names. Of course, we had to use a trick to get to that point.

The Problem

He originally was receiving the following error:

“Exception calling "SqlRestore" with "1" argument(s): "Restore failed for Server '<Server2>'. "

At :line:97 char:20

+ $restore.SqlRestore <<<< ($instance)

The line the error above references is the line where the Restore action is called to be executed and the character is the 'e' on $restore.SQLRestore'”

While it’s not a very informative message, we can get more information by running the command:

$error[0]|format-list –force

This provides a lot more information on what the cause of the error is. For example, when I force an error on the SqlRestore method, I might get the above error message by default but once I execute the above command, I’ll see:

“Exception : System.Management.Automation.MethodInvocationException: Exception calling "SqlRestore" with "1" argument(s): "Restore failed for Server 'GREGVAR1\SQLEXPRESS2'. " ---> Microsoft.SqlServer.Management.Smo.FailedOperationException: Restore failed for Server 'GREGVAR1\SQLEXPRESS2'. ---> Microsoft. SqlServer.Management.Common.ConnectionFailureException: Failed to connect to server GREGVAR1\SQLEXPRESS2. ---> System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified)…”

So, after he ran the magic command, the errors he was actually experiencing were the following:

The script expects the database’s logical name to be the file name of the database.
Some of the databases had a full text catalog associated with it. The script was not handling the relocation of the full text catalog either.

Many thanks to Michiel Wories for initially introducing me to that really cool command a year or so ago.

The Research

So, once we knew the problem, we needed a method to get additional information from the backup file. Enter the Restore.ReadFileList(…) method. You can use this method like the following:

 $server = New-Object("Microsoft.SqlServer.Management.Smo.Server") 
                                $instance
 $backupDevice = New-Object("Microsoft.SqlServer.Management.Smo.BackupDeviceItem") 
                                                           ($restorefile, "File")            
 $smoRestore = new-object("Microsoft.SqlServer.Management.Smo.Restore")            
             
 #restore settings            
 $smoRestore.NoRecovery = $false;            
 $smoRestore.ReplaceDatabase = $true;            
 $smoRestore.Action = "Database"            
 $smoRestore.PercentCompleteNotification = 10;            
 $smoRestore.Devices.Add($backupDevice)                                     
            
 # Get Database Logical File Names            
 $sourceLogicalNameDT = $smoRestore.ReadFileList($server)

The ReadFileList gives us a DataTable which contains tons of additional information about the contents of the file we are attempting to restore. Once we have that object, we clearly saw that it contains all of the information we could possibly need:

LogicalName          : foodb
PhysicalName         : T:\…\DB\foo.mdf
Type                 : D
FileGroupName        : PRIMARY
Size                 : 104857600
MaxSize              : 35184372080640
FileId               : 1
CreateLSN            : 0
DropLSN              : 0
UniqueId             : 4992c6f2-2282-4391-851e-d1177ab03920
ReadOnlyLSN          : 0
ReadWriteLSN         : 0
BackupSizeInBytes    : 30081024
SourceBlockSize      : 512
FileGroupId          : 1
LogGroupGUID         :
DifferentialBaseLSN : 1297000000068200037
DifferentialBaseGUID : 69e8b951-7db1-4a1b-b0fc-e2cb012b3bcf
IsReadOnly           : False
IsPresent            : True

LogicalName          : foodb_log
PhysicalName         : R:\…\Logs\foo_log.LDF
Type                 : L
…

LogicalName          : sysft_OtherTables
PhysicalName         : T:\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\OtherTables004e
Type                 : F
…

Now that we have all of this great information, we need to iterate through the rows and put the values we care about into some local variables. For that, we can use the foreach construct:

 $FileType = ""  
          
 foreach($Row in $sourceLogicalNameDT) {            
  # Put the file type into a local variable.            
  # This will be the variable that we use to find out which file            
  # we are working with.            
  $FileType = $Row["Type"].ToUpper()            
             
  # If Type = "D", then we are handling the Database File name.            
  If ($FileType.Equals("D")) {            
   $sourceDBLogicalName = $Row["LogicalName"]            
  }            
  # If Type = "L", then we are handling the Log File name.            
  elseif ($FileType.Equals("L")) {            
   $sourceLogLogicalName = $Row["LogicalName"]              
  }            
  # If Type = "F", then we are handling hte Full Text Catalog File Name.            
  elseif ($FileType.Equals("F")) {            
   $sourceFTSLogicalName = $Row["LogicalName"]            
   # We may also want to grab the full path of the Full Text catalog.            
   $sourceFTSPhysicalName = $Row["PhysicalName"]            
  }            
 }

The Solution

Now, that we have all of the data we need, we can populate the Relocate File objects so that the SqlRestore object will know how to handle these additional files:

 #specify new data and log files (mdf and ldf)            
 $smoRestoreDBFile = New-Object("Microsoft.SqlServer.Management.Smo.RelocateFile")            
 $smoRestoreLogFile = New-Object("Microsoft.SqlServer.Management.Smo.RelocateFile")            
              
 #the logical file names should be the logical filename stored in the backup media                     
 $smoRestoreDBFile.LogicalFileName = $sourceDBLogicalName            
 $smoRestoreDBFile.PhysicalFileName = $mdfFilePath + "\" + $sourceDBLogicalName + ".mdf"            
 $smoRestoreLogFile.LogicalFileName = $sourceLogLogicalName            
 $smoRestoreLogFile.PhysicalFileName = $ldfFilePath + "\" + $sourceLogLogicalName + ".ldf"            
             
 $smoRestore.RelocateFiles.Add($smoRestoreDBFile)            
 $smoRestore.RelocateFiles.Add($smoRestoreLogFile)            
            
 # Check to see if the $SourceFTSLogicalName is empty or not. If its not empty            
 # then we do have a full text catalog present and thus we add the appropriate            
 # entries to restore those files. If the variable is empty we just continue with            
 # the restore.            
 if (![String]::IsNullOrEmpty($sourceFTSLogicalName)) {            
                
     "We DO have a Full Text Catalog in our Backup"            
                    
     # Adding full text catalog restore parameters.            
     $smoRestoreFTSFile = New-Object("Microsoft.SqlServer.Management.Smo.RelocateFile")            
     $smoRestoreFTSFile.LogicalFileName = $sourceFTSLogicalName            
                    
     # Here we specify the new location by truncating the first 45 characters in the path            
     # that is specified on the backup file. This could be done a better way.            
     $smoRestoreFTSFile.PhysicalFileName = $ftsFilePath + "\" + 
                                           $sourceFTSPhysicalName.Substring(45)            
     $smoRestore.RelocateFiles.Add($smoRestoreFTSFile)            
 }

Lastly, we just need to execute the Restore command and clear out the variables for the next iteration of the loop.

 # Restore Database            
 $smoRestore.SqlRestore($server)            
                            
 # We now clear the variables before the next loop starts                        
 Remove-Variable sourceDBLogicalName            
 Remove-Variable sourceLogLogicalName            
 Remove-Variable smoRestoreDBFile            
 Remove-Variable smoRestoreLogFile            
            
 # If a full text catalog was present, we clear those variables too                           
 if (![String]::IsNullOrEmpty($sourceFTSLogicalName)) {                       
     Remove-Variable sourceFTSLogicalName            
     Remove-Variable smoRestoreFTSFile            
 }

And that’s all there is to it. Keep in mind that these are the guts of the script and not the full contents. If there’s interest, I can post the whole script. Just drop a comment.

Enjoy!

Comments [0] | Trackback |

Friday, 20 August 2010

Micro optimization or just good coding practice?

This is a common topic and I thought I’d write up some thoughts I have on it. In-fact, I was just working with a customer on improving their code reviews and what they should be checking for and the question arose - “Should performance be targeted during a code review?” It’s an interesting question. I’m a big fan of performance testing early and often and not waiting until the end of a dev cycle but code reviews, IMO, should focus on logic, maintainability and best practices. I may be in the minority and if you look around the web, you’ll see varying opinions on the topic. For example, one of the PAG articles states:

“Code reviews should be a regular part of your development process. Performance and scalability code reviews focus on identifying coding techniques and design choices that could lead to performance and scalability issues. The review goal is to identify potential performance and scalability issues before the code is deployed. The cost and effort of fixing performance and scalability flaws at development time is far less than fixing them later in the product deployment cycle.

Avoid performance code reviews too early in the coding phase because this can restrict your design options. Also, bear in mind that that performance decisions often involve tradeoffs. For example, it is easy to reduce maintainability and flexibility while striving to optimize code.”

As I mentioned above, I am a huge proponent of performance analysis and optimization many times throughout a typical product development cycle. I can say with a fair amount of certainty that if you don’t build performance reviews into your project plan at regular intervals, you will hit some problem (or multiple problems) in production and have to refactor some code.

Circling back to the original question, though, are code reviews the place for performance analysis? Typically, I’d recommend using them to squash little bits of bad code but maintainability and code-cleanliness should be first and foremost in your minds. That said, if you see a pattern that you know can be improved, by all means bring it up. What’s an example of that type of situation?

Let’s take a look at predicates, specifically their usage in the Find method of a List<T>. If you’re not aware, the Find() method performs a linear search through all of the items until it finds the first match – then it returns. This makes it a O(n) operation where “n” is the number of items in the list. Basically, this means that the more items you have in the list, the longer a Find() operation can potentially take. So, if we slam about 10,000 elements into a list:

private static List<Data> LoadList()
{
    List<Data> myList = new List<Data>();
    for (int i = 0; i < 10000; i++)
    {
        myList.Add(new Data() { Id = "Id" + i.ToString(), 
                                Value = "Value" + i.ToString() });
    }

    return myList;
}

Then, if someone wants to return the instance of the Data class that contains an Id of say “Id10000”, they might write the following code:

static Data Find1(List<Data> myList, string idToFind)
{
    Data data = myList.Find(s => 
                        s.Id.ToLower() == 
                        idToFind.ToLower());

    return data;
}

Now, keep in mind that the predicate is executed for each element in the List<T> until it finds the instance you care about. With that in mind, we would probably want to refactor out the “idToFind.ToLower()” above the predicate since that value isn’t changing. So, you might end-up with something like this:

static Data Find2(List<Data> myList, string idToFind)
{

    idToFind = idToFind.ToLower();

    Data data = myList.Find(s => 
                            s.Id.ToLower() == 
                            idToFind);

    return data;
}

Another route you may want to go is just to use the string.Equals(…) method to perform the comparison. That would look like:

static Data Find3(List<Data> myList, string idToFind)
{

    Data data = myList.Find(s => 
                            string.Equals(
                                s.Id, 
                                idToFind, 
                                StringComparison.
                                    InvariantCultureIgnoreCase)
                           );

    return data;

}

Fact is, the last method IS the fastest way to perform the operation. I can say that without even needing to run it through a profiler. But if you don’t believe me…

Function Name	Elapsed Inclusive Time
...Find1(System.Collections.Generic.List`1<....Data>,string)	6.34
...Find2(System.Collections.Generic.List`1<....Data>,string)	4.47
...Find3(System.Collections.Generic.List`1<....Data>,string)	3.65

That’s something I might put into the category of a micro-optimization AND just good coding practice. But is this something that should be caught during a code review? I’d say “yes” because logically it all makes sense and none of the solutions would really hurt maintainability or readability of the code.

So, I’d tag this as a good coding practice. Other thoughts on the topic?

Enjoy!

Comments [0] | Trackback |

Saturday, 14 August 2010

Fixing the DynamicControlsPlaceholder control – Making the community better

Saturday, 14 August 2010 12:45:39 (Central Daylight Time, UTC-05:00) ( ASP.NET | Development | Premier Field Engineer (PFE) )

11954221481914068549johnny_automatic_mister_fix_it_svg_hi In my job as a PFE for Microsoft, I read, review and fix a lot of code. A lot of code. It’s a large part of what I love about my job. The code is generally written by large corporations or for public websites. Every now and again I’ll get pinged on an issue and after troubleshooting the issue, it’s pretty clear that the core issue is with some community code. When I say community code, in this instance, I don’t mean a CodeProject or CodePlex project. In this case, I am referring to a control that Denis Bauer created and then made available to the community on his website – the “DynamicControlsPlaceholder” control. This is a great little control that inherits from a PlaceHolder and allows you to create dynamic controls on the fly and then it will persist the controls you add on subsequent requests – like a postback.

The Problem

The customer was experiencing a problem that could only be replicated in a web farm when they don’t turn on sticky sessions. They found that when a request went from one server to another server in their farm they would get a FileNotFoundException with the following details:

Type Of Exception:FileNotFoundException
Message:Error on page http://blahblahblah.aspx
Exception Information:System.IO.FileNotFoundException: 
    Could not load file or assembly 'App_Web_myusercontrol.ascx.cc671b29.ypmqvhaw, 
    Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' or one of its dependencies. 
    The system cannot find the file specified.
File name: 'App_Web_myusercontrol.ascx.cc671b29.ypmqvhaw, 
            Version=0.0.0.0, Culture=neutral, PublicKeyToken=null'
      at System.RuntimeTypeHandle.GetTypeByName(String name, 
                                             Boolean throwOnError, 
                                             Boolean ignoreCase, 
                                             Boolean reflectionOnly, 
                                             StackCrawlMark& stackMark)
   ...
   at DynamicControlsPlaceholder.RestoreChildStructure(Pair persistInfo, 
            Control parent)
   at DynamicControlsPlaceholder.LoadViewState(Object savedState) 
   at System.Web.UI.Control.LoadViewStateRecursive(Object savedState)
   ...
   at System.Web.UI.Control.LoadViewStateRecursive(Object savedState)
   at System.Web.UI.Page.LoadAllState()
   at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, 
                                            Boolean includeStagesAfterAsyncPoint)

So, we can gleam a few things from the error details:

They are using the ASP.NET website model (the “app_web_….dll” assembly is the clue).
The error is occurring in the RestoreChildStructure method of the DynamicControlsPlaceholder control.

The Research

The way that ASP.NET Websites work is that each component of your site can be compiled into a separate assembly. The assembly name is randomly generated. This also means that on two servers, the name of the assemblies can end up being different. So, an assumption to make is that something is trying to load an assembly by its name. If we look at the RestoreChildStructure method, we see the following:


    Type ucType = Type.GetType(typeName[1], true, true);

    try
    {
        MethodInfo mi = typeof(Page).GetMethod("LoadControl", 
                            new Type[2] { typeof(Type), typeof(object[]) });
        control = (Control) mi.Invoke(this.Page, new object[2] { ucType, null });
    }
    catch (Exception e)
    {
        throw new ArgumentException(String.Format("The type '{0}' …", 
                                                   ucType.ToString()), e);
    }

The important thing to look at here is the Type.GetType(…) call. Since the code for the control is in a separate assembly from everything else, the “typeName[1]” value MUST BE A FULLY QUALIFIED ASSEMBLY NAME. From the exception details, we can see that it is attempting to load the type from the following string:

App_Web_myusercontrol.ascx.cc671b29.ypmqvhaw, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null

The “typeName[1]” variable is loaded from ViewState because that’s where the control persists its child structure. So, for some reason the fully qualified assembly name is stored in ViewState. If we look at the code that inserts the value into ViewState (in the PersistChildStructure(…) method), we see:


typeName = "UC:" + control.GetType().AssemblyQualifiedName;

So, here we see the AssemblyQualifiedName is being stored into ViewState – which is then used to persist the controls across postback using the above code. As I mentioned, this won’t work with an ASP.NET website hosted in a web farm because the assembly qualified name will probably be different from server to server. We even have a KB article that discusses this issue somewhat.

The Fix

Fortunately, the fix is pretty simple.

First, we need to store the path to the User Control instead of the AQN in ViewState. To do this, you can comment out the “typeName = ….” line from directly above and replace it with:


UserControl uc = control as UserControl;
typeName = "UC:" + uc.AppRelativeVirtualPath;

So, now we store the path to the UserControl in ViewState. Then, we need to fix the code that actually loads the control. Replace the code from above in the RestoreChildStructure(…) method with this code:


string path = typeName[1];

try
{
    control = Page.LoadControl(path);
}
catch (Exception e)
{
    throw new ArgumentException(String.Format(
                                "The type '{0}' cannot be recreated from ViewState", 
                                               path), e);
}

That’s all there is to it. Just load the user control from where it is being stored in the site and ASP.NET will take care of loading the appropriate assembly.

Enjoy!

Comments [0] | Trackback |

Monday, 09 August 2010

OpenXML: How to refresh a field when the document is opened

Monday, 09 August 2010 01:09:32 (Central Daylight Time, UTC-05:00) ( Development | OpenXML )

I was working on an internal project a bit ago and one of the requirements was to implement a fancy Word document. The idea was that all of the editing of the text/code samples/etc. would be done in the application and then the user could just export it to Word to put any finishing touches and send off to the customer. The final report needed to include section headers, page breaks, a table of contents, etc. There are a number of ways we could have accomplished the task. There’s the Word automation stuff that relies upon a COM based API, there’s the method of just creating an HTML document and loading that into Word and then finally there’s the Open XML API. Now, someone had hacked up a version of this export functionality previously using the Word automation stuff but considering we’re often dealing with 1,000+ page documents – it turned out to be a little slow. Also, there are some restrictions around using the automation libraries in a server context. Lastly, since my OpenXML kung-fu is strong, I thought I would take the opportunity to implement a better, more flexible and much faster solution. For those just starting out, Brian and Zeyad’s excellent blog on the topic is invaluable.

One of the requirements for the export operation was to have Word automagically refresh the table of contents (and other fields) the first time the document is opened. This was something that took a bit of time to research but you really end up with 2 options:

w:updateFields Element

The “w:updateFields” element is a document-level element that is set in the document settings part and tells Word to update all of the fields in the document:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<w:settings …>
    <w:updateFields w:val="true" />
    …
</w:settings>

If you’re wondering what the document settings part is – just rename a Word doc from “blah.docx” to “blah.docx.zip” and extract it to a folder on your computer. In the new folder is a directory called “word”. In that directory, you should see a file called “settings.xml”:

In that file are all of the document level settings for your docx. There’s some really great stuff in here.

If you’d like to use the OpenXML SDK to set that value (and you’d be crazy not to), here’s some sample code:

using (WordprocessingDocument document = WordprocessingDocument.Open(path, true))
{

    DocumentSettingsPart settingsPart = 
        document.MainDocumentPart.GetPartsOfType<DocumentSettingsPart>().First();

    // Create object to update fields on open
    UpdateFieldsOnOpen updateFields = new UpdateFieldsOnOpen();
    updateFields.Val = new DocumentFormat.OpenXml.OnOffValue(true);

    // Insert object into settings part.
    settingsPart.Settings.PrependChild<UpdateFieldsOnOpen>(updateFields);
    settingsPart.Settings.Save();
    
}

w:dirty Attribute

This attribute is applied to the field you would like to have refreshed when the document is opened in Word. It tells Word to only refresh this field the next time the document is opened. For example, if you want to apply it to a field like your table of contents, just find the w:fldChar and add that attribute:

<w:r>
  <w:fldChar w:fldCharType="begin" w:dirty="true"/>
</w:r>

For a simple field, like the document author, you’ll want to add it to the w:fldSimple element, like so:

<w:fldSimple w:instr="AUTHOR \* Upper \* MERGEFORMAT"
        w:dirty="true" >
   <w:r>
      ...
   </w:r>
</w:fldSimple>

A caveat or two

Both of these methods will work just fine in Word 2010.

In Word 2007, though, you need to clear out the contents of the field before the user opens the document. For example, with a table of contents, Word will normally cache the contents of the TOC in the fldChar element. This is good, normally, but here it causes a problem.

For example, in a very simple test document, you would see the following cached data (i.e.: Heading 1, Heading 2, etc.):

<w:p w:rsidR="00563999" w:rsidRDefault="00050B09">
  ...
  <w:r>
    <w:fldChar w:fldCharType="begin"/>
  </w:r>
  <w:r w:rsidR="00563999">
    <w:instrText xml:space="preserve"> TOC  \* MERGEFORMAT </w:instrText>
  </w:r>
</w:p>
<w:p w:rsidR="00F77370" w:rsidRDefault="00F77370">
  ...
  <w:r>
    ...
    <w:t>Heading 1</w:t>
  </w:r>
  ...
</w:p>
<w:p w:rsidR="00F77370" w:rsidRDefault="00F77370">
  ...
  <w:r>
    ...
    <w:t>Heading 2</w:t>
  </w:r>
  ...
</w:p>
<w:p w:rsidR="00F77370" w:rsidRDefault="00F77370">
  ...
  <w:r>
    <w:rPr>
      <w:noProof/>
    </w:rPr>
    <w:fldChar w:fldCharType="end"/>
  </w:r>
</w:p>

After you clear out the schmutz, you end up with just the begin element, the definition of the TOC and the end element:

<w:p w:rsidR="00563999" w:rsidRDefault="00563999">
  ...
  <w:r>
    <w:fldChar w:fldCharType="begin"/>
  </w:r>
  <w:r>
    <w:instrText xml:space="preserve"> TOC  \* MERGEFORMAT </w:instrText>
  </w:r>
</w:p>
<w:p w:rsidR="00B63C3C" w:rsidRDefault="00563999" w:rsidP="00B63C3C">
  <w:r>
    <w:fldChar w:fldCharType="end"/>
  </w:r>
  ...
</w:p>

Once you’ve made the updates, you can safely open up your file in Word 2007 and your fields will update when the document opens.

Big thanks for Zeyad for his tip on trimming out the schmutz.

Just to stress, this is improved in Word 2010 and you no longer need to clear out the cached data in your fields.

Enjoy!

Comments [0] | Trackback |

Wednesday, 04 August 2010

Working with the Bing Translator API

Wednesday, 04 August 2010 01:23:21 (Central Daylight Time, UTC-05:00) ( Bing | Development | Google | Online Translator )

An online translator really isn’t all that new. They’ve been around for at least 8 years or so. I remember the days when I would use Babelfish for all of my fun translations. It was a great way to get an immediate translation for something non-critical. The problem in a lot of cases was grammatical correctness. Translating word for word isn’t particularly difficult but context and grammar varies so much between languages that it was always challenging to translate entire sentences, paragraphs, passages, etc. from one language to another.

Fortunately the technology has improved a lot over the years. Now, you can somewhat reliably translate entire web pages from one language to another. I’m not saying it’s without fault – but I am saying that it’s gotten a lot better over time. These days there are a few big players in this space. Notably Google Translate, Babelfish and the Bing Translator. The interesting thing I’ve found is that only Bing actually has a supported API into its translation service.

There are 3 primary ways to interact with the service:

They all seem to expose the same methods but it’s just the way you call them that differs. For example, the sample code published for the HTTP method looks like:

        1: string appId = "myAppId";

       2: string text = "Translate this for me";

       3: string from = "en";

       4: string to = "fr";

       5:  

       6: string detectUri = "http://api.microsofttranslator.com/v2/Http.svc/Translate?appId=" + appId +

       7:     "&text;=" + text + "&from;=" + from + "&to;=" + to;

       8: HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(detectUri);

       9: WebResponse resp = httpWebRequest.GetResponse();

      10: Stream strm = resp.GetResponseStream();

      11: StreamReader reader = new System.IO.StreamReader(strm);

      12: string translation = reader.ReadToEnd();

      13:  

      14: Response.Write("The translated text is: '" + translation + "'.");

Then, for the SOAP method:

       1: string result;

       2: TranslatorService.LanguageServiceClient client = 

       3:                     new TranslatorService.LanguageServiceClient(); 

       4: result = client.Translate("myAppId", 

       5:                           "Translate this text into German", 

       6:                           "en", "de"); 

       7: Console.WriteLine(result);

And lastly for the AJAX method:

       1: var languageFrom = "en";

       2: var languageTo = "es";

       3: var text = "translate this.";

       4:  

       5: function translate() {

       6:     window.mycallback = function(response) { alert(response); }

       7:     

       8:     var s = document.createElement("script");

       9:     s.src = "http://api.microsofttranslator.com/V2/Ajax.svc/Translate?oncomplete=mycallback&appId;=myAppId&from;=" 

      10:                 + languageFrom + "&to;=" + languageTo + "&text;=" + text;

      11:     document.getElementsByTagName("head")[0].appendChild(s);

      12: }

Fortunately, it all works as you’d expect – cleanly and simply. The really nice thing about this (and the Google Translator) is that when faced with straight-up HTML like:

       1: <p class="style">Hello World!</p>

They will both return the following:

       1: <p class="style">¡Hola mundo!</p> 

Both translators will keep the HTML tags intact and only translate the actual text. This undoubtedly comes in handy if you do any large bulk translations. For example, I’m working with another couple of guys here on an internal (one day external) tool that has a lot of data in XML files with markup. Essentially we need to translate something like the following:

       1: <Article Id="this does not get translated" 

       2:            Title="Title of the article" 

       3:            Category="Category for the article"

       4:            >

       5:   <Content><![CDATA[<P>description for the article<BR/>another line </p>]]></Content>

       6: </Article>

The cool thing is that if I just deserialize the above into an object and send the value of the Content member to the service like:

       1: string value = client.Translate(APPID_TOKEN, 

       2:                                 content, "en", "es");

I get only the content of the HTML translated:

       1: <p>Descripción del artículo<br>otra línea</p> 

Pretty nice and easy. One thing all of the translator services have trouble with is if I just try to translate the entire xml element from the above in one shot. Bing returns:

       1: <article id="this does not get translated" 

       2:          title="Title of the article" 

       3:          category="Category for the article">

       4: </article> 

       5:     <content><![CDATA[<P>Descripción del artículo<br>otra línea]]</content> >

And Google returns:

       1: <= Id artículo "esto no se traduce"

       2: Título = "Título del artículo"

       3: Categoría = "Categoría para el artículo">

       4:  

       5: <Content> <! [CDATA [descripción <P> para el artículo <BR/> otra línea </ p >]]>

       6: </ contenido>

       7: </> Artículo

Oh well – I guess no one’s perfect and for now we’ll be forced to deserialize and translate each element at a time.

Enjoy!

Comments [0] | Trackback |

Tuesday, 03 August 2010

IE9: DOM Traversal

Tuesday, 03 August 2010 01:50:24 (Central Daylight Time, UTC-05:00) ( Development )

Really interesting blog post by the IE team on some of the new DOM traversal features in IE9 (and other browsers). Often times, you need to traverse the DOM to find a particular element or series of elements. In the past, you might need to write some recursive JavaScript functions to navigate through the HTML on your page to act upon functions you care about.

Now, in IE9 (and other browsers that follow the W3C spec), you can use node iterators to get a flat list of the elements that you actually care about. For example:

        1: // This would work fine with createTreeWalker, as well

       2: var iter = document.createNodeIterator(elm, 

       3:                                        NodeFilter.SHOW_ELEMENT, 

       4:                                        null, 

       5:                                        false); 

       6:  

       7: var node = iter.nextNode();

       8: while (node = iter.nextNode())

       9: {

      10:     node.style.display = "none";

      11: }

The NodeFilter enum by default allows for the following values (from the w3c spec here - http://www.w3.org/TR/2000/REC-DOM-Level-2-Traversal-Range-20001113/traversal.html#Traversal-NodeFilter):

       1: const unsigned long       SHOW_ALL                       = 0xFFFFFFFF;

       2: const unsigned long       SHOW_ELEMENT                   = 0x00000001;

       3: const unsigned long       SHOW_ATTRIBUTE                 = 0x00000002;

       4: const unsigned long       SHOW_TEXT                      = 0x00000004;

       5: const unsigned long       SHOW_CDATA_SECTION             = 0x00000008;

       6: const unsigned long       SHOW_ENTITY_REFERENCE          = 0x00000010;

       7: const unsigned long       SHOW_ENTITY                    = 0x00000020;

       8: const unsigned long       SHOW_PROCESSING_INSTRUCTION    = 0x00000040;

       9: const unsigned long       SHOW_COMMENT                   = 0x00000080;

      10: const unsigned long       SHOW_DOCUMENT                  = 0x00000100;

      11: const unsigned long       SHOW_DOCUMENT_TYPE             = 0x00000200;

      12: const unsigned long       SHOW_DOCUMENT_FRAGMENT         = 0x00000400;

      13: const unsigned long       SHOW_NOTATION                  = 0x00000800;

While this is great – you can also write your own NodeFilter callback function to filter the results even further:

       1: var iter = document.createNodeIterator(elm, 

       2:                                        NodeFilter.SHOW_ALL, 

       3:                                        keywordFilter, 

       4:                                        false);

       5:  

       6: function keywordFilter(node)

       7: {

       8:  

       9:     var altStr = node.getAttribute('alt').toLowerCase();

      10:     

      11:     if (altStr.indexOf("flight") != -1 || altStr.indexOf("space") != -1)

      12:         return NodeFilter.FILTER_ACCEPT;

      13:     else

      14:         return NodeFilter.FILTER_REJECT;

      15: }

Really nice and can help make your code simpler to read and faster too!

Enjoy!

Comments [0] | Trackback |

I know kung fu

Contact

Categories

On this page

Archive

Blogroll

The Problem

The Research

The Fix

Closing Comments

My Solution (for now)

Step 1: Decorate your existing class

Step 2: Add Configuration

Step 3: Deploy to IIS

Debugging your service

Step 1: Get the EVTX

Step 2: Generate a useful XML file

Step 3: Parse into an object model

Step 4: Analyze and report on the data

Query 1: Exception Type Summary

Query 2: Exception Type and Request URL Summary

Query 3: Exception Type, Request URL and Method Name Summary

The Problem

The Research

The Solution

The Problem

The Research

The Fix

w:updateFields Element

w:dirty Attribute

A caveat or two

Just to stress, this is improved in Word 2010 and you no longer need to clear out the cached data in your fields.