Still optimizing IL

by Tobias Hertkorn on June 28th, 2006

Here are some pointers to great documents that I was looking at, while still trying to find a faster way to make arithmetic operations in safe C#:

Writing High-Performance Managed Applications : A Primer

Effective C#: 50 Specific Ways to Improve Your C#

Writing Faster Managed Code: Know What Things Cost

A Fast Serialization Technique

Post to Twitter Tweet this

June 28th, 2006 3:32 pm | Comments (0)

A detailed look at hbm2net’s CodeDOMRenderer

by Tobias Hertkorn on June 24th, 2006

This is an addition to to previous post Make hbm2net generated code look more like .NET 2.0 (Part 3).

I sent a message to the nhibernate developer mailing list and Sergey Koshcheyev asked me to provide some insight into the code the CodeDOMRenderer generates. Especially he was hoping I would point out what the differences between our generated code and the default generated code are. So here we go:

Main Goals for CodeDOMRenderer

  • Use partial classes in order to hide the autogenerated code and actually make it easier to update.
  • Make the generated code aware of Generics (NHibernate.Generics namespace), but don't force the usage of them.
  • Support all the benefits of CodeDOM, e.g. make it easy to switch to generating classes in a different language (e.g. VB.NET), fully .NET compliant syntax, open for new language features to come, etc.
  • Support all options provided by BasicRenderer and VelocityRenderer.

Right now switching generated languages is not as simple as I wish it would be. That's due to some handgenerated code that we will try to get rid of asap. After that is done a new config option will be introduced, making the generated language fully configurable.

Generated Code Example

To look at some generated code, please download this zip file:
hbm2net.CodeDOMRenderer.generatedCode.zip
It includes the renderer's configfile, 2 sample .hbm.xml that are quite complex and the generated code files.

To generate them yourself, please download this zip:
hbm2net.CodeDOMRenderer.zip
It includes the binaries used to generate the classes provided in the upper zip, the renderer's configfiles and 2 complex .hbm.xmls. Unzip the file, change into the directory and execute (all one line):

NHibernate.Tool.hbm2net.Console --config=configRender.xml
    GenCodeExample.hbm.xml GenCodeGenericExample.hbm.xml

This binary does run on mono as well. Except for a bug in Generator.cs that generates weird filenames for the generated files. But a simple rename of these generated classes "fixes" that. The pattern is quite clear. ;) I didn't want to fix the actual bug, because I did want to leave the original sources untouched.

Some interesting pointers

Notice that when there are no Generics used in a class, the WireUpEntities is not necessary and therefore not generated. So these classes can be used in .NET 1.1 as well.
I included the <meta attribute="gen-property">false</meta> to show you that we are not done yet. ;)
Notice how all comments and includes, extends, etc may be defined inside the .hbm.xml.

XML:
  1. <codegen>
  2.   <generate suffix=".hbm2net" renderer="NHibernate.Tool.hbm2net.CodeDOMRenderer">
  3.     <param name="usePartial">true</param>
  4.     <param name="internalPrefix">_</param>
  5.   </generate>
  6.   <generate renderer="NHibernate.Tool.hbm2net.CodeDOMRenderer">
  7.     <param name="internalPrefix">_</param>
  8.     <param name="generateEmptyPartialClass">true</param>
  9.   </generate>
  10. </codegen>

This is the config file you need in order to generate classes using CodeDOMRenderer. The internalPrefix defines the prefix that is used when generating the private variables. Using this config there will be two renderer used on every supplied .hbm.xml. One is creating the base class and the one with the generateEmptyPartialClass parameter set to true will generate the user space class.

Post to Twitter Tweet this

June 24th, 2006 2:48 pm | Comments (0)

Make hbm2net generated code look more like .NET 2.0 (Part 3)

by Tobias Hertkorn on June 22nd, 2006

I just wanted to give you folks a quick update on the status of nhibernate's hbm2net that we are rewriting. The version I am posting here is production stable (at least we are using it ;) ), but does still lack some stuff:

  • revise the use of scopes
  • Use Nullables information to do some basic checking in set ops
  • Overwrite Equals and HashCode
  • The part generating the Generics needs to get replaced
  • Supporting List and Set - not only Bag
  • Only regenerate the base class, if there is already a generated class (to protect the user's modifications)
  • Rewrite the EntityWireUp sample

How to install the CodeDOM Renderer for hbm2net

First of all you need to download the complete source code of nhibernate, because you need to compile it from source in order to get CodeDOMRenderer support. This version is compiled against the svn trunk version of June 19th 2006. Download the zip from this post and unzip it into a temp dir. Find the directory trunk\nhibernate\src\NHibernate.Tool.hbm2net inside the nhibernate sourcecode tree and copy the unzipped files there. Then open trunk\nhibernate\src\NHibernate.Everything-2.0.sln using visual studio 2005. That's all we tested. For style, set NHibernate.Tool.hbm2net.Console as startup project and hit Build. It might be the case that the compiler complains about missing AssemblyInfo.cs. That's due to the fact, that they are autogenerated using NAnt. Please go to the NHibernate download page for instructions how to use NAnt to generate the AssemblyInfo.cs. After you did that reopen the sln and hit Build.

How to use the Renderer

CodeDOMRenderer uses hbm2net's built in configuration filesyntax. That means you should craft a config files similar to this one:

XML:
  1. <codegen>
  2.   <generate suffix=".hmb2net" renderer="NHibernate.Tool.hbm2net.CodeDOMRenderer">
  3.     <param name="usePartial">true</param>
  4.     <param name="internalPrefix">_</param>
  5.     <param name="baseclass-prefix">.hmb2net</param>
  6.   </generate>
  7.   <generate renderer="NHibernate.Tool.hbm2net.CodeDOMRenderer">
  8.     <param name="internalPrefix">_</param>
  9.     <param name="generateEmptyPartialClass">true</param>
  10.   </generate>
  11. </codegen>

Then go to the directory that contains you hbm.xml and execute

NHibernate.Tool.hbm2net.Console.exe --config=config.cfg *.hbm.xml

Now you have all the generated classes in the subdirectory "generated".

Please keep in mind that this is beta quality. Use at your own risk.

Download:
Modified NHibernate.Tool.hbm2net

Quick update: There is a binary version and some samples of generated code available now:
hbm2net.CodeDOMRenderer.zip binary version
hbm2net.CodeDOMRenderer.generatedCode.zip

For instructions how to use it, go to: A detailed look at hbm2net’s CodeDOMRenderer.

Post to Twitter Tweet this

June 22nd, 2006 11:25 am | Comments (2)

Make hbm2net generated code look more like .NET 2.0 (Part 2)

by Tobias Hertkorn on June 19th, 2006

I did not pursue the plan I had in my first part about extending hbm2net. Instead Roland Eichler, a fellow programmer, and I decided to write a totally independant Renderer for hbm2net. We are using CodeDOM's CodeGenerator and right now the Renderer is utilizing partial classes and does feature Generics. Please bear with us, until we can present the results. Hopefully next week.

Update:
There is a downloadable version in hbm2net extension Part 3

Post to Twitter Tweet this

June 19th, 2006 8:16 pm | Comments (1)

Make hbm2net generated code look more like .NET 2.0

by Tobias Hertkorn on June 17th, 2006

NHibernate ships with a pretty simple but effective code generator console program named hbm2net. Unfortunatelly it does not use any .NET 2.0 features yet. So I just tried to at least get in partial classes which is a great feature to separate the autogenerated code with any custom changes.

Go to www.hibernate.org and download the nhibernate framework. I strongly recommend though, to get the svn version to get the latest hbm2net version. The one shipped with the stable branch is not that great and this little guide uses the trunk of the svn on June 17, 2006.

What do we need to do? It should be fairly simple to introduce parital classes since we only have to change the filename the class is stored under (from xyz.cs to xyz.base.cs or xyz.hbm2net.cs) and introduce an additional partial at the position of any class modifiers. Hopefully there will be a basic template or any programmatically introduced modifiers. In addition, if there is time, I want to introduce a mechanism that automatically generates the corresponding xyz.cs, if it does not yet exist. So, let's dive in.

Introducing partial

Oh, great. The tool uses Velocity to generate the classes. Great, great, great. Well, that should make it fairly easy to do the modifications. A short runthrough: there is a convert.vm that is used as a template for the class generation. So one solution would be to edit this line:

$clazz.scope $clazz.modifiers $clazz.declarationType \
       $clazz.generatedName [...]

to something like this:

$clazz.scope #if(!$clazz.isInterface())partial#end $clazz.modifiers \
       $clazz.declarationType $clazz.generatedName [...]

That would be fine. Since I knew where to look, when I found out that the generator uses Velocity, this solution gets 10 points for speed but only 2 points for elegance. :) Since "partial" is a class modifier it should be inserted into the template by $clazz.modifiers and not hardcoded. Plus, I have some time on my hand, and I want to learn more about hbm2net. So those 2 points for elegance are more out of curiosity than out of spite. ;)

Ohoh, I guess this should go on my ever-so-long todo list. Damn this is weird codebase. It is very, very obvious that this was "ported" from Hibernate. Hehehe. Some quite nasty quickhacks to port stuff as well. One class is still called JavaTool. ;) Oh, well.

The class we have to focus on is ClassMapping. This class is used to resolve everything prefixed with $clazz. in the convert.vm file. (Take a look at VelocityRenderer.render(...) to verify that). Therefore we need the property Modifiers to change the behaviour of $clazz.modifiers. The original one looks like this:

C#:
  1. virtual public string Modifiers
  2. {
  3.   get
  4.   {
  5.     if (shouldBeAbstract() &&
  6.         (Scope.IndexOf("abstract") == -1)
  7.     )
  8.     {
  9.       return "abstract";
  10.     }
  11.     else
  12.     {
  13.       return "";
  14.     }
  15.   }
  16. }

So we simply create a new class PartialClassMapping that inherits from ClassMapping and overwrites the Property Modifiers with something like:

C#:
  1. public override string Modifiers
  2. {
  3.   if (Interface)
  4.   {
  5.     return base.Modifiers;
  6.   }
  7.   else
  8.   {
  9.     if (base.Modifiers.Trim() == string.Empty)
  10.     {
  11.       return "partial";
  12.     }
  13.     else
  14.     {
  15.       return "partial " + base.Modifiers;
  16.     }
  17.   }
  18. }

Be sure to create the right Constructors that simply call base. Download the whole file: PartialClassMapping.cs

Well, now there should be some configuration system. Or a Factory of Strategy Pattern that creates new instances of the appropriate ClassMapping... Hmpf. Well, there are only two files that contains "new ClassMapping" and those are ClassMapping itself and CodeGenerator. So, we introduce a couple of ugly protected virtual methods there to make the CodeGenerator the real keeper of the "configuration" of with classes will be generated.

After some reordering of the Constructors' arguments and refactoring those things worked out - kinda. ;) Now there is a new command-switch "--net20" that will enable those first tiny .NET 2.0 features. But I guess later there should be Generics involved, when using that switch, etc.

Here are all the downloads for this hack:
PartialClassMapping.cs
ClassMapping.cs
CodeGenerator.cs

And introducing the base class - that's tomorrow. ;)

Continue here for the second part. Third part.

Post to Twitter Tweet this

June 17th, 2006 11:13 pm | Comments (6)

What’s the worst thing about Link-Spam?

by Tobias Hertkorn on June 15th, 2006

I personally think that it's not the fact that they post on my site. That's what the Akismet Spam plugin is for. By the way, great work guys! Your plugin already caught 830 spam messages, in 7 weeks. But that's not the point. I am just always upset by the horrible, horrible English they use. Seriously - aweful! Even those one-liners are full of mistakes and bad grammar. But I guess it is not worth putting more thoughts into those messages - since I delete them anyway. ;)

Post to Twitter Tweet this

June 15th, 2006 12:13 pm | Comments (0)

I just learned two new things on Debian

by Tobias Hertkorn on June 14th, 2006

Hahaha, why is that such a big deal? Well, I have been using Debian for more than 10 years now and I thought I was well read, when it comes to the dpkg and apt commands. But today I actually learned not one, but two new things related to apt. Aaaaand the winners are: apt-build and volatile.

Well, as far as I am concerned there is no need for Gentoo anymore. ;) Let there be apt-build. Nice work on making recompiling performance critical packages as easy as directly installing them using apt-get. Nonono, please don't start a flame on Debian vs. Gentoo, or Gentoo vs. rest of the world. I was just joking. :)

Second cool thing is Debain Volatile, a project aimed at getting more recent software like AV-scanners into the stable branch. So now there is less need to do some apt-pinning and/or upgrading to testing on your production servers. Nice.

Post to Twitter Tweet this

June 14th, 2006 5:21 pm | Comments (1)

Sorry for the downtime

by Tobias Hertkorn on June 13th, 2006

Sorry about the downtime of my blog. The university that is hosting this site had some weird dns trouble. first we thought that it was our local ldap that was acting up, but fortunatelly I discovered that the dns entry for our ldap server had simply vanished. But now everything is back online.

Post to Twitter Tweet this

June 13th, 2006 1:56 pm | Comments (0)

On the hunt for safe optimizations for md5

by Tobias Hertkorn on June 10th, 2006

Well, right now I am testing some other optimizations that use strictly safe code. One of the things I came across:

On my fast machine it helps to do as much arithmetic as possible in one go. For example:

C#:
  1. b += (((d ^ a) & c) ^ a) + (uint)K[7] + buff[7];
  2. b = (b <<22) | (b>> 10);
  3. b += c;

This b += c is not smart to use in this situation and actually produces a performance hit. Better would be:

C#:
  1. b += (((d ^ a) & c) ^ a) + (uint)K[7] + buff[7];
  2. b = ((b <<22) | (b>> 10)) + c;

Altering the whole segment like this in the md5 algorithm adds up to a 10-15% performance increase!

Let's look at the IL:

IL:
  1. ldloc.1
  2. ldloc.3
  3. ldloc.0
  4. xor
  5. ldloc.2
  6. and
  7. ldloc.0
  8. xor
  9. ldsfld unsigned int32[] MD5CryptoServiceProviderMonoOrig::K
  10. ldc.i4.7
  11. ldelem.u4
  12. add
  13. ldarg.0
  14. ldfld unsigned int32[] MD5CryptoServiceProviderMonoOrig::buff
  15. ldc.i4.7
  16. ldelem.u4
  17. add
  18. add
  19. stloc.1
  20. ldloc.1
  21. ldc.i4.s 22
  22. shl
  23. ldloc.1
  24. ldc.i4.s 10
  25. shr.un
  26. or
  27. stloc.1  // stores b
  28. ldloc.1  // loads b
  29. ldloc.2
  30. add
  31. stloc.1

the second version's:

IL:
  1. ldloc.1
  2. ldloc.3
  3. ldloc.0
  4. xor
  5. ldloc.2
  6. and
  7. ldloc.0
  8. xor
  9. ldsfld unsigned int32[] MD5AddOptimization3::K
  10. ldc.i4.7
  11. ldelem.u4
  12. add
  13. ldarg.0
  14. ldfld unsigned int32[] MD5AddOptimization3::buff
  15. ldc.i4.7
  16. ldelem.u4
  17. add
  18. add
  19. stloc.1
  20. ldloc.1
  21. ldc.i4.s 22
  22. shl
  23. ldloc.1
  24. ldc.i4.s 10
  25. shr.un
  26. or
  27.                // store, load eliminated
  28. ldloc.2
  29. add
  30. stloc.1

I thought this was weird, because I assumed that the compiler might be able to find these kinds of unnecessary operations, but I guess it is harder to locate them than it seems. Especially since a wrong optimization might end up with broken code.

A weird thing is: I just did some testing on my old, old i586 - and this optimization I described here actually decreases performances under some cirumstances! Obviously the lesson learned is: Never trust "optimizations" without testing them!

Well, I'll soon write about more safe optimizations I found. But this is for sure again: Optimizing is all about trial and error. There is almost no way to tell if anything might perform better or worse.

Post to Twitter Tweet this

June 10th, 2006 8:07 pm | Comments (2)

Md5 – no unsafe in crypto

by Tobias Hertkorn on June 9th, 2006

I got in contact with Sebastien Pouliot in order to contribute back to mono. And I got a really nice email back. He suggested a couple of other optimizations, explained some of the performance gains I was experiencing and was looking forward to seeing more of my optimizations. But unfortunatelly he also wrote this:

For many reasons I don't want to add unsafe code into crypto. Beside being "unsafe" many projects use copies of Mono crypto source code directly into their apps and I don't want to require them to use unsafe code (and limit the condition were the code can be executed, e.g. FullTrust)

That makes it impossible to use the one optimization that gives the biggest performance boost on little endian machines. Since that optimization will not make it into mono, I just wanted to blog about it here - just in case somebody needs the extra edge.

Optimizing the md5 algo on little endian machines

This decoding step is a major part within the md5 algo. It translates a 64 byte array into a 16 uint array. The general endian-safe step looks like this:

C#:
  1. for (i = 0; i <16; i++)
  2. {
  3.   buff[i] = (uint)(inputBuffer[inputOffset + 4 * i])
  4.     | (((uint)(inputBuffer[inputOffset + 4 * i + 1])) <<8 )
  5.     | (((uint)(inputBuffer[inputOffset + 4 * i + 2])) <<16)
  6.     | (((uint)(inputBuffer[inputOffset + 4 * i + 3])) <<24);
  7. }

If this step is performed on a little endian machine, the bytes are already safed in memory in the right order and there is no need for all the bitlogic. Therefore optimizing it using unsafe code and direct copying 4 consecutive bytes as one uint gives a huge speed increase:

C#:
  1. unsafe
  2. {
  3.   fixed (byte* bFixed = inputBuffer)
  4.   {
  5.     byte* inputPointer = bFixed + inputOffset;
  6.  
  7.     for (i = 0; i <16; i++)
  8.     {
  9.       buff[i] = *(uint*)(inputPointer);
  10.       inputPointer += 4;
  11.     }
  12.   }
  13. }

As we are already in unsafe context using an additional pointer for buff gives an additional speed increase:

C#:
  1. unsafe
  2. {
  3.   fixed (byte* bFixed = inputBuffer)
  4.   {
  5.     fixed (uint* aFixed = buff)
  6.     {
  7.       byte* inputPointer = bFixed + inputOffset;
  8.       uint* bufferPointer = aFixed;
  9.  
  10.       for (i = 0; i <16; i++)
  11.       {
  12.         *bufferPointer = *(uint*)(inputPointer);
  13.         inputPointer += 4;
  14.         bufferPointer++;
  15.       }
  16.     }
  17.   }
  18. }

Be sure to still make this Md5 implementation endian safe by introducing a if:

C#:
  1. if (BitConverter.IsLittleEndian)
  2. {
  3.   // unsafe decode
  4. }
  5. else
  6. {
  7.   // bitshifting decode
  8. }

I really can understand why this is important. There should not be any unsafe code in a crypto environment, and I agree on that. Unfortunatelly I have not come up with an alternative solution for that particular decode step. Using

C#:
  1. BitConverter.ToUInt32(inputBuffer, inputOffset + 4 * i);

really is no good. Unfortunatelly this method does some extensive testing before converting the bytes. That slows it down so much, that it performs worse than the bitops.

Post to Twitter Tweet this

June 9th, 2006 2:14 pm | Comments (0)
Tobi + C# = T# - Blogged blogoscoop