Optimizing heavy arithmetics in Mono

by Tobias Hertkorn on June 29th, 2009

Sowhat as a tribute to this rather foolish letter.

---

This will be an article about how I optimized the md5 algorithm in Mono and the Microsoft .NET Framework. I haven't completed it yet, but there is quite a bit about optimization in my blog. Check out all the mentions of md5 in my blog for some pointers.

If you need my work on this right now, raw and unpolished - please feel free to email me at tobias at hertkorn dot com. Or to get some hand-on example, go to my post about the optimized Md5CryptoServiceProvider in mono.

Optimizing the md5 algo on little endian machines

This decoding step is a major part within the md5 algo. It translates a 64 byte array into a 16 uint array. The general endian-safe step looks like this:

C#:
  1. for (i = 0; i <16; i++)
  2. {
  3.   buff[i] = (uint)(inputBuffer[inputOffset + 4 * i])
  4.     | (((uint)(inputBuffer[inputOffset + 4 * i + 1])) <<8 )
  5.     | (((uint)(inputBuffer[inputOffset + 4 * i + 2])) <<16)
  6.     | (((uint)(inputBuffer[inputOffset + 4 * i + 3])) <<24);
  7. }

If this step is performed on a little endian machine, the bytes are already safed in memory in the right order and there is no need for all the bitlogic. Therefore optimizing it using unsafe code and direct copying 4 consecutive bytes as one uint gives a huge speed increase:

C#:
  1. unsafe
  2. {
  3.   fixed (byte* bFixed = inputBuffer)
  4.   {
  5.     byte* inputPointer = bFixed + inputOffset;
  6.  
  7.     for (i = 0; i <16; i++)
  8.     {
  9.       buff[i] = *(uint*)(inputPointer);
  10.       inputPointer += 4;
  11.     }
  12.   }
  13. }

As we are already in unsafe context using an additional pointer for buff gives an additional speed increase:

C#:
  1. unsafe
  2. {
  3.   fixed (byte* bFixed = inputBuffer)
  4.   {
  5.     fixed (uint* aFixed = buff)
  6.     {
  7.       byte* inputPointer = bFixed + inputOffset;
  8.       uint* bufferPointer = aFixed;
  9.  
  10.       for (i = 0; i <16; i++)
  11.       {
  12.         *bufferPointer = *(uint*)(inputPointer);
  13.         inputPointer += 4;
  14.         bufferPointer++;
  15.       }
  16.     }
  17.   }
  18. }

Be sure to still make this Md5 implementation endian safe by introducing a if:

C#:
  1. if (BitConverter.IsLittleEndian)
  2. {
  3.   // unsafe decode
  4. }
  5. else
  6. {
  7.   // bitshifting decode
  8. }

Avoiding unnecessary stloc and ldloc ops

If possible do as much arithmetic as possible in one go. For example:

C#:
  1. b += (((d ^ a) & c) ^ a) + (uint)K[7] + buff[7];
  2. b = (b <<22) | (b>> 10);
  3. b += c;

This b += c is not smart to use in this situation and actually produces a performance hit. Better would be:

C#:
  1. b += (((d ^ a) & c) ^ a) + (uint)K[7] + buff[7];
  2. b = ((b <<22) | (b>> 10)) + c;

Altering the whole segment like this in the md5 algorithm adds up to a 10-15% performance increase!

Let's look at the IL:

IL:
  1. ldloc.1
  2. ldloc.3
  3. ldloc.0
  4. xor
  5. ldloc.2
  6. and
  7. ldloc.0
  8. xor
  9. ldsfld unsigned int32[] MD5CryptoServiceProviderMonoOrig::K
  10. ldc.i4.7
  11. ldelem.u4
  12. add
  13. ldarg.0
  14. ldfld unsigned int32[] MD5CryptoServiceProviderMonoOrig::buff
  15. ldc.i4.7
  16. ldelem.u4
  17. add
  18. add
  19. stloc.1
  20. ldloc.1
  21. ldc.i4.s 22
  22. shl
  23. ldloc.1
  24. ldc.i4.s 10
  25. shr.un
  26. or
  27. stloc.1  // stores b
  28. ldloc.1  // loads b
  29. ldloc.2
  30. add
  31. stloc.1

the second version's:

IL:
  1. ldloc.1
  2. ldloc.3
  3. ldloc.0
  4. xor
  5. ldloc.2
  6. and
  7. ldloc.0
  8. xor
  9. ldsfld unsigned int32[] MD5AddOptimization3::K
  10. ldc.i4.7
  11. ldelem.u4
  12. add
  13. ldarg.0
  14. ldfld unsigned int32[] MD5AddOptimization3::buff
  15. ldc.i4.7
  16. ldelem.u4
  17. add
  18. add
  19. stloc.1
  20. ldloc.1
  21. ldc.i4.s 22
  22. shl
  23. ldloc.1
  24. ldc.i4.s 10
  25. shr.un
  26. or
  27.                // store, load eliminated
  28. ldloc.2
  29. add
  30. stloc.1

Post to Twitter Tweet this

June 29th, 2009 8:38 pm | Comments (0)

When a great experience suddenly turns into … horror

by Tobias Hertkorn on February 24th, 2009

I was upgrading my server which is hosted by large provider from Debian etch to the current stable version lenny. And I was again amazed how simple upgrading a whole box can be with Debian. It seriously was a blast. A couple of questions about touched configurations, a bit of waiting and all was happily done.

At least I thought it was.

As usual I started doing the upgrade - even though I should have known better - at about 11 p.m. which I though would give ample time to finish up and go to bed. Well, it just was a good time to upgrade, traffic on the site was low, all services where not in use, so I was happy to not disturb too many people. And the best part was: I seriously was done at about half past twelve, spending most of the time doing some readjusting of the exim configuration after the upgrade. I seriously doubt that any other non-debian based system would have given me such a fast a smooth upgrade path. After verifying everything worked, I was eager to test - as a last thing before hopping into bed - if the server was still coming up without a glitch after a reboot/power-loss.

After giving ample time for the server to come up again, I tried to remote ssh to it. And got nothing but timeouts. Which was the point where I slightly started to panic. I immediately started thinking about kernel failures, data loss, mbr corruption, ... Oh the horror.

Lucky me, my hoster does provide terminal access even from remote, so I could check if it was disaster or if it was "just" connectivity - missing module in the new kernel or something similar. To make a long story short - somehow the dhcp client got deinstalled during the upgrade! D'oh! And of course I did not have the installation package locally. Chicken and egg problem, right? How should I fix the network connection without getting the dhcpclient package via the network connection. But trying the usual gateways like .1 or .254 via static routes did not seem to work out so well...

Fortunately I did find the correct settings for 1&1 (which are a bit exotic) eventually - so for anybody having the same problem here they are:


iface eth0 inet static
address 212.227.xxx.xxx
netmask 255.255.255.255
network 212.227.xxx.xxx
broadcast 212.227.xxx.xxx
up /sbin/route add -net 10.255.255.1 netmask 255.255.255.255 dev eth0
up /sbin/route add default gw 10.255.255.1

Substitute all the 212.227.xxx.xxx with your server address.

After setting the static interface information, reinstalling the dhcp client was not a bit of a problem. But how in hell could I have missed that the upgrade uninstalled such a vital package?
In the end I thought "All's well that ends well" and went to bed around 3 in the morning. Though one question remains: Will I be wiser in choosing the time to start an upgrade next time? Why do I seriously doubt that? :)

Post to Twitter Tweet this

February 24th, 2009 8:38 am | Comments (0)

Getting nvidia to run on lenny

by Tobias Hertkorn on July 22nd, 2007

I just did an upgrade from etch to lenny on my system. I just had the time and since this machine is not in production use right now (done with my diploma thesis!!! :) ) I thought "Why not?". Well, unfortunatelly after the upgrade X failed to start.

(EE) Failed to load module "nvidia" (module does not exist, 0)

Well, I thought, ok, new kernel 2.6.21 and I have not compiled the module yet. Not a problem. Unfortunatelly there was a problem, because they introduced a check, if a non-GPL driver used some strictly GPL interfaces... After fumbling around for a couple of minutes I was fed up with the console, because researching with lynx is getting harder and harder these days because of "site-designers". So I thought I would boot the old 2.6.18 kernel, no problemo.

Did that but the same error occured, which made me really wonder?!? why did that happen? Turns out, the shared version of the nvidia driver is missing in lenny right now. So if you run into the same problem I did, no nvidia driver found on lenny even if you boot the 2.6.18 kernel do this:


cd /usr/lib/xorg/modules/drivers/
gcc -shared -o nvidia_drv.so nvidia_drv.o

Of course you need to be root...

I will look into the Paravirt thing later today. If it's an easy fix, I will post it here - if not, well, I guess running the 2.6.18 kernel til that is fixed does not hurt too much in my case right now. In fact I did find some fixes on http://bugs.debian.org/ - we'll see.

Update: No, didn't do it. I could, but right now I am too lazy. Maybe it would be wise to complain to NVidia, maybe they will get their act together and release a GPL driver? I guess that's wishful thinking.

Post to Twitter Tweet this

July 22nd, 2007 5:41 pm | Comments (1)

Improving on yesterday’s mean value calculator

by Tobias Hertkorn on June 17th, 2007

There are still some bugs in yesterday's bash script. For example there is a nasty condition in the awk part. What happens if the mean value of an empty file is calculated? Awk of course complains "fatal: division by zero attempted".

Improvement:

BASH:
  1. awk '
  2. BEGIN {
  3.   FS=","
  4.   total = 0
  5.   count = 0
  6. }
  7. {
  8.   total = total + $2
  9.   count++
  10. }
  11. END {
  12.   if(count> 0) { print total/count } else { print "n/a" }
  13. }'

And there are of course some interesting side effects in the command parsing snippet. But that's for another day, I guess. Because improving that will require some real array voodoo. :-)

Post to Twitter Tweet this

June 17th, 2007 6:38 pm | Comments (2)

How to create a userfriendly bash script (command line parsing)

by Tobias Hertkorn on June 16th, 2007

Well, I had to do some very basic calculations based on simple csv files. I had to pick the second field and calculate the overall mean in a file of that field. There are some very good awk tutorials out there, but I was very disappointed about how little they talk about how to surround the awk command with a little bash sugar to produce a small reusable tool. So I just wanted to quickly point out how to something like that WELL. The basic awk algorithm is very simple, but you really should surround that simple awk command with some basic tests. It is pretty important, at least as far as I am concerned, that tools are always blessed with a nice command line parser, that it allows for direct file referencing and piping as well, if an error occurs please make it write to stderr and make it returns correct exit status codes. It's just good practice. Granted, bash command line parsing is not too much fun, but there are a couple of tricks, but the exit codes are so easy to deal with, just remember putting in one line.

I'll just throw out the code, so look at it and maybe you can identify all the things that went into the surrounding sugar:

BASH:
  1. #!/bin/bash
  2. do_usage(){
  3.   echo>&2 "Usage: `basename $0` [-|-f file|--file file] [--] [file ...]"
  4.   exit 1
  5. }
  6. test_if_last_argument() {
  7.   if [ $# -gt 0 ]; then
  8.     do_usage
  9.   fi
  10. }
  11.  
  12. filename=
  13. while [ $# -gt 0 ]
  14. do
  15.   case "$1" in
  16.     -f|--file) filename="$2"; shift;;
  17.     --) shift; break;;
  18.     -) filename="-"; shift; test_if_last_argument "$@"; break;;
  19.     -*) do_usage;;
  20.     *) break;; # terminate while loop
  21.   esac
  22.   shift
  23. done
  24.  
  25. awk '
  26. BEGIN {
  27.   FS=","
  28. }
  29. {
  30.   mean = mean + $2
  31.   count++
  32. }
  33. END {
  34.   mean = mean / count
  35.   print mean
  36. }
  37. ' "$filename" "$@"
  38.  
  39. exit $?

It is an excellent example of how there usually 56.7% of the lines are just there for maintenance and only 43.3% actually do something useful. But always remember: 42.7 percent of all statistics are made up on the spot.

I'll explain why I put those maintenance lines in, why it's "$@" and not $@, etc ... tomorrow.

Post to Twitter Tweet this

June 16th, 2007 11:43 pm | Comments (0)

Using a x86 chroot to run 32 bit apps on Debian AMD64

by Tobias Hertkorn on May 21st, 2007

Unfortunatelly today's internet sites usually depend heavily on Macromedia Flash to "enhance" the experience. Therefore it was not too much fun working on my 64 bit Linux system the last couple of days, because the browser was displaying a lot of "please download plugin" signs and after a while they got on my nerves. Since there is no 64bit Flash plugin for firefox I chose to resort to a 32bit chroot environment to run my iceweasel in. Here's what I did:

mkdir /chroot-sid-i386
debootstrap --arch i386 sid /chroot-sid-i386 http://ftp2.de.debian.org/debian/

That sets up a basic Debian i386 environment using sid (unstable).

To get transparent mount and environment settings there are a couple of things to set up.

Change /etc/fstab and append following lines:

# ia32 chroot
/home /chroot-sid-i386/home none bind 0 0
/tmp /chroot-sid-i386/tmp none bind 0 0
/proc /chroot-sid-i386/proc none bind 0 0
/dev /chroot-sid-i386/dev none bind 0 0
/media/cdrom0 /chroot-sid-i386/media/cdrom0 none bind 0 0
/usr/share/fonts /chroot-sid-i386/usr/share/fonts none bind 0 0

Now execute the following commands:

mkdir /chroot-sid-i386/media/cdrom0
mkdir /chroot-sid-i386/usr/share/fonts
mount -a
cp /etc/passwd /etc/shadow /etc/group /etc/environment /chroot-sid-i386/etc/
apt-get install dchroot

Now add your favourite Debian mirror to the chroot's apt source.list:

# vim /chroot-sid-i386/etc/apt/sources.list
deb http://ftp2.de.debian.org/debian/ sid main contrib non-free
deb-src http://ftp2.de.debian.org/debian/ sid main contrib

and finally go into the chroot for the first time:

chroot /chroot-sid-i386

Maybe it's a good thing to check your environment first:

dpkg-reconfigure locales

And usually it is a good idea to install some basic stuff:

tasksel --new-install

Make sure "Standard system" is selected. Usually it is a good idea to not select all the other stuff. Since usually those are already installed in your regular environment and might just be dead weight on your harddrive.
As an alternative you can install that task without an interface by using

tasksel install standard

Well, now let's update apt-get and install some stuff.

apt-get update
apt-get install firefox

And that's basically it.

To make it easier for normal user to access the chroot and execute 32 bit programs set up dchroot correctly:

(While in the non-chroot env):
Create /etc/dchroot.conf with the following content:

# sid386 chroot
sid386 /chroot-sid-i386

Create a file /usr/local/bin/do_chroot with the following content:

#!/bin/sh
ARGS=""
for i in "$@" ; do
ARGS="$ARGS '$i'"
done
exec dchroot -c sid386 -d -q "`basename $0`" "$ARGS"

Now linking a name in /usr/local/bin to do_chroot via logical symlink executes the program with the same name within the chroot. Without the need of super-user rights.

My symlinks look like this (I added an additional 86 at the end of each name to avoid name conflicts):

/usr/local/bin$ ls -l
total 4
lrwxrwxrwx 1 root staff 9 May 16 11:09 acroread86 -> do_chroot
-rwxr-xr-x 1 root staff 114 May 16 10:40 do_chroot
lrwxrwxrwx 1 root staff 9 May 16 10:44 iceweasel86 -> do_chroot

and

/chroot-sid-i386/usr/bin$ ls -l acroread* iceweasel*
lrwxrwxrwx 1 root root 40 May 16 11:08 acroread -> /usr/local/Adobe/Acrobat7.0/bin/acroread
lrwxrwxrwx 1 root root 8 May 16 11:11 acroread86 -> acroread
lrwxrwxrwx 1 root root 26 May 15 17:45 iceweasel -> ../lib/iceweasel/iceweasel
lrwxrwxrwx 1 root root 9 May 15 17:52 iceweasel86 -> iceweasel

Additionally any user can now execute a bash shell inside the chroot environment by using:

dchroot -c sid386 -d

This setup should work on any Debian setup. I personally tested it on etch and lenny.

Post to Twitter Tweet this

May 21st, 2007 10:41 pm | Comments (1)

Google toolbar on Debian’s Iceweasel

by Tobias Hertkorn on May 21st, 2007

Yeah, sometimes using Debian has some drawbacks. For example I can't understand why they had to change Firefox to Iceweasel. Oh, well... Usually I don't care about that kind of stuff, but unfortunatelly this time it affects me as well - since Google's toolbar does not install as the installer does not recognize Iceweasel as Firefox. And all just because of a different useragent string. How to change it back (and get full Google toolbar support):

  • In the navigation: about:config
  • Go to (or filter by): general.useragent.extra.firefox
  • Exchange Iceweasel with Firefox: Firefox/2.0.0.3

That's all folks. Now go to Google Toolbar's website and the installer will run as expected.

Post to Twitter Tweet this

May 21st, 2007 3:42 pm | Comments (7)

Writing a program’s stdout to a file and stdout in Linux

by Tobias Hertkorn on March 6th, 2007

I want to pipe stdout of a program to both a file to store the output of the program and to the console in order to be able to observe the progress of the program. Unfortunatelly bash does not allow splitting pipes out of the box. So I just hacked this little program in order to accomplish that. It is quite simple - all it does is take stdin and write it to stdout and stderr. That way one can be piped to a file and the other will be appearing on the console.

splitstdin.c:

C:
  1. #include <stdio.h>
  2. #define BUF_LEN 255
  3. main(int argc, char** argv)
  4. {
  5.     size_t numRead;
  6.     char buf[BUF_LEN + 1];
  7.     size_t charSize = sizeof(char);
  8.     while (!feof(stdin)) {
  9.         numRead = fread(buf, charSize, BUF_LEN, stdin);
  10.         buf[numRead] = '\0'/* append a nul character */
  11.         fwrite(buf, charSize, numRead, stdout);
  12.         fwrite(buf, charSize, numRead, stderr);
  13.     }
  14. }

For stdout AND stderr of the program "programtoobserve" to get parsed by splitstdin use this commandform in bash:
split to file and console:

programtoobserve 2>&1 | splitstdin 2>backupOutput

split to two files:

programtoobserve 2>&1 | splitstdin 1>backupOutput1 2>backupOutput2

Edit:
I found the proper built-in command: tee

Post to Twitter Tweet this

March 6th, 2007 5:01 pm | Comments (4)

Getting Debian MPICH and icc/ifort ready on Core2Duo/amd64 port

by Tobias Hertkorn on January 3rd, 2007

Sorry, today's post won't be pretty and certainly not for people who are not familiar with Debian and installing. It's basically just a short summary for me. But feel free to email me or post a comment if you need assistance.

Get the Intel Fortran and CC compiler. If you use them for non-commercial purposes it is free to download them.

http://registrationcenter.cps.intel.com/irc_nas/537/l_fc_c_9.1.036.tar.gz

http://registrationcenter.cps.intel.com/irc_nas/534/l_cc_c_9.1.042.tar.gz

Installing the stuff:
Make sure the ia32-libs package is installed, when running an amd64 port.
apt-get install ia32-libs
This is not the default on Debian!

Pointers on how to install the intel compilers:

http://ubuntuforums.org/showthread.php?t=149579

http://lunatics.at/index.php?op=Articles;article=2

Get MPICH2, unpack and configure it:
CC=icc CXX=icpc F77=ifort F90=ifort FFLAGS='-fomit-frame-pointer -xT -O3 -ip' CFLAGS='-fomit-frame-pointer -xT -O3 -ip' ./configure --prefix=/opt/mpich2 --with-device=ch3:shm --enable-fast --enable-f77

Make sure to include /opt/mpich2/bin in your path.
Get the fftw2, unpack and configure it:
CC=icc CXX=icpc F77=ifort F90=ifort FFLAGS='-fomit-frame-pointer -xT -O3 -ip' CFLAGS='-fomit-frame-pointer -xT -O3 -ip' ./configure --prefix=/opt/fftw-2.1.5 --enable-mpi --enable-sse2 --enable-i386-hacks

Post to Twitter Tweet this

January 3rd, 2007 1:13 pm | Comments (0)

I just learned two new things on Debian

by Tobias Hertkorn on June 14th, 2006

Hahaha, why is that such a big deal? Well, I have been using Debian for more than 10 years now and I thought I was well read, when it comes to the dpkg and apt commands. But today I actually learned not one, but two new things related to apt. Aaaaand the winners are: apt-build and volatile.

Well, as far as I am concerned there is no need for Gentoo anymore. ;) Let there be apt-build. Nice work on making recompiling performance critical packages as easy as directly installing them using apt-get. Nonono, please don't start a flame on Debian vs. Gentoo, or Gentoo vs. rest of the world. I was just joking. :)

Second cool thing is Debain Volatile, a project aimed at getting more recent software like AV-scanners into the stable branch. So now there is less need to do some apt-pinning and/or upgrading to testing on your production servers. Nice.

Post to Twitter Tweet this

June 14th, 2006 5:21 pm | Comments (1)
Tobi + C# = T# - Blogged blogoscoop