Chaos to Order?: 2007

Wednesday 12 December 2007

Not so Static Classes?

A colleague of mine noticed something interesting whilst working with C# the other day.
He had a breakpoint in a static constructor (on a non-static class) which seemed to be hit several times during execution of his program.
He discussed this with me and we both agreed that in C# it should not be possible for a static constructor to be invoked more than once in the same AppDomain, contrary to the behaviour he was experiencing.

I consulted MSDN, and found the following definitions:
Static Classes [C#]
"Static class members can be used to separate data and behavior that is independent of any object identity..."
"Static classes are loaded automatically by the .NET Framework common language runtime (CLR) when the program or namespace containing the class is loaded."

Static Constructors
"A static constructor is used to initialize any static data, or to perform a particular action that needs performed once only. It is called automatically before the first instance is created or any static members are referenced."

My colleague was working with .Net 3.0 using a class that derived from ClientBase in the System.ServiceModel namespace.
Basically, he added a static constructor to define a single EndPoint for all the instances of this derived ClientBase(T).

Upon further inspection, we saw that what was actually happening was that the Assembly containing the derived class was being loaded and instantiated by WCF using Reflection.

With this in mind, I decided to go back to basics and create a simple program to see whether there was a difference in the behaviour of static classes and static constructors when working with an Assembly loaded using Reflection.

I created a new C# solution that contained a Console Application called Demo and a Class Library called ReflectedAssembly.

1. Demo
In Demo, I added a static class called MyStaticClass.
To this class I added a static constructor and a public static method called Call which had a string parameter.
When Call is invoked, it will write the string parameter to the Console.
In the static constructor, I added a call to the Call method, passing "cctor" as a parameter.

2. ReflectedAssembly
I added a reference from ReflectedAssembly to Demo.
I added a class named ReflectedClass and a single method named DoIt which accepted a string as a parameter.
In the DoIt method, I added a call to MyStaticClass.Call, passing the string parameter.

3. Demo
Back in Demo, I wrote the following in the program Main:
---
Assembly TheAssembly;
Type TheType;

MyStaticClass.Call("main start");

TheAssembly = Assembly.LoadFile("**PATH**");
TheType = TheAssembly.GetType("ReflectedClass");

if (TheType != null)
TheType.GetMethod("DoIt").Invoke(Activator.CreateInstance(TheType), new object[] { "hello" });

MyStaticClass.Call("main end");
---

I added a breakpoint to the static constructor in MyStaticClass, but was disappointed to see it only get hit once. It seems I had not managed to reproduce the scenario that my colleague had experienced.

I spent some time trying to figure out precisely what would cause the static class to be constructed mulitple times.
In the end I discovered that it could be achieved in the following way:
1. In the program code, making a copy of the ReflectedAssembly and Demo Assemblies in a different folder.
2. Using Assembly.LoadFrom to load the ReflectedAssembly Assembly as opposed to Assembly.LoadFile.

Once my code was changed to use this method, I found that my static constructor was being invoked twice.

Next, I wanted to find out why this was happening.
I added some additional code to the Call method in MyStaticClass, such that the following information was also written out to the Console:

The process ID

The managed thread ID

The AppDomain ID

The handle of the current Call method

I was hoping to see that somewhere, one of these values would be different.
My initial concern was that there may be some kind of context based issue around multi-threading.

After running the program again, I discovered that the process, thread and AppDomain IDs were all the same.
The difference was in the MethodHandle for the Call method.
In the two calls from the program Main (where "main start" and "main end" were written), the handle is the same, but in the call from ReflectedClass, the handle was different.
This seems to indicate that the instance of MyStaticClass that is used by ReflectedClass differs from the instance used by the program Main, which would explain why the static constructor was being called twice.

By enumerating the Assemblies of the current AppDomain, I can see that there are actualy two copies of my Demo Assembly in memory.
It appears that each contains an instance of the static class MyStaticClass, and this is why the static constructor has been invoked twice.
It is interesting to note that if I use Assembly.LoadFile to load ReflectedAssembly, there will also be two copies of the Assembly in memory, but the reference to the Assembly has been resolved to the same instance as my program Main is running.
However, if I use Assembly.LoadFrom, there seems to be an issue with resolving the reference to the existing Assembly and the reference is resolved to the new Assembly in memory instead.

I decided to take my work a little further and see if I could produce multiple instances of the static class in memory.
I modified my program so that it loads the ReflectedAssembly Assembly three times, from a different location each time.
If I use Assembly.LoadFile, although the Assembly is actually loaded into memory an additional three times, the reference to Demo is always resolved to the first copy in memory, the static constructor only runs once, and the calls to the MyStaticClass.Call method are on the same instance of the static class.
However, using Assembly.LoadFrom (with the Assemblies in a different location each time) causes the same behaviour as before - this time the static constructor is invoked 4 times, and the calls to MyStaticClass.Call are on different instances of the static class.

I refer to the MSDN entry for the Assembly.LoadFrom method:
"If an assembly with the same identity is already loaded, LoadFrom returns the loaded assembly even if a different path was specified."
This does not appear to be true. The code I ran above demonstrates that actually, LoadFrom is returning the newly loaded Assembly each time, and that the Assembly references are being resolved to newly loaded Assemblies too.

Demo Program
A demo program that shows this in detail can be downloaded here.
You are welcome to the source code, which is written in C# 2.0. Please contact me if you would like it.

Conclusion
Static classes and static constructors may only static in the context of the loaded Assembly.
If using Reflection to load Assemblies, you may unknowingly end up with multiple instances of your static classes contrary to accepted wisdom.
Essentially, you cannot rely on there being a single instance of a static class in your AppDomain.

In addition, using Assembly.LoadFrom does not guarantee that any Assemblies you are loading are resolved to in-memory copies, despite what is asserted in MSDN.
This is particularly important because this is what it appears is happening when using WCF in .Net 3.0.

None of this necessarily poses a problem however: as long as we are aware that this is the case we can work with it.

Sunday 2 December 2007

Private Invoking Using Reflection

Whilst working on a product the other day, a thought occured to me with regard to Reflection in Microsoft.Net.
I wondered whether it would be possible to create an instance of a type with Internal or Private scope which was located in another assembly.

As I understand it:

An object declared with Internal scope should only be visible inside the assembly in which it is located.

An object with Private scope should only be visible to the object that contains it. Private objects can not be declared at namespace level, only within other objects.

To see whether it was possible, I created a test.
I created a Class Library, FormLib, which contained a Public class name FormWrapper.
FormWrapper contains three further classes: PublicForm, InternalForm and PrivateForm.

PublicForm is declared as a Public class with a Public constructor.
InternalForm is declared as an Internal class with an Internal constructor.
PrivateForm is declared as a Private class with a Private constructor.
All three classes inherit from Form and contain a single Label control that states "This is a ... form", with "..." replaced by the scope of the class.

I then created a Windows Forms project called Demo which contained a Form. Demo is the startup project.
In Demo, I added three buttons, which would attempt to create an instance of PublicForm, InternalForm and PrivateForm respecitvely.

To start, I needed to obtain the Assembly object representing FormLib. This was relatively easy and was achieved by referencing FormLib from my Demo project and then enumerating the Assembly objects contained in the current AppDomain:
---
...
Assembly _Target;

foreach (Assembly Item in AppDomain.CurrentDomain.GetAssemblies())
{
  if (Item.FullName.StartsWith("FormLib"))
  {
    _Target = Item;
    break;
  }
}
...
---

I assumed creating an instance of the Public class would be easiest, so I implemented this first.
The Assembly object has a method CreateInstance, which can be used to generate an instance of a type, as so:
---
...
object TheObject;

TheObject = _Target.CreateInstance("FormLib.FormWrapper+PublicForm", false, BindingFlags.Public | BindingFlags.Instance, null, null, null, null);
...
---

The first parameter of CreateInstance is the full name of the type I want to create.
"FormWrapper+PublicForm" indicates that the type PublicForm is inside the type FormWrapper, which is unlike the standard syntax for a type inside a namespace, which would simply be "Namespace.Type".
The third paremeter, which is a BindingFlags enumeration, indicates:

Public - The public declarations of the type should be searched

Instance - An instance should be created

The way that CreateInstance actually works is to generate the Type specified in the first parameter ("FormLib.FormWrapper+PublicForm" in this case) and then using the static CreateInstance method of the Activator object to actually generate the instance. This in turn uses RuntimeType.CreateInstanceImpl (RuntimeType is Internal to System) which actually invokes the constructor of the Type.
We could, of course, use Activator directly, but we would be unable to do this with Internal and Private types in a different Assembly because the types are not available to us directly.

Now if the above code works, I want to actually display the form I have just created. To achieve this, following the above code I carry out the following:
---
...
if (TheObject is Form)
{
  Form TheForm;

  TheForm = (Form)TheObject;
  TheForm.ShowDialog();
}
...
---

I ran my code to see what happened, and PublicForm was shown as expected.

Moving on, I decided to try and repeat the above with the Internal and Private classes.
To achieve this, I made a small modification to the call to CreateInstance, first to create the types of the InternalForm and PrivateForm types, and second to change the BindingFlags to Private instead of Public:
---
...
TheObject = _Target.CreateInstance("FormLib.FormWrapper+InternalForm", false, BindingFlags.Private | BindingFlags.Instance, null, null, null, null);
...
TheObject = _Target.CreateInstance("FormLib.FormWrapper+PrivateForm", false, BindingFlags.Private | BindingFlags.Instance, null, null, null, null);
...
---

Upon executing the code, in both cases the form was shown as expected.
To provide further proof that internal or private methods could be executed using reflection, I decided to add a method named "AMethod" to each of the classes.
The scope of the method was the same as the class itself, so there was a Public, Internal and Private method.
In each method, a messagebox will be shown displaying that the method had been invoked.

Directly following the line "TheForm.ShowDialog();" in each case I added the following code to attempt to invoke the method:
---
...
foreach (MethodBase Item in TheObject.GetType().GetMethods(BindingFlags.Public | BindingFlags.Private | BindingFlags.Instance)
{
  Item.Invoke(TheObject, null);
  break;
}
...
---

The objective of the above code is to enumerate all of the publically and privately available methods in the type represented by TheObject.
This reveals the "AMethod" method (which is of type MethodBase), which I can then invoke by calling Invoke and passing the instance of the type in as a parameter.
The second parameter for Invoke is the parameters for the method call (which can be passed as an array of objects - "object[]"). In this case, the method has no parameters so I simply pass in null.

The method could also be invoked by using the InvokeMember method of the Type. However, I found this unsuccessful with methods with Private access so I discounted it.

A Potential Solution
A method that you could use to add some protection to your internal or private code is to monitor the stack trace when your method is invoked. You could also do this upon object construction by adding a monitor to your class constructors.
Basically, you can retrieve a list of the current call stack at the start of each method and ensure that the call to your method came from within your own class or within your own assembly, depending on whether it is a private or internal method respectively.
The following code demonstrates the use of this to ensure the caller is in the current assembly, which will validate access to an internal method:
---
...
StackTrace Trace = new StackTrace();
if (Trace.FrameCount > 2)
  if (Trace.GetFrame(1).GetType().Assembly != Assembly.GetExecutingAssembly())
    throw new Exception("The call to this method is invalid");
...
---

This works by examining the second method on the call stack (frame 1), which will be the direct caller of the current method on the call stack (frame 0). If the caller is not in the same assembly as the current frame then you know that it is in another assembly and has not been made internally.

This works a treat but sadly adds complexity to your program and slows it down. However, if you are producing code that must only be executed internally or privately, this method may offer a reasonable way of protecting it.

Conclusion
You may create classes for internal use only in your assembly, which have internal or private members, properties or methods, but you cannot rely on the fact that these classes and members will not be created and consumed outside of your own code.
The power of .Net Reflection demands that you must not consider your internal and private code to be internal or private.

If you would like the demo program I produced to show this working, it is available here.

Until next time!

Sunday 18 November 2007

Comparing ISINST to CASTCLASS

I was reading through a corporate C# coding standard the other day for one of my clients, and noticed that one of the coding requirements in the standard related to casting of types in code.
Their standard stated that all type conversions in C# should be accomplished by using the 'as' keyword rather than using a direct cast (i.e. '(type)').
I quizzed my client on this to see what formed the basis of their choice to use 'as'. They had no explanation, but said that they felt it could be faster.

Following on from this, I was interested to see whether using 'as' or '(type)' was faster for type conversion.

In .Net IL, 'as' is compiled to 'isinst'. isinst performs a type conversion and returns null if the specified type conversion is not valid.
This differs from using '(type)', which compiles to 'castclass'. castclass also performs a type conversion but throws an Exception if the specified type conversion is not valid.
This is an important difference. For example, consider the following example using 'as' (isinst):
void Foo1(object ObjA)
{
  ClassDef ObjB;

  ObjB = ObjA as ClassDef;
  ObjB.Property = "12345";
}

In the example, we have to be certain that we know ObjA can be converted to the ClassDef type, ObjB, because if it cannot, a NullReferenceException will occur in the line where the property is set.
A better implementation of the above example could be:
void Foo2(object ObjA)
{
  ClassDef ObjB;

  ObjB = ObjA as ClassDef;
  if (ObjB != null)
    ObjB.Property = "12345";
}

If we consider the same example using '(type)' (castclass):
void Foo3(object ObjA)
{
  ClassDef ObjB;

  ObjB = (ClassDef)ObjA;
  ObjB.Property = "12345";
}

Again in this example we have to be sure that the conversion between the two types is valid, because else a TypeConversionException will occur on the line where ObjA is cast to ClassDef.
There could be two potential alternatives to the castclass example:
void Foo4(object ObjA)
{
  try
  {
    ClassDef ObjB;

    ObjB = (ClassDef)ObjA;
    ObjB.Property = "12345";
  }
  catch (Exception ex)
  {
    // Do something useful
    throw ex;
  }
}
void Foo5(object ObjA)
{
  ClassDef ObjB;

  if (ObjA is ClassDef)
  {
    ObjB = (ClassDef)ObjA;
    ObjB.Property = "12345";
  }
}

In the examples above, Foo1 and Foo3 would be appropriate to use if we knew that the object ObjA can definitely be converted to ClassDef. However, this may not always be likely, particularly in these examples becuase the parameter ObjA is an object type.
In these circumstances, it could be more appropriate to use Foo2, Foo4 or Foo5, because they offer security against failures during the conversion process.
In Foo2, we rely on isinst returning null if the conversion is not valid, and so we check for null after the conversion.
In Foo4, we handle an exception that could be raised during a failed castclass.
In Foo5, we first check to see whether ObjA is a ClassDef (using isinst - 'ObjA is ClassDef') and then perform the conversion in the knowledge that it will not cause an exception.

Now I had five potential examples for performing a type convesion, I thought it would be interesting to see whether there was a great deal of difference in the time taken to carry out the different examples.
I wrote a test project which used a high-precision StopWatch to time each of the procedures.
I timed each procedure as it performed one billion casts, repeated over 50 iterations to ensure that a good average time could be ascertained.
Following on from this I analysed the results to see which of the above five examples was fastest.

The total time for all 50 billion conversions for each test were:
Foo1 = 1,466,176 ms
Foo2 = 1,506,633 ms
Foo3 = 1,499,661 ms
Foo4 = 1,519,072 ms
Foo5 = 1,689,141 ms

And the average times per iteration (1 billion conversions) were:
Foo1 = 29,323.52 ms
Foo2 = 30,132.66 ms
Foo3 = 29,993.22 ms
Foo4 = 30,381.44 ms
Foo5 = 33,782.82 ms

Foo1 was the fastest, closely followed by Foo3 (which was around 2.284% slower).
However, as neither Foo1 or Foo3 contain any checks to ensure that the conversion was a success so they may be unsuitable for general use.

Of the remaining methods, Foo2 was fastest, which involved checking for a null value after using isinst, followed by Foo4 (the the try...catch block around castclass) and finally Foo5, which was a combination of isinst and castclass. Foo5 was significantly slower than all the other methods, being around 3.5 seconds\million conversions slower than the slowest of the other four methods.
In order to directly compare isinst and castclass, we can examine Foo1 and Foo3. Consider the IL for Foo1 and Foo3 respectively:
Foo1
...
IL_0014  ldloc.2
IL_0015  isinst    TestCastStyle.ClassDef
IL_001a  stloc.1
IL_001b  ldloc.1
IL_001c  ldstr     "12345"
IL_0021  callvirt  instance void TestCastStyle.ClassDef::set_Property(string)
...

Foo3
...
IL_0014  ldloc.2
IL_0015  castclass TestCastStyle.ClassDef
IL_001a  stloc.1
IL_001b  ldloc.1
IL_001c  ldstr     "12345"
IL_0021  callvirt  instance void TestCastStyle.ClassDef::set_Property(string)
...

The code for the methods is identical, other than IL_0015, which uses isinst in Foo1 and castclass in Foo3.

In the tests I ran, Foo1 took around 29.32ms\million conversions whereas Foo3 took 29.99ms\million, which is a difference of 0.67ms\million.
This confirms that using isinst is definitely quicker than castclass.

Considering the scenario where a check must be performed to ensure the conversion is valid, we can compare Foo2 (isinst) to Foo4 or Foo5 (castclass). The IL for these methods is:
Foo2
...
IL_0014  ldloc.2
IL_0015  isinst    TestCastStyle.ClassDef
IL_001a  stloc.1
IL_001b  ldloc.1
IL_001c  brfalse.s IL_0029
IL_001e  ldloc.1
IL_001f  ldstr     "12345"
IL_0024  callvirt  instance void TestCastStyle.ClassDef::set_Property(string)
...

Foo4
...
try
{
  IL_0014  ldloc.2
  IL_0015  castclass TestCastStyle.ClassDef
  IL_001a  stloc.1
  IL_001b  ldloc.1
  IL_001c  ldstr     "12345"
  IL_0021  callvirt  instance void TestCastStyle.ClassDef::set_Property(string)
  IL_0026  leave.s   IL_002b
}
catch [mscorlib]System.Exception
{
  IL_0028  pop
  IL_0029  rethrow
}
...

Foo5
...
IL_0014  ldloc.2
IL_0015  isinst    TestCastStyle.ClassDef
IL_001a  brfalse.s IL_002e
IL_001c  ldloc.2
IL_001d  castclass TestCastStyle.ClassDef
IL_0022  stloc.1
IL_0023  ldloc.1
IL_0024  ldstr     "12345"
IL_0029  callvirt  instance void TestCastStyle.ClassDef::set_Property(string)
...

At first glance, I had assumed that Foo4 would be slowest of these methods, but the results were:

Foo2 took 30.13ms\million conversions

Foo4 took 30.38ms\million conversions

Foo5 took 33.78ms\million conversions

I was surprised to see that Foo5 was by far the slowest. There does not seem to be a massive difference between Foo2 and Foo4 under the test conditions, but if an Exception was raised at IL_0015 in Foo4, this would slow the method down considerably, whereas Foo2 would continue to be very quick.

The conclusions from the tests I performed are:

In a direct comparison, isinst is quicker than castclass (although on slightly)

When having to perform checks to ensure the conversion was successful, isinst was significantly quicker than castclass

A combination of isinst and castclass should not be used as this was far slower than the quickest "safe" conversion (over 12% slower)

If you would like to run the tests yourself, drop me an e-mail and I will send you the test program I used. Additionally, if you would like the analysis files containing the timings and code dumps, I can also send you this.

Have fun!

Friday 2 November 2007

Using Win32 to Manipulate the UI

Whilst working on an application at a client's site the other day, I noticed that access to certain code was being controlled by disabling buttons on a form.
The code in question should not be invoked under certain conditions (in this case, under certain security conditions), and so the application developers had assumed that disabling the button that invoked the code would prevent the user from invoking the functionality behind the button.

Sadly, this is not the case.
I demonstrated a very simple attack using Win32 API calls that enabled the button, and allowed the user to click on it, whether or not the application had disabled it or not.

In this article, I will be discussing the technique I used, and how you can write code that is protected from this kind of attack.

To demonstrate the attack to my client, I implemented my code in Microsoft Excel. I did this because most desktop PCs tend to have Microsoft Excel and I wanted to show my client that you do not need development tools like Visual Studio in order to orchestrate a simple attack like this.
In addition, I used a standard network user account to to show that this attack can be performed by a standard user, with the only requirement (in this scenario) being access to VBA in an Office product and the ability to execute macros in the product.

In order to enable the disabled button, the user will need to carry out the following actions:
- Find the window handle (HWND) of the button
- Enable the button

To start new Microsoft Excel workbook should be created, and the Visual Basic Editor (Tools -> Macros -> Visual Basic Editor) should be open.
A new Module must be added to the current workbook. The following code must go into the Module - it will not work unless it is in a VBA code Module (not WorkSheet or WorkBook code).

Find the HWND
In order to enable the button, the user needs to know the window handle (which is often referred to as the HWND) of the button.
If the user can use a tool such as Spy++ this is very easy, but as most users will not have it we need a way of discovering the window handle from Excel.

An easy way to do this is to produce a list of window handles that exist in the current session.
This is done by making an API call to EnumWindows (which lives in user32).
EnumWindows requires a callback to enumerate the window list, which is simple to achieve in Excel; We simply declare a method that matches the callback specification:
---
Private Function EnumWindowsProc(ByVal hwnd As Long, ByVal lparam As Long) As Long
---

This method matches the requirement, so we can now make the API call as required. To do this, we need to make the API method declaration and call it:
---
' This is the API method declaration
Private Declare Function EnumWindows Lib "user32" (ByVal callback As Long, ByVal lparam As Long) As Boolean

' This method, when invoked, will make the API call
Public Sub EnumerateAllWindows()
  EnumWindows AddressOf EnumWindowsProc, 0
End Sub
---

When the EnumerateAllWindows method is invoked, the API call is made to EnumWindows, and our callback method (EnumWindowsProc) will be invoked.

In EnumWindowsProc, we get passed a window handle in hwnd. The lparam parameter serves no purpose in this example and can be ignored (it will be 0).
The value in hwnd will be the first enumerated window handle. To get the next one, the EnumWindowsProc must return a true value:
---
Private Function EnumWindowsProc(ByVal hwnd As Long, ByVal lparam As Long) As Long
  Debug.Print hwnd
  EnumWindowsProc = 1
End Function
---

This will then be repeatedly invoked until all the window have been enumerated, and the Immediate window will now contain a list of a couple of hundred numbers.

However, this is not very useful as far as the user is concerned, because they do not know which of these window handles are any use to them.
To help the user, the name of the window whose handle has been displayed can also be obtained.
To do this, an API call to GetWindowTextA (which also lives in user32) can be made, which has the following API method declaration:
---
Private Declare Function GetWindowTextA Lib "user32" (ByVal hwnd As Long, ByVal Text As String, ByVal maxcount As Long) As Long
---

And now in EnumWindowsProc, the following code is added before the Debug.Print statement:
---
  Dim Length As Long
  Dim Text As String

  ' Empty string to accept a name
  Text = String(100, " ")

  ' Get the text
  Length = GetWindowTextA(hwnd, Text, 100)
  If Length > 0 Then
    Text = Mid$(Text, 1, Length)
  Else
    Text = ""
  End If

  Debug.Print Text
---

There are some important points to note when using GetWindowTextA:
- A string (named Text) of length 100, containing spaces, was created for the string to be put into. This must be done, else the string will always be empty.
- The resulting string (from GetWindowTextA) was terminated using the length returned. If this was not done the Text would contain a string terminator (null, '\0') and then spaces up to the 100th character and would be unusable.

After including the above code, invoking the method will dump the name of the window and the window handle to the immediate window.

As event this output is not ideal, Excel can be used to dump the list to the current Sheet. A member variable, RowCounter, of type Integer is added to the Module, and then in EnumerateChildWindows method, is set this value to 1.
Now in EnumWindowsProc, the Debug.Print statements are changed to:
---
  Application.ActiveSheet.Cells(RowCounter, 1).Value = Str(hwnd)
  Application.ActiveSheet.Cells(RowCounter, 2).Value = Text

  RowCounter = RowCounter + 1
---

When this code is now invoked, it dumps a list of window handles and window names to the current spreadsheet - much more useful when looking for something.

Now that is done, a list of the window handles in the system can be obtained. This can be tested by running Calc (calc.exe) and then invoking EnumerateAllWindows. The text "Calculator" will be visible next to a window handle on the list somewhere.

That's a good start, but if the user is trying to enable a button on a form, at the moment they will only have visiblity of the form's window handle.

To have a look at what is on the form, another API call, EnumChildWindows. EnumChildWindows will be used to enumerate all the child windows of a specified window handle.
Each button (and indeed most forms components) will have their own window handle.
The declaration for EnumChildWindows is:
---
Private Declare Function EnumChildWindows Lib "user32" (ByVal hwndParent As Long, ByVal callback As Long, ByVal lparam As Long) As Boolean
---

In the declaration hwndParent refers to the parent window handle, and then the callback and lparam parameters are the same as before.
To use EnumChildWindows, another callback method is required, which is declared in the same was as with EnumWindowsProc:
---
Private Function EnumChildWindowsProc(ByVal hwnd As Long, ByVal lparam As Long) As Long
---

The method body is the same as EnumWindowsProc, so it will also produce a list of window handles and associated text.

A method called EnumerateChildWindows is declared, which will read the number (window handle) in the currently selected cell and enumerate child windows of it:
---
Public Sub EnumerateChildWindows()
  If IsNumeric(Application.Selection) Then
    Dim Number As Long

    Number = Application.Selection
    If Number > 0 Then
      Application.ActiveSheet.Cells.Clear
      RowCounter = 1
      EnumChildWindows Number, AddressOf EnumChildWindowsProc, 0
    End If
  End If
End Sub
---

If the user clicks on a window handle in the worksheet and then runs the method EnumerateChildWindows, the current list will be cleared and then the child windows of the selected window handle will be displayed.

This can be repeated if necessary to pass through multiple layers of child windows, but it is most likely that the button the user is looking for has been found the first time EnumerateChildWindows is invoked.

This can be tested this by:
- Run Calc (calc.exe)
- Invoking EnumerateAllWindows
- Selecting the window handle next to "Calculator" in the list
- Invoking EnumerateChildWindows
A list that contains all of the controls on the Calculator form should now be displayed.

The user will now be able to obtain the window handle of the button they wish to enable.

Enable The Button
In the above section, the user can obtained the window handle for a control that they wish to enable.

Now the user needs to enable the button.
This is achieved by making an API call to EnableWindow, which is declared in user32:
---
Private Declare Function EnableWindow Lib "user32" (ByVal hwnd As Long, ByVal state As Boolean) As Boolean
---

When making the API call, the window handle must be supplied in hwnd, and the enabled state of the target window in state.
A method can be declared to do this:
---
Public Sub EnableSelectedWindow()
  If IsNumeric(Application.Selection) Then
  Dim Number As Long

  Number = Application.Selection
  If Number > 0 Then
  EnableWindow Number, True
    End If
  End If
End Sub
---

Now, the user can simply select a window handle in the list in the spreadsheet and invoke EnableSelectedWindow, and the window will become enabled.

This can be tested by:
- Run Calc (calc.exe)
- Putting Calc into Scientific mode (View -> Scientific). Note that the Ave, Sum, s, Dat, A, B, C, D, E and F buttons are disabled
- Invoking EnumerateAllWindows
- Selecting the window handle next to "Calculator" in the list
- Invoking EnumerateChildWindows
- Select the window handle next to one of the disabled button names (i.e. "Ave") - Invoking EnableSelectedWindow

The button will now be enabled. This can repeated as many times as necessary with other controls on the form.

This method will work on anything that has a window handle in Win32.

Conclusion
It is very easy to modify the state of controls using Win32.
This is because almost all standard controls in Windows have a window handle. This is a fundamental part of the operation of Microsoft Windows, and cannot be prevented.

It is important when designing and developing UI-invoked code that preventative steps are always taken to double-check that code execution is permitted.
As this article has demonstrated, it cannot be assumed that code is protected by simply disabling or hiding UI components.
A much more secure method would be to disable a control but to verify that execution is permitted inside the control's Click event or inside the code that is being executed.

Additionally, it is important to realise that it is not only the Enabled state of controls that can be influenced. In fact, using the same principal described in this article a user can change any of the following aspects of a window:
- Colour
- Position
- Size
- Visibility

Following the demonstration of the above to my client, they readily agreed that a better approach to protecting code from unwanted execution was required, and they have taken steps to prevent code access in this way.

If you would like to download a copy of the code used in this article, it can be downloaded here. This code is written (as per this article) for use in Microsoft Excel but can easily be changed to work with other Office products.

Have fun!

Monday 10 September 2007

Trial Game Unlocking

The other day, I downloaded a trial version of a game from a website.
The game was a fairly simple one where you have a series of balls rolling along a track, and you have to shoot coloured balls at them to cause them to pop.

I was enjoying the game a lot - it was great fun - but then after 60 minutes the game turned itself off and displayed a message stating that my trial was over and that I would have to purchase the game to continue playing.
A lot of software these days is based on trial use systems, whereby the user is entitled to a certain amount of use of the program, and then after a this time has been used the program locks itself so that it can no longer function, or functions with severely reduced functionality.

The problem with this method of software protection though, is that the actual program is still on the user's PC - after the trial period the software "locks" but is actually still fully available.

With this in mind, I decided to have some fun seeing how easy it would be to simply bypass the trial protection and to continue playing the game.
Before I continue, I would like to say that I do believe people should pay a fair price for software, as I myself am aware of how much effort goes into producing applications and games, and that I did actually purchased the software in question.

To start, I installed the software on a new PC, thereby unlocking it for full use.
I then ran the software.
The first thing that happened was that a nag screen appeared which informed me that I had only 60 minutes of free gameplay left. I clicked on the Play button on this screen to start the game. The nag screen then disappeared and a couple of seconds later the game appeared on screen.

I closed the program, and then opened OllyDBG. I opened the program in OllyDBG and selected Run. The nag screen appeared.
At this point I Paused OllyDBG and noticed I was in kernel32. I selected "Execute till user code".
I now noticed I was in a large proc that seemed to control the trial window that had been displayed.
As the trial window had been kind enough to inform me how much time was left in my trial period, I knew that this value had to have been supplied from somewhre, so I decided to have a look through the proc.
I saw something that interested me at +5FAD, which was the following:
  PUSH   00533940   UNICODE "gTimeLeft"
Now time left could of course be for a number of different things, but it was a good place to start looking for what I was after. I set a breakpoint on this line and then let the program continue to run.
However, my breakpoint did not hit. So I tried a different method, I restarted the program, and then checked to ensure the breakpoint was still set on that line.
I then seleced Run.
A short while later, the breakpoint was hit - this was obviously used as part of the initialisation of the trial screen rather than during it's general running.

The first thing I did now that the breakpoint had hit was to have a look at the new few lines of code:
+5FAD  PUSH   00533940                    UNICODE "gTimeLeft"
+5FB2  LEA    ECX, DWORD PTR SS:[EBP-2C]
+5FB5  CALL   +87D3
+5FBA  MOV    DWORD PTR SS:[EBP-4], 1D
+5FC1  PUSH   DWORD PTR DS:[5BA530]
I made the initial assumption that at +5FB5, there was a call to a proc that retrieved this value "gTimeLeft". I quickly stepped through the proc and resulting sub-proc calls to see whether I could find anything obvious, but couldn't.
I decided to get back to the main proc and step through this to see whether there was anything else helpful.
I stepped through until I got to +5FC1, where the value at 5BA530 was pushed onto the stack.
The value at this location turned out to be 0x3C (60), which was identical to the 60 minutes displayed on the trail screen. However, this could be pure coincidence and I wanted to be sure.
I set a new breakpoint on this line and then let the program run. I played it for a couple of minutes and then closed it.
After OllyDBG had informed me that the process had terminated, I restarted and ran the program, and shortly afterwards my breakpoint was hit again. This time the value was 0x3A (58). I let the program run so that the trial screen would appear, and lo and behold the message displayed the fact that I had 58 minutes remaining in my trial.

Armed with an apparent location for the amount of time left in the game, I restarted the program, and then decided to perform an analysis on any code which accessed this memory location (5BA530).
In OllyDBG, I examined the location in memory and then used "Find References" to enumerate all the places where this memory location was referenced.
It turned out that there were 4 locations:
+37B4  MOV    DWORD PTR DS:[5BA530], EAX
+5FC1  PUSH   DWORD PTR DS:[5BA530]
+6391  CMP    DWORD PTR DS:[5BA530], EBC
+68DC  PUSH   DWORD PTR DS:[5BA530]
The most sensible way to proceed was to set a breakpoint on each of them, which I then did, and I then restarted and ran the program again.

The first breakpoint that was hit was at +37B4, which seemed to load the number of remaining minutes into the memory location. At this stage, I thought that a simple modification of this line would allow one to play indefinitely, but I wanted a slightly cleaner solution, so I continued execution.
The next breakpoint hit was at +5FC1, which is where the value seemed to be loaded for display on the trial screen. I again continued execution. The trial screen was displayed, so I clicked on the Play button to load the game itself.
The next breakpoint to be hit was at +68DC. This was used to generate a debug message output using OutputDebugStringA in Kernel32. I removed this breakpoint (as it was hit every second) and again continued execution.
However, the final breakpoint was not hit, and so I decided to exit the game and see whether that may trigger anything, but it did not. The process terminated.

So now I had a single point at which I could manipulate the game, +37B4, where the initial remaining time had been set. I restarted the program and ran it until this breakpoint was hit again.
I could easily of changed the remaining time value at this point, or simply modify the code so that instead of loading the remaining time into the address, it always loaded a high fixed value, but this didn't seem like a particularly elegant solution, so I wanted to actually find out where the trial screen was displayed so I could see whether there was a better way of influencing the program.
I stepped through the code until the proc returned (at +37C6). I stepped through the code here, and noticed that at +61FD, there was a call to +61BD, and that if I stepped over this, the trial screen was displayed and the game seemed to load. This is where I put my next breakpoint.
I restarted and ran the program, and then stepped into the proc that was called. I saw that the called proc only had a single call, so I stepped into this too.
Now I ended up in the large proc that contained the "gTimeLeft" line I found earlier, and a reference to the remaining time in 5BA530 (at +5FC1).

I don't want to go through this large proc. A quick glance at it tells me that this is where the trial registration screen is displayed, and I want to intercept the execution before it gets here.
With this in mind, I go back to the original call into the large proc, and back to the original call into the calling proc, which is at +61FD.
There is a small proc here, which includes the line at +61E8:
+61E8 PUSH 00533970 ASCII "TrialScreen"
Now this is starting to look more interesting. This proc seems to be concerned with launching the trial screen. I would like to see where it is called from, so I examine the start of this proc, at +61D7 and see that there is a single call from +7FE3, which I go to.
I put a breakpoint on +7FE3, and then restart and run the program. The breakpoint hits straight away, so I decide to step over the line - I am not interested in how the trial screen is displayed at the moment - and see what happens.
As expected, the trial screen is displayed. I click the Play button and see that the trial screen closes, and immediately I am back into my code, at +7FE8. The line I just stepped over is just responsible for displaying the trial screen.
The line I am now on (at +7FE8) contains the a JMP to +81A6.
If I run the program now the game will load.
So, what will happen if I replace the CALL in +7FE8 with NOP instructions? Will the game load without a trial screen?
I decide to try it, and as I hoped, the trial screen is not displayed - the game just loads. I have managed to find where the trial screen is displayed from.

I decide to have a look around to see what is going on with the code - I am particularly interested in the events leading up to the display of the trial screen.
I notice that there is a CMP at +7FD8 and a JNZ at +8FE1 just above the call to the trial screen. I set a breakpoint on these to see what is happening in them.
After restarting the program, I notice that the CMP is checking to see whether a value is 0 or not. If the value is 0, a JMP is made to the next line at +7FED, essentially this skips the trial screen.
I decide to force this jump by altering the Z flag. I then allow the program to run to see what has happened.
I actually end up being presented with a screen that tells me my trial period is over, and I can no longer play the game. I assume, from this, that this must be what gets called when there are no minutes left to use up and you cannot trial the game any more.
This isn't what I'm looking for though, so I have a look further up the proc to see what I can find. I notice that there is another CMP just above, at +7FC9. I put a breakpoint on this line, and notice that it has jumps from +7E9B and +7EB7.
I want to know whether either of these jumps are what leads to this instruction or whether the execution simply continues from the previous instruction (+7FC7) so I set a breakpoint on +7FC7.
I restart and run the code and see that the instruction +7FC9 is jumped to, because the breakpoint at +7FC7 is never triggered.
There is a JBE just after the CMP. This is always taken. If I force it not to be taken, I am presented with a screen asking for my order number for registration purposes. Again, not really what I am looking for.
So I decide to go back a bit further, to examine the jumps that led us to +7FC9, so I set breakpoints at +7E9B and +8EB7. Both jumps are actually JE instructions.

I restart and run the program, and the breakpoint on +7E9B is triggered, but I notice the JE instruction at this address is not followed. I continue execution.
The breakpoint on +7EB7 is then triggered, and this time the JE is taken, so this is where the program execution branches to the whole registration screen\trial screen\end of trial screen area.

Moving back down to look at the registration, trial and end of trial screen section in the proc, between +7FC7 and +7FF2, I notice that after each screen has been displayed (via a CALL), a JMP is made to +81A6.
Now this seems to be when the program is actually executed, so I go and examine the code located there.
Interestingly, the first things I notice are a ASCII reference to the program file, and a UNICODE reference containing the wors "Unable to extract game executable.".
This must be where the game is actually loaded from!
Now if the game is registered, then the entire section about registration screens and trial screens should be skipped, so I have a look and see if +81A6 is jumped to from anywhere else.
I can see the three jumps from the registration, trial and end of trial screens at +7FD6, +7FE8 and +7FF2, but there are also two additional jumps from +802D and +8049.
Both of these are located in some code that is just below the whole registration screen area, which starts at +7FF7.
I see that there are two jumps to +7FF7, one from +7E4F and one from +7E5B.
I go and have a look at these. They are both conditional JNZ based on CMP statements.
I set breakpoints, restart and run the code.

The breakpoint at +7E4F is triggered (as expected; it is first). I quickly notice that neither this or the second JNZ are taken, and then the execution quickly proceeds to the registration screen.
I thought it may be prudent to force one of the JNZ, so I pick the first one at +7E4F and restart the program.
When the breakpoint is triggered, I simply change the Z register so the jump is taken, and run the program. None of the screens are displayed, and the game loads and plays successfully.

I have found where I can bypass the registration and trial screens.
Simply changing the JNZ to a JMP (and filling the leftover byte with a NOP) will cause the program to execute every time, regardless of minutes remaining (0 or more). The software protection is, effectively, disabled.

Just to see for myself, I wanted to modify the actual EXE file to see whether this change could easily be made permanent.
I open the EXE using UltraEdit and quickly see that the code is not encrypted. I modify the few bytes so that the JNZ is a JMP, and then save the file.
I can now run it every time without any prompting what-so-ever. I have even tried patching it after the trial period ran out (after 60 minutes was up) and it still works.

This kind of technique of software protection is poor. It actually took me longer to write this article than it did for me to break the protection in this game.
The method used for this game relies on the fact that a wrapper has been produced around the game itself, and this wrapper provides all the registration and trial functionality. The actual game has no concept of this what-so-ever.
Also, the wrapper itself is poorly produced. The fact that I can bypass all the protection with a change to a single line of code is staggering. It would be better to have put a check into several places in the code, and also embedded further, regular checks into the game itself.
However, a lot of developers do not understand how vulnerable their code can be, particularly when they write their code in a language such as C++ and then compile it to assembler.
The actual assembler output from their C++ code may be very different from what they originally wrote, and could be susceptable to easy attack using methods like the ones I have used today.

I am not going to produce a patch for this particular bit of software, because I am reluctant to give away the ability to bypass the software protection on what is actually a very good game. Like I said at the start of my article, I have actually purchased this game because I like it so much, and the only reason I have produced this article is because of my academic interest in the protection methods used in the real world.

If you have any queries about this article, please do not hesitate to get in touch.

Until next time...

Thursday 6 September 2007

Good Password Generation?

Following on from my previous articles about password generation and the potential flaws in using password generation tools, I thought it might be useful to produce a tool that generates robust passwords based upon a generation code (such as an asset number).

I had the following objectives for the tool:
- Generate passwords that were random every time
- Generate a repeatable password based on a numeric seed using an algorithm that was not easy to break
I also wished for the generation to be configurable:
- Allow passwords between 1 and 15 characters
- Allow a character set to be specified
- For random passwords, allow a quantity of passwords to be specified
- For repeatable passwords, allow an asset number to be entered

RANDOM PASSWORD GENERATION
I started off by looking at producing passwords that were random every time. These would not be based on asset codes, but instead should generate passwords that appear truly random when compared to each other.

I decided to implement two types of random password generator, both based on Microsoft .Net random number generators.
The first was to be based on the simple subtractive random number generator which is implemented by the .Net class Random.
The second would be based on the cyptographic random number generator, RNGCryptServiceProvider (in System.Security.Cryptography).

To produce the passwords, the random number generator would be used to supply a byte array containing a sequence of character positions from the character set used to seed the passwords.

For simplicity, say I was using the character set 'ABCDE' and I wanted a 8 character password.
I request a sequence of 8 bytes from the random number generator, each with a value between 0 and 4, and receive the byte array '0, 3, 2, 4, 4, 2, 0, 1'.
I then use these positions to retrieve characters from the character set, as so:
0 = A, 3 = D, 2 = C, 4 = E, 4 = E, 2 = C, 0 = A, 1 = B
Which gives a password 'ADCEECAB'.

Passwords based on random number generators can be good because there should be no discernable pattern present in the generated sequence.

However, passwords generated in this way do result in some security problems:
- The passwords are not repeatable because they are not based on an asset number. This means they have to be stored somewhere for later retrieval, which may potentially represent a huge security risk.
- The passwords may not be truly random. Random number generators work from a predefined formula for which gives the appearance of randomness, however, they are not truly random (see below for more on this).

Before I move on, I would like to discusss the randomness of the passwords that can be generated using a random number generator.
A simple random number generator, such as the Random class in Microsoft.Net, is a pseudo-random number generator, that is, the numbers that are generated are not random at all, they are derived mathematically using a formula, and the numbers appear to be random.
The Random class (in .Net) uses a "seed" to generate a sequence of numbers. In .Net, by default this is based on the number of ticks so far in the day. However, a seed can also be supplied directly to the class Constructor.
This highlights an important limitation of simple random number generators - if the same seed is used again, the sequence of numbers generated will be the same.

Create a simple program that instantiates a Random class with a seed of 1, and then gets the first 10 integers (Random.Next()).
Execute the program a few times. Each time the numbers generated will be the same, in the same order.
It is very important to accept that if the seed used to create the Random class can be ascertained, the entire sequence of numbers used to generate a password can be recreated, and this may prove the undoing of any password generator based on pseudo-random number generation.

REPEATABLE PASSWORD GENERATION
For a repeatable password to be generated, an asset code will be required. To simplify matters, the asset code will be limited to numerics only.

I decided to implement two repeatable password generators:
- Simple. This will be based on the principals outlined above with regards to using a fixed seed for random number generation. The seed will be the asset code.
- Complex. This will be based on a method that uses a predefined formula to generate the asset code. I will not reveal the formula because I would like to see whether anyone can figure it out for themselves.

To generate the actual passwords, the repeatable password generation will work in the same way as the random number generator, that is a sequence of valid numbers will be generated that will indicate the positions of the character to be used in the password from a character set.

Repeatable password generators offer advantages over randomly generated passwords, because they do not need to be written down anywhere; They can be generated at any later time simply by supplying the asset code.

However, they do have limitations:
- The password that is generated is, ultimately, based on a formula of some sort and so can be broken given a large enough set to work with
- The tool must be properly protected from unauthorised access. Access to the tool is, essentially, access to the entire system.

THE IMPLEMENTATION
I decided to implement each of the different generation methods (both random and repeatable) in classes that support a common interface, IRNG. This way my generation program does not care which actual implementation is being used to generate the passwords.

I wrote a manager class called PwManager, which will be responsible for generating the passwords using the desired generator.

I then placed the manager class, the interface and the generator classes in an assembly, and referenced this from my Windows Forms project.

In my Windows Forms project, I added a form to allow the user to configure the generation method, password length and character set and also, depending upon the chosen generation method, the number of passwords to be generated or the asset code.
I then added a second form which would be used to display the results of the generation.

When a button is pushed on the first form, the PwManager is instantiated and configured with the appropriate parameters, and the generation of the password(s) takes place. The resulting configuration information and password list is displayed on the second form.

DEMO PROGRAM
If you would like to download and use the password generator program, it can be downloaded here.

There are two assemblies in the download, a library and an executable. The executable is simply a UI which presents the functionality from the library.
Feel free to reference the library from your own code and make use of the generation methods.

A final note: Please do not use the program to generate passwords for your organisation - it is freely available on the web (from my blog) and others could easily obtain it. Instead, the source code is (as always) available upon request - ask me for it and modify it to make it unique to your organisation.

CONCLUSION
I have demonstrated several techniques for generating passwords during this article.
I would be interested to hear if anyone can break the repeatable password generators, you are welcome to try!

Monday 3 September 2007

Bulk Password Generation Part 2

In my previous article I managed to demonstrate that a password generation technique implemented by a tool in an organisation could be easily broken with as few as 300 passwords.
In the article I broke the password generator for the BIOS passwords generated by the aforementioned tool.
In this article, I will discuss breaking the local administrator passwords generated by the tool, and then go on to discuss the tool required to reproduce the passwords.

It should be noted, before I continue, that I do not have the tool used to generate the passwords. Instead, I have been given a list of sequential passwords generated from the tool.
The tool itself generates passwords based on the asset tag of the PC they are for. The asset tags are a number.
The passwords I have been given are for the 500 assets numbered 0 - 499.

Referring to my previous article, the local administrator passwords are 8 characters long, and were structured like this:
UlUlUnns
Where U is Uppercase letter, l is lowercase letter, n is number and s was a symbol.

In the same way as before, I started by looking at the first few passwords:
0 JcVvI32(
1 KdWwJ43)
2 LeXxK54*
3 MfYyL65+
4 NgZzM06-
5 OhAaN17!

At this stage I began to get the same sinking feeling that I got when I worked on the BIOS passwords. The passwords are, again, sequential.
I briefly hoped that the producer of the password generator tool had the aptitude to at least use a different technique for the two different password types...
But fairly soon it became obvious that these passwords were as simple to break as the other ones.
In fact, using the same technique as demonstrated in my previous article, I broke the entire set of formulae for each letter in less than 20 minutes!
Although characters 2 and 4 of this password are lowercase, they are not mixed, so they are still as limited as the uppercase characters, i.e. 26 possibilities in the sequence.
The final formulae for each character turn out to be:
1) l = (n + 113) mod 137 mod 26
2) l = (n + 54) mod 153 mod 26
3) l = (n + 21) mod 113 mod 26
4) l = (n + 99) mod 117 mod 26
5) l = (n + 86) mod 193 mod 26
6) l = (n + 123) mod 127 mod 10
7) l = (n + 202) mod 239 mod 10
8) l = (n + 105) mod 129 mod 10

Where l is the final number of the character in the sequence, and n is the asset number.
The character sequence for positions 1, 3 and 5 is A - Z (0 - 25).
The character sequence for positions 2 and 4 is a - z (0 - 25).
The character sequence for positions 6 and 7 is 0 - 9 (0 - 9).
The character sequence for position 8 is '!#$%&()*+-' ('0123456789').

The only minor benefit that was present with these passwords is that a much larger sequence was needed in order to be sure of the generation techniques.
In fact, I needed the full 500 for character 7 and just under 400 for 3 of the others.

That said though, the generation technique is still extremely poor, for the same reasons that the BIOS password generator was poor, and also:

AS EASY TO BRUTE FORCE (Following on from previous article section)
The BIOS passwords could be relatively easily brute forced.
These passwords are as easy, despite the inclusion of the lowercase characters.
This is because, like the BIOS passwords, the entire sequence is strict in it's structure.
The lowercase characters are always at positions 2 and 4.
If, like with the BIOS passwords, the full character sequence could be used in each position (A-Z, a-z, 0-9, symbols) then there would be 72 possibilities at each position:
72^8 = 722,204,136,308,736
A whopping 722 trillion, over 60,784 times more than for the standard password, and still 36 times more than the strongest possible BIOS password (with 46 possibilities per position).

I had been asked if I could produce a tool that would reproduce the passwords, given the asset number.
I was now armed with the knowledge to do this, so I proceeded to produce a tool to do the work outlined in this and my previous article.

I decided to write the tool in C# (Microsoft.Net 2.0).
A screenshot of the tool is shown below:

To use the tool, enter the asset code into the 'Asset Code' textbox, and click the Go button.
The Administrator password will appear in the "Admin Password" box and the BIOS password will appear in the "BIOS Password" box.
The tool itself simply implements the formulae described in this and my previous article to generate the passwords.
If you would like the source code for the tool, please e-mail me and I will send it to you.

The tool can be downloaded here.

To conclude:
Using password generation tools is a good idea. Company administrators find such tools invaluable in producing random passwords for hundreds, or thousands, of computers at an organisation.
Generation tools can produce passwords that appear to be significantly more random than those that could be thought up by a human.
However, if poorly implemented, password generation tools can pose a significant security risk.
Consider some important facts:
- My friend was able to obtain a list of 500 passwords generated for sequential asset numbers without having any administrative access to his domain
- The password algorithm used was easy to break, being based on a sequential generation system
- The password implementation was extremely poor, using a fixed format and poor combinations of character sequences

The organisation in question would have had a significantly more secure system if:
- The tool was protected so that only administrators could access it
- The algorithms used were truly random
- The passwords were never sequential
- The password format was not predefined

I may explore a password generator program in my next article, to see how it could be done in a more secure way.

If you have any queries about this article then don't hesitate to contact me.

Have fun!

Friday 31 August 2007

Bulk Password Generation Part 1

I recently received a list of passwords from someone that had been generated from an administration tool in use at their organisation.
They were interested in knowing whether I could create a tool that would reproduce the passwords for them.

Apparently, the way the tool worked was to generate a password using the numeric asset identifier of the PC the password was to be used on.
The tool produced two passwords for each PC. I am told that one is the password for the local administrator account, and the other was a password for the system BIOS.

The list I received contained a total of 1000 passwords, two for each of the first 500 assets (0 to 499).

The passwords were similar in appearance.
The first set of passwords (which I believe were the local administrator passwords) were 8 characters long, and were structured like this:
UlUlUnns
Where U is Uppercase letter, l is lowercase letter, n is number and s was a symbol.
The second set of passwords (which, apparently, were the BIOS passwords) were also 8 characters long, and were structured like this:
UUUUUnns

I decided to start with the BIOS passwords (as they appeared simpler - they did not contain mixed case letters), and in this article I will discuss how the BIOS password could be broken.
In my next article I will examine the local administrator passwords.

To start with, I thought it might be helpful to look at the first few passwords. The file displayed each password next to the asset number used to generate it, thus:
0 KCPGB76#
1 LDQHC87$
2 MERID98%
3 NFSJE09&
4 OGTKF10(
5 PHULG21)

You will, as I did, notice immediately that there is a very obvious pattern to these passwords... With the exception of the symbols, each character in the password was simply moved on one place for each subsequent password!
Well I couldn't believe the scheme could be so simple, and to see whether this was correct, I calculated what the password would be for a few asset numbers (ignoring the symbol) and compared the results:
10 UMZQL76
15 ZREVQ21
20 EWJAV76
25 JBOFA21

I first compared my estimate for asset 10, UMZQL76, and this seems a good match for the generated code UMZQL76& (I'll talk about the symbols in a minute).
The second estimate was for asset 15, which I though was ZREVQ21. This again matches ZREVQ21-.
My third estimate, asset 20, matches the actual EWJAV76&.
The fourth estimate, asset 25, broke the rule. I had estimated a password of JBOFA21. The actual password turned out to be EBOFA21-. You will notice that what I had though was a J turned out to be an E. Although other than the first letter, the password was ok.

I decided to see where I had gone wrong. I knew that for asset 20, I had generated the correct password (ignoring the symbol), so to see where things went wrong, I estimated the next asset, 21.
According to the rule I had so far made, asset 21 should have produced FXKBW87. It actually produced AXKBW87.
Again only the first letter was out (I thought F, actual A).
So unlike the previous 21 assets (0-20), where the first letter had increased by one each time (with Z looping around to A), this time it had moved back from E (in asset 20) to A, a step of -4 instead of +1.
If I included this into my formula, that at step 21, the lettering would go back 4 but then continue adding one every time, I can successfully generate the correct first letter for assets 25, 30, 35, etc. up to asset 78.
At asset 78, the lettering again goes back 4, from E to A.
This happens again at assets 135 and 192.
Looking at these "special" assets together, I see that my formula breaks at these numbers:
21
78
135
192
At this point I decided to use some simple maths to help me along.
The difference between 0 and 21 is, of course, 21.
The difference between 21 and 78 is 57.
The difference between 78 and 135 is 57.
The difference between 135 and 192 is 57.
A pattern has emerged - every 57th asset, the letter goes back 4 instead of going forward one. In every case, this is going from E to A.

To help me work things out more easily, I decided to substitute letters for numbers.
The most obvious way to do this was to say A=0, B=1, etc until Z=25.
I then examined the first few assets from 21 onwards. I chose 21 because this is the first number of the "57" sequence.
21 A 0
22 B 1
...
72 Z 25
73 A 0
74 B 1
75 C 2
76 D 3
77 E 4
78 A 0 <--- Restart sequence
79 B 1
So at asset 78, the lettering reset to start from A again (which happened to be 4(E)-4).

I want a way to calculate which letter should be generated from the assset number. I know that 21 should give 0 (A), and 22 should give 1 (B).
I could do asset-21 modulus 26:
21-21 = 0 mod 26 = 0 (A)
22-21 = 1 mod 26 = 1 (B)
...
72-21 = 51 mod 26 = 25 (Z)
73-21 = 52 mod 26 = 0 (A)
...
77-21 = 56 mod 26 = 4 (E)
78-21 = 57 mod 26 = 5 (F) <--- Incorrect
But that would not work the as soon as we reach one of the "reset" assets...
I already know that the sequence resets every 57 assets.
What about is I tried asset modulus 57 first to get the position in the sequence?
20 mod 57 = 20
21 mod 57 = 21
22 mod 57 = 22
...
77 mod 57 = 20
78 mod 57 = 21
Letter 21 is a V though, and I know that the letter for asset 78 is an A.
Now what if I modify the asset number used to generate the code by the difference between 57 and the offset (the asset number of the password whose letter was reset to A first), which was 21?
So, looking at asset 21:
21 + (57 - 21) = 57
. 57 mod 57 = 0
. 0 = A
Now asset 22:
22 + (57 - 21) = 58
. 58 mod 57 = 1
. 1 = B
And a few more (replacing '(57 - 21)' with 36 for simplicity):
76 + 36 = 112 mod 57 = 55 (?)
77 + 36 = 113 mod 57 = 56 (?)
78 + 36 = 114 mod 57 = 0 (A)
However, the results for assets 76 and 77 (55 and 56) don't mean anything, but if I subsequently modulus these with 26 (the number of letters in the sequence):
76 + 36 = 112 mod 57 = 55 mod 26 = 3 (D)
77 + 36 = 113 mod 57 = 56 mod 26 = 4 (E)
78 + 36 = 114 mod 57 = 0 mod 26 = 0 (A)
That looks a bit better.
I tried it out on some random asset numbers between 78 and 499, and the letter was correct in all cases.
So the formula for the number of the first letter (remember 0=A) turns out to be:
l = (n + 36) mod 57 mod 26
Where l is the letter code, and n is the asset number.

Now I had figured out the first letter was generated, I tried for the next 4.
I took each letter and worked out what number is was in my sequence (where A=0 and Z=25), and then looked at where the sequence was disrupted (reset to A).
For letter 2, I noticed that the sequence was reset first at asset 39, and then subsequently at 106 and 173.
For letter 3, it was 30, 101 and 172.
For letter 4, it was 47, 126 and 205.
For letter 5, it was 30, 113 and 196.
So for letter 2, the sequence length was 67 and the modifier was 28 (67 minus the first offset 39).
For letter 3 the sequence length was 71 and the modifier was 41 (71 - 30).
For letter 4 the sequence length was 79 with a modifier of 32.
For letter 5 the sequence length was 83 with a modifier of 53.

This resulted in formulae for each of these letters of:
1) l = (n + 36) mod 57 mod 26
2) l = (n + 28) mod 67 mod 26
3) l = (n + 41) mod 71 mod 26
4) l = (n + 32) mod 79 mod 26
5) l = (n + 53) mod 83 mod 26
Using these formulae I was able to successfully calculate the first five characters of every single password for the assets up to 499.

Moving onto the two numbers located at positions 6 and 7 in the password, these working in much the same way as the letters, other than the character sequence is 0 - 9 and consists of 10 possibilities, instead of the 26 for the letters (A-Z).
Applying exactly the same method that I used above, I managed to work out a formula for the numbers of:
6) l = (n + 27) mod 99 mod 10
7) l = (n + 116) mod 197 mod 10

Finally, the symbols are all that is left to work out.
Through examining the symbols that are used for various sequential codes, I managed to work out the the symbol sequence was had 10 possibilities, in the order "!#$%&()*+-".
Again, using the same technique described above, I managed to work out the formula for the symbols as:
8) l = (n + 101) mod 107 mod 10

Using the formula for each character of the password, I can now generate the correct BIOS password for any asset number.
For example, for asset 101:
1) l = (101 + 36) mod 57 mod 26
. = 137 mod 57 mod 26
. = 23 mod 26
. = 23 'X'
2) l = (101 + 28) mod 67 mod 26
. = 129 mod 67 mod 26
. = 62 mod 26
. = 10 'K'
3) l = (101 + 41) mod 71 mod 26
. = 142 mod 71 mod 26
. = 0 mod 26
. = 0 'A'
4) l = (101 + 32) mod 79 mod 26
. = 133 mod 79 mod 26
. = 54 mod 26
. = 2 'C'
5) l = (101 + 53) mod 83 mod 26
. = 154 mod 83 mod 26
. = 71 mod 26
. = 19 'T'
6) l = (101 + 27) mod 99 mod 10
. = 128 mod 99 mod 10
. = 29 mod 10
. = 9 '9'
7) l = (101 + 116) mod 197 mod 10
. = 217 mod 197 mod 10
. = 20 mod 10
. = 0 '0'
8) l = (101 + 101) mod 107 mod 10
. = 202 mod 107 mod 10
. = 95 mod 10
. = 5 '('
So the final password for asset 101 is 'XKACT90('.

Some important things to note about this password generation method:
WEAK TECHNIQUE
The technique used to generate these passwords is weak.
If you saw two or three sequential passwords, you would immediately sense a flaw due to the sequential nature of the generated passwords.
To make a highly educated guess at the formula for each character using the technique I described above, you would need less than 300 sequential passwords.
You could actually make a confident guess at 5 out of 8 of the character formulae with less than 150 sequential passwords.

POOR IMPLEMENTATION
Whoever created the scheme had the forsight to use complex password characters, by including the numerics and symbols in the password.
However, because the 6th and 7th character are always numbers and the final character is always one of 10 symbols, this actually reduces the complexity of the password significantly!
You would only need to see two separate passwords (not even sequential) to surmise that the format was fixed, 5 letters followed by 2 numbers and a single symbol.
A better implementation would have been to include numbers and symbols in each character sequence for each letter, thereby increasing the complexity of the password to 46 possibilities per password character.

EASY TO BRUTE FORCE
Consider that to break the password using brute force, you would currently need to break 5 letters (26 possibilities each), 2 numbers (10 possibilities each) and a symbol (10 possibilities):
26^5 x 10^3 = 11,881,376,000
But if a fuller implementation (letters, numbers and symbols in each character set) had been implemented (46 possibilities each):
46^8 = 20,047,612,231,936
This is over 1,687 times the number of combination than for the current password.
The math is simple 11 billion vs 20 trillion combination!

In my next article I will look at the local administrator passwords that comprise the second half of the password list I was given and discuss the use of such methods to generate password lists.

Until then...

Tuesday 21 August 2007

Protecting .Net Assemblies Part 2

In my last article, we looked at techniques for obfuscating the string and meta data inside our assemblies so that they could not be easily understood.

In this article, I would like to explore another technique which I came across the other day, which basically involves hiding a compiled assembly from disassembler tools so that it cannot be disassembled easily.

The way it worked is, in my opinion, fairly clever and although the technique has been around since long before .Net was released, and is not specifically used to prevent disassembly, it is still a very effective way of deterring casuals attempts to reverse engineer one's code.
Basically, the way in which this works is to produce a program in two assemblies.
The first assembly consists of the public face to the program. The second assembly consists of all of the actual functionality for the program.
In the first assembly (lets call it A), we create an Interface, which I will call IRunApp for sake of argument. The second assembly (B) references the public assembly (A), and implements the IRunApp Interface in class RunApp
The IRunApp Interface has a single parameterless void method which I will call Start.

Now we compile B so that it becomes a .Net IL assembly file. We make a note of it's size (say 32K exactly for sake of argument).
In A, in the main startup code, we add some lines that will do the following:
- Get the currently executing Assembly (A)
- Get the loaded Module of this Assembly
- Read the bytes from the total length of the Module - 32K to the end of the Module
- Load these bytes into an Assembly (B)
- Using Reflection, create an instance of the IRunApp Interface via the class RunApp in B
- Execute the Start method of IRunApp in B
We then compile A so that it also becomes a .Net IL assembly. We then take the entire content of the file B and append it to the end of the file A (using a hex-editor or perhaps a tool you have written yourself).
Then we run A, and see that the the code in B has been successfully executed. If we examine the code using ILDASM, we see that only assmelby A is visible - we can see that A does something to load some code by Reflection but we do not actually have access to the code in B.

I have produced two demos that show this functionality:
Demo A
In Demo A, there are two assemblies - the main program executable (A) and the library that will eventually become the hidden assembly (B).
In this demo, A loads the file content of B directly, and then invokes the IRunApp.Start() method using Reflection. This sample is there just to give you an example of how reflection works.
You should also examine the assemblies using ILDASM or another disassembly tool, just to see how visible the IL is in both cases (particularly in B).

Demo B
In Demo B, there is a single assembly which is the program executable. This contains the main assembly and also contains the embedded assembly B, which has been embedded using the technique described above.
In this demo, A loads the assembly B using the technique I described in HIDING ASSEMBLIES above. B is then invoked using Reflection.
If you open the executable in ILDASM you will see the code disassembled that carries out the work to load and execute B but, importantly, you will not see the actual code for B.

The samples are available by clicking the links below:
- Demo A
- Demo B

ADDITIONAL TECHNIQUES
Well assuming that you have implemented an assembly hiding method now, you are probably wondering what is to stop someone who understands the technique from simply using a hex-editor and cutting and pasting your embedded assemblies out to separate files.
Well the truth of the matter is that it is highly unlikely that you will ever stop someone who knows what they are doing from accessing your .Net code in this way, but you can still use a few more methods to make it more difficult:
- Add fake libraries into the file
You can add some fake assemblies to the end of the file, and make one of these your valid assembly. This will not make it impossible to access the real assembly but it will make it more difficult.
- Encrypt your libraries
You can encrypt the libraries that are embedded in the file so that they cannot just be extracted using a hex-editor. A compiled assembly has a distinctive layout, which will be completely hidden by encryption.
- Use remoting or a WebService for sensitive business logic
If you are able to, you can use .Net Remoting or a WebService to host your business logic and other sensitive code. You can invoke the service from your application code.

There isn't a lot more to say on the technique for hiding assemblies, other than you should make the effort to protect your source code in every way possible. People are often flippant about security, but you should not forget that your source code is your intellectual property - do you really want others to see how you do things in your applications, particularly if you are providing an application that is unique or satisfies a niche demand.

If you have any suggestions or queries about this article, just drop me an e-mail.

Have fun!

Friday 17 August 2007

Protecting .Net Assemblies Part 1

Over the next two articles, I thought it would be interesting to look at a couple of techniques to make .Net assemblies less easy to disassemble.

Now I will presume that you already know that to disassemble a .Net assembly (EXE or DLL), all you need to do is open the assembly using the ILDASM tool, which comes as part of the Microsoft.Net SDK.

If you don't know this already, give it a try now - open ILDASM (through the VS command prompt) and then open the latest .Net assembly you worked on using it.

ILDASM can be used to reverse engineer most .Net projects if one so desired, but this would be a long and painstaking process. Converting IL back into VB.NET or C# is reasonably straight forward, but would require significant effort due to the sheer volume of work that would need to be done to achieve the result.

Another much more useful tool is the .Net Reflector, which is produced by Lutz Roeder. This tool provides the same functionality as ILDASM, but in addition it can disassemble the IL into c#, VB.NET, etc., and provides some extremely useful analysis capabilities too. You can download the .Net Reflector here.

It is worth noting that ALL .Net assemblies are vulnerable to disassembly. This is due to the nature in which they are produced. .Net source code, such as C# or VB.NET, is compiled by a .Net compiler (such as Visual Studio.Net) into IL (Intermediate Language).
This is a platform independant pseudo-assembler that is compiled at run-time by the Microsoft.Net Just-In-Time (JIT) compiler into native assembler for execution on the target processor environment (typically x86 in the case of Windows).

So, even if you were not aware that your compiled .Net code can be reversed before you red this article, I think we can all agree now that this is the case.

How then, do we protect our valuable source code from prying eyes?

A quick, simple and relatively effective method is to apply obfuscation to your code.
This involves using an additional tool to "scramble" the metadata and string portions of the code so that they are not readable by ILDASM and other such tools.

If, for instance, you had a MessageBox.Show in your code that display a message "Hello World", the obfuscation tool would replace "Hello World" with an encrypted byte sequence. When examined in a disassembly tool, the byte sequence would be unreadable and, thus, it becomes more difficult to understand what is happening in the program.

As well as strings, the metadata portion of the assembly is scrambled, so that the names of objects, methods, properties, etc. are replaced by a single unprintable character. This is most effective because if you had a method named something like "CheckPassword", which would be readable in the disassembly tool and clearly shows what the method does, it would be replaced by a single character which doesn't mean anything.

Unfortunately, obfuscation does not make it impossible to disassemble a .Net assembly. In fact, it doesn't really even make it difficult, because the raw code is still available in IL. It can be disassembled, and analysis of the code will, in time, reveal the purpose of most of the code portions.

Even the string data is not secure. Analysis of the code will reveal that each time a string (or obfuscated byte sequence) is used, a method is invoked which accepts the byte sequence (and sometimes an integer seed) as a parameter and returns a string. This is the byte decryption method, and the code for this is, sadly, freely available inside the .Net assembly that was obfuscated.
I have produced a tool myself that can break the byte sequence encryption for any string obfuscated by the Dotfuscator tool.

In conclusion, obfuscation adds another layer of protection to our assembly, but it doesn't protect the code in any way, so what else could be done to make the code more secure?

In my next article I will be exploring another tecnique, which should provide another layer of protection from prying eyes... assembly hiding.

Until next time...

Wednesday 15 August 2007

The Outlook PST Breaker

Well it's been too long since I last wrote, so my apologies for that.

Here is the promised PST password tool.

I attach a screenshot:

The tool has the following features:
Encrypt a password
Enter a password into the "Password" textbox and click the Encrypt button.

You will see in the main output window all of the steps taken to encrypt the password.

Attempt to cryptographically break a 32-bit password code
Enter a password code into the "Pw Code" textbox, and select options to help the break. Then click the Decrypt button.

As previously discussed (in my earlier article), using a password length limit, character substitution and a smaller character set will result in significantly faster breaking.

The password code already in the "Pw Code" textbox will break to "adad". If you turn on character substitution, it will break to "¹muleb", in about half the time of the "adad" break.

Show verbox output
Turn this on to see extended information generated during the break. This will slow down the break significantly - it will be in excess of 10 times slower!

The tool is located here.

Let me know if you have any suggestions for it.

Until next time!

Thursday 12 July 2007

Outlook PST Password Decryption

In my previous article I looked at how encryption worked for Microsoft Outlook PSTs. In this article I am going to see if the encryption is reversable and if coded passwords can be decrypted.

Lets look again at the encryption of 'adad' (based on the content of my previous article):

It	C	PV	BV	P	SV	M	NV
1	'a' 0x61 (97)	0x00000000	0x00	0x61 (97)	0x00000000	0x3AB551CE	0x3AB551CE
2	'd' 0x64 (100)	0x3AB551CE	0xCE	0xAA (170)	0x003AB551	0x36034AF6	0x3639FFA7
3	'a' 0x61 (97)	0x3639FFA7	0xA7	0xC6 (198)	0x003639FF	0x72076785	0x72315E7A
4	'd' 0x64 (100)	0x72315E7A	0x7A	0x1E (30)	0x0072315E	0xFA0F3D63	0xFA7D0C3D

As can be seen, 'adad' is encrypted to 0xFA7D0C3D in 4 steps.

So, how can we get from 0xFA7D0C3D to 'adad'? At first glance I think it isn't possible.

Consider the way the password is encrypted. After iteration 1, we have a NV of 0x3AB551CE, which is then right-shifted in iteration 2 so that the 0xCE component is lost. It simply isn't possible to get from the NV 0x3639FFA7 in iteration 2 back to a PV of 0x3AB551CE. The best we could possibly do is get to 0x3AB551??. In addition, there is no way to calculate BV or C at any iteration, because we need two parts of either P, BV or C.

Unfortunately, it appears the only way to decrypt the password is to break it. The encryption is not reversible.

To start the break, I have the value 0xFA7D0C3D, and a list of potential modifier (M) values. Now remember, as I am reversing and breaking the encryption, I am working backwards. The first value I am using is for the last character of the password.

The first thing I do is work out all the possible modifier values that could have been applied to the value I have. This is easy to work out - because my value (NV) starts with 0xFA, I just need to find any modifiers that start with 0xFA, which yields the following:

0xFA0F3D63 - 0x001E
0xFAF83322 - 0x0177

Now we already know that modifiers at positions higher than 0xFF cannot be generated with the ASCII character set (from my previous article on encryption), so I can discard modifier number 2, which leaves me with a single possible modifier, 0xFA0F3D63.

Right, so I now have a value, a modifier and a position. From these, I can calculate a shifted value, which is 0x0072315E (V XOR M), which will give me a PV of 0x72315E?? (left-shift the SV by 8 bits). This gives me the following:

It	C	PV	BV	P	SV	M	NV
?	'?' 0x??	0x72315e??	0x??	0x1e	0x0072315e	0xfa0f3d63	0xfa7d0c3d

Now I still have no chance of calculating BV or C from P alone, but I can extrapolate potential values for these. I am going to substitute C for various different characters, and then XOR these with P to get an assumed BV. Technically, I should submit C for any valid character from the ASCII character set, but for the purposes of this article I am only going to work with the characters 'a' and 'd'. Any final decryption method would clearly need to implement a much larger character set.

Lets have a look at what results we get with each character:

'a' is character code 0x61, which would give BV = 0x61 XOR 0x1E (C XOR P), which gives a BV of 0x7F. This would result in a PV of 0x72315E7F.
'd' is character code 0x64, which would give BV = 0x7A. This would result in a PV of 0x72315E7A.

Because I have no way of knowing which PV is the correct value, I need to check each of them.

So, I use the first PV, 0x72315E7F (which came from character 'a'). I repeat the same process for this value as I did above for the first value (0xFA7D0C3D). This gives only one potential modifier, 0x72076785 from position 0xC6. I now calculate a potential SV and PV as so:

It	C	PV	BV	P	SV	M	NV
-1	'?' 0x??	0x3639fa??	0x??	0xc6	0x003639fa	0x72076785	0x72315e7f
?	'a' 0x61	0x72315e7f	0x7f	0x1e	0x0072315e	0xfa0f3d63	0xfa7d0c3d

And again I repeat the same process of calculating the BV through character substitution, which gives the following:

C = 'a', BV = 0x61 XOR 0xC6, BV = 0xA7, PV = 0x3639FAA7
C = 'd', BV = 0x64 XOR 0xC6, BV = 0xA2, PV = 0x3639FAA2

Following this I now take the PV 0x3639FAA7 and run it through the same process. This results in the following modifiers:

0x36034AF6 - 0x00AA
0x36b8DEAF - 0x01B7

Now I know that modifier positions above 0xFF aren't permitted so I can discard modifier 2, but again modifer 1 is valid so I calculate SV and potential PV with this:

It	C	PV	BV	P	SV	M	NV
-2	'?' 0x??	0x3ab051??	0x??	0xaa	0x003ab051	0x36034af6	0x3639faa7
-1	'a' 0x61	0x3639faa7	0xa7	0xc6	0x003639fa	0x72076785	0x72315e7f
?	'a' 0x61	0x72315e7f	0x7f	0x1e	0x0072315e	0xfa0f3d63	0xfa7d0c3d

And again I repeat the same process of calculating the BV, which gives the following possibilities:

C = 'a', BV = 0x61 XOR 0xAA, BV = 0xCB, PV = 0x3ab051CB
C = 'd', BV = 0x64 XOR 0xAA, BV = 0xCE, PV = 0x3ab051CE

So I take the PV ox 0x3AB051CB and run it again through the same process. This gives a single potential modifier of 0x3AB551ce at position 0x61.

If I repeat the SV and PV calculations with this modifier, it gives me the following:

It	C	PV	BV	P	SV	M	NV
-3	'?' 0x??	0x050005??	0x??	0x61	0x00050005	0x3ab551ce	0x3ab051cb
-2	'a' 0x61	0x3ab051cb	0xcb	0xaa	0x003ab051	0x36034af6	0x3639faa7
-1	'a' 0x61	0x3639faa7	0xa7	0xc6	0x003639fa	0x72076785	0x72315e7f
?	'a' 0x61	0x72315e7f	0x7f	0x1e	0x0072315e	0xfa0f3d63	0xfa7d0c3d

I recalcaulte BV, which gives these possibilites:

C = 'a', BV = 0x61 XOR 0x61, BV = 0x00, PV = 0x05000500
C = 'd', BV = 0x64 XOR 0x61, BV = 0xCE, PV = 0x05000505

Lets again take the first PV, 0x05000500, and run it through the same process, which yields a single modifier of 0x05005713 at position 0xC7.

This time, if I calculate the SV and PV values, I get:

It	C	PV	BV	P	SV	M	NV
-4	'?' 0x??	0x005213??	0x??	0xc7	0x00005213	0x05005713	0x05000500
-3	'a' 0x61	0x05000500	0x00	0x61	0x00050005	0x3ab551ce	0x3ab051cb
-2	'a' 0x61	0x3ab051cb	0xcb	0xaa	0x003ab051	0x36034af6	0x3639faa7
-1	'a' 0x61	0x3639faa7	0xa7	0xc6	0x003639fa	0x72076785	0x72315e7f
?	'a' 0x61	0x72315e7f	0x7f	0x1e	0x0072315e	0xfa0f3d63	0xfa7d0c3d

However, this time I know there aren't any possibilites. A PV of less than 0x01000000 will not yield any modifiers, because there are none that start with 0x00. This means that I know there are no valid letter combinations below where I am now, '?aaaa'.

So I can discard this particular path and move back up to the next possibility, which is at '?daaa'. This has an identical modifier list (because it also starts with a NV of 0x05) and I already know that this will not yield a valid PV, so I can also discard this possibility.

This means I now need to move on to the next possibility, which begins at '?daa'.

I repeat the same process over and over, moving through the other potentials, until I manage to dismiss all possibilites that start '?a'. So I know that the password must be '?d', that is it ends with a 'd'.

So now if I go back to where I started, but work with a 'd' instead of an 'a', I get the following:

It	C	PV	BV	P	SV	M	NV
-1	'?' 0x??	0x3639ff??	0x??	0xc6	0x003639ff	0x72076785	0x72315e7a
?	'd' 0x64	0x72315e7a	0x7a	0x1e	0x0072315e	0xfa0f3d63	0xfa7d0c3d

I then reapply the same rules as before, which dismisses the combinations 'aaad' and 'daad', until I reach '?dad':

It	C	PV	BV	P	SV	M	NV
-3	'?' 0x??	0x000000??	0x??	0x61	0x00000000	0x3ab551ce	0x3ab551ce
-2	'd' 0x64	0x3ab551ce	0xce	0xaa	0x003ab551	0x36034af6	0x3639ffa7
-1	'a' 0x61	0x3639ffa7	0xa7	0xc6	0x003639ff	0x72076785	0x72315e7a
?	'd' 0x64	0x72315e7a	0x7a	0x1e	0x0072315e	0xfa0f3d63	0xfa7d0c3d

Now you'll see we have a P of 0x61, and a SV of 0x0. Look what happens if I calculate the possible BV by using character substitution of C:

C = 'a', BV = 0x61 XOR 0x61, BV = 0x00, PV = 0x00000000
C = 'd', BV = 0x64 XOR 0x61, BV = 0x05, PV = 0x00000005

We know the 'd' cannot be valid because a PV of 0x5 is below the minimum allowed, which is 0x01000000, but look at the 'a' - it results in a PV of 0x0. This means there are no further calculations to do - we have broken the password - and given a final result of 'adad':

It	C	PV	BV	P	SV	M	NV
-3	'a' 0x61	0x0000000	0x00	0x61	0x00000000	0x3ab551ce	0x3ab551ce
-2	'd' 0x64	0x3ab551ce	0xce	0xaa	0x003ab551	0x36034af6	0x3639ffa7
-1	'a' 0x61	0x3639ffa7	0xa7	0xc6	0x003639ff	0x72076785	0x72315e7a
?	'd' 0x64	0x72315e7a	0x7a	0x1e	0x0072315e	0xfa0f3d63	0xfa7d0c3d

Now, I have demonstrated that the password encryption can be broken, and we can definitely get the valid password from the final coded value.

However, breaking the password in this way is very expensive. I have listed some important points that can affect the performance of the break:

CHARACTER SET
If you consider that at each level there are 223 (0xDF) ASCII character possibilities (assuming that control characters below 0x20 are ignored), and that there may be many possible modifiers at each level (up to 4), this means that at each level there could be 892 possibilities.

This is ok if the password is very short, but for a password of length 5 there are 564,708,431,199,232 combinations (over 564 trillion). At length 10 there are 318,895,612,267,497,741,289,677,389,824 (over 318 trillion trillion)!

We can reduce the cost of breaking passwords significantly if we make assumptions during the character substitution stage. We could work with specific character sets, which would reduce the cost as so:

Set	Poss (1)	Poss (length 5)	Poss (length 10)
All 223	892	564,708,431,199,232	318,895,612,267,497,741,289,677,389,824
0-9A-Za-z	248	938,120,019,968	880,069,171,864,760,718,721,024
a-z	104	12,166,529,024	148,024,428,491,834,392,576

As you can see, reducing the available character set would make a huge difference to the number of possibilities. The scenario listed above, however, assumes the worst case - that there are 4 valid modifiers at each level, which is unlikely. For most PV there are 1 or 2 modifier values.

CHARACTER POSITIONING
In the English alphabet, certain letters are much more likely to appear in words than others.

Using this information, we could order the characters so the most likely characters to be used come first, which means more likely letter combinations are broken earlier in the process.

A good example would be to put the letter 'e' before all others, because it is the most common letter used in English words. Letters such as 'q' could be placed at the end of the list or maybe excluded altogether - just taking 'j' and 'q' out of the character set 'a-z' would reduce the possibilities in a 10 letter password by 55% - 80,000 trillion.

PASSWORD LENGTH
We know that the maximum password length for a PST password is 15 characters.

Using this knowledge, we can ensure that we do not waste time calculating impossible passwords. As soon as we reach a 16th decryption step, we can assume that this is not a possible password.

Additionally, we can save significant time in processing if we make an assumption about the password length.

Consider the following, if I have to calculate all the possible passwords for the character set 'a-z' to a length of 15 characters with the maximum 4 modifiers at each level, there are ((26 * 4) to the 15) possibilities - an amazing 1,800,943,505,506,915,684,277,314,125,824!

However, if I assume the maximum password size is 8, then this becomes ((26 * 4) to the 8) possibilities - a mere 13,685,690,504,052,736. This is over 131 trillion times less possibilities.

Obviously this is several orders of magniture less, and will require significantly less effort to break.

LOWEST LEVEL LETTER SUBSTITUTION
At the bottom level of a decryption break, where there is a NV of 0, we might have a remaining character that is not part of our character set.

A good example is the password 'fred', which encrypts to 0x1B6D2E42.

This password can be broken using the standard method, but by allowing the lowest letter of the password to be substituted by any valid ASCII character, it is actually possible to break this faster. The result is the password '¹mrzab', which can be copied and pasted into Outlook.

If we have assumed our password length is 8 or less characters, this will result in a significantly faster break (over 50%) for passwords such as these, and there is no additional cost involved for us in allowing this.

The only issue is that you may not necessarily obtain the original password.

Well that concludes my work on Outlook PST password decryption. I believe I have successfully demonstrated that Outlook PST passwords are vulnerable.

Although, as with SourceSafe passwords, I wonder whether there is really a problem that the passwords are vulnerable. Anyone who is concerned about keeping the e-mail very secure will no doubt implement Microsoft Exchange with Active Directory integrated security - which is infinitely more secure than a PST on the desktop.

As always, if you want to keep your data secure use a long password with mixed case letters, numbers and symbols (if permitted). Using such a technique with Outlook PSTs would significantly increase the amount of work required to perform a break by several orders of magnitude. You have been warned...