Chaos to Order?: August 2007

Friday 31 August 2007

Bulk Password Generation Part 1

I recently received a list of passwords from someone that had been generated from an administration tool in use at their organisation.
They were interested in knowing whether I could create a tool that would reproduce the passwords for them.

Apparently, the way the tool worked was to generate a password using the numeric asset identifier of the PC the password was to be used on.
The tool produced two passwords for each PC. I am told that one is the password for the local administrator account, and the other was a password for the system BIOS.

The list I received contained a total of 1000 passwords, two for each of the first 500 assets (0 to 499).

The passwords were similar in appearance.
The first set of passwords (which I believe were the local administrator passwords) were 8 characters long, and were structured like this:
UlUlUnns
Where U is Uppercase letter, l is lowercase letter, n is number and s was a symbol.
The second set of passwords (which, apparently, were the BIOS passwords) were also 8 characters long, and were structured like this:
UUUUUnns

I decided to start with the BIOS passwords (as they appeared simpler - they did not contain mixed case letters), and in this article I will discuss how the BIOS password could be broken.
In my next article I will examine the local administrator passwords.

To start with, I thought it might be helpful to look at the first few passwords. The file displayed each password next to the asset number used to generate it, thus:
0 KCPGB76#
1 LDQHC87$
2 MERID98%
3 NFSJE09&
4 OGTKF10(
5 PHULG21)

You will, as I did, notice immediately that there is a very obvious pattern to these passwords... With the exception of the symbols, each character in the password was simply moved on one place for each subsequent password!
Well I couldn't believe the scheme could be so simple, and to see whether this was correct, I calculated what the password would be for a few asset numbers (ignoring the symbol) and compared the results:
10 UMZQL76
15 ZREVQ21
20 EWJAV76
25 JBOFA21

I first compared my estimate for asset 10, UMZQL76, and this seems a good match for the generated code UMZQL76& (I'll talk about the symbols in a minute).
The second estimate was for asset 15, which I though was ZREVQ21. This again matches ZREVQ21-.
My third estimate, asset 20, matches the actual EWJAV76&.
The fourth estimate, asset 25, broke the rule. I had estimated a password of JBOFA21. The actual password turned out to be EBOFA21-. You will notice that what I had though was a J turned out to be an E. Although other than the first letter, the password was ok.

I decided to see where I had gone wrong. I knew that for asset 20, I had generated the correct password (ignoring the symbol), so to see where things went wrong, I estimated the next asset, 21.
According to the rule I had so far made, asset 21 should have produced FXKBW87. It actually produced AXKBW87.
Again only the first letter was out (I thought F, actual A).
So unlike the previous 21 assets (0-20), where the first letter had increased by one each time (with Z looping around to A), this time it had moved back from E (in asset 20) to A, a step of -4 instead of +1.
If I included this into my formula, that at step 21, the lettering would go back 4 but then continue adding one every time, I can successfully generate the correct first letter for assets 25, 30, 35, etc. up to asset 78.
At asset 78, the lettering again goes back 4, from E to A.
This happens again at assets 135 and 192.
Looking at these "special" assets together, I see that my formula breaks at these numbers:
21
78
135
192
At this point I decided to use some simple maths to help me along.
The difference between 0 and 21 is, of course, 21.
The difference between 21 and 78 is 57.
The difference between 78 and 135 is 57.
The difference between 135 and 192 is 57.
A pattern has emerged - every 57th asset, the letter goes back 4 instead of going forward one. In every case, this is going from E to A.

To help me work things out more easily, I decided to substitute letters for numbers.
The most obvious way to do this was to say A=0, B=1, etc until Z=25.
I then examined the first few assets from 21 onwards. I chose 21 because this is the first number of the "57" sequence.
21 A 0
22 B 1
...
72 Z 25
73 A 0
74 B 1
75 C 2
76 D 3
77 E 4
78 A 0 <--- Restart sequence
79 B 1
So at asset 78, the lettering reset to start from A again (which happened to be 4(E)-4).

I want a way to calculate which letter should be generated from the assset number. I know that 21 should give 0 (A), and 22 should give 1 (B).
I could do asset-21 modulus 26:
21-21 = 0 mod 26 = 0 (A)
22-21 = 1 mod 26 = 1 (B)
...
72-21 = 51 mod 26 = 25 (Z)
73-21 = 52 mod 26 = 0 (A)
...
77-21 = 56 mod 26 = 4 (E)
78-21 = 57 mod 26 = 5 (F) <--- Incorrect
But that would not work the as soon as we reach one of the "reset" assets...
I already know that the sequence resets every 57 assets.
What about is I tried asset modulus 57 first to get the position in the sequence?
20 mod 57 = 20
21 mod 57 = 21
22 mod 57 = 22
...
77 mod 57 = 20
78 mod 57 = 21
Letter 21 is a V though, and I know that the letter for asset 78 is an A.
Now what if I modify the asset number used to generate the code by the difference between 57 and the offset (the asset number of the password whose letter was reset to A first), which was 21?
So, looking at asset 21:
21 + (57 - 21) = 57
. 57 mod 57 = 0
. 0 = A
Now asset 22:
22 + (57 - 21) = 58
. 58 mod 57 = 1
. 1 = B
And a few more (replacing '(57 - 21)' with 36 for simplicity):
76 + 36 = 112 mod 57 = 55 (?)
77 + 36 = 113 mod 57 = 56 (?)
78 + 36 = 114 mod 57 = 0 (A)
However, the results for assets 76 and 77 (55 and 56) don't mean anything, but if I subsequently modulus these with 26 (the number of letters in the sequence):
76 + 36 = 112 mod 57 = 55 mod 26 = 3 (D)
77 + 36 = 113 mod 57 = 56 mod 26 = 4 (E)
78 + 36 = 114 mod 57 = 0 mod 26 = 0 (A)
That looks a bit better.
I tried it out on some random asset numbers between 78 and 499, and the letter was correct in all cases.
So the formula for the number of the first letter (remember 0=A) turns out to be:
l = (n + 36) mod 57 mod 26
Where l is the letter code, and n is the asset number.

Now I had figured out the first letter was generated, I tried for the next 4.
I took each letter and worked out what number is was in my sequence (where A=0 and Z=25), and then looked at where the sequence was disrupted (reset to A).
For letter 2, I noticed that the sequence was reset first at asset 39, and then subsequently at 106 and 173.
For letter 3, it was 30, 101 and 172.
For letter 4, it was 47, 126 and 205.
For letter 5, it was 30, 113 and 196.
So for letter 2, the sequence length was 67 and the modifier was 28 (67 minus the first offset 39).
For letter 3 the sequence length was 71 and the modifier was 41 (71 - 30).
For letter 4 the sequence length was 79 with a modifier of 32.
For letter 5 the sequence length was 83 with a modifier of 53.

This resulted in formulae for each of these letters of:
1) l = (n + 36) mod 57 mod 26
2) l = (n + 28) mod 67 mod 26
3) l = (n + 41) mod 71 mod 26
4) l = (n + 32) mod 79 mod 26
5) l = (n + 53) mod 83 mod 26
Using these formulae I was able to successfully calculate the first five characters of every single password for the assets up to 499.

Moving onto the two numbers located at positions 6 and 7 in the password, these working in much the same way as the letters, other than the character sequence is 0 - 9 and consists of 10 possibilities, instead of the 26 for the letters (A-Z).
Applying exactly the same method that I used above, I managed to work out a formula for the numbers of:
6) l = (n + 27) mod 99 mod 10
7) l = (n + 116) mod 197 mod 10

Finally, the symbols are all that is left to work out.
Through examining the symbols that are used for various sequential codes, I managed to work out the the symbol sequence was had 10 possibilities, in the order "!#$%&()*+-".
Again, using the same technique described above, I managed to work out the formula for the symbols as:
8) l = (n + 101) mod 107 mod 10

Using the formula for each character of the password, I can now generate the correct BIOS password for any asset number.
For example, for asset 101:
1) l = (101 + 36) mod 57 mod 26
. = 137 mod 57 mod 26
. = 23 mod 26
. = 23 'X'
2) l = (101 + 28) mod 67 mod 26
. = 129 mod 67 mod 26
. = 62 mod 26
. = 10 'K'
3) l = (101 + 41) mod 71 mod 26
. = 142 mod 71 mod 26
. = 0 mod 26
. = 0 'A'
4) l = (101 + 32) mod 79 mod 26
. = 133 mod 79 mod 26
. = 54 mod 26
. = 2 'C'
5) l = (101 + 53) mod 83 mod 26
. = 154 mod 83 mod 26
. = 71 mod 26
. = 19 'T'
6) l = (101 + 27) mod 99 mod 10
. = 128 mod 99 mod 10
. = 29 mod 10
. = 9 '9'
7) l = (101 + 116) mod 197 mod 10
. = 217 mod 197 mod 10
. = 20 mod 10
. = 0 '0'
8) l = (101 + 101) mod 107 mod 10
. = 202 mod 107 mod 10
. = 95 mod 10
. = 5 '('
So the final password for asset 101 is 'XKACT90('.

Some important things to note about this password generation method:
WEAK TECHNIQUE
The technique used to generate these passwords is weak.
If you saw two or three sequential passwords, you would immediately sense a flaw due to the sequential nature of the generated passwords.
To make a highly educated guess at the formula for each character using the technique I described above, you would need less than 300 sequential passwords.
You could actually make a confident guess at 5 out of 8 of the character formulae with less than 150 sequential passwords.

POOR IMPLEMENTATION
Whoever created the scheme had the forsight to use complex password characters, by including the numerics and symbols in the password.
However, because the 6th and 7th character are always numbers and the final character is always one of 10 symbols, this actually reduces the complexity of the password significantly!
You would only need to see two separate passwords (not even sequential) to surmise that the format was fixed, 5 letters followed by 2 numbers and a single symbol.
A better implementation would have been to include numbers and symbols in each character sequence for each letter, thereby increasing the complexity of the password to 46 possibilities per password character.

EASY TO BRUTE FORCE
Consider that to break the password using brute force, you would currently need to break 5 letters (26 possibilities each), 2 numbers (10 possibilities each) and a symbol (10 possibilities):
26^5 x 10^3 = 11,881,376,000
But if a fuller implementation (letters, numbers and symbols in each character set) had been implemented (46 possibilities each):
46^8 = 20,047,612,231,936
This is over 1,687 times the number of combination than for the current password.
The math is simple 11 billion vs 20 trillion combination!

In my next article I will look at the local administrator passwords that comprise the second half of the password list I was given and discuss the use of such methods to generate password lists.

Until then...

Tuesday 21 August 2007

Protecting .Net Assemblies Part 2

In my last article, we looked at techniques for obfuscating the string and meta data inside our assemblies so that they could not be easily understood.

In this article, I would like to explore another technique which I came across the other day, which basically involves hiding a compiled assembly from disassembler tools so that it cannot be disassembled easily.

The way it worked is, in my opinion, fairly clever and although the technique has been around since long before .Net was released, and is not specifically used to prevent disassembly, it is still a very effective way of deterring casuals attempts to reverse engineer one's code.
Basically, the way in which this works is to produce a program in two assemblies.
The first assembly consists of the public face to the program. The second assembly consists of all of the actual functionality for the program.
In the first assembly (lets call it A), we create an Interface, which I will call IRunApp for sake of argument. The second assembly (B) references the public assembly (A), and implements the IRunApp Interface in class RunApp
The IRunApp Interface has a single parameterless void method which I will call Start.

Now we compile B so that it becomes a .Net IL assembly file. We make a note of it's size (say 32K exactly for sake of argument).
In A, in the main startup code, we add some lines that will do the following:
- Get the currently executing Assembly (A)
- Get the loaded Module of this Assembly
- Read the bytes from the total length of the Module - 32K to the end of the Module
- Load these bytes into an Assembly (B)
- Using Reflection, create an instance of the IRunApp Interface via the class RunApp in B
- Execute the Start method of IRunApp in B
We then compile A so that it also becomes a .Net IL assembly. We then take the entire content of the file B and append it to the end of the file A (using a hex-editor or perhaps a tool you have written yourself).
Then we run A, and see that the the code in B has been successfully executed. If we examine the code using ILDASM, we see that only assmelby A is visible - we can see that A does something to load some code by Reflection but we do not actually have access to the code in B.

I have produced two demos that show this functionality:
Demo A
In Demo A, there are two assemblies - the main program executable (A) and the library that will eventually become the hidden assembly (B).
In this demo, A loads the file content of B directly, and then invokes the IRunApp.Start() method using Reflection. This sample is there just to give you an example of how reflection works.
You should also examine the assemblies using ILDASM or another disassembly tool, just to see how visible the IL is in both cases (particularly in B).

Demo B
In Demo B, there is a single assembly which is the program executable. This contains the main assembly and also contains the embedded assembly B, which has been embedded using the technique described above.
In this demo, A loads the assembly B using the technique I described in HIDING ASSEMBLIES above. B is then invoked using Reflection.
If you open the executable in ILDASM you will see the code disassembled that carries out the work to load and execute B but, importantly, you will not see the actual code for B.

The samples are available by clicking the links below:
- Demo A
- Demo B

ADDITIONAL TECHNIQUES
Well assuming that you have implemented an assembly hiding method now, you are probably wondering what is to stop someone who understands the technique from simply using a hex-editor and cutting and pasting your embedded assemblies out to separate files.
Well the truth of the matter is that it is highly unlikely that you will ever stop someone who knows what they are doing from accessing your .Net code in this way, but you can still use a few more methods to make it more difficult:
- Add fake libraries into the file
You can add some fake assemblies to the end of the file, and make one of these your valid assembly. This will not make it impossible to access the real assembly but it will make it more difficult.
- Encrypt your libraries
You can encrypt the libraries that are embedded in the file so that they cannot just be extracted using a hex-editor. A compiled assembly has a distinctive layout, which will be completely hidden by encryption.
- Use remoting or a WebService for sensitive business logic
If you are able to, you can use .Net Remoting or a WebService to host your business logic and other sensitive code. You can invoke the service from your application code.

There isn't a lot more to say on the technique for hiding assemblies, other than you should make the effort to protect your source code in every way possible. People are often flippant about security, but you should not forget that your source code is your intellectual property - do you really want others to see how you do things in your applications, particularly if you are providing an application that is unique or satisfies a niche demand.

If you have any suggestions or queries about this article, just drop me an e-mail.

Have fun!

Friday 17 August 2007

Protecting .Net Assemblies Part 1

Over the next two articles, I thought it would be interesting to look at a couple of techniques to make .Net assemblies less easy to disassemble.

Now I will presume that you already know that to disassemble a .Net assembly (EXE or DLL), all you need to do is open the assembly using the ILDASM tool, which comes as part of the Microsoft.Net SDK.

If you don't know this already, give it a try now - open ILDASM (through the VS command prompt) and then open the latest .Net assembly you worked on using it.

ILDASM can be used to reverse engineer most .Net projects if one so desired, but this would be a long and painstaking process. Converting IL back into VB.NET or C# is reasonably straight forward, but would require significant effort due to the sheer volume of work that would need to be done to achieve the result.

Another much more useful tool is the .Net Reflector, which is produced by Lutz Roeder. This tool provides the same functionality as ILDASM, but in addition it can disassemble the IL into c#, VB.NET, etc., and provides some extremely useful analysis capabilities too. You can download the .Net Reflector here.

It is worth noting that ALL .Net assemblies are vulnerable to disassembly. This is due to the nature in which they are produced. .Net source code, such as C# or VB.NET, is compiled by a .Net compiler (such as Visual Studio.Net) into IL (Intermediate Language).
This is a platform independant pseudo-assembler that is compiled at run-time by the Microsoft.Net Just-In-Time (JIT) compiler into native assembler for execution on the target processor environment (typically x86 in the case of Windows).

So, even if you were not aware that your compiled .Net code can be reversed before you red this article, I think we can all agree now that this is the case.

How then, do we protect our valuable source code from prying eyes?

A quick, simple and relatively effective method is to apply obfuscation to your code.
This involves using an additional tool to "scramble" the metadata and string portions of the code so that they are not readable by ILDASM and other such tools.

If, for instance, you had a MessageBox.Show in your code that display a message "Hello World", the obfuscation tool would replace "Hello World" with an encrypted byte sequence. When examined in a disassembly tool, the byte sequence would be unreadable and, thus, it becomes more difficult to understand what is happening in the program.

As well as strings, the metadata portion of the assembly is scrambled, so that the names of objects, methods, properties, etc. are replaced by a single unprintable character. This is most effective because if you had a method named something like "CheckPassword", which would be readable in the disassembly tool and clearly shows what the method does, it would be replaced by a single character which doesn't mean anything.

Unfortunately, obfuscation does not make it impossible to disassemble a .Net assembly. In fact, it doesn't really even make it difficult, because the raw code is still available in IL. It can be disassembled, and analysis of the code will, in time, reveal the purpose of most of the code portions.

Even the string data is not secure. Analysis of the code will reveal that each time a string (or obfuscated byte sequence) is used, a method is invoked which accepts the byte sequence (and sometimes an integer seed) as a parameter and returns a string. This is the byte decryption method, and the code for this is, sadly, freely available inside the .Net assembly that was obfuscated.
I have produced a tool myself that can break the byte sequence encryption for any string obfuscated by the Dotfuscator tool.

In conclusion, obfuscation adds another layer of protection to our assembly, but it doesn't protect the code in any way, so what else could be done to make the code more secure?

In my next article I will be exploring another tecnique, which should provide another layer of protection from prying eyes... assembly hiding.

Until next time...

Wednesday 15 August 2007

The Outlook PST Breaker

Well it's been too long since I last wrote, so my apologies for that.

Here is the promised PST password tool.

I attach a screenshot:

The tool has the following features:
Encrypt a password
Enter a password into the "Password" textbox and click the Encrypt button.

You will see in the main output window all of the steps taken to encrypt the password.

Attempt to cryptographically break a 32-bit password code
Enter a password code into the "Pw Code" textbox, and select options to help the break. Then click the Decrypt button.

As previously discussed (in my earlier article), using a password length limit, character substitution and a smaller character set will result in significantly faster breaking.

The password code already in the "Pw Code" textbox will break to "adad". If you turn on character substitution, it will break to "¹muleb", in about half the time of the "adad" break.

Show verbox output
Turn this on to see extended information generated during the break. This will slow down the break significantly - it will be in excess of 10 times slower!

The tool is located here.

Let me know if you have any suggestions for it.

Until next time!