TLDR: We use a custom dictionary to crack Microsoft Office document encryption. Then we use a custom dictionary for pwnage in LinkedIn hash database.
I recently got a couple of questions about a better way to crack encrypted Excel files. The question came from BHIS’s extended community who is using commercial password-recovery tools with distributed CPU and GPU processing power. The problem is they were still getting ridiculously slow hashing speeds making brute force unfitting.
In discussing our typical run-down of hashing on John the Ripper (JTR) and Hashcat, the user responded with “I used that 15 years ago… people still do that”?
Yes… Yes, we still do that.
In fact, both JTR and Hashcat have active development to this day.
To be fair, I can’t say if a commercial software is faster (better, faster, stronger), but I will say that if it includes professional support and you’re dealing with something complicated, that’s always nice to have. There’s a mantra that Black Hills Information Security SysAdmins have: we are neither pro-proprietary nor pro-open source; we are pro-security awareness. Commercial software definitely can have its niche and a quick survey of Password Recovery software shows some interesting offerings, especially regarding distributed workloads that the open-source community has struggled to find significant growth.
The slow hash-cracking is the result of efforts the Microsoft Office application puts into storing the password hash and encrypting the document. The encryption methods are far more complex than they used to be in earlier Office versions. Office 2013 encryption uses 128-bit AES using SHA-512 algorithm. The more processing power used to create the hash, the harder it is to attempt multiple combinations to find that matching hash.
Interestingly, Microsoft also left a backdoor in all Office 2013 encrypted documents that allowed the use of a Master Key. Microsoft even made DocRecrypt Tool that would allow an IT Admin to decrypt or re-crypt an Office document without the original password by using certificate-signing services on the domain. These and other attack vectors have been researched by the community and could yield potential attack vectors that may entirely circumvent hash-cracking encrypted documents altogether.
Setup the Encrypted Document:
First, I’ve created an Excel document and filled it with some fictitious data.
Now, I’m going to “Encrypt with a Password”
Let’s try to use a password I figure might be in a common dictionary somewhere: buckeye31.
(side note, I used “shuf -n 1 rockyou.txt”)
After saving the document, I try to open it again to verify its encrypted.
We don’t want to actually crack the Excel file itself– we just want to crack the hash of the password that was used to encrypt the Excel file. To do this, we need a tool that will read the Excel file, and deliver us a plaintext-hash of the password used in the encryption processing of the file. Now, typically I’d refer to Hashcat-Utils, but, the tool I need isn’t there. Since we also have JTR compiled on the same cracking system, I’m going to use JTR’s office2john.py.
Office2john.py [EXCEL FILE] > hash.txt
Office2John.py identified the hash and determined it’s using MS Office 2013’s encryption method, so despite using Office 2016 it looks like the hash mechanism is still the same. I could use JTR here on out, but I’m still partial to Hashcat, despite having to look up the Hash-type code that I otherwise wouldn’t have to if I just used JTR. I’ll need to cut the JTR Office 2013 hash into something that Hashcat will understand and I’ll need to find the Hash method code from Hashcat’s help file.
To convert this JTR formatted string so Hashcat can read it properly, I need to remove the leading “EncryptedBook.xlsx” from the line created by office2john.py. We could use Hashcat’s –username flag, but I prefer to create a clean hash-list file. So I’ll use cut:
Cut hash.txt -d”:” -f 2 >hashhc.txt
Now, let’s give Hashcat some context:
With hashcat64.bin –help I can find that the Hast method code for Office 2013 is 9600
Real quick, I want to check the benchmark for the 9600 hashing method on our HashCat rig:
Hashcat64.bin -m 9600 -b
47,178 h/sec isn’t great, but it sure beats a few hundred.
Now, the password I used is in rockyou.txt ( I did, in fact, pull it out randomly from that file). Let’s see how big our rockyou.txt is:
14,344,393. Not counting overhead, that’s somewhere around 5 minutes.
Shoot, let’s go:
hashcat64.bin -m 9600 hasheshc.txt /opt/wordlists/rockyou.txt -o hashes.pot
Four minutes later…
Not surprising, the password was found in rockyou.txt.
But what if we just knew it had some lowercase-letters followed by a couple of numbers?
hashcat64.bin -m 9600 -a 3 hasheshc.txt ?l?l?l?l?l?l?d?d -o hashes.pot
SEVEN days. Wow, ouch.
Wait… what if we just knew it was 8 characters but knew nothing else?
hashcat64.bin -m 9600 -a 3 hasheshc.txt ?a?a?a?a?a?a?a?a -o hashes.pot
Point is you can save yourself about 4577 years if you use a dictionary, or… an 8 character alphanumeric password is pretty good for MS Office encryption, apparently.
What about a different approach?
I’m not a big football fan, but if I knew the author of the Excel file was, I might try to build a custom dictionary. I’ll use cewl to look for keywords about College Football on this Wikipedia page to help me build a dictionary file.
cewl –depth 0 -w customdict.txt https://en.wikipedia.org/wiki/List_of_college_team_nicknames_in_the_United_States
This generated a custom dictionary of 1626 words.
Let’s add all UPPER and lower in there too:
cp customdict.txt customdict.more.txt cat customdict.txt | tr ‘[:upper:]’ ‘[:lower:]’ >> customdict.more.txt cat customdict.txt | tr ‘[:lower:]’ ‘[:upper:]’ >> customdict.more.txt
Now we’re at 4878 words.
Let’s go a bit farther and run hashcat-utils expander to expand out all those words.
(Note, I had to recompile expander to expand out to 8 characters…)
cat customdict.more.txt | /opt/hashcat-utils/src/expander.bin > customdict.more.expanded.txt
392,322 words. Now what?
Now, let’s add a couple of numbers at the end of the wordlist using hashcat’s hybrid wordlist attack:
hashcat64.bin --session HashBlog1 -a 6 -m 9600 hashhc.txt customdict.more.expanded.txt ?d?d -o hash.pot
In 27 seconds we had a winner.
Dictionaries are where it is at for process-intensive hashes.
If you’re like most people and not using random alphanumerics and symbols, anything someone knows about you, including your sports preferences, could be used in a word list to cut downtime cracking passwords only you (think you) know.
Hold on, this was all fictitious and you knew the password to begin with. No one would actually use those passwords…
Just for fun, let’s test our custom.more.expanded.txt word dictionary across a known hash-release of the LinkedIn ~60M hash release. Since it uses SHA1 and hashing will go insanely fast, we’re going to add a couple of alphanumeric’s at the end of each word in our dictionary too.
hashcat64.bin -a 6 -m 100 68_hash.txt customdict.more.expanded.txt ?a?a -o test.pot
We hit 1.27% of those ~60 million LinkedIn hashes with our College Football sourced dictionary and it took 22 seconds.
DigINinja’s CeWL: https://digi.ninja/projects/cewl.php
John the Ripper: http://www.openwall.com/john/
Microsoft Office Document Encryption: https://technet.microsoft.com/en-us/library/cc179125.aspx
Black Hills Information Security Hashcat Blogs:
Black Hills Information Security Password Cracking Rig Build:
Black Hills Information Security: How to Crack Passwords for Password Protected MS Office Documents:
Join the BHIS Blog Mailing List – get notified when we post new blogs, webcasts, and podcasts.