Rockyou2024 analysis: Mega password list or just noise?  (2024)

Back in June 2021, a large data dump called ‘rockyou2021’ was posted on a popular hacking forum. It was named after the popular password list used in brute-force attacks called ‘Rockyou.txt’ – and it was a pretty big story at the time. You can see our team’s analysis on it here.

Fast forward to 2024 and you may have seen a new compilation doing the rounds: ‘Rockyou2024.’ The original forum post claimed the password list contained ‘over 9.9 billion passwords.’ It’s a serious claim – so our team have dug into the details to analyze whether this is something organizations should be concerned about.

What’s really in Rockyou2024?

Like with ‘Rockyou2021’, there’s been a fair bit of news around ‘Rockyou2024.’ Some sources have repeated the claim of this being the largest password database leak in history with over 10 billion records. However, our team’s analysis indicates this is far from true. The dataset is neither useful as a wordlist, nor is it an alleged list of passwords that can be used to attack potential targets. In all honesty, it’s mostly garbage data, and we wouldn’t recommend focusing energy or efforts on it.

Specops data analysis

The delta between ‘Rockyou2024’ and ‘Rockyou2021’ is 54GB of 146GB. This results in approximately 1.5 billion (1,489,515,500) new records. Of these, processing out a number of standard hashes (as this type of data is usually a lot of these kind of compilations):

  • 138 261 666 bcrypt.txt
  • 480 116 331 md5.txt
  •  202 577 427 sha1.txt
  •  4 5676 538 sha256.txt
  • 0 sha512crypt.txt
  • 6 743 064 sha512.txt
  • 873 375 026 total

This results in additions of 616,140, 474 records (~ 20GB). Note that this is non-inclusive of all hashes or truncated hashes. It’s simply a nuclear stripping of some common hash types to truncate the data down.

Of these, splitting the records by length, you get the following character length groups in descending order:

  • 09  2.1G
  • 34  833M
  • 38  686M
  • 24  650M
  • 10  399M
  • 12  264M
  • 11  245M
  • 13  199M
  • 08  179M
  • 14  176M
  • 63  128M
  • 20  127M
  • 51  114M
  • 41  113M
  • 15  111M
  • 32   85M

With counts of:

  • 09  223 055 687
  • 34   24 940 213
  • 38   18 423 609
  • 24   27 229 401
  • 10   37 978 951
  • 12   21 259 190
  • 11   21 384 664
  • 13   14 833 594
  • 08   20 780 673
  • 14   12 288 359
  • 63    2 085 716
  • 20    6 322 637
  • 51    2 289 684
  • 41    2 814 447
  • 15    7 248 179
  • 32    2 689 279

This amounts to a total of 445 624 283, or 73% of the records. If we now break these lengths down further:

09

9-digit numbers of various complete and partial strings. As usual, this length sits in the middle of a password length range. There may be valid passwords in here, but they’re unlikely to be different from other alternative wordlists. It’s unlikely much of the data is good.

34

A lot of Russian strings, poorly parsed strings, various hashes and truncated hashes.

38

Similar to 34, Russian language strings, various hashes and truncated hashes, for example truncated bcrypt hashes.

24

Largely base64 encoded strings, but they’re not English text. They’re either unicode or another layer of encoding before base64. Not worth investigating right now. As well as, similar to the other longer classes, foreign language strings; which would be better served by other wordlists and rules or masks.

10

Similar to 9 – a large collection of 10-digit numbers into strings.

  • numeric: 18023570 (47.46%)
  • loweralphanum: 11489073 (30.25%)
  • mixedalphanum: 2423042 (6.38%)
  • loweralpha: 2063234 (5.43%)

12

Collection of 12-character strings, numerics, and a lot of IPs.

  • loweralphanum: 6384403 (30.03%)
  • mixedalphaspecialnum: 4633689 (21.8%)
  • specialnum: 3040090 (14.3%)
  • mixedalphanum: 1546756 (7.28%)
  • loweralpha: 1407371 (6.62%)
  • mixedalphaspecial: 922710 (4.34%)
  • mixedalpha: 768977 (3.62%)

11

Collection of 11-character strings, numerics, and some IPs.

  • loweralphanum: 8871162 (41.48%)
  • numeric: 4863557 (22.74%)
  • mixedalphanum: 1839704 (8.6%)
  • loweralpha: 1614811 (7.55%)
  • specialnum: 1070368 (5.01%)
  • mixedalpha: 822882 (3.85%)

13

Collection of 13-chartacter strings, numerics, and a lot of IPs.

08

What we would expect to see from 8 characters.

  • loweralphanum: 10880391 (52.36%)
  • loweralpha: 3873181 (18.64%)
  • mixedalphanum: 2651258 (12.76%)
  • mixedalpha: 906904 (4.36%)
  • upperalphanum: 627632 (3.02%)

14

A lot of unicode junk and stuff clipped out of what looks like ‘ncurses’ output.

63

Generally, just junk, such as poorly processed email addresses and strings from telegram scraping.

20

A lot of Russian and junk. We see this trend from other long character classes; people tend not to run passwords that long, and the way these compilations are collected leads to a lot of hashes or just plain garbage. The leaker in question will often just ram a bunch of data breaches together without any processing for a large file size and media clout (the screenshot of their name on the forum post, etc).

We continue to go off the rails past this point with longer and longer strings of poorly processed junk that isn’t usable for a wordlist to either attack hashes or use as a password in a spray attack.

So, what’s the key takeaway?

What this really comes down to, is the person in question has taken ‘Rockyou2021’ (which received do much uproar for the number of records) and added more collected data from other seemingly low-quality sources. They’ve then posted it with the claim of it being a huge new list in order to get clout and credit.

This dataset should pose minimal to no risk to existing Specops customers, and the value of this dataset as a wordlist in cracking or other attacks is extremely nebulous to nil. The dataset is too large to be of any realistic use as part of any effort to crack a given hash and there’s simply too much low-quality data to successfully use in attacks. The value of the data is negligible compared to good, prepared wordlists and rulesets in the hands of a capable actor.

This list does not in any way impact the threat model of any of our customers and should generally just be ignored as another clickbait compilation.

Security recommendations to deal with Rockyou2024

At the end of the day, RockYou2024 was not a large dump of breached passwords as claimed (though it did contain some). However, there is still a potential that some of the contained data could come from other wordlists or be generated with other attack types. There’s no one-size-fits-all password policy recommendation for organizations looking to prevent attacks making use of the RockYou2021 and Rockyou2024 lists. Each organization will have different compliance needs and security concerns. 

The use of either Specops Password Policy, or an equivalent password filter to enforce sound password policies, is the best defense against attacks with these types of datasets. Its Breached Password Protection feature continuously scans your Active Directory for breached and compromised passwords, notifying end users that they need to change their password immediately.

If looking to simply make use of password length as a defense, organizations could simply require long passwords or passphrases – you can follow a best practice guide for helping end users create long passphrases here. Organizations can also choose to incentivize longer passwords with length-based password aging in Specops Password Policy. 

Interested to know how Specops Password Policy could fit in with your organization? Get in touch and speak to an expert today.

Rockyou2024 analysis: Mega password list or just noise?  (2024)

FAQs

Rockyou2024 analysis: Mega password list or just noise? ? ›

At the end of the day, RockYou2024 was not a large dump of breached passwords as claimed (though it did contain some). However, there is still a potential that some of the contained data could come from other wordlists or be generated with other attack types.

What is the most common password list for brute force? ›

Some of the most commonly found passwords in brute force lists include: date of birth, children's names, qwerty, 123456, abcdef123, a123456, abc123, password, asdf, hello, welcome, zxcvbn, Qazwsx, 654321, 123321, 000000, 111111, 987654321, 1q2w3e, 123qwe, qwertyuiop, gfhjkm.

What is the name of the file path for the password list? ›

Traditionally, the /etc/passwd file is used to keep track of every registered user that has access to a system.

What is a strong password for Mega? ›

What does a great MEGA password look like? MEGA uses AES-128, so your password should have approximately 128 bits of entropy. A great way of achieving this is to generate a random password of 22 characters consisting of uppercase and lowercase letters, numbers, and non-alphanumeric characters.

What is hardest password to brute force? ›

Creating and handling passwords

The hardest passwords to crack, for a given length and character set, are random character strings; if long enough they resist brute force attacks (because there are many characters) and guessing attacks (due to high entropy).

What is the strongest password in the world? ›

Try to include numbers, symbols, and both uppercase and lowercase letters. Avoid using words that can be found in the dictionary. For example, swimming1 would be a weak password. Random passwords are the strongest.

Can a strong password be hacked? ›

Even if you used a powerful password for all your accounts, it would do little to protect you from hackers if one of those accounts has been the subject of a data breach. Using the same password for multiple accounts will allow hackers to utilize credential stuffing.

What is the most common password lookup? ›

According to a study by NordPass, the most commonly used passwords include “123456”, “123456789”, “qwerty”, “password”, and “111111”. These passwords are not only easy to remember but also easy to hack, leaving your personal data vulnerable to cybercriminals.

What are the passwords in brute force? ›

A brute force attack uses trial-and-error to guess login info, encryption keys, or find a hidden web page. Hackers work through all possible combinations hoping to guess correctly.

What are the passwords for brute force testing? ›

What is password brute-forcing? Trying out all possible combinations of characters until the “correct answer” is found. This process can take a very long time, so dictionaries and lists of common passwords like "qwerty" or "123456" are usually used.

References

Top Articles
Latest Posts
Article information

Author: Msgr. Refugio Daniel

Last Updated:

Views: 5944

Rating: 4.3 / 5 (54 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Msgr. Refugio Daniel

Birthday: 1999-09-15

Address: 8416 Beatty Center, Derekfort, VA 72092-0500

Phone: +6838967160603

Job: Mining Executive

Hobby: Woodworking, Knitting, Fishing, Coffee roasting, Kayaking, Horseback riding, Kite flying

Introduction: My name is Msgr. Refugio Daniel, I am a fine, precious, encouraging, calm, glamorous, vivacious, friendly person who loves writing and wants to share my knowledge and understanding with you.