What are random numbers and how they are managed on Linux?


In this article, we will deep dive into the major concepts behind random numbers and learn how to work with them on a Linux system.

But first, let’s try to understand what are random numbers and why they are so important for computing (especially cryptography).

What are random numbers?

Random numbers are values that occur in an unpredictable sequence or order. They lack any discernible pattern and are unpredictable, appearing to be independent and haphazard.

Random numbers

The quest for randomness in numbers goes back centuries. In the early days, randomness was often achieved through natural processes like dice throws or shuffling cards. However, as the need for randomness in various fields increased, the challenge became how to generate numbers that are truly unpredictable and unbiased.

The concept of randomness in mathematics and computing gained significant attention in the mid-20th century with the rise of computers. The creation of random number generators (RNGs) became essential for various applications, from simulations and cryptography to statistical sampling and gaming.

It is impressive how much we depend on such an “overlooked” feature. The security of an encryption algorithm, even if it’s the most secure and robustly designed, can be significantly compromised by the use of weak random numbers.

So how a random number is generated?

Generating random numbers

Early attempts at generating random numbers on computers relied on algorithms that simulated randomness using mathematical formulas. However, these random numbers were not truly random; they were deterministic and produced sequences that were predictable if one knew the starting point (aka the seed). This is why they are called “pseudo-random numbers”.

The challenge is having an unpredictable seed. Given the same seed, a PRNG (Pseudo Random Number Generator) will produce the same sequence of numbers, making them pretty much predictable. PRNGs are widely used in applications where true randomness is not critical, such as simulations, gaming, and certain statistical analyses.

Now, if you can have an unpredictable and good source of randomness to generate the seed, then you can have true random numbers. The word entropy is usually used here.

In information theory, entropy is a measure of uncertainty or unpredictability in the information content. Different sources have different entropy, and for generating random numbers, more entropy is better. Inputs like mouse movements, keystroke timings, network traffic, electronic noise and thermal noise, can all increase entropy and be good sources of randomness to generate the seed.

Entropy

TRNGs (True Random Number Generators) can harness these natural processes to produce truly random sequences. They are crucial in cryptography, security systems, and any application where high-quality randomness is essential.

Since there are trade-offs between PRNGs and TRNGs, in modern computing, a combination of approaches is often used. PRNGs are used for many everyday applications because they are faster and generally adequate for tasks that don’t require high-security needs. However, for cryptography and other sensitive applications, true randomness is preferred.

Let’s now learn about the most common use cases for random numbers.

Why random numbers are important?

Random numbers have a wide range of applications and are important in various fields due to their ability to provide unpredictability and fairness.

Random numbers are crucial in cryptography for generating secure keys, which are the foundation of cryptographic algorithms (encryption, signing, etc). Fundamentally, a cryptographic key is a sequence of bits, and the randomness and unpredictability of this bit sequence are crucial for the security of the cryptographic system.

In scientific and engineering simulations, random numbers are used to model complex systems that have elements of unpredictability, like weather patterns, stock market fluctuations, or quantum physics phenomena.

Casinos and online games use random numbers to ensure fairness and unpredictability in games like roulette, dice games, and slot machines. The same for lotteries, that use random numbers to ensure fairness and unpredictability in the selection of winners. I will not comment on how random (and fair) they are though. 😄

Random numbers are used in statistical sampling to ensure an unbiased selection of a sample from a larger population. This is important in surveys, opinion polls, and research studies.

Random numbers help to generate textures and realistic environments in computer graphics, especially for effects like smoke, fire, and water.

In distributed computing and networking, random numbers can be used for load balancing and ensuring system resilience by randomly distributing tasks or rerouting traffic.

And the list of use cases goes on and on!

Now, how does a random number generator work?

Random number generators and algorithms

Pseudo-random number generators (PRNGs) can be implemented in software. They just require an initial value (seed) and an algorithm. Examples of algorithms are Linear Congruential Generator, Mersenne Twister, and Xorshift.

For example, the Xorshift algorithm generates a random number by repeatedly applying XOR operations to the bits of a number, combined with shifting the bits left or right. The simplicity of these operations allows for fast execution on digital computers.

There is a variant of PRNGs called Cryptographically Secure Pseudo-Random Number Generators (CSPRNGs). These are designed to meet the needs of cryptography and are much less predictable. They often use external random events, like mouse movements or keystroke timings, as seed values. Examples include Fortuna, Yarrow, and Blum Blum Shub.

For example, Fortuna collects entropy from various sources in the system (mouse movements, keyboard timings, system counters, network traffic, etc) and uses a generator based on a block cipher (AES is commonly used) in counter mode to produce the random output.

Though all of these algorithms can be implemented in software, we also have Hardware Random Number Generators (HRNGs).

HRNGs use dedicated chips capable of using physical processes (electronic noise, radioactive decay, thermal noise, etc) to generate true random numbers. They provide a higher degree of randomness compared to PRNGs and are often used in high-security applications like key generation in cryptography.

Now it’s time to learn how a Linux system deals with random numbers.

Random numbers on Linux

Random numbers on Linux

On Linux systems, random numbers are managed primarily through two special files: /dev/random and /dev/urandom. These files interface with the kernel’s random number generator and provide a means to access random data.

The /dev/random file provides random bytes, which are considered to be of cryptographic quality. It gathers environmental noise from device drivers and other sources into an entropy pool. Sources include timings of keyboard and mouse inputs, disk I/O operations, and other system activities. The randomness is derived from this entropy pool, and the output is considered to be very unpredictable.

A notable characteristic of /dev/random is that it can block. This means if the entropy pool is low, it will wait until enough environmental noise has been gathered to ensure the randomness of the output. This behavior makes it suitable for applications where very high-quality randomness is required, such as cryptographic key generation.

The /dev/urandom file also provides random bytes and uses the same entropy pool as /dev/random, but it will not block (the ‘u’ stands for unblocking). Once the entropy pool is initialized, /dev/urandom will continue to produce random numbers even if the entropy pool is low. It does this by using a pseudo-random number generator (PRNG) seeded from the pool.

While /dev/urandom is generally considered “less random” than /dev/random, for most practical purposes, especially after the system has been running for a while, the randomness is sufficient and indistinguishable from true randomness. So /dev/urandom is suitable for most applications, including many cryptographic uses, where the blocking behavior of /dev/random would be undesirable or unnecessary.

In newer Linux kernels, the distinction between /dev/random and /dev/urandom has been reduced. Improvements in the entropy-gathering and random number generation algorithms have made /dev/urandom secure and fast for virtually all purposes, including cryptography.

This is why some Linux distributions have already moved towards using /dev/urandom for all random number needs due to its non-blocking nature and sufficient security.

In practice, applications that require random numbers can read from these files as if they were regular files. For example, this is how you would generate 32 bytes of random data in the command line:

# dd if=/dev/urandom of=random.bin bs=32 count=1
1+0 records in
1+0 records out
32 bytes copied, 8,079e-05 s, 396 kB/s

# od -A n -t x1 random.bin
 0b ea 35 f3 d8 88 f7 b4 53 b6 08 e9 1c 7d e6 f7
 58 34 9e ee 73 a3 15 dd 67 9a e0 24 5e 2f 3e ce

If you are writing a C or C++ application and want random numbers, you can just read from /dev/random or /dev/urandom, or use the getrandom() system call:

#include <sys/random.h>

ssize_t getrandom(void *buf, size_t buflen, unsigned int flags);

There is also the ramdom() function call from the C library. This is a pseudo-random number generator from the C library but is considered to be nonsecure for cryptographic use cases.

#include <stdlib.h>

long random(void);

On top of that, there are user-space applications capable of generating random numbers, such as OpenSSL.

OpenSSL has its own algorithm for generating random numbers that use /dev/urandom (by default) as the entropy source for generating the seed.

One can use the openssl command line tool via its rand sub-command, to produce cryptographically strong pseudo-random bytes. Here’s an example of how to use OpenSSL to generate 32 random bytes:

# openssl rand -out random.bin 32

# od -A n -t x1 random.bin
 0b ea 35 f3 d8 88 f7 b4 53 b6 08 e9 1c 7d e6 f7
 58 34 9e ee 73 a3 15 dd 67 9a e0 24 5e 2f 3e ce

OpenSSL is very flexible and provides several mechanisms for generating random numbers. It can also be used in C and C++ programs via its provided libraries. More information is available on its wiki page.

Now, how to produce random numbers with a hardware random number generator?

Hardware random number generators on Linux

As mentioned earlier, there are random number generators implemented in hardware, and in case you have one in your system (and the Linux kernel properly configured), an interface to talk to the device will be exposed to /dev/hwrng:

# ls -l /dev/hwrng
crw------- 1 root root 10, 183 fev  2 09:46 /dev/hwrng

The /dev/hwrng file can be read like a regular file to obtain random data:

# dd if=/dev/hwrng of=random.bin bs=32 count=1
1+0 records in
1+0 records out
32 bytes copied, 0,0212886 s, 1,5 kB/s

# od -A n -t x1 random.bin
 5b e0 e5 f2 d9 e0 5e a9 f9 5d 32 82 2b 6b 39 f8
 46 af 82 4b 37 49 2f c5 f7 e7 c5 9c 7b 93 85 5d

However, access to /dev/hwrng is usually restricted to the root user or members of specific groups because direct access to hardware random number generators could potentially be used to weaken cryptographic systems if misused or overused. That is why direct access to HRNGs is typically done only by system services or privileged daemons.

One such example is the rngd daemon. Its main purpose is feeding randomness from a hardware random number generator into the kernel’s entropy pool, enhancing the quality and quantity of entropy available to /dev/random and /dev/urandom.

Hardware random number generators can also be implemented by a TPM device.

Generating random numbers with a TPM device

TPM (Trusted Platform Module) is an international standard that enables trust in computing platforms in general, providing several security-related features for computer systems, including hashing, encryption, signing, random number generation, and many more!

I have written about TPMs before, and in case you want to learn more, have a look at the following article: Introduction to TPM (Trusted Platform Module).

Generating random numbers with a TPM device on Linux involves leveraging the TPM’s hardware capabilities for generating random data, which is often seen as more secure and less predictable than software-based methods.

For that, you need a TPM chip available in the hardware and the Linux kernel configured with the correspondent TPM driver and subsystem enabled. If that is the case, you should see the file /dev/tpm0 to communicate with the TPM device:

# ls -l /dev/tpm0
crw-rw---- 1 tss root 10, 224 fev  2 09:46 /dev/tpm0

You also need a few user-space tools to communicate with the TPM device. The most commonly used software for managing TPM devices on Linux is tpm2-tools.

To generate random numbers, one can use the tpm2_getrandom command followed by the number of random bytes. For example, to generate 32 bytes of random data:

# tpm2_getrandom 32 > random.bin

# od -A n -t x1 random.bin
 a8 82 58 56 94 8a ac 10 75 39 a8 6b 34 de 04 a2
 0e 05 58 fd 83 ef 07 ea a7 ab 83 9a 55 31 ea 50

Generating random numbers with a TPM device can be slower than using software-based methods like /dev/urandom. TPMs are generally not designed for high-speed random number generation. On the other hand, TPMs are considered more secure as they are designed to be tamper-resistant hardware. They are less susceptible to attacks compared to software-based random number generators.

The last topic I want to cover in this article is related to the quality of random numbers.

So how random is a random number?

Testing random numbers

When testing random number generators for randomness, you might run into the “Is that really random?” question.

DESCRIPTION

Is ‘0100100101001011’ random? And what about ‘0000000000000000’?

You may say the latter is not a random number. But this assumption might not be true. A number full of zeros might not be desirable, but you cannot say for sure whether it’s random or not.

It is a hard problem indeed. However, there are several techniques for measuring the quality of random numbers, usually by statistical tests, e.g. analyzing the distribution of a set of data to see whether it can be described as random (patternless) or not.

And there are a few tools available on Linux that implement these tests. One such tool is dieharder, a benchmarking tool for random number generators.

To use it, you have to create a file with random data, generated by the random number generator you want to test:

# dd if=/dev/urandom of=random.dat bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0,189949 s, 552 MB/s

And then execute dieharder on it:

# dieharder -a -g 201 -f random.dat
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |           filename             |rands/second|
 file_input_raw|                      random.dat|  6.33e+07  |
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.14455175|  PASSED  
# The file file_input_raw was rewound 4 times
      diehard_operm5|   0|   1000000|     100|0.82299029|  PASSED  
# The file file_input_raw was rewound 9 times
  diehard_rank_32x32|   0|     40000|     100|0.31016821|  PASSED  
# The file file_input_raw was rewound 11 times
    diehard_rank_6x8|   0|    100000|     100|0.19003489|  PASSED  
# The file file_input_raw was rewound 12 times
   diehard_bitstream|   0|   2097152|     100|0.99701278|   WEAK
# The file file_input_raw was rewound 20 times
        diehard_opso|   0|   2097152|     100|0.21961518|  PASSED  
# The file file_input_raw was rewound 25 times
        diehard_oqso|   0|   2097152|     100|0.00459460|   WEAK   
# The file file_input_raw was rewound 28 times
         diehard_dna|   0|   2097152|     100|0.00013666|   WEAK   
# The file file_input_raw was rewound 28 times
diehard_count_1s_str|   0|    256000|     100|0.53245578|  PASSED
[...]

Testing the quality of a random number generator is very important because low-quality RNGs can be a weak link in the secure use of cryptography.

There are several examples of how random number generators were exploited in a cryptography system, including When Good Randomness Goes Bad: Virtual Machine Reset Vulnerabilities and Hedging Deployed Cryptography and Mining Your Ps and Qs: Detection of Widespread Weak Keys in Network Devices.

I hope you had some fun reading this article (as I had while writing it), and now you know how critical a random number generator is in any computing system.

About the author: Sergio Prado has been working with embedded systems for more than 25 years. If you want to know more about his work, please visit the About Me page or Embedded Labworks website.

Please email your comments or questions to hello at sergioprado.blog, or sign up the newsletter to receive updates.


See also