PHP Random Number Generator with Normal Distribution / Bell Curve

25. October 2013 20

So for a project we needed some random test data but we wanted the data to be on a bell curve to represent real life scenarios. I couldn’t find anything so after so digging I found some code in Java and translated it into php seems to work great.

function purebell(\$min,\$max,\$std_deviation,\$step=1) {
\$rand1 = (float)mt_rand()/(float)mt_getrandmax();
\$rand2 = (float)mt_rand()/(float)mt_getrandmax();
\$gaussian_number = sqrt(-2 * log(\$rand1)) * cos(2 * M_PI * \$rand2);
\$mean = (\$max + \$min) / 2;
\$random_number = (\$gaussian_number * \$std_deviation) + \$mean;
\$random_number = round(\$random_number / \$step) * \$step;
if(\$random_number < \$min || \$random_number > \$max) {
\$random_number = purebell(\$min, \$max,\$std_deviation);
}
return \$random_number;
}

When I ran through it 10,000 times with a min of 1 and a max of 100 and a standard deviation of 3 here is the distribution I got, I would say that looks like a bell curve. 20 thoughts on “PHP Random Number Generator with Normal Distribution / Bell Curve”

• Andy on March 20, 2014

Thanks for this it will be very helpful in creating test data.

• Laurent on March 26, 2014

Hi, thanks for the code, it seems to do exactly what I need ! 🙂

But I’m still having trouble understanding how you got that graph … Could you tell me if it needs a loop or anything ?

Thanks man ! 🙂

• pitchinnate on March 26, 2014

Yes I ran a loop that ran this function 10,000 times then just took the numbers generated and put them into Excel to generate the graph. You could generate the graph with a PHP library if you wanted.

• Georgi on May 4, 2014

Hi Nate,

I think there is a serious flaw in your formula. Either that or I’ve screwed up really badly. I tried using the function but I kept getting skewed results. I tried it for min = 5, max = 15, sd = 3.5. I kept getting the values skewed as if 9.5 was the high point, not 10. It also seemed to be “cut” from the right.

Here are the results of a sample run with 500 000 generated numbers, split in buckets of 0.5 (5.0-5.5, 5.5-6.0, 6.0-6.5, etc.):

1368
2949
6069
11089
18161
27440
37074
45403
49994
49897
45445
36969
27585
18321
11070
6116
2986
1332
514
218

You don’t even need a graph to see the problem.

After modifying those two rows like so:
\$half = (\$max – \$min) / 2;
\$middle = \$min + \$half;

I was able to produce numbers in a good distribution:

238
666
1814
4307
8888
16415
26487
38290
48544
54627
54577
48427
37939
26517
16594
8787
4107
1837
688
251

I failed to understand why those two rows that calculate the half and the middle are the way you made them but fixing them like I describe seems to have fixed my problem. Please, let me know if I missed something important they are supposed to do in the way you’ve written them.

Best,
Georgi

• Nate on May 5, 2014

Yeah it believe it has to do with whether you have an even or odd number of possible results. For example in your case 5 to 15 = 11 possible results, 1 to 100 = 100 possible results. I’ll update the function so that it checks for that. Thanks for the feedback.

• Georgi on May 5, 2014

I’m glad I helped, but I’m not sure I get your reasoning for even and odd numbers.

\$half = (\$max – \$min + 1) / 2;
\$middle = \$min + \$half – 1;

With values of 1 and 100 it would produce \$half = 50 when it should be 49.5 (100 = 1 + 49.5 + 49.5) and \$middle = 49.5 when it should be 50.5.

• pitchinnate on May 5, 2014

Nope that wasn’t it either I just updated again. I just used the mean and round() (instead of intval) now and it is working great.

• pitchinnate on May 5, 2014

I also just added a new input variable for the step so you can send in \$step = .5 and it will give you back the numbers rounded off to the closest .5 value (defaults to \$step = 1).

• Georgi on May 5, 2014

I did a trial run of the function with 0.5 for \$step and it generated only integers.

I think this line does nothing:
\$random_number = round(\$random_number / \$step) * \$step;
\$random_number = 8 =>
round(8 / 0.5) * 0.5 = round (16) * 0.5 = 16 * 0.5 = 8…

• pitchinnate on May 5, 2014

It works if you are storing them in an array make sure you use quotes around the key (I had that issue the first time i tried.) I did it and got the following results:

Array
(
 => 170
[5.5] => 210
 => 303
[6.5] => 358
 => 492
[7.5] => 464
 => 607
[8.5] => 601
 => 695
[9.5] => 654
 => 821
[10.5] => 688
 => 759
[11.5] => 617
 => 588
[12.5] => 447
 => 480
[13.5] => 328
 => 304
[14.5] => 219
 => 195
)

Before this line:
\$random_number = round(\$random_number / \$step) * \$step;

\$random_number is a float so it would look something like this 8.35 (lot more decimal places though) then when you divide by .5 you would get 16.7 which would round up to 17. Then 17 * .5 = 8.5.

• Georgi on May 5, 2014

Well, I’ve run the function and it doesn’t generate anything different than integers.

\$random_number is always an integer, since you have:
\$random_number = round((\$gaussian_number * \$std_deviation) + \$mean);
just before
\$random_number = round(\$random_number / \$step) * \$step;
So I would submit that any integer divided by 0.5 (multiplied by 2) will always result in an even integer. And then dividing that integer by 2… the line is pointless unless you remove the “round” function from the previous line.

• pitchinnate on May 5, 2014

Your correct, sorry it was a copy and paste error forgot to remove the round() on the line before. Sorry for the confusion.

• Georgi on May 4, 2014

Forgot to mention – you might want to use mt_rand() & mt_getrandmax() instead of rand() and getrandmax().

• Nate on May 5, 2014

Yes if you want better random numbers then yes I would use those functions. However, they are much slower functions so I guess it all depends on whether you are looking for speed or precision.

• Georgi on May 5, 2014

I’ve actually read on several places that mt_rand uses a better algo and is up to 4x faster than rand. Turns out this is information is outdated, but I would not agree that it is slower than rand(). Running a few tests with 10 million iterations showed both functions exhibiting the exact same speed ~4 seconds on my VPS.

• pitchinnate on May 5, 2014

Your right I did some more digging and according to PHP documentation mt_rand() is faster. I went ahead and updated it to use the mt_ functions. Thanks again.

• Juho on October 9, 2014

Thankyou!

As being said, this is a great tool for creating test data!

• ash on May 25, 2015

Hi, can you post a sample of the code you used to generate the graph? I would like to test it on my server.

• pitchinnate on May 26, 2015

I just used Excel to generate the graph, no code.

• Sal on February 11, 2016

Great work, thanks for sharing 🙂