# PHP Random Number Generator with Normal Distribution / Bell Curve

So for a project we needed some random test data but we wanted the data to be on a bell curve to represent real life scenarios. I couldn’t find anything so after so digging I found some code in Java and translated it into php seems to work great.

function purebell($min,$max,$std_deviation,$step=1) { $rand1 = (float)mt_rand()/(float)mt_getrandmax(); $rand2 = (float)mt_rand()/(float)mt_getrandmax(); $gaussian_number = sqrt(-2 * log($rand1)) * cos(2 * M_PI * $rand2); $mean = ($max + $min) / 2; $random_number = ($gaussian_number * $std_deviation) + $mean; $random_number = round($random_number / $step) * $step; if($random_number < $min || $random_number > $max) { $random_number = purebell($min, $max,$std_deviation); } return $random_number; }

When I ran through it 10,000 times with a min of 1 and a max of 100 and a standard deviation of 3 here is the distribution I got, I would say that looks like a bell curve.

Thanks for this it will be very helpful in creating test data.

Hi, thanks for the code, it seems to do exactly what I need ! 🙂

But I’m still having trouble understanding how you got that graph … Could you tell me if it needs a loop or anything ?

Thanks man ! 🙂

Glad it was helpful.

Yes I ran a loop that ran this function 10,000 times then just took the numbers generated and put them into Excel to generate the graph. You could generate the graph with a PHP library if you wanted.

Hi Nate,

I think there is a serious flaw in your formula. Either that or I’ve screwed up really badly. I tried using the function but I kept getting skewed results. I tried it for min = 5, max = 15, sd = 3.5. I kept getting the values skewed as if 9.5 was the high point, not 10. It also seemed to be “cut” from the right.

Here are the results of a sample run with 500 000 generated numbers, split in buckets of 0.5 (5.0-5.5, 5.5-6.0, 6.0-6.5, etc.):

1368

2949

6069

11089

18161

27440

37074

45403

49994

49897

45445

36969

27585

18321

11070

6116

2986

1332

514

218

You don’t even need a graph to see the problem.

After modifying those two rows like so:

$half = ($max – $min) / 2;

$middle = $min + $half;

I was able to produce numbers in a good distribution:

238

666

1814

4307

8888

16415

26487

38290

48544

54627

54577

48427

37939

26517

16594

8787

4107

1837

688

251

I failed to understand why those two rows that calculate the half and the middle are the way you made them but fixing them like I describe seems to have fixed my problem. Please, let me know if I missed something important they are supposed to do in the way you’ve written them.

Best,

Georgi

Yeah it believe it has to do with whether you have an even or odd number of possible results. For example in your case 5 to 15 = 11 possible results, 1 to 100 = 100 possible results. I’ll update the function so that it checks for that. Thanks for the feedback.

I’m glad I helped, but I’m not sure I get your reasoning for even and odd numbers.

$half = ($max – $min + 1) / 2;

$middle = $min + $half – 1;

With values of 1 and 100 it would produce $half = 50 when it should be 49.5 (100 = 1 + 49.5 + 49.5) and $middle = 49.5 when it should be 50.5.

Nope that wasn’t it either I just updated again. I just used the mean and round() (instead of intval) now and it is working great.

I also just added a new input variable for the step so you can send in $step = .5 and it will give you back the numbers rounded off to the closest .5 value (defaults to $step = 1).

I did a trial run of the function with 0.5 for $step and it generated only integers.

I think this line does nothing:

$random_number = round($random_number / $step) * $step;

$random_number = 8 =>

round(8 / 0.5) * 0.5 = round (16) * 0.5 = 16 * 0.5 = 8…

It works if you are storing them in an array make sure you use quotes around the key (I had that issue the first time i tried.) I did it and got the following results:

Array

(

[5] => 170

[5.5] => 210

[6] => 303

[6.5] => 358

[7] => 492

[7.5] => 464

[8] => 607

[8.5] => 601

[9] => 695

[9.5] => 654

[10] => 821

[10.5] => 688

[11] => 759

[11.5] => 617

[12] => 588

[12.5] => 447

[13] => 480

[13.5] => 328

[14] => 304

[14.5] => 219

[15] => 195

)

Before this line:

$random_number = round($random_number / $step) * $step;

$random_number is a float so it would look something like this 8.35 (lot more decimal places though) then when you divide by .5 you would get 16.7 which would round up to 17. Then 17 * .5 = 8.5.

Well, I’ve run the function and it doesn’t generate anything different than integers.

$random_number is always an integer, since you have:

$random_number = round(($gaussian_number * $std_deviation) + $mean);

just before

$random_number = round($random_number / $step) * $step;

So I would submit that any integer divided by 0.5 (multiplied by 2) will always result in an even integer. And then dividing that integer by 2… the line is pointless unless you remove the “round” function from the previous line.

Your correct, sorry it was a copy and paste error forgot to remove the round() on the line before. Sorry for the confusion.

Forgot to mention – you might want to use mt_rand() & mt_getrandmax() instead of rand() and getrandmax().

Yes if you want better random numbers then yes I would use those functions. However, they are much slower functions so I guess it all depends on whether you are looking for speed or precision.

I’ve actually read on several places that mt_rand uses a better algo and is up to 4x faster than rand. Turns out this is information is outdated, but I would not agree that it is slower than rand(). Running a few tests with 10 million iterations showed both functions exhibiting the exact same speed ~4 seconds on my VPS.

Your right I did some more digging and according to PHP documentation mt_rand() is faster. I went ahead and updated it to use the mt_ functions. Thanks again.

Thankyou!

As being said, this is a great tool for creating test data!

Hi, can you post a sample of the code you used to generate the graph? I would like to test it on my server.

I just used Excel to generate the graph, no code.

Great work, thanks for sharing 🙂