PHP Random Number Generator with Normal Distribution / Bell Curve

25. October 2013 Blog 20

So for a project we needed some random test data but we wanted the data to be on a bell curve to represent real life scenarios. I couldn’t find anything so after so digging I found some code in Java and translated it into php seems to work great.

function purebell($min,$max,$std_deviation,$step=1) {
  $rand1 = (float)mt_rand()/(float)mt_getrandmax();
  $rand2 = (float)mt_rand()/(float)mt_getrandmax();
  $gaussian_number = sqrt(-2 * log($rand1)) * cos(2 * M_PI * $rand2);
  $mean = ($max + $min) / 2;
  $random_number = ($gaussian_number * $std_deviation) + $mean;
  $random_number = round($random_number / $step) * $step;
  if($random_number < $min || $random_number > $max) {
    $random_number = purebell($min, $max,$std_deviation);
  }
  return $random_number;
}

When I ran through it 10,000 times with a min of 1 and a max of 100 and a standard deviation of 3 here is the distribution I got, I would say that looks like a bell curve.

distro


20 thoughts on “PHP Random Number Generator with Normal Distribution / Bell Curve”

  • 1
    Andy on March 20, 2014 Reply

    Thanks for this it will be very helpful in creating test data.

  • 2
    Laurent on March 26, 2014 Reply

    Hi, thanks for the code, it seems to do exactly what I need ! 🙂

    But I’m still having trouble understanding how you got that graph … Could you tell me if it needs a loop or anything ?

    Thanks man ! 🙂

  • 3
    pitchinnate on March 26, 2014 Reply

    Glad it was helpful.

    Yes I ran a loop that ran this function 10,000 times then just took the numbers generated and put them into Excel to generate the graph. You could generate the graph with a PHP library if you wanted.

  • 4
    Georgi on May 4, 2014 Reply

    Hi Nate,

    I think there is a serious flaw in your formula. Either that or I’ve screwed up really badly. I tried using the function but I kept getting skewed results. I tried it for min = 5, max = 15, sd = 3.5. I kept getting the values skewed as if 9.5 was the high point, not 10. It also seemed to be “cut” from the right.

    Here are the results of a sample run with 500 000 generated numbers, split in buckets of 0.5 (5.0-5.5, 5.5-6.0, 6.0-6.5, etc.):

    1368
    2949
    6069
    11089
    18161
    27440
    37074
    45403
    49994
    49897
    45445
    36969
    27585
    18321
    11070
    6116
    2986
    1332
    514
    218

    You don’t even need a graph to see the problem.

    After modifying those two rows like so:
    $half = ($max – $min) / 2;
    $middle = $min + $half;

    I was able to produce numbers in a good distribution:

    238
    666
    1814
    4307
    8888
    16415
    26487
    38290
    48544
    54627
    54577
    48427
    37939
    26517
    16594
    8787
    4107
    1837
    688
    251

    I failed to understand why those two rows that calculate the half and the middle are the way you made them but fixing them like I describe seems to have fixed my problem. Please, let me know if I missed something important they are supposed to do in the way you’ve written them.

    Best,
    Georgi

    • 5
      Nate on May 5, 2014 Reply

      Yeah it believe it has to do with whether you have an even or odd number of possible results. For example in your case 5 to 15 = 11 possible results, 1 to 100 = 100 possible results. I’ll update the function so that it checks for that. Thanks for the feedback.

      • 6
        Georgi on May 5, 2014 Reply

        I’m glad I helped, but I’m not sure I get your reasoning for even and odd numbers.

        $half = ($max – $min + 1) / 2;
        $middle = $min + $half – 1;

        With values of 1 and 100 it would produce $half = 50 when it should be 49.5 (100 = 1 + 49.5 + 49.5) and $middle = 49.5 when it should be 50.5.

        • 7
          pitchinnate on May 5, 2014 Reply

          Nope that wasn’t it either I just updated again. I just used the mean and round() (instead of intval) now and it is working great.

    • 8
      pitchinnate on May 5, 2014 Reply

      I also just added a new input variable for the step so you can send in $step = .5 and it will give you back the numbers rounded off to the closest .5 value (defaults to $step = 1).

      • 9
        Georgi on May 5, 2014 Reply

        I did a trial run of the function with 0.5 for $step and it generated only integers.

        I think this line does nothing:
        $random_number = round($random_number / $step) * $step;
        $random_number = 8 =>
        round(8 / 0.5) * 0.5 = round (16) * 0.5 = 16 * 0.5 = 8…

        • 10
          pitchinnate on May 5, 2014 Reply

          It works if you are storing them in an array make sure you use quotes around the key (I had that issue the first time i tried.) I did it and got the following results:

          Array
          (
          [5] => 170
          [5.5] => 210
          [6] => 303
          [6.5] => 358
          [7] => 492
          [7.5] => 464
          [8] => 607
          [8.5] => 601
          [9] => 695
          [9.5] => 654
          [10] => 821
          [10.5] => 688
          [11] => 759
          [11.5] => 617
          [12] => 588
          [12.5] => 447
          [13] => 480
          [13.5] => 328
          [14] => 304
          [14.5] => 219
          [15] => 195
          )

          Before this line:
          $random_number = round($random_number / $step) * $step;

          $random_number is a float so it would look something like this 8.35 (lot more decimal places though) then when you divide by .5 you would get 16.7 which would round up to 17. Then 17 * .5 = 8.5.

          • 11
            Georgi on May 5, 2014

            Well, I’ve run the function and it doesn’t generate anything different than integers.

            $random_number is always an integer, since you have:
            $random_number = round(($gaussian_number * $std_deviation) + $mean);
            just before
            $random_number = round($random_number / $step) * $step;
            So I would submit that any integer divided by 0.5 (multiplied by 2) will always result in an even integer. And then dividing that integer by 2… the line is pointless unless you remove the “round” function from the previous line.

          • 12
            pitchinnate on May 5, 2014

            Your correct, sorry it was a copy and paste error forgot to remove the round() on the line before. Sorry for the confusion.

  • 13
    Georgi on May 4, 2014 Reply

    Forgot to mention – you might want to use mt_rand() & mt_getrandmax() instead of rand() and getrandmax().

    • 14
      Nate on May 5, 2014 Reply

      Yes if you want better random numbers then yes I would use those functions. However, they are much slower functions so I guess it all depends on whether you are looking for speed or precision.

      • 15
        Georgi on May 5, 2014 Reply

        I’ve actually read on several places that mt_rand uses a better algo and is up to 4x faster than rand. Turns out this is information is outdated, but I would not agree that it is slower than rand(). Running a few tests with 10 million iterations showed both functions exhibiting the exact same speed ~4 seconds on my VPS.

        • 16
          pitchinnate on May 5, 2014 Reply

          Your right I did some more digging and according to PHP documentation mt_rand() is faster. I went ahead and updated it to use the mt_ functions. Thanks again.

  • 17
    Juho on October 9, 2014 Reply

    Thankyou!

    As being said, this is a great tool for creating test data!

  • 18
    ash on May 25, 2015 Reply

    Hi, can you post a sample of the code you used to generate the graph? I would like to test it on my server.

    • 19
      pitchinnate on May 26, 2015 Reply

      I just used Excel to generate the graph, no code.

  • 20
    Sal on February 11, 2016 Reply

    Great work, thanks for sharing 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *