PHP Random Number Generator with Normal Distribution / Bell Curve

25. October 2013 Blog 27

So for a project we needed some random test data but we wanted the data to be on a bell curve to represent real life scenarios. I couldn’t find anything so after so digging I found some code in Java and translated it into php seems to work great.

function purebell($min,$max,$std_deviation,$step=1) {
  $rand1 = (float)mt_rand()/(float)mt_getrandmax();
  $rand2 = (float)mt_rand()/(float)mt_getrandmax();
  $gaussian_number = sqrt(-2 * log($rand1)) * cos(2 * M_PI * $rand2);
  $mean = ($max + $min) / 2;
  $random_number = ($gaussian_number * $std_deviation) + $mean;
  $random_number = round($random_number / $step) * $step;
  if($random_number < $min || $random_number > $max) {
    $random_number = purebell($min, $max,$std_deviation);
  }
  return $random_number;
}

When I ran through it 10,000 times with a min of 1 and a max of 100 and a standard deviation of 3 here is the distribution I got, I would say that looks like a bell curve.

distro

Update and credit to Stefan Muth for his comment below. He has made some tweaks and even added some functionality to allow you to skew results and change the standard deviation. Here is the code he created:

// Generate a random integer between $min and $max on a bell curve
function intBell($min, $max, $mid=0.5, $std_deviation=1) {
  if ($min == $max) return $max;
  $n = floatBell($min, $max, $mid, $std_deviation);
  return floor((($n - $min) * ($max - $min + 1) / ($max - $min)) + $min);
}

// Generate a random float between $min and $max on a bell curve
// - Increasing $std_deviation flattens it
// - Increasing $mid skews it to the right
function floatBell($min, $max, $mid=0.5, $std_deviation=1) {
  if ($min == $max) return $max;
  $d = (float)mt_getrandmax();
  $r1 = (float)mt_rand() / $d;
  $r2 = (float)mt_rand() / $d;
  $gaussian_number = sqrt(-2 * log($r1)) * cos(2 * M_PI * $r2);
  $mean = ($max + $min) * $mid;
  $random_number = ($gaussian_number * $std_deviation) + $mean;
  if($random_number < $min || $random_number > $max) {
    $random_number = floatBell($min, $max, $mid, $std_deviation);
  }
  return $random_number;
}

27 thoughts on “PHP Random Number Generator with Normal Distribution / Bell Curve”

  • 1
    Andy on March 20, 2014 Reply

    Thanks for this it will be very helpful in creating test data.

  • 2
    Laurent on March 26, 2014 Reply

    Hi, thanks for the code, it seems to do exactly what I need ! 🙂

    But I’m still having trouble understanding how you got that graph … Could you tell me if it needs a loop or anything ?

    Thanks man ! 🙂

  • 3
    pitchinnate on March 26, 2014 Reply

    Glad it was helpful.

    Yes I ran a loop that ran this function 10,000 times then just took the numbers generated and put them into Excel to generate the graph. You could generate the graph with a PHP library if you wanted.

  • 4
    Georgi on May 4, 2014 Reply

    Hi Nate,

    I think there is a serious flaw in your formula. Either that or I’ve screwed up really badly. I tried using the function but I kept getting skewed results. I tried it for min = 5, max = 15, sd = 3.5. I kept getting the values skewed as if 9.5 was the high point, not 10. It also seemed to be “cut” from the right.

    Here are the results of a sample run with 500 000 generated numbers, split in buckets of 0.5 (5.0-5.5, 5.5-6.0, 6.0-6.5, etc.):

    1368
    2949
    6069
    11089
    18161
    27440
    37074
    45403
    49994
    49897
    45445
    36969
    27585
    18321
    11070
    6116
    2986
    1332
    514
    218

    You don’t even need a graph to see the problem.

    After modifying those two rows like so:
    $half = ($max – $min) / 2;
    $middle = $min + $half;

    I was able to produce numbers in a good distribution:

    238
    666
    1814
    4307
    8888
    16415
    26487
    38290
    48544
    54627
    54577
    48427
    37939
    26517
    16594
    8787
    4107
    1837
    688
    251

    I failed to understand why those two rows that calculate the half and the middle are the way you made them but fixing them like I describe seems to have fixed my problem. Please, let me know if I missed something important they are supposed to do in the way you’ve written them.

    Best,
    Georgi

    • 5
      Nate on May 5, 2014 Reply

      Yeah it believe it has to do with whether you have an even or odd number of possible results. For example in your case 5 to 15 = 11 possible results, 1 to 100 = 100 possible results. I’ll update the function so that it checks for that. Thanks for the feedback.

      • 6
        Georgi on May 5, 2014 Reply

        I’m glad I helped, but I’m not sure I get your reasoning for even and odd numbers.

        $half = ($max – $min + 1) / 2;
        $middle = $min + $half – 1;

        With values of 1 and 100 it would produce $half = 50 when it should be 49.5 (100 = 1 + 49.5 + 49.5) and $middle = 49.5 when it should be 50.5.

        • 7
          pitchinnate on May 5, 2014 Reply

          Nope that wasn’t it either I just updated again. I just used the mean and round() (instead of intval) now and it is working great.

    • 8
      pitchinnate on May 5, 2014 Reply

      I also just added a new input variable for the step so you can send in $step = .5 and it will give you back the numbers rounded off to the closest .5 value (defaults to $step = 1).

      • 9
        Georgi on May 5, 2014 Reply

        I did a trial run of the function with 0.5 for $step and it generated only integers.

        I think this line does nothing:
        $random_number = round($random_number / $step) * $step;
        $random_number = 8 =>
        round(8 / 0.5) * 0.5 = round (16) * 0.5 = 16 * 0.5 = 8…

        • 10
          pitchinnate on May 5, 2014 Reply

          It works if you are storing them in an array make sure you use quotes around the key (I had that issue the first time i tried.) I did it and got the following results:

          Array
          (
          [5] => 170
          [5.5] => 210
          [6] => 303
          [6.5] => 358
          [7] => 492
          [7.5] => 464
          [8] => 607
          [8.5] => 601
          [9] => 695
          [9.5] => 654
          [10] => 821
          [10.5] => 688
          [11] => 759
          [11.5] => 617
          [12] => 588
          [12.5] => 447
          [13] => 480
          [13.5] => 328
          [14] => 304
          [14.5] => 219
          [15] => 195
          )

          Before this line:
          $random_number = round($random_number / $step) * $step;

          $random_number is a float so it would look something like this 8.35 (lot more decimal places though) then when you divide by .5 you would get 16.7 which would round up to 17. Then 17 * .5 = 8.5.

          • 11
            Georgi on May 5, 2014

            Well, I’ve run the function and it doesn’t generate anything different than integers.

            $random_number is always an integer, since you have:
            $random_number = round(($gaussian_number * $std_deviation) + $mean);
            just before
            $random_number = round($random_number / $step) * $step;
            So I would submit that any integer divided by 0.5 (multiplied by 2) will always result in an even integer. And then dividing that integer by 2… the line is pointless unless you remove the “round” function from the previous line.

          • 12
            pitchinnate on May 5, 2014

            Your correct, sorry it was a copy and paste error forgot to remove the round() on the line before. Sorry for the confusion.

  • 13
    Georgi on May 4, 2014 Reply

    Forgot to mention – you might want to use mt_rand() & mt_getrandmax() instead of rand() and getrandmax().

    • 14
      Nate on May 5, 2014 Reply

      Yes if you want better random numbers then yes I would use those functions. However, they are much slower functions so I guess it all depends on whether you are looking for speed or precision.

      • 15
        Georgi on May 5, 2014 Reply

        I’ve actually read on several places that mt_rand uses a better algo and is up to 4x faster than rand. Turns out this is information is outdated, but I would not agree that it is slower than rand(). Running a few tests with 10 million iterations showed both functions exhibiting the exact same speed ~4 seconds on my VPS.

        • 16
          pitchinnate on May 5, 2014 Reply

          Your right I did some more digging and according to PHP documentation mt_rand() is faster. I went ahead and updated it to use the mt_ functions. Thanks again.

  • 17
    Juho on October 9, 2014 Reply

    Thankyou!

    As being said, this is a great tool for creating test data!

  • 18
    ash on May 25, 2015 Reply

    Hi, can you post a sample of the code you used to generate the graph? I would like to test it on my server.

    • 19
      pitchinnate on May 26, 2015 Reply

      I just used Excel to generate the graph, no code.

  • 20
    Sal on February 11, 2016 Reply

    Great work, thanks for sharing 🙂

  • 21
    peter f on April 17, 2020 Reply

    Wtf ? It should not be a Bell curve. It should be straight horizontal curve.

  • 22
    Ebuka on January 12, 2022 Reply

    Pls @Pitchinnate and @Georgi update the code because this one here isn’t generating a bell shaped curve when the code is executed

    • 23
      pitchinnate on April 7, 2022 Reply

      Sorry I haven’t looked at this in a long time. I just tested and it seems to still work without any issues for me. What issue are you having?

  • 24
    Stefan Muth on October 17, 2022 Reply

    Thanks to your post, you have given me the only practical solution to this problem that I could find after hours of searching, so THANK YOU!

    However, there is a problem with this function when it comes to generating random *integers* (as array indices for example) as others have noticed:

    Say you want to generate random integers between 1 and 7 on a bell curve.

    Your function will correctly give you FLOAT values ranging from 1.000… to 6.999…

    HOWEVER, for getting INTEGER values 1 to 7, this will not work because int(1.000…) = 1 but int(6.999…) = 6.

    Changing “$max” to “$max + 1” inside the function will lead to compound errors due to the recursion.

    Here is how to convert a float $n in the range $min to $max into an integer ranging from $min to $max:

    floor((($n – $min) * $max / ($max – 1)) + $min)

    I have modified your function to include a $mid value to allow for easy skewing, and turned it into two functions that generate either a random bell-curve integer or float as required. I also removed the $step parameter because this is easily handled outside the function. Feel free to take anything from it that you like… it works perfectly as-is though:

    // Generate a random integer between $min and $max on a bell curve
    function intbell($min, $max, $mid=0.5, $std_deviation=1) {
    $n = floatbell($min, $max, $mid, $std_deviation);
    return floor((($n – $min) * $max / ($max – 1)) + $min);
    }

    // Generate a random float between $min and $max on a bell curve
    // – Increasing $std_deviation flattens it
    // – Increasing $mid skews it to the right
    function floatbell($min, $max, $mid=0.5, $std_deviation=1) {
    $d = (float)mt_getrandmax();
    $r1 = (float)mt_rand() / $d;
    $r2 = (float)mt_rand() / $d;
    $gaussian_number = sqrt(-2 * log($r1)) * cos(2 * M_PI * $r2);
    $mean = ($max + $min) * $mid;
    $random_number = ($gaussian_number * $std_deviation) + $mean;
    if($random_number $max) {
    $random_number = floatbell($min, $max, $mid, $std_deviation);
    }
    return $random_number;

    • 25
      Stefan Muth on October 17, 2022 Reply

      * Regarding the above post, the trailing “}” is missing and, because of the ‘less than’ sign being interpreted (stripped) as an HTML tag in this comments section, some of the code is missing in the recursion test line. I am happy to provide the full code to anyone who emails me at diagnose (aat) gmail (doot) com.

    • 26
      Stefan Muth on October 17, 2022 Reply

      Apologies for a third post (perhaps Nate can edit/combine them):

      Upon further testing, the formula

      floor((($n – $min) * $max / ($max – 1)) + $min)

      … only works in the specific case when $min = 1.

      The correct code that works for any range of integers is:

      floor((($n – $min) * ($max – $min + 1) / ($max – $min)) + $min);

      • 27
        pitchinnate on May 22, 2023 Reply

        Good catch I will review and update the post.

Leave a Reply

Your email address will not be published. Required fields are marked *