Building a probability model
Now that you understand the mechanics of how to use the Exponential distribution class, you're ready to see how it can be used to gain insight into a real-world problem.
And this will be fun problem -- to develop a model for when and how many goals are likely to occur in World Cup soccer games. What I want to focus on here is showing you how to use the Exponential distribution class I just discussed to derive some of the results reported in this article.
As you know, the Exponential distribution accepts a rate parameter, also referred to as lambda. To use the Exponential distribution to develop a probability model for World Cup soccer goals, you need to be able to derive this rate parameter from your measurements.
A rate is defined as the number of occurrences of some phenomenon over a unit of time or space. The rate of soccer goals in World Cup tournaments between 31-May-2002 and 30-June-2002 is equal to 575/232. This can be thought of as the mean number of goals scored in a 90-minute regulation game. It was computed as follows:
average goal rate = Total Number of Goals / Total Number of Games
In PHP, you represent this concept by setting up variables called $num_goals
and $num_games
that are placeholders for the evolving quantities that are needed to
compute the goal rate. The code snippet that follows shows a fragment
of the PHP-based probability model for World Cup soccer goals.
|
This fragment produces the following output:
|
One question you might be curious about is the probability that a goal will be scored in the first 10 minutes of a soccer game. I am going to ignore some prior modeling steps in which you would have developed code to plot inter-goal intervals and to test whether the exponential distribution is the best fitting distribution. Instead, I will assume that this code has been developed in a separate script. Therefore, I'll proceed to a stage where you would use the theoretical exponential distribution to calculate some probabilities of interest. The next code snippet, for example, is added to the previous bit of code and used to compute the probability of a goal in the first 10 minutes of play.
|
The output this fragment generates is:
|
In other words, in 24 percent of games a goal is scored in the first 10 minutes of play. In the next 100 games that are played, you can expect that in 24 of those games, a goal will be scored in the first 10 minutes.
You might also be interested in the inverse question: In P percent of games, a goal occurs within how minutes of play? You can answer this question with three different P values using the following code snippet:
|
The output this fragment generates is:
|
You might also be interested in understanding the issue of how likely it is that X number of goals are scored in a game. The easiest way to compute this probability is to use a mathematically related distribution called the Poisson distribution which is useful for obtaining answers to such discrete counting problems.
I have also implemented a PHP-based version of the Poisson distribution functions. The following code fragment shows how to use the Poisson distribution to compute the probability of scoring various numbers of goals in a game.
|
The output this fragment generates is:
|
The Poisson distribution differs from the Exponential distribution in that Poisson is for modeling discrete random variables and Exponential is for modeling continuous random variables. I used Exponential to calculate waiting-time probabilities because inter-arrival time is a continuous random variable and this distribution is often a good probability distribution to consider using to account for the distribution of waiting times.
The Poisson distribution is for modeling discrete random variables involving a counting process (such as, the number of times a certain event occurs in some period of time). A count falls into a discrete list of values from 0 to some upper bound. In the case of World cup soccer, the Poisson distribution can be used to compute the probability of a game ending with different goal counts.
Space precludes a more complete discussion of this important probability distribution; however, I hope that this brief discussion has raised your awareness of the distinction between discrete and continuous random variables and probability distributions and how different probability distributions can be applied to data to construct a more detailed probability model that answers different types of questions.
View Apply probability models to Web data using PHP Discussion
Page: 1 2 3 4 5 6 7 8 9 10 11 Next Page: Some thoughts on probability modeling