Determine the confidence interval of the mean value. Confidence interval for mathematical expectation

For the vast majority of simple measurements, the so-called normal law of random errors is satisfied quite well ( Gauss's law), derived from the following empirical provisions.

1) measurement errors can take on a continuous series of values;

2) with a large number of measurements, errors of the same magnitude, but of different signs, occur equally often,

3) the greater the magnitude of the random error, the less likely it is to occur.

The graph of the normal Gaussian distribution law is presented in Fig. 1. The equation of the curve is

where is the distribution function of random errors (errors), characterizing the probability of an error, σ is the mean square error.

The quantity σ is not a random variable and characterizes the measurement process. If the measurement conditions do not change, then σ remains a constant value. The square of this quantity is called measurement dispersion. The smaller the dispersion, the smaller the spread of individual values ​​and the higher the accuracy of the measurements.

The exact value of the mean square error σ, as well as the true value of the measured value, is unknown. There is a so-called statistical estimate of this parameter, according to which the mean square error is equal to the mean square error of the arithmetic mean. The value of which is determined by the formula

where is the result i th dimension; - arithmetic mean of the obtained values; n– number of measurements.

The greater the number of dimensions, the smaller and the closer it gets to σ. If the true value of the measured quantity is μ, its arithmetic mean value obtained as a result of measurements is , and the random absolute error is , then the measurement result will be written in the form .

The interval of values ​​from to , which contains the true value of the measured quantity μ, is called confidence interval. Since it is a random variable, the true value falls within the confidence interval with probability α, which is called confidence probability, or reliability measurements. This value is numerically equal to the area of ​​the shaded curved trapezoid. (see picture)

All this is true for a sufficiently large number of measurements, when σ is close. To find the confidence interval and confidence probability for a small number of measurements, which we deal with in the course of laboratory work, we use Student probability distribution. This is the probability distribution of a random variable called Student's coefficient, gives the value of the confidence interval in fractions of the root mean square error of the arithmetic mean.


The probability distribution of this quantity does not depend on σ 2, but significantly depends on the number of experiments n. With increasing number of experiments n the Student distribution tends to the Gaussian distribution.

The distribution function is tabulated (Table 1). The value of the Student coefficient is at the intersection of the line corresponding to the number of measurements n, and the column corresponding to the confidence probability α

The probability that the true value of the measured quantity lies within a certain interval is called confidence probability , or reliability factor, and the interval itself - confidence interval.

Each confidence probability has its own confidence interval. In particular, a confidence level of 0.67 corresponds to a confidence interval from to . However, this statement is true only for a sufficiently large number of measurements (more than 10), and the probability of 0.67 does not seem reliable enough - in approximately each of the three series of measurements y may be outside the confidence interval. To obtain greater confidence that the value of the measured value lies within the confidence interval, a confidence probability of 0.95 - 0.99 is usually set. Confidence interval for a given confidence probability taking into account the influence of the number of measurements n can be found by multiplying the standard deviation of the arithmetic mean

.

by the so-called Student coefficient. Student's coefficients for a series of values ​​and n are given in the table.

Table - Student coefficients

Number of measurements n Confidence probability y
0,67 0,90 0,95 0,99
2,0 6,3 12,7 63,7
1,3 2,4 3,2 5,8
1,2 2,1 2,8 4,6
1,2 2,0 2,6 4,0
1,1 1,8 2,3 3,3
1,0 1,7 2,0 2,6

Finally, for the measured quantity y for a given confidence probability y and number of measurements n we get the condition

We will call the quantity random error quantities y.

Example: see lecture No. 5 – a series of numbers.

Let's define

With a number of measurements of 45 and a confidence probability of 0.95, we obtain that the Student coefficient is approximately equal to 2.15. Then the confidence interval for this series of measurements is 62.6.

Misses (gross error) - gross errors associated with operator errors or unaccounted for external influences. They are usually excluded from measurement results. Mistakes are usually caused by inattention. They can also occur due to a malfunction of the device.

Write down the task. For example: The average weight of a male student at ABC University is 90 kg. You will test the accuracy of predicting the weight of male students at ABC University within a given confidence interval.

Select a suitable sample. You will use it to collect data to test your hypothesis. Let's say you have already randomly selected 1000 male students.

Calculate the mean and standard deviation of this sample. Select statistical quantities(such as mean and standard deviation) that you want to use to analyze your sample. Here's how to calculate the mean and standard deviation:

  • To calculate the sample mean, add up the weights of the 1,000 sampled men and divide the result by 1,000 (the number of men). Let's say we get an average weight of 93 kg.
  • To calculate the standard deviation of a sample, you need to find the mean. Then you need to calculate the variance of the data, or the average of the squares of the differences from the mean. Once you find this number, simply take the square root of it. Let's say that in our example the standard deviation is 15 kg (note that sometimes this information can be given along with the conditions of the statistical problem).
  • Select the desired confidence level. The most commonly used confidence levels are 90%, 95% and 99%. It can also be given along with the problem statement. Let's say you chose 95%.

  • Calculate the margin of error. You can find the margin of error using the following formula: Z a/2 * σ/√(n). Z a/2 = confidence coefficient (where a = confidence level), σ = standard deviation, and n = sample size. This formula shows that you should multiply the critical value by the standard error. Here's how you can solve this formula by breaking it down into parts:

    • Calculate the critical value or Z a/2 . The confidence level is 95%. Convert the percentage to a decimal: 0.95 and divide by 2 to get 0.475. Then look at the Z-score table to find the corresponding value for 0.475. You will find a value of 1.96 (at the intersection of row 1.9 and column 0.06).
    • Take the standard error (standard deviation): 15 and divide it by the square root of the sample size: 1000. You get: 15/31.6 or 0.47 kg.
    • Multiply 1.96 by 0.47 (the critical value by the standard error) to get 0.92, the margin of error.
  • Write down the confidence interval. To formulate a confidence interval, simply record the mean (93) ± margin of error. Answer: 93 ± 0.92. You can find the upper and lower limits of the confidence interval by adding and subtracting the error to/from the mean. So the lower bound is 93 - 0.92 or 92.08 and the upper bound is 93 + 0.92 or 93.92.

    • You can use the following formula to calculate the confidence interval: x̅ ± Z a/2 * σ/√(n), where x̅ is the average value.
  • Confidence interval for mathematical expectation - this is an interval calculated from data that, with a known probability, contains the mathematical expectation of the general population. A natural estimate for the mathematical expectation is the arithmetic mean of its observed values. Therefore, throughout the lesson we will use the terms “average” and “average value”. In problems of calculating a confidence interval, an answer most often required is something like “The confidence interval of the average number [value in a particular problem] is from [smaller value] to [larger value].” Using a confidence interval, you can evaluate not only average values, but also the specific weight of a particular characteristic of the general population. Average values, dispersion, standard deviation and error, through which we will arrive at new definitions and formulas, are discussed in the lesson Characteristics of the sample and population .

    Point and interval estimates of the mean

    If the average value of the population is estimated by a number (point), then a specific average, which is calculated from a sample of observations, is taken as an estimate of the unknown average value of the population. In this case, the value of the sample mean - a random variable - does not coincide with the mean value of the general population. Therefore, when indicating the sample mean, you must simultaneously indicate the sampling error. The measure of sampling error is the standard error, which is expressed in the same units as the mean. Therefore, the following notation is often used: .

    If the estimate of the average needs to be associated with a certain probability, then the parameter of interest in the population must be estimated not by one number, but by an interval. A confidence interval is an interval in which, with a certain probability P the value of the estimated population indicator is found. Confidence interval in which it is probable P = 1 - α the random variable is found, calculated as follows:

    ,

    α = 1 - P, which can be found in the appendix to almost any book on statistics.

    In practice, the population mean and variance are not known, so the population variance is replaced by the sample variance, and the population mean by the sample mean. Thus, the confidence interval in most cases is calculated as follows:

    .

    The confidence interval formula can be used to estimate the population mean if

    • the standard deviation of the population is known;
    • or the standard deviation of the population is unknown, but the sample size is greater than 30.

    The sample mean is an unbiased estimate of the population mean. In turn, the sample variance is not an unbiased estimate of the population variance. To obtain an unbiased estimate of the population variance in the sample variance formula, sample size n should be replaced by n-1.

    Example 1. Information was collected from 100 randomly selected cafes in a certain city that the average number of employees in them is 10.5 with a standard deviation of 4.6. Determine the 95% confidence interval for the number of cafe employees.

    where is the critical value of the standard normal distribution for the significance level α = 0,05 .

    Thus, the 95% confidence interval for the average number of cafe employees ranged from 9.6 to 11.4.

    Example 2. For a random sample from the population of 64 observations, the following total values ​​were calculated:

    sum of values ​​in observations,

    sum of squared deviations of values ​​from the mean .

    Calculate the 95% confidence interval for the mathematical expectation.

    Let's calculate the standard deviation:

    ,

    Let's calculate the average value:

    .

    We substitute the values ​​into the expression for the confidence interval:

    where is the critical value of the standard normal distribution for the significance level α = 0,05 .

    We get:

    Thus, the 95% confidence interval for the mathematical expectation of this sample ranged from 7.484 to 11.266.

    Example 3. For a random population sample of 100 observations, the calculated mean is 15.2 and standard deviation is 3.2. Calculate the 95% confidence interval for the expected value, then the 99% confidence interval. If the sample power and its variation remain unchanged and the confidence coefficient increases, will the confidence interval narrow or widen?

    We substitute these values ​​into the expression for the confidence interval:

    where is the critical value of the standard normal distribution for the significance level α = 0,05 .

    We get:

    .

    Thus, the 95% confidence interval for the mean of this sample ranged from 14.57 to 15.82.

    We again substitute these values ​​into the expression for the confidence interval:

    where is the critical value of the standard normal distribution for the significance level α = 0,01 .

    We get:

    .

    Thus, the 99% confidence interval for the mean of this sample ranged from 14.37 to 16.02.

    As we see, as the confidence coefficient increases, the critical value of the standard normal distribution also increases, and, consequently, the starting and ending points of the interval are located further from the mean, and thus the confidence interval for the mathematical expectation increases.

    Point and interval estimates of specific gravity

    The share of some sample attribute can be interpreted as a point estimate of the share p of the same characteristic in the general population. If this value needs to be associated with probability, then the confidence interval of the specific gravity should be calculated p characteristic in the population with probability P = 1 - α :

    .

    Example 4. In some city there are two candidates A And B are running for mayor. 200 city residents were randomly surveyed, of which 46% responded that they would vote for the candidate A, 26% - for the candidate B and 28% do not know who they will vote for. Determine the 95% confidence interval for the proportion of city residents supporting the candidate A.

    Konstantin Kravchik clearly explains what a confidence interval is in medical research and how to use it

    "Katren-Style" continues the publication of Konstantin Kravchik's cycle about medical statistics. In two previous articles, the author dealt with the explanation of concepts such as and.

    Konstantin Kravchik

    Mathematician-analyst. Specialist in statistical research in medicine and humanities

    Moscow city

    Very often in articles on clinical studies you can find a mysterious phrase: “confidence interval” (95 % CI or 95 % CI - confidence interval). For example, an article might write: “To assess the significance of differences, the Student’s t-test was used to calculate the 95 % confidence interval.”

    What is the value of the “95 % confidence interval” and why calculate it?

    What is a confidence interval? - This is the range within which the true population means lie. Are there “untrue” averages? In a sense, yes, they do. In we explained that it is impossible to measure the parameter of interest in the entire population, so researchers are content with a limited sample. In this sample (for example, based on body weight) there is one average value (a certain weight), by which we judge the average value in the entire population. However, it is unlikely that the average weight in a sample (especially a small one) will coincide with the average weight in the general population. Therefore, it is more correct to calculate and use the range of average values ​​of the population.

    For example, imagine that the 95% confidence interval (95% CI) for hemoglobin is 110 to 122 g/L. This means that there is a 95% chance that the true mean hemoglobin value in the population will be between 110 and 122 g/L. In other words, we do not know the average hemoglobin value in the population, but we can, with 95 % probability, indicate a range of values ​​for this trait.

    Confidence intervals are particularly relevant for differences in means between groups, or effect sizes as they are called.

    Let's say we compared the effectiveness of two iron preparations: one that has been on the market for a long time and one that has just been registered. After the course of therapy, we assessed the hemoglobin concentration in the studied groups of patients, and the statistical program calculated that the difference between the average values ​​of the two groups was, with a 95 % probability, in the range from 1.72 to 14.36 g/l (Table 1).

    Table 1. Test for independent samples
    (groups are compared by hemoglobin level)

    This should be interpreted as follows: in some patients in the general population who take a new drug, hemoglobin will be higher on average by 1.72–14.36 g/l than in those who took an already known drug.

    In other words, in the general population, the difference in average hemoglobin values ​​between groups is within these limits with a 95% probability. It will be up to the researcher to judge whether this is a lot or a little. The point of all this is that we are not working with one average value, but with a range of values, therefore, we more reliably estimate the difference in a parameter between groups.

    In statistical packages, at the discretion of the researcher, you can independently narrow or expand the boundaries of the confidence interval. By lowering the confidence interval probabilities, we narrow the range of means. For example, at 90 % CI the range of means (or difference in means) will be narrower than at 95 %.

    Conversely, increasing the probability to 99 % expands the range of values. When comparing groups, the lower limit of the CI may cross the zero mark. For example, if we expanded the boundaries of the confidence interval to 99 %, then the boundaries of the interval ranged from –1 to 16 g/l. This means that in the general population there are groups, the difference in means between which for the characteristic being studied is equal to 0 (M = 0).

    Using a confidence interval, you can test statistical hypotheses. If the confidence interval crosses the zero value, then the null hypothesis, which assumes that the groups do not differ on the parameter being studied, is true. The example is described above where we expanded the boundaries to 99 %. Somewhere in the general population we found groups that did not differ in any way.

    95% confidence interval of the difference in hemoglobin, (g/l)


    The figure shows the 95% confidence interval for the difference in mean hemoglobin values ​​between the two groups. The line passes through the zero mark, therefore there is a difference between the means of zero, which confirms the null hypothesis that the groups do not differ. The range of difference between groups is from –2 to 5 g/L. This means that hemoglobin can either decrease by 2 g/L or increase by 5 g/L.

    The confidence interval is a very important indicator. Thanks to it, you can see whether the differences in the groups were really due to the difference in means or due to a large sample, since with a large sample the chances of finding differences are greater than with a small one.

    In practice it might look like this. We took a sample of 1000 people, measured hemoglobin levels and found that the confidence interval for the difference in means ranged from 1.2 to 1.5 g/l. The level of statistical significance in this case p

    We see that the hemoglobin concentration increased, but almost imperceptibly, therefore, statistical significance appeared precisely due to the sample size.

    Confidence intervals can be calculated not only for means, but also for proportions (and risk ratios). For example, we are interested in the confidence interval of the proportions of patients who achieved remission while taking a developed drug. Let us assume that the 95 % CI for the proportions, i.e., for the proportion of such patients, lies in the range of 0.60–0.80. Thus, we can say that our medicine has a therapeutic effect in 60 to 80 % of cases.

    Random articles

    Up