MIDTERM EXAMINATION 
PART I 
Question 1: 
Some of the best answers: ====================================There are 720 permutations. _{6}P_{6 }= 6!/(66)! = (6*5*4*3*2*1)/1= 720 ====================================_{m}P_{m} = m! 6! = 720 permutations of the six different symbols _{m}C_{r} = (m!) / (r!(mr)!) ==================================== Permutations 6! = 6 * 5 * 4 * 3 * 2 * 1 = 720 ==================================== How many permutations of these symbols exist?_{m}P_{r }= m!/(mr)! _{m}P_{m }= m! 6!=720 ==================================== 
Question 2: Central limit theorem says: “Sufficiently large populations are always normally distributed”. Right? 
Some of the best answers: =========================================Wrong. The Central Limit Theorem states that for large samples, the sample mean is approximately normally distributed and as a sample gets larger, the sample means become even closer to a normal distribution. =========================================== Wrong! Not sufficiently large population. ========================================= The central limit theorem does not say that "sufficiently large populations are always normally distributed". The theorem says that the distribution of a sample mean xbar should be close to normally distributed if a sample size is large and xbar should become closer to the normal distribution as the sample size becomes larger. If the sample size is large enough, it should have a normally distributed sample mean xbar ================================================== The central limit theorem states that “for a relatively large sample size, the variable x is approximately normally distributed. It becomes better and better with increasing sample size.” ============================================= THE CENTRAL LIMIT THEOREM DOES NOT SAY THAT SUFFICIENTLY LARGE POPULATIONS ARE NORMALLY DISTRIBUTED. IT SAYS THAT THE MEAN WILL BE NORMALLY DISTRIBUTED FOR A LARGE SAMPLE SIZE. THE APPROXIMATION BECOMES BETTER AND BETTER WITH AN LARGER SAMPLE SIZE AS WELL. ============================================= Central Limit Theorem states that a if a sample size is sufficiently large, the sample mean is roughly normally distributed, and as size increases, tendency toward normal distribution increases. =============================================== 
Question 3: What is the difference between Sample Space and Event Space? 
Some of the best answers: ============================================Sample space is the collection of all elementary events Event space is the collection of all possible events Event is a subset of a sample space, set of outcomes for the experiment, any subset of a sample space ============================================Sample space is a [set of all related elements from which a sample is selected] where as; an event space is the set of all events. An EVENT is a subset of the sample space. Therefore an event space is a set of subsets of sample space. ============================================= 
Question 4: What is the difference between Discrete and Continuous Random Variable? 
Some of the best answers: ===================================== Discreet random variables are countable infinite, this means that they can be listed even though they may go to infinity. EACH VALUE OF THE VARIABLE GETS A CERTAIN QUANTITY OF PROBABILITY MASS . Continuous random variables are those that are uncountable infinite meaning that within two given numbers, we could find an infinite amount of numbers, this implies that we cannot assign a CERTAIN AMMOUNT OF PROBABILITY MASSS TO EACH VALUE OF THE VARIABLE, instead we have to consider continuous variables in terms of intervals. =====================================A discrete random variable is a random variable which is in a "finite or countebly infinite set." The possible values "can be listed" and it is usually a "collection of whole numbers." A continuous random variable is a random variable that is part of an uncountebly infinite set, whose values "form some intervals of numbers." The graphs of the functions vary because a discrete random variable is made up of many separate poits, whereas the continuous random variable is a continuous graph with infinite points. ===================================== Discrete Random Variable is a random variable whose possible values can be listed  form a finite or countably infinite set of numbers. Continuous Random Variable is a random variable whose possible values form some intervals of numbers. ===================================== A discrete random variable is one which can only have certain values. There can be no variable between the specified values. Integers as random variables is an example of this. There may be infinite random variables, but it is a countable infinity, because we can list the possible values. A continuous random variable can have any value on the number line, and for any 2 values there is always another value that falls between them. Continuous random variables therefore display the property of uncountable infinity. ===================================== The main difference between discrete and continuous random variables is that one is countable and has definitive values (discrete) while the other has an infinite number of values and can simply be one of an infinite number of values between two points (continuous), ALSO REFERRED TO AS AN "INTERVAL OF NUMBERS". Discrete variables can go on infinitely, but they will be countable fairly easily and each possible outcome will have some probability to it that will make sense. When adding up all probabilities of outcomes, you will get 1. This is also true for continuous random variables, but in continuous random variables, each variable has a probability of zero of coming true because the number of outcomes is infinite and uncountable. It is like saying the probability is (1/∞) one over an infinitely large number which is the same as zero. =====================================

PART II  
Question 5: You are given two independent samples from a very large population. The size of the first sample is 10 and the size of the second is 25. You were told that it is well known that the population standard deviation is 38, but the population mean is unknown and your task is to estimate it. You calculated the mean from the first sample and it was 44. The mean calculated from the second sample was 56. What can you say about the population mean?  
Some of the best answers: =======================================Sample 1: n = 10 Sample mean = 44 Sample 2: n = 25 Sample mean = 56 Population standard deviation = 38. Population mean = ? Weighted average of sample means: (10*44+25*56)/35 = 52.57 52.57 is the sample mean. We will assume that the sample means are normally distributed with the mean of the distribution being equal to the population mean and with a standard deviation of sample means being σ/(n^˝) if n = 35 and σ = 38, then the standard deviation of the sample means is 38/(35^˝) = 6.423 52.57 – 2(sample means σ) = 52.57  2* 6.423 = 39.724 52.57 + 2(sample means σ) = 52.57 + 2*6.423 = 65.416 Based on the confidence probabilities, we can be 95.44% certain that the population mean will fall within 2 standard deviations of the sample mean, so there is a 95.44% chance that μ is between 39.724 and 65.416. We can be 68.26% sure that μ will fall between 52.57 – (sample mean σ) and 52.57 + (sample mean σ), that is between 46.147 and 58.993 We can be 99.74% sure that μ will fall between 52.57 – 3(sample mean σ) and 52.57 + 3(sample mean σ), that is between 33.301 and 71.839.</b> =========================================== The weighted mean we got was 52.57. This value was obtained by weighting the averages and summing them together. The standard deviation of the population is 38. The best approximation for the mean is 52.57. I will now calculate the standard deviation. We know that the expected value is 95.44% between 2 standard deviations above and below the mean. We find that 2 standard deviations above the weighted mean would give us a value of 65.42 and 2 below would give a value of 39.724. This means that there is a 95.44% chance that the population mean is between 65.42 and 39.72. ===========================================  
Question 6: Suppose that you are crossing Com. Avenue several times every day and always at the same spot. Sometimes when you come the light is green and you do not have to wait, but more frequently the light is red or yellow and you have to wait several minutes. Suppose further that at this spot the green is on for 3 minutes and the yellow and red for 7 minutes in each 10 minutes interval. Unlike most of the other BU students you are well disciplined and never cross the street when red or yellow light is on. (a) How long you should expect to wait before you come sufficiently close to see the light ? (b) How long you should expect to wait when you already stand before the crossing and you see the light?  
Some of the best answers: ========================================= The light is green 30% of the time. All possible waiting times are {7,6,5,4,3,2,1,0,0,0} minutes. However, 0 is there the equivalent of three times because its probability of occurring is 0.30. The mean waiting time (if you do not know what lite is on) is (7+6+5+4+3+2+1+0+0+0)/10 = 2.8 minutes. When you are already there, if the light is green you know that you do not need to wait. If it is red, you will probably expect to wait around 4 minutes because your outcome will come from 17 minutes. THIS IS THE MEAN AND THE MEDIAN OF THE POSSIBLE WAITING TIMES AND WILL GIVE A GOOD GENERAL IDEA OF THE MIDDLE VALUE OF THE TIMES THAT YOU WOULD WAIT IF THE LIGHT WAS RED WHEN YOU GOT THERE. THIS IS ASSUMING THAT YOU CANNOT PICK ANY VALUE BETWEEN THE MINUTES AND WILL CHOOSE THE CLOSEST POSSIBLE MINUTE. OTHERWISE, YOU WOULD NEED TO INCLUDE EACH POSSIBLE SECOND CREATING A CONTINUOUS SET OF POSSIBILITIES RATHER THAN A DISCRETE ONE. ============================================= Expected time of waiting when you see green light 0 minutes. Expected time of waiting when you see red light 3.5 minutes Expected time of waiting when you do not see the light .3(0) + .7 (3.5) = 2.45 minutes  
Question 7: SAU is a country with well educated population but also with high unemployment. Let us divide education into two categories: (A) Low  high school or less and (B) high  college undergraduate and graduate.. The recent census of population found the following division of the labor force: 30% had high education and was employed; 5% had high education and was unemployed; 50% had low education and was employed; 15% had low education and was unemployed If you chose randomly a person from that labor force the above shares can be interpreted as probabilities. What are the joint and marginal probabilities? What is a conditional probability of being unemployed if the person had low education? What is the conditional probability of having high education if the person is employed? Is employment and education statistically independent?  
Some of the best answers:====================================================== LABOR FORCE: Employed Unemployed EDUCATION : Low High
The joint probabilities are the probabilities in the middle that depend on two different variableseducation and employment. The marginal probabilities are the probabilities in the margin of the table. They are the total of the joint probabilities in their column and show the probability of only one variable. "JOINT PROBABILITIES" SHOW BIVARIATE DATA AND SHOW TWO VARIABLES WHILE THE "MARGINAL PROBABILITIES" SHOW UNIVARIATE DATA SHOWING THE EFFECT OF ONLY ONE VARIABLE. The conditional probability of being unemployed if having a low education is: 0.15/0.65 = 0.231 or 23.1% The conditional probability of having high education if unemployet is: 0.30/0.80 = .375 or 37.5% No, unemployment and education are not statistically independent. THE OUTCOME WOULD NOT BE THE SAME DEPENDING ON THE DIFFERENT VARIABLES IN ANY CASE SO NOTHING IS STATISTICALLY INDEPENDENT IN THIS EXAMPLE. ======================================================= High Ed. Low Ed. Total Employed 30% 50% 80% Unemployed 5% 15% 20% Total 35% 65% 100% The joint probabilities are the ones in the middle that give amounts of categories combined  (30%,50%,5%,15%). The marginal probabilities are on the right side and the bottom row and are summed percentages that give totals for each individual category. A IS LOW EDUCATION B IS HIGH EDUCATION P(unemployed  A) = P(unemployed and A) / P(A) = .15 / .65 = .23076 = 23.08%. P(B  employed) = P(B and employed) / P(employed) = .30 / .80 = .375 = 37.5%. IS P(EMPLOYED  HIGHER ED.)*P(HIGHER ED.) = P(EMPLOYED)*P(HIGHER ED.)? (.3) * (.35) IS NOT EQUAL TO (.8) * (.35), SO EMPLOYMENT AND EDUCATION ARE DEPENDENT. =============================================== A: Low education B: High education E: Employed U: Unemployed · Joint probabilities P(A&E) = .5 P(A&U) = .15 P(B&E) = .3 P(B&U) = .05 · Marginal probabilities P(A) = P(A&E) + P(A&U) = .65 P(B) = P(B&E) + P(B&U) = .35 P(E) = P(A&E) + P(B&E) = .8 P(U) = P(A&U) + P(B&U) = .2 · Conditional Probability of being unemployed given low education P(U/A) = P(A&U)/P(A) = (.15)/(.65) = .23 · Conditional Probability of high education given the person is employed P(B/E) = P(B&E)/P(E) = (.3)/(.8) = .375 · Unemployment and education ARE NOT statistically independent because if that WOULD BE the case then P(U/A) = P (U) The probability of being unemployed given that a person has low education would not be influenced by the probability of low education; Therefore, this conditional probability would be equal to the marginal probability of even U, these two probabilities are NOT the same: P (U/A) = P (U) → .23 ≠.2 P (B/E) = P (B) → .375 ≠.35 ============================================= MARGINAL PROBABILITIES: 35% of population – high education 65% of population – low education 80% of education – employed 20% of population – not employed JOINT PROBABILITIES: GIVEN IN QUESTION 30%  HIGH EDUCATION AND EMPLOYED 5%  HIGH EDUCATION AND NOT EMPLOYED 50%  LOW EDUCATION AND EMPLOYED 15%  LOW EDUCATION AND NOT EMPLOYED Conditional: low ed; unemployed P(BA) = (P(A+B)) / (P(A)) 65/100 15/65 = .23 = 23% Conditional: employed; high ed 80/100 30/80 = .375 = 37.5% Employment and education are not independent as education seems to have an effect on the unemployment rate. FROM THE CONDITIONAL PROBABILITIES CALCULATED ABOVE, IT IS SHOWN THAT THE OCCURRENCE OF EITHER UNEMPLOYMENT OR EDUCATION CHANGES THE PROBABILITY OF THE OCCURRENCE OF THE OTHER. ======================================================  
PART III  
Question 8: Random sampling resulted in the following sequence of numbers: (5, 2, 3, 8, 12, 0, 2, 5, 9, 14) Calculate the mean, median, mode, variance and standard deviation.  
Some of best answers: =========================================================== Mean = 60/10 = 6 Median (mean of middle two observations) = [Set data in INCREASING ORDER: (5, 2, 3, 8, 12, 0, 2, 5, 9, 14) > (0, 2, 2, 3, 5, 5, 8, 9, 12, 14) Calculate mean of middle two observations = 10/2 = 5 Median = 5 Mode(s) = 5, 2 Variance = s^2 = 192/9 = 21.33 Standard Deviation = s = (square root of) 192/9 = 4.619 ====================================================== Sample mean = Σx/n = 60/10 = 6 THIS IS FOUND BY TAKING THE SUM OF THE VARIABLES AND DIVIDING THAT BY THE TOTAL NUMBER OF VARIABLES. {0,2,2,3,5,5,8,9,12,14} Median = 5 THIS REPRESENTS THE CENTER NUMBER. Mode = 2 and 5 THERE ARE TWO MODES BECAUSE THERE ARE TWO NUMBERS THAT SHOW UP THE MOST IN THE SAMPLE SPACE. x x(sample mean) [x(sample mean)]^2 0 6 36 2 4 16 2 4 16 3 3 9 5 1 1 5 1 1 8 2 4 9 3 9 12 6 36 14 8 64 Sum=60 Sum=192 Variance = Σ(xsample mean)^2/(n1) = 192/9 = 21.3 Standard Deviation = √(Σ(xsample mean)^2/(n1)) = √21.3 = 4.619 CAN ALSO BE FOUND USING THE COMPUTING FORMULA: √[(552(60^2/10))/(101)] = 4.619 Mean = (5+2+3+8+12+0+2+5+9+14)/10 = (60)/10 = 6 = x1 Median = {0,2,2,3,5,5,8,9,12,14} Median = 5 Mode = 2, 5 Variance: sumation of ((xx1)^2)/(n1) [(65)^2 + (62)^2 + (63)^2 + (68)^2 + (612)^2 + (60)^2 + (62)^2 + (65)^2 + (69)^2 + (614)^2]/(101) s^2 = (1+16+9+4+36+36+16+1+9+64)/9 s^2 = 192/9 Standard Deviation: s=square root(192/9) s = 4.6188 ================================================================= {5, 2, 3, 8, 12, 0, 2, 5, 9, 14} Mean = (5+2+8+12+0+2+5+9+14) / 10 = 6 If we list the numbers in increasing order: {0,2,2,3,5,5,8,9,12,14} Median = (5+5) / 2 = 5 Mode = 2 and 5 Standard deviation = ( S(x6)^2 / (101) )^1/2 = (192/9)^1/2 = 4.62 Variance = 192 / 9 = 21.33 =========================================================  
Question 9: What statistics did you calculate for assignment #2 and what did they tell you?
 
Some of the best answers:======================================I chose to analyze the CPI’s of the Northeast and Southeast regions of inflation. I mostly used the means of the CPI’s and the rates of inflation and also the Standard Deviation. I also got the range over the 3 yr period of inflation and found that over the whole period inflation rates were similar. The month to month inflation means for both regions was nearly the same, however the south had a much higher standard deviation of inflation, and these results were reflected in the fact that the inflation rates in the south were always much more extreme than in the Northeast, but the mean was still similar for both regions. I thought it was interesting to see that even within a single country, rates of inflation vary even between two regions. ==================================================== For assignment 2, I calculated the mean, median, mode and standard deviation for two variables: 1) The net enrollment ratio for the world, divisions of North America and Europe, Africa, and Latin America and the Caribbean DURING TWO YEARS 1990 AND 1998 and 2) the GDP growth from 19902000 for the world and the same divisions mentioned above. These statistics told me that enrollment in Europe and North America was above the world mean meaning that literacy was higher, however GDP growth was below the world mean meaning that DURING THE PAST YEARS THESE ECONOMIES HAVE GROWN AT A SLOWER RATE THAN OTHERS HAVE. For Africa, enrollment ratio mean was below the world mean and so was GDP growth. For Latin American and the Caribbean, enrollment was around the world mean and GDP growth was above it. A relationship between GDP growth and enrollment in school seems more apparent in this region. ==================================================== I collected data from the two governments of West Germany and Great Britain from the years 19531963. I calculated that a certain trend is noticeable with this particular data. I used descriptive statistics to analyze the particular distributions of this data particularly the variables in the test. For instance, I noticed that the dispersion of the unemployment rate in Great Britain was lower than that of West Germany’s because its rate was relatively stable throughout the ten years I researched. I found a trend indicating the same for Great Britain’s GNP, Industrial Production, and output per manhour. For West Germany, the variance was relatively larger compared to Britain’s because the unemployment decrease substantially throughout the ten years. In West Germany, the unemployment rate dropped at an annual rate of 3.6%. This number indicated a relationship to its mean and standard deviation. For instance, the higher the change in annual unemployment, the higher the dispersion. I mean to say GNP rather than GDP. ====================================================== In assignment #2, I recorded the data of the unemployment rates over the past 40 periods of the United States (country), Wisconsin (state), and Wauwatosa (city). I CAME TO SEVERAL CONCLUSIONS after calculating various descriptive statistics FOR EACH RATE. SOME OF THESE STATISTICS WERE THE mean (sum of x divided by n), median (middle number), mode (most used), range (max number – min number), variance ((XX`)^2), as well as percentage growth over time. I noticed that the range HAD THE SMALLEST VALUE IN the sample with the smallest size. This also showed me that its variance TOO WAS TINY. This CALCULATION allowed me to conclude that the samples taken from the small town HAD VERY SIMILAR CHARACTERISTICS. The samples taken in the entire country, on the other hand, had many more differences. There was thus a larger range and greater deviation from the mean FOR THE UNITED STATES. I can also conclude that when the population is larger, the mean INCREASES. The United States’ mean was THUS much greater than that of the city of Wauwatosa. The mean of the unemployment rate may have increased with the size of the sample, but I can further conclude that the larger the sample size, the more diverse the sample, AND MORE PRECISE THE MEAN MAY BE. IN TURN, I WAS LED TO BELIEVE THAT THERE IS A greater chance of calculating ACCURACY WITH the variables in a SAMPLE SIZE HAVING A larger range. =================================================The descriptive statistics I calculated for assignment number 2 are: Mean > the average of the data set Mode > the most frequent data value Median > the middle # of data set (ordered) Range > Max – Min 1St Quartile > median of values less than or equal to median of entire data set 2nd Quartile > median of entire data set 3rd Quartile > median of values greater than or equal to median of entire data set IQR (interquartile range) > Q3 – Q1 Min > smallest value in data set Max > largest value in data set Growth rate of Gross Domestic Product > (GDP(Yr2) – GDP(Yr1)) / GDP(Yr1) …for the data, GDP vs. Unemployment Rate The GDP for both countries are increasing data variable, that is, increasing approximately linearly. Therefore there is no mode in the data set for GDP. The Unemployment Rate is always positive, so the Min will never be 0. In terms of a good data to explain and utilize descriptive statistics, the unemployment rate is a sufficient data set to use. =============================================== My assignment included statistics on GDP and GDP growth rate, along with foreign exchange indices and agricultural production indices for various countries and from time span of approximately 199091 to 200001. While my analysis mainly included a comparison and an interpretation of the trend I could calculate the mean, median, mode, standard deviation for variables such as the GDP growth rate. The mean would then give me the average growth of the country over the span of 10 years 19902000. The standard deviation helps me understand the fluctuation patterns of the economy and thus analyze whether there have been unusual trends and the reasons that affect the economy. I drew line graphs of these variables as well and these help compare for example, the relation between the change in exchange rates and agricultural exports. The descriptive statistics (mean, median etc.) also helped understand at what rate the economy stabilizes and help predict the general growth trend in next few years. ============================================= The statistics that I calculated for my research were the descriptive statistics in that I used the measures of central tendency to calculate the mean GDP and the standard deviation away from the GDP in years 19531963 for West Germany and Great Britain. Since the average of the 10 GDP'S was relatively small, the dispersion or variance of these years was around the GDP mean and was not widely dispersed. I also calculated the same statistics for the rate of unemployment. The two were descriptive until I began to make inferences about the variales studied for both countries. I noticed that the unemployment rate was more dispersed for countries that had a low averaged GDP. Since unemployment rose drastically in West Germany, the variance was more widely dispersed according to the level at which its corresponding GDP was at. ========================================For assignment 2 I was examining the poverty rate in the United States and I was able to find the mean poverty rate over the past 40 years to be around 14% while the 2000 poverty rate is at about 11%. The median and mode all occurred very close to the mean as well as showing that the data was pretty constant over time. The standard deviation form the mean was somewhere at about 2 or so which also proves this constance about the poverty rate. No real outliers occurred however the maximum value of the poverty rate was probably 7 or so percentage pts from the mean, but this can be attributed to the Great Depression and not to mistakes in the data. ========================================================== For assignment #2 I calculated the mean, median, mode, range, variance, standard deviation, frequency, relative frequency for the average personal income of the 50 states. My statistics told me the sample mean of personal income of the U.S. population as well as a range (mean +/) standard deviation) of what most Americans earn for a living on average. It also showed the price adjusted level of income providing insight into which states are relatively more os less expensive to live in. This could help people in the future if/when transportaion becomes much faster where people could live in inexpensive areas like Oklahoma and work in expensive areas like New Jersey where the average salary is higher, therefore maximizing their profits further. This is a long way off though. For now my data and statisitics showed the trends of 2000 for both price levels, personal income(adjusted) and salaries (nonadjusted) for both the avergae american and for each individual state. ===============================I calculated the mean , median, mode, range, skewness, standard deviation, and variance. These statistics allowed me to make comparisons between two separate groups of data. Differences in the mean and median helped to prove my hypothesis that unemployment rates of blacks are greater than those of whites both nationally and in Massachusetts. However, it showed that the difference by which black unemployment was greater than white unemployment was much smaller for Massachusetts than the nation. Standard deviation and variance showed how in the individual state, there were greater fluctations in the unemployment rate. Skewness measured the location of the mean in relation to the median, or rather the shape of the distribution. ====================================The statistics taken from the paper I'm analyzing that were put in my second assignment were the mean, standard deviation, maximum and minimum of the variables used in the regression equation that the author uses to predict the aperture or openness of the market in the states for the retail and services industries to new establishments. They told me how the distribution of population, real income per capita, poverty, urbanization, density, people over 65 years of age and unemployment was for the pooling of the 49 states taken into consideration. This gave me an insight on what variables had the most effect on new establishment growth for the purpose of proving his hypothesis that unemployment has little or no effect on the growth in new establishments as a crosssectional analysis of the 49 states and a time series analysis from 1993 to 1997. He later finds that there is none.




