Q66. The mean of two samples of sizes 50 and 100 respectively are 54.1 and 50.3 and the standard deviations are 8 and 7. Then the mean and standard deviation of the sample of size 150 containing the two samples is
Q53. The number of seeds in 10 fruits of a variety are counted and the first 9 are found to be 10,9,6,11,8,9,12,10 and 7. If the number of seeds in the tenth fruit is x, then for the mean number of seeds to be at least 9,it is necessary and sufficient that
Since the values of the variable X are multiplied (or dividing) by a constant , the arithmetic mean of the new observations can be obtained by multiplying (or dividing) the initial arithmetic mean by the same constant.
Therefore, when dividing each entry in a data by non zero number, a, the arithmetic mean of new data is divided by a.
Q33. The sum of deviations taken from the actual arithmetic mean is
Explanation: A stem-and-leaf plot does group the data but does not explicitly give the frequencies.
Q18. You have a summary table and a simple bar chart (like the ones at the beginning of the chapter) indicating where customers prefer to do their banking. How could you enhance the bar chart to provide both visual and actual information?
(a) Use vertical lines on the bar chart to show the values more precisely.
(b) Add values to the bar chart like what is commonly done on a pie chart.
(c) Only the summary table can show the actual values for the data.
(d) The bar chart and summary table must be presented together in order to represent this data.
Explanation: The values from the summary table can be added to the bar chart.
Q19. It might be said that the stem-and-leaf display is really a quick and easy way of creating a rudimentary chart or diagram for numerical data. If so, which chart is used to describe categorical data does it most closely resemble?
(a) The stem-and-leaf display most closely resembles a rudimentary bar chart.
(b) The stem-and-leaf display most closely resembles a rudimentary pie chart.
(c) The stem-and-leaf display most closely resembles a rudimentary Pareto chart.
(d) The stem-and-leaf display does not resemble any of the above charts or diagrams.
Explanation: The number of classes is usually between 5 and 15.
The rule of thumb for creating a frequency distribution is to divide the data into 5-15 classes. While larger numbers of classes allow for larger data sets, how do you know exactly how many classes to use?
(a) If in doubt about the number of classes, select 10 since it is the midpoint between 5 and 15 classes.
(b) Any number of classes between 5 and 15 is sufficient.
(c) Determine the width of the class interval, then calculate the number of classes.
(d) Select the number of classes that provides definition to the shape of the data.
Explanation: An ogive is a graphical display of the cumulative percentages.
The table above shows the frequency and relative frequencies for 7 groups of restaurant meal prices. How was the value of 0.36 obtained for the relative frequency of meals costing $32 but less than $40?
(a) The number of data points is 50, so divide 18 by 50.
(b) (18 x 2)/100 = 0.36.
(c) The midpoint of the class is $36, so divide 36 by 100.
Explanation: Statistics is a form of mathematical analysis that uses quantified models, representations and synopses for a given set of experimental data or real-life studies. The science of collecting, organizing, presenting, analyzing and interpreting data to assist in making more effective decisions is called Statistics.
Q2.Methods of organizing, summarizing, and presenting data in an informative way are called:
Explanation: Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data.
Q3.The methods used to determine something about a population on the basis of a sample is called:
Explanation: we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study.
Q4. When the characteristic being studied is non – numeric, it is called a:
Explanation: A qualitative variable, also called a categorical variable, are variables that are not numerical. It describes data that fits into categories. For example: Eye colors (variables include: blue, green, brown, hazel)
Q5. When the variable studied can be reported numerically, the variable is called a:
Explanation: Variables that have are measured on a numeric or quantitative scale. Ordinal, interval and ratio scales are quantitative. A country’s population, a person’s shoe size, or a car’s speed are all quantitative variables.
Q6.A specific characteristic of a population is called:
Explanation: Parameters in statistics is an important component of any statistical analysis. In simple words, a parameter is any numerical quantity that characterizes a given population or some aspect of it. This means the parameter tells us something about the whole population.
Q7.A specific characteristic of a sample is called:
Explanation: In statistics and quantitative research methodology, a data sample is a set of data collected and/or selected from a statistical population by a defined procedure. Typically, the population is very large; making a census or a complete enumeration of all the values in the population is either impractical or impossible.
Q10. Listing of the data in order of numerical magnitude is called:
Explanation: Raw data (sometimes called source data or atomic data) is data that has not been processed for use. A distinction is sometimes made between data and information to the effect that information is the end product of data processing. Raw data that has undergone processing is sometimes referred to as cooked data.
Q12.Data that are collected by anybody for some specific purpose and use are called:
Explanation: Primary data means original data that has been collected specially for the purpose in mind. It means someone collected the data from the original source first hand. Data collected this way is called primary data.
Q13.The data which have under gone any treatment previously is called:
Explanation: Secondary data refers to data that was collected by someone other than the user. Common sources of secondary data for social science include censuses, information collected by government departments, organisational records and data that was originally collected for other research purposes.
Q14.The data obtained by conducting a survey is called:
Explanation: Survey researchers employ a variety of techniques in the collection of survey data. People can be contacted and surveyed using several different modes: by an interviewer in-person or on the telephone (either a landline or cellphone), via the internet or by paper questionnaires (delivered in person or in the mail).
The choice of mode can affect who can be interviewed in the survey, the availability of an effective way to sample people in the population, how people can be contacted and selected to be respondents, and who responds to the survey. In addition, factors related to the mode, such as the presence of an interviewer and whether information is communicated aurally or visually, can influence how people respond. Surveyors are increasingly conducting mixed-mode surveys where respondents are contacted and interviewed using a variety of modes.
Q15. A survey in which information is collected from each and every individual of the population is known as:
Explanation: A census is a survey conducted on the full set of observation objects belonging to a given population or universe. Context: A census is the complete enumeration of a population or groups at a point in time with respect to well defined characteristics: for example, population, production, traffic on particular roads.
SSC CGL Tier 2 Paper 3 Study Material Day 7 [statistics]
SSC CGL Tier 2 Paper 3 Study Material Day 7 [statistics]:-
Different Types of Moments and Their Relationship
19th century under the framework of the group theory and of the theory of algebraic invariants. The theory of algebraic invariants was thoroughly studied by famous German mathematicians P.A. Gordan and D. Hilbert and was further developed in the 20th century in and among others.
Moment invariants were first introduced to the pattern recognition and image processing community in 1962 when Hu employed the results of the theory of algebraic invariants and derived his seven famous invariants to rotation of 2-D objects. Since that time, hundreds of papers have been devoted to various improvements, extensions and generalizations of moment invariants and also to their use in many areas of application. Moment invariants have become one of the most important and most frequently used shape descriptors. Even though they suffer from certain intrinsic limitations (the worst of which is their globalness, which prevents direct utilization for occluded object recognition), they frequently serve as ”first-choice descriptors” and as a reference method for evaluating the performance of other shape descriptors. Despite a tremendous effort and huge number of published papers, many open problems remain to be resolved.
Moments in mathematical statistics involve a basic calculation. These calculations can be used to find a probability distribution’s mean, variance and skewness.
Suppose that we have a set of data with a total of n discrete points. One important calculation, which is actually several numbers, is called the sth moment. The sth moment of the data set with values x1, x2, x3, . . . , xn is given by the formula:
(x1s + x2s + x3s + . . . + xns)/n
Using this formula requires us to be careful with our order of operations. We need to do the exponents first, add, then divide this sum by n the total number of data values.
The term moment has been taken from physics. In physics the moment of a system of point masses is calculated with a formula identical to that above, and this formula is used in finding the center of mass of the points. In statistics the values are no longer masses, but as we will see, moments in statistics still measure something relative to the center of the values.
Moments are scalar quantities used for hundreds of years to characterize function and to capture its significant features. They have been widely used in statistics for description of the shape of a probability density function and in classic rigid-body mechanics to measure the mass distribution of a body. From the mathematical point of view, moments are ”projections” of a function onto a polynomial basis (similarly, Fourier transform is a projection onto a basis of harmonic functions). For the sake of clarity, we introduce some basic terms and propositions, which we will use throughout the book.
Definition 1: By an image function (or image) we understand any piecewise continuous real function f(x, y) of two variables defined on a compact support D ⊂ R × R and having a finite nonzero integral.
Definition 2: General moment M(f) pq of an image f(x, y), where p, q are non-negative integers and r = p + q is called the order of the moment, is defined as
where p00(x, y), p10(x, y), . . . , pkj(x, y), . . . are polynomial basis functions defined on D. (We omit the superscript (f) if there is no danger of confusion.)
Depending on the polynomial basis used, we recognize various systems of moments.
The nth raw moment µn (i.e., moment about zero) of a distribution P(x) is defined by µn = (xn)
µn the mean, is usually simply denoted u=u1 If the moment is instead taken about a point a,
µn(a) = <(x-a)n> = ∑(x-a)n P(x).
A statistical distribution is not uniquely specified by its moments, although it is by its characteristic function.
The moments are most commonly taken about the mean. These so-called central moments are denoted un and are defined by
µn = <(x-µ)n>
with µ1 = 0 The second moment about the mean is equal to the variance
where σ = √µ2 is called the standard deviation
The related characteristic function is defined by
=ø(n) (0) = [dn ø/ d tn]t = 0
in µn (0).
The moments may be simply computed using the moment-generating function,
Different types of Moments-
The types of moments are-
µn = M(n)(0).
1. First Moment
For the first moment we set s = 1. The formula for the first moment is thus:
(x1 x2 + x3 + . . . + xn)/n
This is identical to the formula for the sample mean.
The first moment of the values 1, 3, 6, 10 is (1 + 3 + 6 + 10) / 4 = 20/4 = 5.
2. Second Moment
For the second moment we set s = 2. The formula for the second moment is:
(x21 + x22 + x23 + . . . + x2n)/n
The second moment of the values 1, 3, 6, 10 is (12 + 32+ 62 + 102) / 4 = (1 + 9 + 36 + 100)/4 = 146/4 = 36.5.
For the third moment we set s = 3. The formula for the third moment is:
(x31 + x32 + x33 + . . . + x3n)/n
(The third moment of the values 1, 3, 6, 10 is (13 + 33 + 63 + 103) / 4 = (1 + 27 + 216 + 1000)/4 = 1244/4 = 311.
Higher moments can be calculated in a similar way. Just replace s in the above formula with the number denoting the desired moment
4. Fourth (s=4).
The 4th moment = (x14 + x24 + x34 + . . . + xn4)/n
Moments about the Mean
A related idea is that of the sth moment about the mean. In this calculation we perform the following steps:
First calculate the mean of the values.
Next, subtract this mean from each value.
Then raise each of these differences to the sth power.
Now add the numbers from step #3 together.
Finally, divide this sum by the number of values we started with.
The formula for the sth moment about the mean m of the values x1, x2, x3, . . . , xn is given by:
This formula is equivalent to that for the sample variance.
For example, consider the set 1, 3, 6, 10. We have already calculated the mean of this set to be 5. Subtract this from each of the data values to obtain differences of:
1 – 5 = -4
3 – 5 = -2
6 – 5 = 1
10 – 5 = 5
We square each of these values and add them together: (-4)2 + (-2)2 + 12 + 52 = 16 + 4 + 1 + 25 = 46. Finally divide this number by the number of data points: 46/4 = 11.5
Applications of Moments
As mentioned above, the first moment is the mean and the second moment about the mean is the sample variance. Pearson introduced the use of the third moment about the mean in calculating skewness and the fourth moment about the mean in the calculation of kurtosis.
Uses of Moments In Statistics-
The central question in statistics is that given a set of data, we would like to recover the random process that produced the data (that is, the probability law of the population). This question is extremely difficult in general and in the absence of strong assumptions on the underlying random process you really can’t get very far (those who work in nonparametric statistics may disagree with me on this). A natural way to approach this problem would be to look for simple objects that do identify the population distribution if we do make some reasonable assumptions.
The question then becomes what type of objects should we search for. The best arguments I know about why we should look at the Laplace (or Fourier; I’ll show you what this is in a second if you don’t know) transform of the probability measure are a bit complicated, but naively we can draw a good heuristic from elementary calculus: given all the derivatives of an analytic function evaluated at zero we know everything there is to know about the function through its Taylor series.
Suppose for a moment that the function f(t)=E[etX] exists and is well behaved in a neighborhood of zero. It is a theorem that this function (when it exists and behaves nicely) uniquely identifies the probability law of the random variable XX. If we do a Taylor expansion of what is inside the expectation, this becomes a power series in the moments of XX: X: and so to completely identify the law of XX we just need to know the population moments. In effect we reduce the question above “identify the population law of XX” to the question “identify the population moments of XX”.
It turns out that (from other statistics) population moments are extremely well estimated by sample moments when they exist, and you can even get a good feel on how far off from the true moments it is possible to be under some often realistic assumptions. Of course we can never get infinitely many moments with any degree of accuracy from a sample, so now we would really want to do another round of approximations, but that is the general idea. For nice random variables, moments are sufficient to estimate the sample law.
I should mention that what I have said above is all heuristic and doesn’t work in most interesting modern examples. In truth, I think the right answer to your question is that we don’t need moments because for many relevant applications (particularly in economics) it seems unlikely that all moments even exist. The thing is that when you get rid of moment assumptions you lose an enormous amount of information and power: without at least two, the Central Limit Theorem fails and with it go most of the elementary statistical tests. If you do not want to work with moments, there is a whole theory of nonparametric statistics that make no assumptions at all on the random process.
SSC CGL Tier 2 JSO / Paper 3 Statistics Study Material Day 6
SSC CGL Tier 2 JSO / Paper 3 Statistics Study Material Day 6
The standard deviation (often SD) is a measure of variability. When we calculate the standard deviation of a sample, we are using it as an estimate of the variability of the population from which the sample was drawn. For data with a normal distribution, about 95% of individuals will have values within 2 standard deviations of the mean, the other 5% being equally scattered above and below these limits. Contrary to popular misconception, the standard deviation is a valid measure of variability regardless of the distribution. About 95% of observations of any distribution usually fall within the 2 standard deviation limits, though those outside may all be at one end. We may choose a different summary statistic, however, when data have a skewed distribution.
The standard deviation is the most important and widely used measure of dispersion. This is also known as root mean square deviation, since it is the square root of the mean of the squared deviation of the sizes from arithmetic mean.
The standard deviation measures the absolute dispersion or variability of the distribution. The greater the amount of dispersion, the greater is the standard deviation.
Thus, standard deviation is defined as the positive square root of the squares of the deviation of all the sizes from their arithmetic mean.
Standard Deviation for Ungrouped Data
Standard Deviation for Frequency Distribution
Formula for computing standard deviation for a frequency distribution is
SSC CGL Tier 2;Paper 3 [Statistics] Study Material Day 6
Merits and Demerits of Standard Deviation
Its most important beauty is that it is free from the of Its most important beauty is that it is free from the compulsion of taking only absolute value in estimating mean deviation. So it is frequently applicable in different algebraic operations.
It took into account all individual observations and so any slight variation in any observations automatically got representation in standard deviation.
Through variance it easily reflects the aberration in data series.
It is the basis of relative measure of dispersion coefficient of variation (CV).
It is also an absolute measure of dispersion and so comparisons of data series in different units of comparisons of data series in different units of measurement are not tenable.
Its value changes if unit of measurement changed.
In a normal distribution, data are symmetrically distributed around mean(mean, median or mode all become identical) and mean σ covers 68.27 per cent of observations; mean 2σ covers 95 45 per cent of observations; mean 2σ covers 95.45 per cent of observations and mean 3σ covers 99.73 per cent of observations. This property is useful in dividing a data series into suitable groups or class.
Measures of Relative Dispersion
The relative dispersion of a data set, more commonly referred to as its coefficient of variation, is the ratio of its standard deviation to its arithmetic mean. In effect, it is a measurement of the degree by which an observed variable deviates from its average value. It is a useful measurement in applications such as comparing stocks and other investment vehicles because it is a way to determine the risk involved with the holdings in your portfolio.
Determine the arithmetic mean of your data set by adding all of the individual values of the set together and dividing by the total number of values.
Square the difference between each individual value in the data set and the arithmetic mean.
Add all of the squares calculated in Step 2 together.
Divide your result from Step 3 by the total number of values in your data set. You now have the variance of your data set.
Calculate the square root of the variance calculated in Step 4. You now have the standard deviation of your data set.
Divide the standard deviation calculated in Step 5 by the absolute value of the arithmetic mean calculated in Step 1.