Tips And Treats
Statistics And Probability Tips
Statistics is a broad mathematical discipline which studies ways to collect, summarize and draw conclusions from data. It is applicable to a wide variety of academic disciplines from the physical and social sciences to the humanities, as well as to business, government, and industry.
Once data is collected, either through a formal sampling procedure or by recording responses to treatments in an experimental setting (cf experimental design), or by repeatedly observing a process over time (time series), graphical and numerical summaries may be obtained using descriptive statistics.
Patterns in the data are modeled to draw inferences about the larger population, using inferential statistics to account for randomness and uncertainty in the observations. These inferences may take the form of answers to essentially yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), prediction of future observations, descriptions of association (correlation), or modeling of relationships (regression).
The framework described above is sometimes referred to as applied statistics. In contrast, mathematical statistics (or simply statistical theory) is the subdiscipline of applied mathematics which uses probability theory and analysis to place statistical practice on a firm theoretical basis.
The word probability derives from the Latin probare (to prove, or to test). Informally, probable is one of several words applied to uncertain events or knowledge, being more or less interchangeable with likely, risky, hazardous, uncertain, and doubtful, depending on the context. Chance, odds, and bet are other words expressing similar notions. As with the theory of mechanics which assigns precise definitions to such everyday terms as work and force, so the theory of probability attempts to quantify the notion of probable
Last Updated - 8th December 2005
Experimental and observational studies
- A common goal for a statistical research project is to investigate causality, and in particular to draw a conclusion on the effect of changes in the values of predictors or independent variables on a response or dependent variable. There are two major types of causal statistical studies, experimental studies and observational studies. In both types of studies, the effect of changes of an independent variable (or variables) on the behavior of the dependent variable are observed. The difference between the two types is in how the study is actually conducted.
- An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation may have modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation. Instead data is gathered and correlations between predictors and the response are investigated.
- An example of an experimental study is the famous Hawthorne studies which attempted to test changes to the working environment at the Hawthorne plant of the Western Electric Company. The researchers were interested in whether increased illumination would increase the productivity of the assembly line workers. The researchers first measured productivity in the plant then modified the illumination in an area of the plant to see if changes in illumination would affect productivity. Due to errors in experimental procedures, specifically the lack of a control group, the researchers while unable to do what they planned were able to provide the world with the Hawthorne effect.
- An example of an observational study is a study which explores the correlation between smoking and lung cancer. This type of study typically uses a survey to collect observations about the area of interest and then perform statistical analysis. In this case, the researchers would collect observations of both smokers and non-smokers and then look at the number of cases of lung cancer in each group.
The basic steps for an experiment are to:
- plan the research including determining information sources, research subject selection, and ethical considerations for the proposed research and method,
- design the experiment concentrating on the system model and the interaction of independent and dependent variables,
- summarize a collection of observations to feature their commonality by suppressing details (descriptive statistics),
- reach consensus about what the observations tell us about the world we observe (statistical inference),
- document and present the results of the study.
Levels of measurement
There are four types of measurements or measurement scales used in statistics. The four types or levels of measurement (ordinal, nominal, interval, and ratio) have different degrees of usefulness in statistical research. Ratio measurement, where both a zero value and distances between different measurements are defined, provide the greatest flexibility in statistical methods that can be used for analysing the data. Interval measurement, with meaningful distances between measurements but no meaningful zero value (such as IQ measurements or temperature measurements in degrees Celsius), is also used in statistical research.
Some well known statistical tests and procedures for research observations are:
- Student's t-test
- analysis of variance (ANOVA)
- Mann-Whitney U
- regression analysis
- Fischer's Least Significant Difference test
- Pearson product-moment correlation coefficient
- Spearman's rank correlation coefficient
The general idea of probability is often divided into two related concepts:
- Aleatory probability , which represents the likelihood of future events whose occurrence is governed by some random physical phenomenon. This concept can be further divided into -physical phenomena that are predictable, in principle, with sufficient information, and phenomena which are essentially unpredictable. Examples of the first kind include tossing dice or spinning a roulette wheel, and an example of the second kind is radioactive decay.
- Epistemic probability , which represents our uncertainty about propositions when one lacks complete knowledge of causative circumstances. Such propositions may be about past or future events, but need not be. Some examples of epistemic probability are to assign a probability to the proposition that a proposed law of physics is true, and to determine how "probable" it is that a suspect committed a crime, based on the evidence presented.
It is an open question whether aleatory probability is reducible to epistemic probability based on our inability to precisely predict every force that might affect the roll of a die, or whether such uncertainties exist in the nature of reality itself, particularly in quantum phenomena governed by Heisenberg's uncertainty principle. Although the same mathematical rules apply regardless of which interpretation is chosen, the choice has major implications for the way in which probability is used to model the real world.
Formalization of probability
Like other theories, the theory of probability is a representation of probabilistic concepts in formal terms -- that is, in terms that can be considered separately from their meaning. These formal terms are manipulated by the rules of mathematics and logic, and any results are then interpreted or translated back into the problem domain.
There have been at least two successful attempts to formalize probability, namely the Kolmogorov formulation and the Cox formulation. In Kolmogorov's formulation, sets are interpreted as events and probability itself as a measure on a class of sets. In Cox's formulation, probability is taken as a primitive (that is, not further analyzed) and the emphasis is on constructing a consistent assignment of probability values to propositions. In both cases, the laws of probability are the same, except for technical details:
- a probability is a number between 0 and 1;
- the probability of an event or proposition and its complement must add up to 1; and
- the joint probability of two events or propositions is the product of the probability of one of them and the probability of the second, conditional on the first.
Representation and interpretation of probability values
The probability of an event is generally represented as a real number between 0 and 1, inclusive. An impossible event has a probability of exactly 0, and a certain event has a probability of 1, but the converses are not always true: probability 0 events are not always impossible, nor probability 1 events certain.
Most probabilities that occur in practice are numbers between 0 and 1, indicating the event's position on the continuum between impossibility and certainty. The closer an event's probability is to 1, the more likely it is to occur.
For example, if two mutually exclusive events are assumed equally probable, such as a flipped coin landing heads-up or tails-up, we can express the probability of each event as "1 in 2", or, equivalently, "50%" or "1/2".
Probabilities are equivalently expressed as odds, which is the ratio of the probability of one event to the probability of all other events. The odds of heads-up, for the tossed coin, are (1/2)/(1 - 1/2), which is equal to 1/1. This is expressed as "1 to 1 odds" and often written "1:1".
Odds a:b for some event are equivalent to probability a/(a+b). For example, 1:1 odds are equivalent to probability 1/2, and 3:2 odds are equivalent to probability 3/5.
There remains the question of exactly what can be assigned probability, and how the numbers so assigned can be used; this is the question of probability interpretations. There are some who claim that probability can be assigned to any kind of an uncertain logical proposition; this is the Bayesian interpretation. There are others who argue that probability is properly applied only to random events as outcomes of some specified random experiment, for example sampling from a population; this is the frequentist interpretation. There are several other interpretations which are variations on one or the other of those, or which have less acceptance at present.
A probability distribution is a function that assigns probabilities to events or propositions. For any set of events or propositions there are many ways to assign probabilities, so the choice of one distribution or another is equivalent to making different assumptions about the events or propositions in question.
There are several equivalent ways to specify a probability distribution. Perhaps the most common is to specify a probability density function. Then the probability of an event or proposition is obtained by integrating the density function. The distribution function may also be specified directly. In one dimension, the distribution function is called the cumulative distribution function. Probability distributions can also be specified via moments or the characteristic function, or in still other ways.
A distribution is called a discrete distribution if it is defined on a countable, discrete set, such as a subset of the integers. A distribution is called a continuous distribution if it has a continuous distribution function, such as a polynomial or exponential function. Most distributions of practical importance are either discrete or continuous, but there are examples of distributions which are neither.
Important discrete distributions include the discrete uniform distribution, the Poisson distribution, the binomial distribution, the negative binomial distribution and the Maxwell-Boltzmann distribution.
Important continuous distributions include the normal distribution, the gamma distribution, the Student's t-distribution, and the exponential distribution.
Probability in mathematics
Probability axioms form the basis for mathematical probability theory. Calculation of probabilities can often be determined using combinatorics or by applying the axioms directly. Probability applications include even more than statistics, which is usually based on the idea of probability distributions and the central limit theorem.
To give a mathematical meaning to probability, consider flipping a "fair" coin. Intuitively, the probability that heads will come up on any given coin toss is "obviously" 50%; but this statement alone lacks mathematical rigor - certainly, while we might expect that flipping such a coin 10 times will yield 5 heads and 5 tails, there is no guarantee that this will occur; it is possible for example to flip 10 heads in a row. What then does the number "50%" mean in this context?
One approach is to use the law of large numbers. In this case, we assume that we can perform any number of coin flips, with each coin flip being independent - that is to say, the outcome of each coin flip is unaffected by previous coin flips. If we perform N trials (coin flips), and let NH be the number of times the coin lands heads, then we can, for any N, consider the ratio NH/N.
As N gets larger and larger, we expect that in our example the ratio NH/N will get closer and closer to 1/2. This allows us to "define" the probability Pr(H) of flipping heads as the limit (mathematics), as N approaches infinity, of this sequence of ratios:
In actual practice, of course, we cannot flip a coin an infinite number of times; so in general, this formula most accurately applies to situations in which we have already assigned an a priori probability to a particular outcome (in this case, our assumption that the coin was a "fair" coin). The law of large numbers then says that, given Pr(H), and any arbitrarily small number e, there exists some number n such that for all N > n,
In other words, by saying that "the probability of heads is 1/2", we mean that, if we flip our coin often enough, eventually the number of heads over the number of total flips will become arbitrarily close to 1/2; and will then stay at least as close to 1/2 for as long as we keep performing additional coin flips.
Note that a proper definition requires measure theory which provides means to cancel out those cases where the above limit does not provide the "right" result or is even undefined by showing that those cases have a measure of zero.
The a priori aspect of this approach to probability is sometimes troubling when applied to real world situations. For example, if you flip a coin which keeps coming up heads over and over again, a hundred times. You can't decide whether this is just a random event - after all, it is possible (although unlikely) that a fair coin would give this result - or whether your assumption that the coin is fair is at fault.
Remarks on probability calculations
The difficulty of probability calculations lie in determining the number of possible events, counting the occurrences of each event, counting the total number of possible events. Especially difficult is drawing meaningful conclusions from the probabilities calculated. An amusing probability riddle, the Monty Hall problem demonstrates the pitfalls nicely.
Applications of probability theory to everyday life
A major effect of probability theory on everyday life is in risk assessment and in trade on commodity markets. Governments typically apply probability methods in environment regulation where it is called "pathway analysis", and are often measuring well-being using methods that are stochastic in nature, and choosing projects to undertake based on their perceived probable effect on the population as a whole, statistically. It is not correct to say that statistics are involved in the modelling itself, as typically the assessments of risk are one-time and thus require more fundamental probability models, e.g. "the probability of another 9/11". A law of small numbers tends to apply to all such choices and perception of the effect of such choices, which makes probability measures a political matter.
A good example is the effect of the perceived probability of any widespread Middle East conflict on oil prices - which have ripple effects in the economy as a whole. An assessment by a commodity trade that a war is more likely vs. less likely sends prices up or down, and signals other traders of that opinion. Accordingly, the probabilities are not assessed independently nor necessarily very rationally. The theory of behavioral finance emerged to describe the effect of such groupthink on pricing, on policy, and on peace and conflict.
It can reasonably be said that the discovery of rigorous methods to assess and combine probability assessments has had a profound effect on modern society. A good example is the application of game theory, itself based strictly on probability, to the Cold War and the mutual assured destruction doctrine. Accordingly, it may be of some importance to most citizens to understand how odds and probability assessments are made, and how they contribute to reputations and to decisions, especially in a democracy.
Another significant application of probability theory in everyday life is reliability. Many consumer products, such as automobiles and consumer electronics, utilize reliability theory in the design of the product in order to reduce the probability of failure. The probability of failure is also closely associated with the product's warranty.
Some sciences use applied statistics so extensively that they have specialized terminology. These disciplines include:
- Business statistics
- Data mining (applying statistics and pattern recognition to discover knowledge from data)
- Economic statistics (Econometrics)
- Engineering statistics
- Statistical physics
- Psychological statistics
- Social statistics (for all the social sciences)
- Statistical literacy
- Process analysis and chemometrics (for analysis of data from analytical chemistry and chemical engineering)
- Reliability engineering
- Statistics in various sports, particularly baseball and cricketStatistics form a key basis tool in business and manufacturing as well. It is used to understand measurement systems variability, control processes (as in statistical process control or SPC), for summarizing data, and to make data-driven decisions. In these roles it is a key tool, and perhaps the only reliable tool.
- Modern statistics is supported by computers to perform some of the very large and complex calculations required.
- Whole branches of statistics have been made possible by computing, for example neural networks.
- The computer revolution has implications for the future of statistics, with a new emphasis on 'experimental' and 'empirical' statistics.
One of the most Important Application of Statics and Probability with computers is Simulation
A simulation is an imitation of some real device or state of affairs. Simulation attempts to represent certain features of the behavior of a physical or abstract system by the behavior of another system.
Simulation is used in many contexts, including the modeling of natural systems, and human systems to gain insight into the operation of those systems; and simulation in technology and safety engineering where the goal is to test some real-world practical scenario. Simulation, using a simulator or otherwise experimenting with a fictitious situation can show the eventual real effects of some possible conditions.
Physical and Interactive simulation
- Physical simulation refers to simulation in which physical objects are substituted for the real thing, these physical objects are often chosen because they are smaller or cheaper, than the actual object or system.
- Interactive simulation, which is a special kind of physical simulation, and often referred to as human in the loop simulations, are physical simulations that include humans, such as the model used in a flight simulator.
Simulation in training
Simulation is often used in the training of civilian and military personnel. This usually occurs when it is prohibitively expensive or simply too dangerous to allow trainees to use the real equipment in the real world. In such situations they will spend time learning valuable lessons in a "safe" virtual environment. Often the convenience is to permit mistakes during training for a safety-critical system.
Training simulations typically come in one of four categories:
- "live" simulation (where real people use simulated (or "dummy") equipment in the real world);
- "virtual" simulation (where real people use simulated equipment in a simulated world (or "virtual environment")), or
- "constructive" simulation (where simulated people use simulated equipment in a simulated environment). Constructive simulation is often referred to as "wargaming" since it bears some resemblance to table-top war games in which players command armies of soldiers and equipment which move around a board.
- Role play simulation (where real people take on the persona of a virtual work)
Medical simulators are increasingly being developed and deployed to teach therapeutic and diagnostic procedures as well as medical concepts and decision making to personnel in the health professions. Simulators have been developed for training procedures ranging from the basics such as blood draw, to laparoscopic surgery and trauma care. Many medical simulators involve a computer connected to a plastic simulation of the relevant anatomy. In others, computer graphics reproduces all visual components and tool handles reproduce haptic aspects of the task. Some contain computer graphics simulations of imagery such as X-ray or other medical images. Some patient simulators employ a life size mannequin which responds to injected drugs and can be programmed to create simulations of life-threatening emergencies. Some medical simulations are disseminated via the web and can be interacted with using standard web browsers They are currently limited to screenbased simulations where users interact with the simulation via standard pointing devices.
A flight simulator is used to train pilots on the ground. It permits a pilot to crash his simulated "aircraft" without being hurt. Flight simulators are often used to train pilots to operate aircraft in extremely hazardous situations, such as landings with no engines, or complete electrical or hydraulic failures. The most advanced simulators have high-fidelity visual systems and hydraulic motion systems. The simulator is normally cheaper to operate than a real trainer aircraft.
Simulation and games
Many video games are also simulators, implemented inexpensively. These are sometimes called "sim games". Such games can simulate various aspects of reality, from economics to piloting vehicles, such as flight simulators.
Simulation is an important feature when engineering systems. For example in electrical engineering, delay lines may be used to simulate propagation delay and phase shift caused by an actual transmission line. Similarly, dummy loads may be used to simulate impedance without simulating propagation, and is used in situations where propagation is unwanted. A simulator may imitate only a few of the operations and functions of the unit it simulates. Contrast with: emulate.
Most engineering simulations entail mathematical modeling and computer assisted investigation. There are many cases, however, where mathematical modeling is not reliable. Simulation of fluid dynamics problems often require both mathematical and physical simulations. In these cases the physical models require dynamic similitude.
Computer simulation, has become a useful part of modeling many natural systems in physics, chemistry and biology, and human systems in economics and social science (the computational sociology) as well as in engineering to gain insight into the operation of those systems. A good example of the usefulness of using computers to simulate can be found in the field of network traffic simulation. In such simulations the model behaviour will change each simulation according to the set of initial parameters assumed for the environment. Computer simulations are often considered to be human out of the loop simulations.
Traditionally, the formal modeling of systems has been via a mathematical model, which attempts to find analytical solutions to problems which enables the prediction of the behaviour of the system from a set of parameters and initial conditions. Computer simulation is often used as an adjunct to, or substitution for, modeling systems for which simple closed form analytic solutions are not possible. There are many different types of computer simulation, the common feature they all share is the attempt to generate a sample of representative scenarios for a model in which a complete enumeration of all possible states of the model would be prohibitive or impossible.
It is increasingly common to hear simulations of many kinds referred to as "synthetic environments". This label has been adopted to broaden the definition of "simulation" to encompass virtually any computer-based representation.
Simulation in computer science
In computer programming, a simulator is often used to execute a program that has to run on some inconvenient type of computer. For example, simulators are usually used to debug a microprogram. Since the operation of the computer is simulated, all of the information about the computer's operation is directly available to the programmer, and the speed and execution of the simulation can be varied at will.
Simulators may also be used to interpret fault trees, or test VLSI logic designs before they are constructed. In theoretical computer science the term simulation represents a relation between state transition systems. This is useful in the study of operational semantics.
Simulation in education
Simulations in education are somewhat like training simulations. They focus on specific tasks. In the past,video has been used for teachers and education students to observe, problem solve and role play; however, a more recent use of simulations in education include animated narrative vignettes (ANV). ANVs are cartoon-like video narratives of hypothetical and reality based stories involving classroom teaching and learning. ANVs have been used to assess knowledge, problem solving skills and dispositions of children, pre-service and in-service teachers.
Another form of simulation has been finding favour in business education in recent years. Business simulations that incorporate a dynamic model enables experimentation with business strategies in a risk free environment and provide a useful extension to case study discussions.
Disclaimer: The Statistics And Probability Tips / Information presented and opinions expressed herein are those of the authors and do not necessarily represent the views of Tips And Treats . com and/or its partners.
© Tips And Treats. An Information Based Website (2005-2015)