In other words, in the expression Y = μ( X) + ε, μ( X) specifies the location of the distribution, and ε captures its shape. It's important to realize that this term arises not because of any kind of error but because Y has a distribution for a given value of X. The deviation of Y from μ( X) is often called the error, ε = Y – μ( X). Here we are not interested in the shape of this distribution we care only about its mean. We can interpret this as Y having a distribution with mean μ( X) for any given value of X. In this case, E( Y| X) = μ( X) = β 0 + β 1 X, a line with intercept β 0 and slope β 1. The most basic regression relationship is a simple linear regression. Regression is a specific kind of association and may be linear or nonlinear ( Fig. If μ( X) varies with X, then we say that Y has a regression on X ( Fig. For a given value of X, we can estimate the average value of Y and write this as a conditional expectation E( Y| X), often written simply as μ( X). In simple regression, there is one independent variable, X, and one dependent variable, Y. We say there is a regression relationship between X and Y when the mean of Y varies with X. The predictor variable may also be randomly selected, but we treat it as fixed when making predictions (for example, predicted weight for someone of a given height). Typically, in correlation we sample both variables randomly from a population (for example, height and weight), and in regression we fix the value of the independent variable (for example, dose) and observe the response. One of the simplest prediction methods is linear regression, in which we attempt to find a 'best line' through the data points.Ĭorrelation and linear regression are closely linked-they both quantify trends. If we suspect a trend, we may want to attempt to predict the values of one variable using the values of the other. We discussed correlation as a type of association in which larger values of Y are associated with larger values of X (increasing trend) or smaller values of X (decreasing trend) 2. The values of a and b are found by solving these equations simultaneously.įor the line of regression of x on y, the "normal equations" are the same but with x and y swapped.We have previously defined association between X and Y as meaning that the distribution of Y varies with X. The "normal equations" for the line of regression of y on x are: We find these by solving the "normal equations". If the equation of the regression line is y = ax + b, we need to find what a and b are. This is a method of finding a regression line without estimating where the line should go by eye. If there is a perfect correlation between the data (in other words, if all the points lie on a straight line), then the two regression lines will be the same. The other is a line of regression of x on y, used to estimate x given y. The first is a line of regression of y on x, which can be used to estimate y given x. When there is a reasonable amount of scatter, we can draw two different regression lines depending upon which variable we consider to be the most accurate. You should make sure that your line passes through the mean point (the point (x,y) where x is mean of the data collected for the explanatory variable and y is the mean of the data collected for the response variable). If there is very little scatter (we say there is a strong correlation between the variables), a regression line can be drawn "by eye". We shall use "x" for the explanatory variable and "y" for the response variable, but we could have used any letters. The fixed/controlled variable is known as the explanatory or independent variable and the other variable is known as the response or dependent variable. In many experiments, one of the variables is fixed or controlled and the point of the experiment is to determine how the other variable varies with the first. There is no correlation if x and y do not appear to be related. We say that there is a positive linear correlation if y increases as x increases and we say there is a negative linear correlation if y decreases as x increases. Here is a scatter diagram with a regression line drawn in:Ĭorrelation is a term used to describe how strong the relationship between the two variables appears to be. "Linear" means that the function we are looking for is a straight line (so our function f will be of the form f(x) = mx + c for constants m and c). Of course, the points might not fit the function exactly but the aim is to get as close as possible. "Regression" is the process of finding the function satisfied by the points on the scatter diagram. A good way of doing this is by drawing a scatter diagram. between a person"s height and weight) by comparing data for each of these things. We often wish to look at the relationship between two things (e.g. This page looks at Linear Regression, Scatter diagrams and Correlation.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |