Simple Linear Regression and Data Analysis

09/11/2023

Today we have gone through into a basics of Data Science

We have apprehended the simple linear regression concept and how to do data analysis which we are applying it on CDC diabetes data for predicting relationships between variables of OBESITY,INACTIVITY,DIABETES,FIPS and COUNTY.

So, it’s important to find the correlation among these variables.

Concept of Simple Linear Regression

Also learnt Simple Linear Regression is a statistical method to analyze two variables an independent variable as X and a dependent variable as Y.

 This will help to find the correlation between dependent and independent variable.

Basically, there are two types of Linear Regression as followed by

                      a)Simple Linear Regression 

                      b)Multiple Linear Regression

Mathematically the Simple linear regression relationship can be represented with the help of following equation:

Y = mX + b +  e

Here, Y  tells the Dependent Variable means trying to         predict.

         X signifies the Independent Variable means                     using to make predictions

         m is the slope of the regression line is the effect X has         on Y

         e is an error.

So, in the given dataset we need to predict diabetes using inactivity and obesity.

Overall. Y is diabetes which is dependent variable and X is obesity which is independent variable.

 Linear relationship can be stated as positive or negative in nature.

1)Positive Linear Relationship: In case of both independent and dependent variable increases.

2)Negative Linear Relationship: In case of both independent increases and dependent variable decreases.

When it comes to multiple regression the mathematical statement is

Y = β01X12 X2+…… +βnXn+e

where,

Y is the dependent variable

X1,X2, …, Xn are the independent variables

βis the intercept.

β1,β2, …,βn are the regression coefficients

e is the error term.

In this case we can also apply this method with our dataset which utilizes two independent variables as Inactivity and Obesity.

Learnt about statistical measures like heterogeneity of variance, kurtosis and about asymmetric distribution.

 

Leave a Reply

Your email address will not be published. Required fields are marked *