09/11/2023
Today we have gone through into a basics of Data Science
We have apprehended the simple linear regression concept and how to do data analysis which we are applying it on CDC diabetes data for predicting relationships between variables of OBESITY,INACTIVITY,DIABETES,FIPS and COUNTY.
So, it’s important to find the correlation among these variables.
Concept of Simple Linear Regression
Also learnt Simple Linear Regression is a statistical method to analyze two variables an independent variable as X and a dependent variable as Y.
This will help to find the correlation between dependent and independent variable.
Basically, there are two types of Linear Regression as followed by
a)Simple Linear Regression
b)Multiple Linear Regression
Mathematically the Simple linear regression relationship can be represented with the help of following equation:
Y = mX + b + e
Here, Y tells the Dependent Variable means trying to predict.
X signifies the Independent Variable means using to make predictions
m is the slope of the regression line is the effect X has on Y
e is an error.
So, in the given dataset we need to predict diabetes using inactivity and obesity.
Overall. Y is diabetes which is dependent variable and X is obesity which is independent variable.
Linear relationship can be stated as positive or negative in nature.
1)Positive Linear Relationship: In case of both independent and dependent variable increases.
2)Negative Linear Relationship: In case of both independent increases and dependent variable decreases.
When it comes to multiple regression the mathematical statement is
Y = β0 +β1X1+β2 X2+…… +βnXn+e
where,
Y is the dependent variable
X1,X2, …, Xn are the independent variables
β0 is the intercept.
β1,β2, …,βn are the regression coefficients
e is the error term.
In this case we can also apply this method with our dataset which utilizes two independent variables as Inactivity and Obesity.
Learnt about statistical measures like heterogeneity of variance, kurtosis and about asymmetric distribution.