Here is short version of exploratory data analysis
1. Variable Identification (categorical, continuous, etc)
2. Univariate Analysis
a. categorical variable : Frequency of occurance (count). Bar chart for visualization
b. continuous variable: Mean, media, mode, min and max. Histogram for visualization
Ref: https://www.youtube.com/watch?v=wFabyCP54YA
3. Bi-variate Analysis
a. Continuous & Continuous: Scatter plot to find out Correlation
Correlation varies between -1 and +1.
-1: perfect negative linear correlation
+1:perfect positive linear correlation and
0: No correlation
b. Categorical & Categorical:
a. Two-way table: Have count and count% as metric
b. Stacked Column Chart:
c. Chi-Square Test: Need to read more on this but
Probability of 0: It indicates that both categorical variable are dependent
Probability of 1: It shows that both variables are independent.
c. Categorical & Continuous:
a. Z-Test/ T-Test:
b. ANOVA: It assesses whether the average of more than two groups is statistically different.
Ref: https://www.youtube.com/watch?v=IA0unflfvQE
https://www.youtube.com/watch?v=zdU8C8QEHH0
..To be continued...
Ref: http://www.analyticsvidhya.com/blog/2016/01/guide-data-exploration/
No comments:
Post a Comment