Warning: package 'here' was built under R version 4.4.3
Warning: package 'knitr' was built under R version 4.4.3
Warning: package 'here' was built under R version 4.4.3
Warning: package 'knitr' was built under R version 4.4.3
Authors
Author affiliations
\(\dagger\) Disclaimer: The opinions expressed in this article are the author’s own and don’t reflect their employer.
This project examines the relationship between individuals height, weight, gender, age, and blood type to identify any significant correlations between these variables.
Linear Models were created using these variables to try to identify significant findings. Multiple tables and figures display these relationships.
There are five variables in this dataset. Height and weight are numerical values. Gender is a categorical value written as M, F, or O.
The Blood Type variable denotes the blood type symbol (A, B, AB, or O) that the individual has. The Age variable denotes the individual’s age in years as a numeric value, which was randomly generated from 18-80.
The raw data of Height, Weight, and Gender was given by our professor Dr. Andreas Handel, while the Age variable was generated using a random number generator between 18-80, and the blood type variable was randomly assigned in a A, B, AB, O cycling order.
The data was cleaned by removing any NA variables, and converting all values into appropriate data types.
The analysis was done by fitting multiple linear models with height as the outcome with varying predictor variables. The summaries of these models were recorded and observed.
For Exploratory analysis, we made plots and tables for the most interesting/important quantities in the data.
Table 1 shows a summary of the data.
| skim_type | skim_variable | n_missing | complete_rate | character.min | character.max | character.empty | character.n_unique | character.whitespace | factor.ordered | factor.n_unique | factor.top_counts | numeric.mean | numeric.sd | numeric.p0 | numeric.p25 | numeric.p50 | numeric.p75 | numeric.p100 | numeric.hist |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| character | Blood Type | 0 | 1 | 1 | 2 | 0 | 4 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| factor | Gender | 0 | 1 | NA | NA | NA | NA | NA | FALSE | 3 | M: 4, F: 3, O: 2 | NA | NA | NA | NA | NA | NA | NA | NA |
| numeric | Height | 0 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | 165.66667 | 15.97655 | 133 | 156 | 166 | 178 | 183 | ▂▁▃▃▇ |
| numeric | Weight | 0 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | 70.11111 | 21.24526 | 45 | 55 | 70 | 80 | 110 | ▇▂▃▂▂ |
| numeric | Age | 0 | 1 | NA | NA | NA | NA | NA | NA | NA | NA | 43.55556 | 17.95906 | 21 | 25 | 44 | 61 | 63 | ▆▂▂▁▇ |
Figure 1 shows a histogram of the Variable Height
Figure 2 shows a histogram of the variable Weight
Figure 3 shows a scatter plot of height as function of weight
Figure 4 shows a scatter plot of height as function of weight, stratified by gender
Figure 5 is a boxplot with Blood Type on the x-axis, and height on the y-axis.
Figure 6 is a scatterplot with weight on the x-axis and Age on the y-axis
Linear Models were run using height as the outcome, and other variables as predictors to try to identify significant findings.
Figure 4 shows a scatterplot figure produced by one of the R scripts.
Table 2 shows a summary of a fit linear model using height as outcome, weight as predictor.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 149.6997661 | 19.7518528 | 7.5790240 | 0.0001285 |
| Weight | 0.2277371 | 0.2708841 | 0.8407177 | 0.4282860 |
Table 3 shows a summary of a linear model fit linear model using height as outcome, weight and gender as predictor.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 149.2726967 | 23.3823360 | 6.3839942 | 0.0013962 |
| Weight | 0.2623972 | 0.3512436 | 0.7470519 | 0.4886517 |
| GenderM | -2.1244913 | 15.5488953 | -0.1366329 | 0.8966520 |
| GenderO | -4.7644739 | 19.0114155 | -0.2506112 | 0.8120871 |
Table 4 shows a summary of a linear model fit linear model using height as outcome, age and blood type as predictor.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 149.2726967 | 23.3823360 | 6.3839942 | 0.0013962 |
| Weight | 0.2623972 | 0.3512436 | 0.7470519 | 0.4886517 |
| GenderM | -2.1244913 | 15.5488953 | -0.1366329 | 0.8966520 |
| GenderO | -4.7644739 | 19.0114155 | -0.2506112 | 0.8120871 |