READy Exercise Part 3

Warning: package 'here' was built under R version 4.4.3
Warning: package 'knitr' was built under R version 4.4.3

Authors

Author affiliations

  1. University of Georgia, Athens, GA, USA.

\(\dagger\) Disclaimer: The opinions expressed in this article are the author’s own and don’t reflect their employer.

1 Summary/Abstract

This project examines the relationship between individuals height, weight, gender, age, and blood type to identify any significant correlations between these variables.

Linear Models were created using these variables to try to identify significant findings. Multiple tables and figures display these relationships.

2 Introduction

2.1 Description of data and data source

There are five variables in this dataset. Height and weight are numerical values. Gender is a categorical value written as M, F, or O.

The Blood Type variable denotes the blood type symbol (A, B, AB, or O) that the individual has. The Age variable denotes the individual’s age in years as a numeric value, which was randomly generated from 18-80.

3 Methods

3.1 Data acquisition

The raw data of Height, Weight, and Gender was given by our professor Dr. Andreas Handel, while the Age variable was generated using a random number generator between 18-80, and the blood type variable was randomly assigned in a A, B, AB, O cycling order.

3.2 Data import and cleaning

The data was cleaned by removing any NA variables, and converting all values into appropriate data types.

3.3 Statistical analysis

The analysis was done by fitting multiple linear models with height as the outcome with varying predictor variables. The summaries of these models were recorded and observed.

4 Results

4.1 Exploratory/Descriptive analysis

For Exploratory analysis, we made plots and tables for the most interesting/important quantities in the data.

Table 1 shows a summary of the data.

Table 1: Data table summary table.
skim_type skim_variable n_missing complete_rate character.min character.max character.empty character.n_unique character.whitespace factor.ordered factor.n_unique factor.top_counts numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
character Blood Type 0 1 1 2 0 4 0 NA NA NA NA NA NA NA NA NA NA NA
factor Gender 0 1 NA NA NA NA NA FALSE 3 M: 4, F: 3, O: 2 NA NA NA NA NA NA NA NA
numeric Height 0 1 NA NA NA NA NA NA NA NA 165.66667 15.97655 133 156 166 178 183 ▂▁▃▃▇
numeric Weight 0 1 NA NA NA NA NA NA NA NA 70.11111 21.24526 45 55 70 80 110 ▇▂▃▂▂
numeric Age 0 1 NA NA NA NA NA NA NA NA 43.55556 17.95906 21 25 44 61 63 ▆▂▂▁▇

Figure 1 shows a histogram of the Variable Height

Figure 1: Height Distribution.

Figure 2 shows a histogram of the variable Weight

Figure 2: Weight Distribution

Figure 3 shows a scatter plot of height as function of weight

Figure 3: height as function of weight

Figure 4 shows a scatter plot of height as function of weight, stratified by gender

Figure 4: height as function of weight, stratified by gender

Figure 5 is a boxplot with Blood Type on the x-axis, and height on the y-axis.

Figure 5: Boxplot with Blood Type on the x-axis, and height on the y-axis.

Figure 6 is a scatterplot with weight on the x-axis and Age on the y-axis

Figure 6: Scatterplot with weight on the x-axis and Age on the y-axis.

4.2 Basic statistical analysis

Linear Models were run using height as the outcome, and other variables as predictors to try to identify significant findings.

Figure 4 shows a scatterplot figure produced by one of the R scripts.

Figure 7: Height and weight stratified by gender.

4.3 Full analysis

Table 2 shows a summary of a fit linear model using height as outcome, weight as predictor.

Table 2: Linear model fit using height as outcome, weight as predictor table.
term estimate std.error statistic p.value
(Intercept) 149.6997661 19.7518528 7.5790240 0.0001285
Weight 0.2277371 0.2708841 0.8407177 0.4282860

Table 3 shows a summary of a linear model fit linear model using height as outcome, weight and gender as predictor.

Table 3: Linear model fit using height as outcome, weight and gender as predictor table.
term estimate std.error statistic p.value
(Intercept) 149.2726967 23.3823360 6.3839942 0.0013962
Weight 0.2623972 0.3512436 0.7470519 0.4886517
GenderM -2.1244913 15.5488953 -0.1366329 0.8966520
GenderO -4.7644739 19.0114155 -0.2506112 0.8120871

Table 4 shows a summary of a linear model fit linear model using height as outcome, age and blood type as predictor.

Table 4: Linear model fit using height as outcome, age and blood type as predictor table.
term estimate std.error statistic p.value
(Intercept) 149.2726967 23.3823360 6.3839942 0.0013962
Weight 0.2623972 0.3512436 0.7470519 0.4886517
GenderM -2.1244913 15.5488953 -0.1366329 0.8966520
GenderO -4.7644739 19.0114155 -0.2506112 0.8120871