In project-1, our journey commenced with the crucial task of preprocessing and transforming a substantial dataset sourced from the Centers for Disease Control and Prevention (CDC). This dataset encompassed vital information on the rates of diabetes, obesity, and physical inactivity at the county level across the United States.
To facilitate a more insightful analysis, we adeptly merged these datasets using the FIPS code and year as common denominators. This amalgamation resulted in a consolidated dataset that served as the foundation for our comprehensive examination.
A pivotal facet of our investigation focused on elucidating the intricate relationship between the percentage of individuals with diabetes and the percentages of those grappling with obesity and physical inactivity. Through the adept application of linear regression, we crafted a predictive model designed to unveil the intricate connections between these health metrics. This endeavor necessitated the division of our data into training and testing sets, enabling us to rigorously assess the model’s performance by making precise predictions on the test set.
Furthermore, visualizing the correlations among these variables held paramount importance. In this regard, a sophisticated three-dimensional scatter plot was employed to offer a holistic depiction of their interplay. The ensuing insights and revelations enriched our understanding of the intricate web of associations between diabetes, obesity, and physical inactivity in the context of U.S. counties.