How Accurately Can Cause of Death be Predicted With Minimal Data?

  • Individual’s Age Group
  • Individual’s Race
  • Cause of Death
I was not able to beat my baseline accuracy score using a decision tree.
I was not able to beat my baseline using a linear model either.

About the Model

Our model’s three most important features are if an individual is of Native or African descent, and if they’re between the ages of 20 and 24.
We can see that there’s no abnormal density between races that show any clear trends between an individual’s race and the classification of their cause of death.

What I Would Do Differently Next Time

If I were to repeat this, I would like to have at minimum economic data, as well as classify the individual’s specific cause of death. However, this is computationally expensive, considering how many different causes of death there are. Another helpful feature may have been the individual’s specific location (for example, county of residence — an individual who resides in a county where they are of a disproportionate racial population may increase issues on a social level, leading to higher levels of stress and thus, health concerns). Regardless of this even, there are factors that simply cannot be predicted with this data; genetic, social, environmental, etc.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Kenneth T. Barrett

Kenneth T. Barrett

Data Science / Machine Learning student with Lambda School with a passion for helping others.