Due to the constraints of time, bandwidth and knowledge, we opted to choose a limited number of variables from the complete set for the purposes of modeling. The list was reduced to 54 variables, based on research performed by the team that suggested certain elements that might be of most interest to the goal of predicting suicide attempts.
V6309 is a Yes/No response question, "Have you ever attempted suicide?"
To make use of the integer values and for purposes of readability, we needed to translate them to category lables. This information was available in the study code book, as well as via the ICPSR website.
Using Python, a script was created using BeautifulSoup to scrape all the table values for each variable and convert them into a dictionary format. The dictionary was then used to map the study data values to their respective category labels.
Final steps were performed to prepare the data for entry into the machine learning model: