Results & Application

From our model results, the strongest positive predictors were "Have you thought about suicide - Yes" (V6301_YES), "Have you made a suicide plan? - Yes (V6305_YES), and "Irritability - Very" (V5112_VERY). This was expected based on the subject matter of these variables.

The strongest negative predictors were both indicators of highly educated persons, "EDCAT_13-15 Years" and "EDCAT_16+ Years".

For future use of the model, a user would need to select their threshold to use for recall of the classifier. We saw that with a threshold of -2, we could reach 82% recall with a precision of 39%. Essentially, increasing recall will decrease precision. Additional work should be performed to optimize the threshold for future use.

Model Variable Coeffients

Coefficient Input Variable
1.405703812270731 V6301_YES
0.3352223221274035 V6305_YES
0.2671181245853438 V5112_VERY
0.22031080567473357 MANLT1_Present
0.19490744465606258 AGOLT2_Present
0.187354920012867 GADLT2_Present
0.18677025125436397 RACE_Other
0.16174506120081208 V5114_NOT AT ALL
0.16002889217051564 V4428_NOT TRUE AT ALL
0.1574067835723938 DEPLT1_Present
0.14607985652706781 V6143_YES
0.14306933488338724 V5114_VERY
0.11597610119923091 CDLT_Present
0.10503733279461921 V6649_YES
0.10190670082093853 V4428_SOMEWHAT TRUE
0.09376439874700598 EMP_Other
0.08748741857235821 DRGDLT_Present
0.08444230263392431 V4433_VERY TRUE
0.08291378285814878 MARSTAT_Marswd
0.06514503860507423 V5114_SOMEWHAT
0.06350514837099 V102_FAIR
0.06311551277404878 EMP_Working, incl. temp. laid off, matern./s
0.062011612294565255 DYSLT2_Present
0.05925450443116165 V4433_nan
0.05428055300577384 V5118_SOMEWHAT
0.05082971973728185 V6749_YES
0.048182940687969814 REL_Other
0.04603511160559177 V5118_NOT AT ALL
0.04050708922655088 V6649_INAP
0.04036439411734284 V102_POOR
0.028841834091414605 V4428_VERY TRUE
0.02769841821160808 V6215_ONLY ONE YES RESPONSE IN U1-U12
0.02345695425121209 DEPLT2_Present
0.021070168240737604 BP1LT1_Present
0.015240909997301032 V4433_SOMEWHAT TRUE
0.008586678359335631 V6126_NO
-0.031782700925540024V101_VERY GOOD
-0.04331059762929013 V5113_nan
-0.04409823261316062REL_No Preference, None
-0.07581536683103464V5113_NOT AT ALL
-0.08598988794640758V4433_NOT TRUE AT ALL
-0.08725683116114903V5115_NOT AT ALL
-0.12914883253025136V5225_NOT AT ALL
-0.17628401943347866EDCAT_12 Years
-0.18033489401509195V102_VERY GOOD
-0.4396296944457304EDCAT_13-15 Years
-0.7817613297772439EDCAT_16+ Years
Future Ideas for Extending the Analysis
  • More Data! Include additional fields as well as addition cases to test/train
  • More current data. Our dataset was based on a study in 1991
  • Additional data on demographics as inputs for the
  • Error analysis to figure out why classifiers may be contributing to false positives and false negatives.
  • Optimizing a way to scrape the category labels, given the volume involved.