Results & Application


From our model results, the strongest positive predictors were "Have you thought about suicide - Yes" (V6301_YES), "Have you made a suicide plan? - Yes (V6305_YES), and "Irritability - Very" (V5112_VERY). This was expected based on the subject matter of these variables.

The strongest negative predictors were both indicators of highly educated persons, "EDCAT_13-15 Years" and "EDCAT_16+ Years".

For future use of the model, a user would need to select their threshold to use for recall of the classifier. We saw that with a threshold of -2, we could reach 82% recall with a precision of 39%. Essentially, increasing recall will decrease precision. Additional work should be performed to optimize the threshold for future use.

Model Variable Coeffients


Coefficient Input Variable
1.405703812270731 V6301_YES
0.3352223221274035 V6305_YES
0.2671181245853438 V5112_VERY
0.22031080567473357 MANLT1_Present
0.19490744465606258 AGOLT2_Present
0.187354920012867 GADLT2_Present
0.18677025125436397 RACE_Other
0.16174506120081208 V5114_NOT AT ALL
0.16002889217051564 V4428_NOT TRUE AT ALL
0.1574067835723938 DEPLT1_Present
0.14607985652706781 V6143_YES
0.14306933488338724 V5114_VERY
0.11597610119923091 CDLT_Present
0.10503733279461921 V6649_YES
0.10190670082093853 V4428_SOMEWHAT TRUE
0.09376439874700598 EMP_Other
0.08748741857235821 DRGDLT_Present
0.08444230263392431 V4433_VERY TRUE
0.08291378285814878 MARSTAT_Marswd
0.06514503860507423 V5114_SOMEWHAT
0.06350514837099 V102_FAIR
0.06311551277404878 EMP_Working, incl. temp. laid off, matern./s
0.062011612294565255 DYSLT2_Present
0.05925450443116165 V4433_nan
0.05428055300577384 V5118_SOMEWHAT
0.05082971973728185 V6749_YES
0.048182940687969814 REL_Other
0.04603511160559177 V5118_NOT AT ALL
0.04050708922655088 V6649_INAP
0.04036439411734284 V102_POOR
0.028841834091414605 V4428_VERY TRUE
0.02769841821160808 V6215_ONLY ONE YES RESPONSE IN U1-U12
0.02345695425121209 DEPLT2_Present
0.021070168240737604 BP1LT1_Present
0.015240909997301032 V4433_SOMEWHAT TRUE
0.008586678359335631 V6126_NO
0.005086174004186556PTLT_Present
0.0016084761572354507ALCALT2_Present
0.0013714895414490762ASPLT1_Present
-4.3792614336451575e-09ALCALT1_Present
-6.189127774912484e-05PDLT_Present
-0.0022437033962745445BP1LT2_Present
-0.0026680125780015793GADLT1_Present
-0.0078017820139745335V6749_NO
-0.008203806858502033ALCDLT_Present
-0.012548959620373385V5918_RARELY
-0.014238671801721343DYSLT1_Present
-0.019657947337533187V101_GOOD
-0.01971568015411971REL_Protestant
-0.022318621559910382V102_GOOD
-0.024668097014764495V5115_nan
-0.027327036902896503V5918_OFTEN
-0.028427557945102683DRGALT2_Present
-0.031782700925540024V101_VERY GOOD
-0.03180425889168251V6649_nan
-0.04131970378340204NAPLT_Present
-0.04317463541935023V6749_INAP
-0.04331059762929013 V5113_nan
-0.04409823261316062REL_No Preference, None
-0.045512862368752564PTSDLT_Present
-0.06548163878321532V5114_nan
-0.06558426381372447V6126_nan
-0.072079539972884V101_FAIR
-0.07365587155458184V5115_VERY
-0.07554611205687609V6649_NO
-0.07581536683103464V5113_NOT AT ALL
-0.0763143332665447V5115_SOMEWHAT
-0.07755640720234999AGOLT1_Present
-0.08034999051940162V101_POOR
-0.08598988794640758V4433_NOT TRUE AT ALL
-0.08725683116114903V5115_NOT AT ALL
-0.09134989171199648V6143_nan
-0.10197774432451405V5225_SOME
-0.10326360939887179RACE_White
-0.10470480896369988EMP_Student
-0.1084265363142451AGE
-0.11341981834248227V5113_SOMEWHAT
-0.11563431507087796V5118_VERY
-0.12914883253025136V5225_NOT AT ALL
-0.15262054762584326MANLT2_Present
-0.15988760994935408V5918_SOMETIMES
-0.1717779338449285MARSTAT_Marnev
-0.17285336745134225SEX_Male
-0.17403082831266203SIMLT_Present
-0.17628401943347866EDCAT_12 Years
-0.18033489401509195V102_VERY GOOD
-0.20160868214576103V6114_NO
-0.2142356450200656V5113_VERY
-0.24344419753797647V5118_nan
-0.2693705527928723V5225_nan
-0.28279116592730885V4428_nan
-0.28704678553078117V5225_INAP
-0.4396296944457304EDCAT_13-15 Years
-0.7817613297772439EDCAT_16+ Years
Future Ideas for Extending the Analysis
  • More Data! Include additional fields as well as addition cases to test/train
  • More current data. Our dataset was based on a study in 1991
  • Additional data on demographics as inputs for the
  • Error analysis to figure out why classifiers may be contributing to false positives and false negatives.
  • Optimizing a way to scrape the category labels, given the volume involved.