Model for elections in USA (DNN)
Storyboard
Book:
Allan J. Lichtman - Predicting the Next President-Rowman & Littlefield Publishers (2020)
Program
ID:(1789, 0)
Load Lichtman Model Data
Description
Load the data into the election.csv file:
# import panda and numpy import pandas as pd import numpy as np # load data election = pd.read_csv('election.csv')
ID:(13856, 0)
Study data structure
Description
Show all columns with head of the data loaded in election:
# show 5 records election.head(5)
ID:(13857, 0)
Form data to train and evaluate
Description
Form the data to train X_train, y_train and evaluate X_eval, y_eval:
# import train_test_split from sklearn.model_selection import train_test_split # build train and evaluation data for the vote and electorate results X = election y_v = election.pop('vote-victory') y_e = election.pop('electoral-victory') X_train_v, X_eval_v, y_train_v, y_eval_v = train_test_split(X, y_v, test_size=0.33, random_state=42) X_train_e, X_eval_e, y_train_e, y_eval_e = train_test_split(X, y_e, test_size=0.33, random_state=42)
ID:(13858, 0)
Create array of columns to define model
Description
To define the model, the arrangement of the columns feature_columns is created that will be used by:
import tensorflow as tf NUMERIC_COLUMNS = ['party-mandate', 'nomination-contest', 'incumbency', 'third-party', 'short-term-economy', 'long-term-economy', 'policy-change', 'social-unrest', 'scandal', 'foreign-military-failure', 'foreign-military-success', 'incumbent-charisma', 'challenger-charisma'] feature_columns = [] for feature_name in NUMERIC_COLUMNS: feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.int32))
ID:(13859, 0)
Define DNN models
Description
With the columns feature_columns you can define with estimator.DNNClassifier the models DNN_model_v for voting and DNN_model_e for electorate:
# define the DNN model for voting DNN_model_v = tf.estimator.DNNClassifier( feature_columns=feature_columns, hidden_units=[64,32]) # define the DNN model for electorate DNN_model_e = tf.estimator.DNNClassifier( feature_columns=feature_columns, hidden_units=[64,32])
ID:(13860, 0)
Upload data to train and evaluate
Description
In order to run the training, you must load the data from both training train_input_fn and evaluation eval_input_fn in tensors and shuffle them:
# input function for training def train_input_fn(features, labels, batch_size): dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels)) dataset = dataset.shuffle(10).repeat().batch(batch_size) return dataset # input function for evaluation or prediction def eval_input_fn(features, labels, batch_size): features=dict(features) if labels is None: inputs = features else: inputs = (features, labels) dataset = tf.data.Dataset.from_tensor_slices(inputs) assert batch_size is not None, 'batch_size must not be None' dataset = dataset.batch(batch_size) return dataset
ID:(13861, 0)
Define the DNN model for victory by votes
Description
With the feature_columns columns you can define the DNN_model_v model with estimator.DNNClassifier:
# define the DNN model v batch_size = 10 train_steps = 40 for i in range(0,100): DNN_model_v.train(input_fn=lambda:train_input_fn(X_train_v, y_train_v,batch_size),steps=train_steps)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 0.7023675, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 40...
INFO:tensorflow:Saving checkpoints for 40 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 40...
INFO:tensorflow:Loss for final step: 0.5706555.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
...
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-3960
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3960...
INFO:tensorflow:Saving checkpoints for 3960 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3960...
INFO:tensorflow:loss = 0.19702096, step = 3960
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4000...
INFO:tensorflow:Saving checkpoints for 4000 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4000...
INFO:tensorflow:Loss for final step: 0.25778225.
ID:(13862, 0)
Define the DNN model for victory by electorate
Description
With the feature_columns columns you can define the DNN_model_e model with estimator.DNNClassifier:
# define the DNN model e batch_size = 10 train_steps = 40 for i in range(0,100): DNN_model_e.train(input_fn=lambda:train_input_fn(X_train_e, y_train_e,batch_size),steps=train_steps)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 0.66710687, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 40...
INFO:tensorflow:Saving checkpoints for 40 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 40...
INFO:tensorflow:Loss for final step: 0.74084073.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
...
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-3960
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3960...
INFO:tensorflow:Saving checkpoints for 3960 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3960...
INFO:tensorflow:loss = 0.32061976, step = 3960
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4000...
INFO:tensorflow:Saving checkpoints for 4000 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4000...
INFO:tensorflow:Loss for final step: 0.2767122.
ID:(13863, 0)
Evaluate the DNN model of victory by votes
Description
Evaluate the voting win model with the data created by train_input_fn:
# evaluate the DNN model of victory by votes eval_result_v = DNN_model_v.evaluate(input_fn=lambda:eval_input_fn(X_eval_v, y_eval_v,batch_size))
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-07-27T13:08:31
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.37481s
INFO:tensorflow:Finished evaluation at 2021-07-27-13:08:32
INFO:tensorflow:Saving dict for global step 4000: accuracy = 0.46153846, accuracy_baseline = 0.61538464, auc = 0.9, auc_precision_recall = 0.9057735, average_loss = 0.81627464, global_step = 4000, label/mean = 0.3846154, loss = 0.74616987, precision = 0.41666666, prediction/mean = 0.7725815, recall = 1.0
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4000: C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000
ID:(13864, 0)
Evaluate the DNN model of victory by college
Description
Evaluate the college win model with the data created by train_input_fn:
# evaluate the DNN model of victory by college eval_result_e = DNN_model_v.evaluate(input_fn=lambda:eval_input_fn(X_eval_e, y_eval_e,batch_size))
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-07-27T13:08:53
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.37231s
INFO:tensorflow:Finished evaluation at 2021-07-27-13:08:54
INFO:tensorflow:Saving dict for global step 4000: accuracy = 0.61538464, accuracy_baseline = 0.53846157, auc = 0.78571427, auc_precision_recall = 0.77785635, average_loss = 0.7589465, global_step = 4000, label/mean = 0.46153846, loss = 0.80362004, precision = 0.54545456, prediction/mean = 0.7483164, recall = 1.0
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4000: C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000
ID:(13865, 0)
Perform forecast with the DNN model of victory by voting
Description
Forecast the model of wining by voting output with the evaluation data created by eval_input_fn:
# forcast with the DNN model of victory by voting predictions_v = DNN_model_v.predict( input_fn=lambda:eval_input_fn(X_eval_v,labels=None, batch_size=batch_size)) results_v = list(predictions_v)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
ID:(13866, 0)
Perform forecast with the DNN model of victory by college
Description
Forecast the model of wining by college output with the evaluation data created by eval_input_fn:
# forcast with the DNN model of victory by college predictions_e = DNN_model_e.predict( input_fn=lambda:eval_input_fn(X_eval_e,labels=None, batch_size=batch_size)) results_e = list(predictions_e)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
ID:(13868, 0)
Perform forecast with the DNN model of victory by college
Description
Forecast the model of wining by college output with the evaluation data created by eval_input_fn:
# show histogram of probabilies import numpy from matplotlib import pyplot bins = numpy.linspace(0, 1, 10) prob_v = [pred['probabilities'][1] for pred in results_v] prob_e = [pred['probabilities'][1] for pred in results_e] pyplot.hist([prob_v,prob_e], bins, label=['vote','college']) pyplot.title('predicted probabilities') pyplot.xlabel('probability') pyplot.ylabel('frequency') pyplot.legend(loc='upper right') pyplot.show()
ID:(13867, 0)
ROC Curves
Description
To use this forecast, a limit value of the property must be defined to define on which value the survival will be predicted and under which the non-survival will be forecast. To do this, the probability that survival is predicted and observed (true positive) must be evaluated and compared with the probability that survival is predicted when it is not survived (false positive).\\nThe true-positive
$TPR=\displaystyle\frac{TP}{TP+FN}$
\\n\\nwhere true-positive
$FPR=\displaystyle\frac{FP}{FP+TN}$
where false-positive
from sklearn.metrics import roc_curve from matplotlib import pyplot as plt fpr, tpr_v, _ = roc_curve(y_eval_v, probs) fpr, tpr_e, _ = roc_curve(y_eval_e, probs) plt.plot(fpr,tpr_v,label='vote') plt.plot(fpr,tpr_e,label='college') plt.title('ROC curve') plt.xlabel('false positive rate') plt.ylabel('true positive rate') plt.legend(loc='lower right') plt.xlim(0,) plt.ylim(0,)
The representation of both probabilities is called a ROC (Receiver Operating Characteristic) diagram:
ID:(13869, 0)
Load and study data to forecast
Description
Load and show 6 columns with head of the data loaded in election_scenarios:
# load and show 6 scenarios election_secnarios = pd.read_csv('election-secnarios.csv') election_secnarios.head(6)
ID:(13870, 0)
Forecast
Description
Forecast results of the scenarios contained in election_scenarios:
predictions_v = DNN_model_v.predict( input_fn=lambda:eval_input_fn(election_secnarios,labels=None, batch_size=batch_size)) predictions_e = DNN_model_e.predict( input_fn=lambda:eval_input_fn(election_secnarios,labels=None, batch_size=batch_size)) results_v = list(predictions_v) results_e = list(predictions_e) def x(res,j): class_id = res[j]['class_ids'] probability = int(res[j]['probabilities'][class_id] *100) if int(class_id) == 0: return ('%s%% probalitity to %s' % (probability,'Challenger')) else: return ('%s%% probalitity to %s' % (probability,'Incumbent')) print ('Predictions for the scenarios:') for i in range(0,6): print (x(results_v,i)+','+x(results_e,i))
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Predictions for the scenarios:
68% probalitity to Incumbent,60% probalitity to Incumbent
80% probalitity to Incumbent,72% probalitity to Incumbent
86% probalitity to Incumbent,76% probalitity to Incumbent
90% probalitity to Incumbent,76% probalitity to Incumbent
89% probalitity to Incumbent,76% probalitity to Incumbent
93% probalitity to Incumbent,75% probalitity to Incumbent
ID:(13871, 0)