Druyd:Knowledge

Model for elections in USA (DNN)

Storyboard

Book:

Allan J. Lichtman - Predicting the Next President-Rowman & Littlefield Publishers (2020)

Program

USElection.ipynb

election.csv

election-secnarios.csv

>Model

ID:(1789, 0)

Load Lichtman Model Data

Description

>Top

Load the data into the election.csv file:

# import panda and numpy
import pandas as pd
import numpy as np

# load data
election = pd.read_csv('election.csv')

ID:(13856, 0)

Study data structure

Description

>Top

Show all columns with head of the data loaded in election:

# show 5 records
election.head(5)

ID:(13857, 0)

Form data to train and evaluate

Description

>Top

Form the data to train X_train, y_train and evaluate X_eval, y_eval:

# import train_test_split
from sklearn.model_selection import train_test_split

# build train and evaluation data for the vote and electorate results
X = election
y_v = election.pop('vote-victory')
y_e = election.pop('electoral-victory')
X_train_v, X_eval_v, y_train_v, y_eval_v = train_test_split(X, y_v, test_size=0.33, random_state=42)
X_train_e, X_eval_e, y_train_e, y_eval_e = train_test_split(X, y_e, test_size=0.33, random_state=42)

ID:(13858, 0)

Create array of columns to define model

Description

>Top

To define the model, the arrangement of the columns feature_columns is created that will be used by:

import tensorflow as tf

NUMERIC_COLUMNS = ['party-mandate', 'nomination-contest', 'incumbency', 'third-party', 'short-term-economy', 'long-term-economy', 'policy-change', 'social-unrest', 'scandal', 'foreign-military-failure', 'foreign-military-success', 'incumbent-charisma', 'challenger-charisma']

feature_columns = []
for feature_name in NUMERIC_COLUMNS:
  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.int32))

ID:(13859, 0)

Define DNN models

Description

>Top

With the columns feature_columns you can define with estimator.DNNClassifier the models DNN_model_v for voting and DNN_model_e for electorate:

# define the DNN model for voting
DNN_model_v = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[64,32])
# define the DNN model for electorate
DNN_model_e = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[64,32])

ID:(13860, 0)

Upload data to train and evaluate

Description

>Top

In order to run the training, you must load the data from both training train_input_fn and evaluation eval_input_fn in tensors and shuffle them:

# input function for training
def train_input_fn(features, labels, batch_size):
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    dataset = dataset.shuffle(10).repeat().batch(batch_size)
    return dataset

# input function for evaluation or prediction
def eval_input_fn(features, labels, batch_size):
    features=dict(features)
    if labels is None:
        inputs = features
    else:
        inputs = (features, labels)

    dataset = tf.data.Dataset.from_tensor_slices(inputs)
    
    assert batch_size is not None, 'batch_size must not be None'
    dataset = dataset.batch(batch_size)

    return dataset

ID:(13861, 0)

Define the DNN model for victory by votes

Description

>Top

With the feature_columns columns you can define the DNN_model_v model with estimator.DNNClassifier:

# define the DNN model v
batch_size = 10
train_steps = 40

for i in range(0,100):    
    DNN_model_v.train(input_fn=lambda:train_input_fn(X_train_v, y_train_v,batch_size),steps=train_steps)

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Saving checkpoints for 0 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:loss = 0.7023675, step = 0

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 40...

INFO:tensorflow:Saving checkpoints for 40 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 40...

INFO:tensorflow:Loss for final step: 0.5706555.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

...

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-3960

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3960...

INFO:tensorflow:Saving checkpoints for 3960 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3960...

INFO:tensorflow:loss = 0.19702096, step = 3960

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4000...

INFO:tensorflow:Saving checkpoints for 4000 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4000...

INFO:tensorflow:Loss for final step: 0.25778225.

ID:(13862, 0)

Define the DNN model for victory by electorate

Description

>Top

With the feature_columns columns you can define the DNN_model_e model with estimator.DNNClassifier:

# define the DNN model e
batch_size = 10
train_steps = 40

for i in range(0,100):    
    DNN_model_e.train(input_fn=lambda:train_input_fn(X_train_e, y_train_e,batch_size),steps=train_steps)

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Saving checkpoints for 0 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:loss = 0.66710687, step = 0

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 40...

INFO:tensorflow:Saving checkpoints for 40 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 40...

INFO:tensorflow:Loss for final step: 0.74084073.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

...

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-3960

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3960...

INFO:tensorflow:Saving checkpoints for 3960 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3960...

INFO:tensorflow:loss = 0.32061976, step = 3960

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4000...

INFO:tensorflow:Saving checkpoints for 4000 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4000...

INFO:tensorflow:Loss for final step: 0.2767122.

ID:(13863, 0)

Evaluate the DNN model of victory by votes

Description

>Top

Evaluate the voting win model with the data created by train_input_fn:

# evaluate the DNN model of victory by votes
eval_result_v = DNN_model_v.evaluate(input_fn=lambda:eval_input_fn(X_eval_v, y_eval_v,batch_size))

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2021-07-27T13:08:31

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Inference Time : 0.37481s

INFO:tensorflow:Finished evaluation at 2021-07-27-13:08:32

INFO:tensorflow:Saving dict for global step 4000: accuracy = 0.46153846, accuracy_baseline = 0.61538464, auc = 0.9, auc_precision_recall = 0.9057735, average_loss = 0.81627464, global_step = 4000, label/mean = 0.3846154, loss = 0.74616987, precision = 0.41666666, prediction/mean = 0.7725815, recall = 1.0

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4000: C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000

ID:(13864, 0)

Evaluate the DNN model of victory by college

Description

>Top

Evaluate the college win model with the data created by train_input_fn:

# evaluate the DNN model of victory by college
eval_result_e = DNN_model_v.evaluate(input_fn=lambda:eval_input_fn(X_eval_e, y_eval_e,batch_size))

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2021-07-27T13:08:53

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Inference Time : 0.37231s

INFO:tensorflow:Finished evaluation at 2021-07-27-13:08:54

INFO:tensorflow:Saving dict for global step 4000: accuracy = 0.61538464, accuracy_baseline = 0.53846157, auc = 0.78571427, auc_precision_recall = 0.77785635, average_loss = 0.7589465, global_step = 4000, label/mean = 0.46153846, loss = 0.80362004, precision = 0.54545456, prediction/mean = 0.7483164, recall = 1.0

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4000: C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000

ID:(13865, 0)

Perform forecast with the DNN model of victory by voting

Description

>Top

Forecast the model of wining by voting output with the evaluation data created by eval_input_fn:

# forcast with the DNN model of victory by voting
predictions_v = DNN_model_v.predict(
    input_fn=lambda:eval_input_fn(X_eval_v,labels=None,
    batch_size=batch_size))

results_v = list(predictions_v)

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

ID:(13866, 0)

Perform forecast with the DNN model of victory by college

Description

>Top

Forecast the model of wining by college output with the evaluation data created by eval_input_fn:

# forcast with the DNN model of victory by college
predictions_e = DNN_model_e.predict(
    input_fn=lambda:eval_input_fn(X_eval_e,labels=None,
    batch_size=batch_size))

results_e = list(predictions_e)

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

ID:(13868, 0)

Perform forecast with the DNN model of victory by college

Description

>Top

Forecast the model of wining by college output with the evaluation data created by eval_input_fn:

# show histogram of probabilies
import numpy
from matplotlib import pyplot

bins = numpy.linspace(0, 1, 10)
prob_v = [pred['probabilities'][1] for pred in results_v]
prob_e = [pred['probabilities'][1] for pred in results_e]

pyplot.hist([prob_v,prob_e], bins, label=['vote','college'])
pyplot.title('predicted probabilities')
pyplot.xlabel('probability')
pyplot.ylabel('frequency')
pyplot.legend(loc='upper right')
pyplot.show()

ID:(13867, 0)

ROC Curves

Description

>Top

To use this forecast, a limit value of the property must be defined to define on which value the survival will be predicted and under which the non-survival will be forecast. To do this, the probability that survival is predicted and observed (true positive) must be evaluated and compared with the probability that survival is predicted when it is not survived (false positive).\\nThe true-positive TPR or sensitivity factor is defined as\\n\\n

$TPR=\displaystyle\frac{TP}{TP+FN}$

\\n\\nwhere true-positive TP the cases correctly predicted as positive and the false-negative FN corresponding to the cases predicted as negative that are negative.\\nThe false-positive FPR or sensitivity factor is defined as\\n\\n

$FPR=\displaystyle\frac{FP}{FP+TN}$

where false-positive FP the cases predicted as positive when it is actually false and the false-negative FN corresponding to the cases predicted as negative that are negative.

from sklearn.metrics import roc_curve
from matplotlib import pyplot as plt

fpr, tpr_v, _ = roc_curve(y_eval_v, probs)
fpr, tpr_e, _ = roc_curve(y_eval_e, probs)
plt.plot(fpr,tpr_v,label='vote')
plt.plot(fpr,tpr_e,label='college')
plt.title('ROC curve')
plt.xlabel('false positive rate')
plt.ylabel('true positive rate')
plt.legend(loc='lower right')
plt.xlim(0,)
plt.ylim(0,)

The representation of both probabilities is called a ROC (Receiver Operating Characteristic) diagram:

ID:(13869, 0)

Load and study data to forecast

Description

>Top

Load and show 6 columns with head of the data loaded in election_scenarios:

# load and show 6 scenarios
election_secnarios = pd.read_csv('election-secnarios.csv')
election_secnarios.head(6)

ID:(13870, 0)

Forecast

Description

>Top

Forecast results of the scenarios contained in election_scenarios:

predictions_v = DNN_model_v.predict(
    input_fn=lambda:eval_input_fn(election_secnarios,labels=None,
    batch_size=batch_size))

predictions_e = DNN_model_e.predict(
    input_fn=lambda:eval_input_fn(election_secnarios,labels=None,
    batch_size=batch_size))

results_v = list(predictions_v)
results_e = list(predictions_e)

def x(res,j):
    class_id = res[j]['class_ids']
    probability = int(res[j]['probabilities'][class_id] *100)

    if int(class_id) == 0:
        return ('%s%% probalitity to %s' % (probability,'Challenger'))
    else:
        return ('%s%% probalitity to %s' % (probability,'Incumbent'))

print ('Predictions for the scenarios:')

for i in range(0,6):    
    print (x(results_v,i)+','+x(results_e,i))

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

Predictions for the scenarios:

68% probalitity to Incumbent,60% probalitity to Incumbent

80% probalitity to Incumbent,72% probalitity to Incumbent

86% probalitity to Incumbent,76% probalitity to Incumbent

90% probalitity to Incumbent,76% probalitity to Incumbent

89% probalitity to Incumbent,76% probalitity to Incumbent

93% probalitity to Incumbent,75% probalitity to Incumbent

ID:(13871, 0)