Druyd:Knowledge

Modelo para elecciones en USA (DNN)

Storyboard

Book:

Allan J. Lichtman - Predicting the Next President-Rowman & Littlefield Publishers (2020)

Program

USElection.ipynb

election.csv

election-secnarios.csv

>Modelo

ID:(1789, 0)

Cargar datos del modelo Lichtman

Descripción

>Top

Cargar los datos en el archivo election.csv:

# import panda and numpy
import pandas as pd
import numpy as np

# load data
election = pd.read_csv('election.csv')

ID:(13856, 0)

Estudiar estructura de los datos

Descripción

>Top

Mostrar todas las columnas con head de los datos cargados en election:

# show 5 records
election.head(5)

ID:(13857, 0)

Formar datos para entrenar y evaluar

Descripción

>Top

Formar los datos para entrenar X_train, y_train y evaluar X_eval, y_eval:

# import train_test_split
from sklearn.model_selection import train_test_split

# build train and evaluation data for the vote and electorate results
X = election
y_v = election.pop('vote-victory')
y_e = election.pop('electoral-victory')
X_train_v, X_eval_v, y_train_v, y_eval_v = train_test_split(X, y_v, test_size=0.33, random_state=42)
X_train_e, X_eval_e, y_train_e, y_eval_e = train_test_split(X, y_e, test_size=0.33, random_state=42)

ID:(13858, 0)

Crear arreglo de columnas para definir modelo

Descripción

>Top

Para definir el modelo se crea el arreglo de las columnas feature_columns que se van a usar mediante:

import tensorflow as tf

NUMERIC_COLUMNS = ['party-mandate', 'nomination-contest', 'incumbency', 'third-party', 'short-term-economy', 'long-term-economy', 'policy-change', 'social-unrest', 'scandal', 'foreign-military-failure', 'foreign-military-success', 'incumbent-charisma', 'challenger-charisma']

feature_columns = []
for feature_name in NUMERIC_COLUMNS:
  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.int32))

ID:(13859, 0)

Definir el modelos DNN

Descripción

>Top

Con las columnas feature_columns se puede definir con estimator.DNNClassifier los modelos DNN_model_v para votación y DNN_model_e para electorado:

# define the DNN model for voting
DNN_model_v = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[64,32])
# define the DNN model for electorate
DNN_model_e = tf.estimator.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[64,32])

ID:(13860, 0)

Cargar datos para entrenar y evaluar

Descripción

>Top

Para poder correr el entrenamiento se debe cargar los datos tanto de entrenamiento train_input_fn y evaluación eval_input_fn en tensores y barajarlos:

# input function for training
def train_input_fn(features, labels, batch_size):
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    dataset = dataset.shuffle(10).repeat().batch(batch_size)
    return dataset

# input function for evaluation or prediction
def eval_input_fn(features, labels, batch_size):
    features=dict(features)
    if labels is None:
        inputs = features
    else:
        inputs = (features, labels)

    dataset = tf.data.Dataset.from_tensor_slices(inputs)
    
    assert batch_size is not None, 'batch_size must not be None'
    dataset = dataset.batch(batch_size)

    return dataset

ID:(13861, 0)

Definir el modelo DNN para victoria por votar

Descripción

>Top

Con las columnas feature_columns se puede definir con estimator.DNNClassifier el modelo DNN_model_v:

# define the DNN model v
batch_size = 10
train_steps = 40

for i in range(0,100):    
    DNN_model_v.train(input_fn=lambda:train_input_fn(X_train_v, y_train_v,batch_size),steps=train_steps)

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Saving checkpoints for 0 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:loss = 0.7023675, step = 0

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 40...

INFO:tensorflow:Saving checkpoints for 40 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 40...

INFO:tensorflow:Loss for final step: 0.5706555.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

...

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-3960

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3960...

INFO:tensorflow:Saving checkpoints for 3960 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3960...

INFO:tensorflow:loss = 0.19702096, step = 3960

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4000...

INFO:tensorflow:Saving checkpoints for 4000 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4000...

INFO:tensorflow:Loss for final step: 0.25778225.

ID:(13862, 0)

Definir el modelo DNN para victoria por electorado

Descripción

>Top

Con las columnas feature_columns se puede definir con estimator.DNNClassifier el modelo DNN_model_e:

# define the DNN model e
batch_size = 10
train_steps = 40

for i in range(0,100):    
    DNN_model_e.train(input_fn=lambda:train_input_fn(X_train_e, y_train_e,batch_size),steps=train_steps)

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...

INFO:tensorflow:Saving checkpoints for 0 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...

INFO:tensorflow:loss = 0.66710687, step = 0

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 40...

INFO:tensorflow:Saving checkpoints for 40 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 40...

INFO:tensorflow:Loss for final step: 0.74084073.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Create CheckpointSaverHook.

INFO:tensorflow:Graph was finalized.

...

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-3960

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3960...

INFO:tensorflow:Saving checkpoints for 3960 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3960...

INFO:tensorflow:loss = 0.32061976, step = 3960

INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4000...

INFO:tensorflow:Saving checkpoints for 4000 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.

INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4000...

INFO:tensorflow:Loss for final step: 0.2767122.

ID:(13863, 0)

Evaluar el modelo DNN de victoria por votación

Descripción

>Top

Evaluar el modelo de victoria por votación con los datos creados por train_input_fn:

# evaluate the DNN model of victory by votes
eval_result_v = DNN_model_v.evaluate(input_fn=lambda:eval_input_fn(X_eval_v, y_eval_v,batch_size))

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2021-07-27T13:08:31

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Inference Time : 0.37481s

INFO:tensorflow:Finished evaluation at 2021-07-27-13:08:32

INFO:tensorflow:Saving dict for global step 4000: accuracy = 0.46153846, accuracy_baseline = 0.61538464, auc = 0.9, auc_precision_recall = 0.9057735, average_loss = 0.81627464, global_step = 4000, label/mean = 0.3846154, loss = 0.74616987, precision = 0.41666666, prediction/mean = 0.7725815, recall = 1.0

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4000: C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000

ID:(13864, 0)

Evaluar el modelo DNN de victoria por comitee

Descripción

>Top

Evaluar el modelo de victoria por comitee con los datos creados por train_input_fn:

# evaluate the DNN model of victory by college
eval_result_e = DNN_model_v.evaluate(input_fn=lambda:eval_input_fn(X_eval_e, y_eval_e,batch_size))

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Starting evaluation at 2021-07-27T13:08:53

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Inference Time : 0.37231s

INFO:tensorflow:Finished evaluation at 2021-07-27-13:08:54

INFO:tensorflow:Saving dict for global step 4000: accuracy = 0.61538464, accuracy_baseline = 0.53846157, auc = 0.78571427, auc_precision_recall = 0.77785635, average_loss = 0.7589465, global_step = 4000, label/mean = 0.46153846, loss = 0.80362004, precision = 0.54545456, prediction/mean = 0.7483164, recall = 1.0

INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4000: C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000

ID:(13865, 0)

Realizar pronóstico con el modelo DNN de victoria por votación

Descripción

>Top

Pronosticar el resultado del modelo de victoria por votación con los datos de evaluación creados por eval_input_fn:

# forcast with the DNN model of victory by voting
predictions_v = DNN_model_v.predict(
    input_fn=lambda:eval_input_fn(X_eval_v,labels=None,
    batch_size=batch_size))

results_v = list(predictions_v)

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

ID:(13866, 0)

Realice el pronóstico con el modelo DNN de victoria por comitee

Descripción

>Top

Pronosticar el resultado del modelo de victoria por comitee con los datos de evaluación creados por eval_input_fn:

# forcast with the DNN model of victory by college
predictions_e = DNN_model_e.predict(
    input_fn=lambda:eval_input_fn(X_eval_e,labels=None,
    batch_size=batch_size))

results_e = list(predictions_e)

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

ID:(13868, 0)

Histograma de las probabilidades de victoria

Descripción

>Top

Pronosticar el resultado del modelo de victoria por comitee con los datos de evaluación creados por eval_input_fn:

# show histogram of probabilies
import numpy
from matplotlib import pyplot

bins = numpy.linspace(0, 1, 10)
prob_v = [pred['probabilities'][1] for pred in results_v]
prob_e = [pred['probabilities'][1] for pred in results_e]

pyplot.hist([prob_v,prob_e], bins, label=['vote','college'])
pyplot.title('predicted probabilities')
pyplot.xlabel('probability')
pyplot.ylabel('frequency')
pyplot.legend(loc='upper right')
pyplot.show()

ID:(13867, 0)

Curvas ROC

Descripción

>Top

Para usar este pronostico se debe definir un valor limite de la propiedad para definir sobre que valor se va a pronosticar que se sobrevive y bajo la cual se pronosticara la no sobre-vivencia. Para ello se debe evaluar la probabilidad de que se pronostique la sobre-vivencia y se observe esta (true positive) y compararla con la probabilidad que de pronostique la sobre-vivencia cuando no se sobrevive (false positive).\\nEl factor true-positive TPR o sensitividad se define como\\n\\n

$TPR=\displaystyle\frac{TP}{TP+FN}$

\\n\\ndonde true-positive TP los casos pronosticados correctamente como positivos y los false-negative FN que corresponden a los casos pronosticados como negativos que resultan negativos.\\nEl factor false-positive FPR o sensitividad se define como\\n\\n

$FPR=\displaystyle\frac{FP}{FP+TN}$

donde false-positive FP los casos pronosticados como positivos cuando es en realidad falso y los false-negative FN que corresponden a los casos pronosticados como negativos que resultan negativos.

from sklearn.metrics import roc_curve
from matplotlib import pyplot as plt

fpr, tpr_v, _ = roc_curve(y_eval_v, probs)
fpr, tpr_e, _ = roc_curve(y_eval_e, probs)
plt.plot(fpr,tpr_v,label='vote')
plt.plot(fpr,tpr_e,label='college')
plt.title('ROC curve')
plt.xlabel('false positive rate')
plt.ylabel('true positive rate')
plt.legend(loc='lower right')
plt.xlim(0,)
plt.ylim(0,)

La representación de ambas probabilidades se denominan un diagrama de ROC (Receiver Operating Characteristic):

ID:(13869, 0)

Cargar y estudiar datos a pronosticar

Descripción

>Top

Cargar y mostrar 6 columnas con head de los datos cargados en election_scenarios:

# load and show 6 scenarios
election_secnarios = pd.read_csv('election-secnarios.csv')
election_secnarios.head(6)

ID:(13870, 0)

Pronosticar

Descripción

>Top

Pronosticar resultados de los escenarios contenidos en election_scenarios:

predictions_v = DNN_model_v.predict(
    input_fn=lambda:eval_input_fn(election_secnarios,labels=None,
    batch_size=batch_size))

predictions_e = DNN_model_e.predict(
    input_fn=lambda:eval_input_fn(election_secnarios,labels=None,
    batch_size=batch_size))

results_v = list(predictions_v)
results_e = list(predictions_e)

def x(res,j):
    class_id = res[j]['class_ids']
    probability = int(res[j]['probabilities'][class_id] *100)

    if int(class_id) == 0:
        return ('%s%% probalitity to %s' % (probability,'Challenger'))
    else:
        return ('%s%% probalitity to %s' % (probability,'Incumbent'))

print ('Predictions for the scenarios:')

for i in range(0,6):    
    print (x(results_v,i)+','+x(results_e,i))

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

INFO:tensorflow:Calling model_fn.

INFO:tensorflow:Done calling model_fn.

INFO:tensorflow:Graph was finalized.

INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000

INFO:tensorflow:Running local_init_op.

INFO:tensorflow:Done running local_init_op.

Predictions for the scenarios:

68% probalitity to Incumbent,60% probalitity to Incumbent

80% probalitity to Incumbent,72% probalitity to Incumbent

86% probalitity to Incumbent,76% probalitity to Incumbent

90% probalitity to Incumbent,76% probalitity to Incumbent

89% probalitity to Incumbent,76% probalitity to Incumbent

93% probalitity to Incumbent,75% probalitity to Incumbent

ID:(13871, 0)