Modelo para elecciones en USA (DNN)
Storyboard
Book:
Allan J. Lichtman - Predicting the Next President-Rowman & Littlefield Publishers (2020)
Program
ID:(1789, 0)
Cargar datos del modelo Lichtman
Descripción
Cargar los datos en el archivo election.csv:
# import panda and numpy import pandas as pd import numpy as np # load data election = pd.read_csv('election.csv')
ID:(13856, 0)
Estudiar estructura de los datos
Descripción
Mostrar todas las columnas con head de los datos cargados en election:
# show 5 records election.head(5)
ID:(13857, 0)
Formar datos para entrenar y evaluar
Descripción
Formar los datos para entrenar X_train, y_train y evaluar X_eval, y_eval:
# import train_test_split from sklearn.model_selection import train_test_split # build train and evaluation data for the vote and electorate results X = election y_v = election.pop('vote-victory') y_e = election.pop('electoral-victory') X_train_v, X_eval_v, y_train_v, y_eval_v = train_test_split(X, y_v, test_size=0.33, random_state=42) X_train_e, X_eval_e, y_train_e, y_eval_e = train_test_split(X, y_e, test_size=0.33, random_state=42)
ID:(13858, 0)
Crear arreglo de columnas para definir modelo
Descripción
Para definir el modelo se crea el arreglo de las columnas feature_columns que se van a usar mediante:
import tensorflow as tf NUMERIC_COLUMNS = ['party-mandate', 'nomination-contest', 'incumbency', 'third-party', 'short-term-economy', 'long-term-economy', 'policy-change', 'social-unrest', 'scandal', 'foreign-military-failure', 'foreign-military-success', 'incumbent-charisma', 'challenger-charisma'] feature_columns = [] for feature_name in NUMERIC_COLUMNS: feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.int32))
ID:(13859, 0)
Definir el modelos DNN
Descripción
Con las columnas feature_columns se puede definir con estimator.DNNClassifier los modelos DNN_model_v para votación y DNN_model_e para electorado:
# define the DNN model for voting DNN_model_v = tf.estimator.DNNClassifier( feature_columns=feature_columns, hidden_units=[64,32]) # define the DNN model for electorate DNN_model_e = tf.estimator.DNNClassifier( feature_columns=feature_columns, hidden_units=[64,32])
ID:(13860, 0)
Cargar datos para entrenar y evaluar
Descripción
Para poder correr el entrenamiento se debe cargar los datos tanto de entrenamiento train_input_fn y evaluación eval_input_fn en tensores y barajarlos:
# input function for training def train_input_fn(features, labels, batch_size): dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels)) dataset = dataset.shuffle(10).repeat().batch(batch_size) return dataset # input function for evaluation or prediction def eval_input_fn(features, labels, batch_size): features=dict(features) if labels is None: inputs = features else: inputs = (features, labels) dataset = tf.data.Dataset.from_tensor_slices(inputs) assert batch_size is not None, 'batch_size must not be None' dataset = dataset.batch(batch_size) return dataset
ID:(13861, 0)
Definir el modelo DNN para victoria por votar
Descripción
Con las columnas feature_columns se puede definir con estimator.DNNClassifier el modelo DNN_model_v:
# define the DNN model v batch_size = 10 train_steps = 40 for i in range(0,100): DNN_model_v.train(input_fn=lambda:train_input_fn(X_train_v, y_train_v,batch_size),steps=train_steps)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 0.7023675, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 40...
INFO:tensorflow:Saving checkpoints for 40 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 40...
INFO:tensorflow:Loss for final step: 0.5706555.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
...
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-3960
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3960...
INFO:tensorflow:Saving checkpoints for 3960 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3960...
INFO:tensorflow:loss = 0.19702096, step = 3960
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4000...
INFO:tensorflow:Saving checkpoints for 4000 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4000...
INFO:tensorflow:Loss for final step: 0.25778225.
ID:(13862, 0)
Definir el modelo DNN para victoria por electorado
Descripción
Con las columnas feature_columns se puede definir con estimator.DNNClassifier el modelo DNN_model_e:
# define the DNN model e batch_size = 10 train_steps = 40 for i in range(0,100): DNN_model_e.train(input_fn=lambda:train_input_fn(X_train_e, y_train_e,batch_size),steps=train_steps)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 0.66710687, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 40...
INFO:tensorflow:Saving checkpoints for 40 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 40...
INFO:tensorflow:Loss for final step: 0.74084073.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
...
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-3960
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 3960...
INFO:tensorflow:Saving checkpoints for 3960 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 3960...
INFO:tensorflow:loss = 0.32061976, step = 3960
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4000...
INFO:tensorflow:Saving checkpoints for 4000 into C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4000...
INFO:tensorflow:Loss for final step: 0.2767122.
ID:(13863, 0)
Evaluar el modelo DNN de victoria por votación
Descripción
Evaluar el modelo de victoria por votación con los datos creados por train_input_fn:
# evaluate the DNN model of victory by votes eval_result_v = DNN_model_v.evaluate(input_fn=lambda:eval_input_fn(X_eval_v, y_eval_v,batch_size))
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-07-27T13:08:31
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.37481s
INFO:tensorflow:Finished evaluation at 2021-07-27-13:08:32
INFO:tensorflow:Saving dict for global step 4000: accuracy = 0.46153846, accuracy_baseline = 0.61538464, auc = 0.9, auc_precision_recall = 0.9057735, average_loss = 0.81627464, global_step = 4000, label/mean = 0.3846154, loss = 0.74616987, precision = 0.41666666, prediction/mean = 0.7725815, recall = 1.0
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4000: C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000
ID:(13864, 0)
Evaluar el modelo DNN de victoria por comitee
Descripción
Evaluar el modelo de victoria por comitee con los datos creados por train_input_fn:
# evaluate the DNN model of victory by college eval_result_e = DNN_model_v.evaluate(input_fn=lambda:eval_input_fn(X_eval_e, y_eval_e,batch_size))
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2021-07-27T13:08:53
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.37231s
INFO:tensorflow:Finished evaluation at 2021-07-27-13:08:54
INFO:tensorflow:Saving dict for global step 4000: accuracy = 0.61538464, accuracy_baseline = 0.53846157, auc = 0.78571427, auc_precision_recall = 0.77785635, average_loss = 0.7589465, global_step = 4000, label/mean = 0.46153846, loss = 0.80362004, precision = 0.54545456, prediction/mean = 0.7483164, recall = 1.0
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4000: C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000
ID:(13865, 0)
Realizar pronóstico con el modelo DNN de victoria por votación
Descripción
Pronosticar el resultado del modelo de victoria por votación con los datos de evaluación creados por eval_input_fn:
# forcast with the DNN model of victory by voting predictions_v = DNN_model_v.predict( input_fn=lambda:eval_input_fn(X_eval_v,labels=None, batch_size=batch_size)) results_v = list(predictions_v)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
ID:(13866, 0)
Realice el pronóstico con el modelo DNN de victoria por comitee
Descripción
Pronosticar el resultado del modelo de victoria por comitee con los datos de evaluación creados por eval_input_fn:
# forcast with the DNN model of victory by college predictions_e = DNN_model_e.predict( input_fn=lambda:eval_input_fn(X_eval_e,labels=None, batch_size=batch_size)) results_e = list(predictions_e)
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
ID:(13868, 0)
Histograma de las probabilidades de victoria
Descripción
Pronosticar el resultado del modelo de victoria por comitee con los datos de evaluación creados por eval_input_fn:
# show histogram of probabilies import numpy from matplotlib import pyplot bins = numpy.linspace(0, 1, 10) prob_v = [pred['probabilities'][1] for pred in results_v] prob_e = [pred['probabilities'][1] for pred in results_e] pyplot.hist([prob_v,prob_e], bins, label=['vote','college']) pyplot.title('predicted probabilities') pyplot.xlabel('probability') pyplot.ylabel('frequency') pyplot.legend(loc='upper right') pyplot.show()
ID:(13867, 0)
Curvas ROC
Descripción
Para usar este pronostico se debe definir un valor limite de la propiedad para definir sobre que valor se va a pronosticar que se sobrevive y bajo la cual se pronosticara la no sobre-vivencia. Para ello se debe evaluar la probabilidad de que se pronostique la sobre-vivencia y se observe esta (true positive) y compararla con la probabilidad que de pronostique la sobre-vivencia cuando no se sobrevive (false positive).\\nEl factor true-positive
$TPR=\displaystyle\frac{TP}{TP+FN}$
\\n\\ndonde true-positive
$FPR=\displaystyle\frac{FP}{FP+TN}$
donde false-positive
from sklearn.metrics import roc_curve from matplotlib import pyplot as plt fpr, tpr_v, _ = roc_curve(y_eval_v, probs) fpr, tpr_e, _ = roc_curve(y_eval_e, probs) plt.plot(fpr,tpr_v,label='vote') plt.plot(fpr,tpr_e,label='college') plt.title('ROC curve') plt.xlabel('false positive rate') plt.ylabel('true positive rate') plt.legend(loc='lower right') plt.xlim(0,) plt.ylim(0,)
La representación de ambas probabilidades se denominan un diagrama de ROC (Receiver Operating Characteristic):
ID:(13869, 0)
Cargar y estudiar datos a pronosticar
Descripción
Cargar y mostrar 6 columnas con head de los datos cargados en election_scenarios:
# load and show 6 scenarios election_secnarios = pd.read_csv('election-secnarios.csv') election_secnarios.head(6)
ID:(13870, 0)
Pronosticar
Descripción
Pronosticar resultados de los escenarios contenidos en election_scenarios:
predictions_v = DNN_model_v.predict( input_fn=lambda:eval_input_fn(election_secnarios,labels=None, batch_size=batch_size)) predictions_e = DNN_model_e.predict( input_fn=lambda:eval_input_fn(election_secnarios,labels=None, batch_size=batch_size)) results_v = list(predictions_v) results_e = list(predictions_e) def x(res,j): class_id = res[j]['class_ids'] probability = int(res[j]['probabilities'][class_id] *100) if int(class_id) == 0: return ('%s%% probalitity to %s' % (probability,'Challenger')) else: return ('%s%% probalitity to %s' % (probability,'Incumbent')) print ('Predictions for the scenarios:') for i in range(0,6): print (x(results_v,i)+','+x(results_e,i))
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmpuhex661k\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from C:\Users\KLAUSS~1\AppData\Local\Temp\tmp05qx1i4u\model.ckpt-4000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Predictions for the scenarios:
68% probalitity to Incumbent,60% probalitity to Incumbent
80% probalitity to Incumbent,72% probalitity to Incumbent
86% probalitity to Incumbent,76% probalitity to Incumbent
90% probalitity to Incumbent,76% probalitity to Incumbent
89% probalitity to Incumbent,76% probalitity to Incumbent
93% probalitity to Incumbent,75% probalitity to Incumbent
ID:(13871, 0)