Taller 5 Solucionado
-
Upload
giovanny-franco -
Category
Documents
-
view
121 -
download
3
Transcript of Taller 5 Solucionado
Taller 5
La siguiente tabla se compone de datos de entrenamiento de una base de datos de los empleados. Los datos han sido generalizados. Por ejemplo, "31-35" para la edad representa el rango de edad de 31 a 35.Para una entrada de fila dada, representa el número de tuplas de datos que tienen los valores dedepartamento, estado, edad y salario dado en esa fila.
department status age salary countsales senior 31-35 46K-50K 30sales junior 26-30 26K-30K 40sales junior 31-35 31K-35K 40systems junior 21-25 46K-50K 20systems senior 31-35 66K-70K 5systems junior 26-30 46K-50K 3systems senior 41-45 66K-70K 3marketing senior 36-40 46K-50K 10marketing junior 31-35 41K-45K 4secretary senior 46-50 36K-40K 4secretary junior 26-30 26K-30K 6
Sea status el atributo etiqueta de clase.
1) Usando weka,a. Construir el árbol usando id3, j48 y random forest. Compare los resultadosb. Bayes netc. Multilayer perceptrond. LibSVM, pruebe con 4 diferentes tipos de kernel. Compare los resultados.e.
2) Otro método para solucionar las redes bayesianas es el de Belief propagation, tambien conocido como sum-product message passing. Describa en brevemente en que consiste.
3) Hacer el ejercicio 9.1 del libro
Solución
Id3
=== Classifier model (full training set) ===
Id3
salary = 46K-50k: senior
salary = 26K-30K: junior
salary = 31K-35K: junior
salary = 46K-50K
| department = sales: null
| department = systems: junior
| department = marketing: senior
| department = secretary: null
salary = 66K-70K: senior
salary = 41K-45k: junior
salary = 36K-40K: senior
Time taken to build model: 0.03 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 165 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Total Number of Instances 165
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
1 0 1 1 1 1 senior
1 0 1 1 1 1 junior
Weighted Avg. 1 0 1 1 1 1
=== Confusion Matrix ===
a b <-- classified as
52 0 | a = senior
0 113 | b = junior
J48
=== Classifier model (full training set) ===
J48 pruned tree
------------------
salary = 46K-50k: senior (30.0)
salary = 26K-30K: junior (46.0)
salary = 31K-35K: junior (40.0)
salary = 46K-50K
| department = sales: junior (0.0)
| department = systems: junior (23.0)
| department = marketing: senior (10.0)
| department = secretary: junior (0.0)
salary = 66K-70K: senior (8.0)
salary = 41K-45k: junior (4.0)
salary = 36K-40K: senior (4.0)
Number of Leaves : 10
Size of the tree : 12
Time taken to build model: 0.02 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 165 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Total Number of Instances 165
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
1 0 1 1 1 1 senior
1 0 1 1 1 1 junior
Weighted Avg. 1 0 1 1 1 1
=== Confusion Matrix ===
a b <-- classified as
52 0 | a = senior
0 113 | b = junior
random forest
=== Classifier model (full training set) ===
Random forest of 10 trees, each constructed while considering 3 random features.
Out of bag error: 0
Time taken to build model: 0.04 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 165 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0.0029
Root mean squared error 0.0199
Relative absolute error 0.6805 %
Root relative squared error 4.2893 %
Total Number of Instances 165
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
1 0 1 1 1 1 senior
1 0 1 1 1 1 junior
Weighted Avg. 1 0 1 1 1 1
=== Confusion Matrix ===
a b <-- classified as
52 0 | a = senior
0 113 | b = junior
. El algoritmo id3 y j48 muestra un conteo de la relación entre las tuplas mostradas
anteriormente. J48 nos muestra el número de hojas que posee el árbol y el tamaño de este. Lo que no hace el
algoritmo id3. Ambos métodos muestran un porcentaje de error nulo. El algoritmo j48 es más preciso que el algoritmo randomForest debido a las diferencias de los
porcentajes de errores que muestra cada método.
b. Bayes Net
=== Classifier model (full training set) ===
Bayes Network Classifier
not using ADTree
#attributes=4 #classindex=1
Network structure (nodes followed by parents)
department(4): status
status(2):
age(6): status
salary(7): status
LogScore Bayes: -671.2178322167754
LogScore BDeu: -734.0007744477259
LogScore MDL: -741.4529682985715
LogScore ENTROPY: -667.416758927013
LogScore AIC: -696.416758927013
Time taken to build model: 0.01 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 161 97.5758 %
Incorrectly Classified Instances 4 2.4242 %
Kappa statistic 0.945
Mean absolute error 0.0273
Root mean squared error 0.0912
Relative absolute error 6.3016 %
Root relative squared error 19.6156 %
Total Number of Instances 165
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
1 0.035 0.929 1 0.963 1 senior
0.965 0 1 0.965 0.982 1 junior
Weighted Avg. 0.976 0.011 0.977 0.976 0.976 1
=== Confusion Matrix ===
a b <-- classified as
52 0 | a = senior
4 109 | b = junior
En este algoritmo de clasificación proporciona datos de los resultaos de los logaritmos referentes algunas variables como la entropía, los Bayes, BDeu, MDL, AIC. También se puede observas las instancias que se clasificaron correctamente y las que se clasificaron incorrectamente.
c. Multilayer perceptron
=== Classifier model (full training set) ===
Sigmoid Node 0
Inputs Weights
Threshold 2.7394370243645954
Node 2 -1.8508751465691542
Node 3 -2.354821458534509
Node 4 -1.4870386211949875
Node 5 -1.3229943341191277
Node 6 1.5278889125307347
Node 7 -2.2131181020972868
Node 8 -0.5914245531409845
Node 9 -2.0587896015776685
Node 10 2.270779949723552
Sigmoid Node 1
Inputs Weights
Threshold -2.7825884536066616
Node 2 1.85191579937515
Node 3 2.323626175638421
Node 4 1.477966841443966
Node 5 1.304880433881045
Node 6 -1.4594876918432313
Node 7 2.232715726449767
Node 8 0.6801335831028799
Node 9 2.051171802934638
Node 10 -2.297941872862322
Sigmoid Node 2
Inputs Weights
Threshold -0.07755042992069021
Attrib department=sales 0.22690884175757925
Attrib department=systems 0.1545650553116968
Attrib department=marketing -0.028916340821306043
Attrib department=secretary -0.2113862149709065
Attrib age=31-35 -0.02750617633512445
Attrib age=26-30 0.7708789960391705
Attrib age=21-25 0.9031969460307472
Attrib age=41-45 -0.24524874175681483
Attrib age=36-40 -0.9372910085639369
Attrib age=46-50 -0.3173972567786108
Attrib salary=46K-50k -1.2721688131980107
Attrib salary=26K-30K 0.596029591410492
Attrib salary=31K-35K 1.073614573206162
Attrib salary=46K-50K 0.10622100546611281
Attrib salary=66K-70K -0.8931249639552377
Attrib salary=41K-45k 0.9840722037777665
Attrib salary=36K-40K -0.32173827499151414
Sigmoid Node 3
Inputs Weights
Threshold -0.09160443154315269
Attrib department=sales 0.19633050850394618
Attrib department=systems 0.14350673033770345
Attrib department=marketing -0.021132001738419306
Attrib department=secretary -0.22057503111112434
Attrib age=31-35 0.01617542844596184
Attrib age=26-30 0.9011003972162315
Attrib age=21-25 1.0622717118790497
Attrib age=41-45 -0.2752012547415663
Attrib age=36-40 -1.084971185421792
Attrib age=46-50 -0.41474256360460926
Attrib salary=46K-50k -1.448381199332202
Attrib salary=26K-30K 0.6690483312961746
Attrib salary=31K-35K 1.2615937981296461
Attrib salary=46K-50K 0.11767253931489512
Attrib salary=66K-70K -1.0619222218399373
Attrib salary=41K-45k 1.1155548070459484
Attrib salary=36K-40K -0.4197992341286937
Sigmoid Node 4
Inputs Weights
Threshold -0.020643481201846912
Attrib department=sales 0.15069976965985996
Attrib department=systems 0.12379838338640137
Attrib department=marketing -0.0943505094026181
Attrib department=secretary -0.14725072714787346
Attrib age=31-35 0.010512235705337582
Attrib age=26-30 0.7498866595807208
Attrib age=21-25 0.7873805984482601
Attrib age=41-45 -0.1464959547224344
Attrib age=36-40 -0.7580525890150757
Attrib age=46-50 -0.2556075853274041
Attrib salary=46K-50k -1.0701392832809675
Attrib salary=26K-30K 0.5473953000497558
Attrib salary=31K-35K 0.992235520740865
Attrib salary=46K-50K 0.13349935484850442
Attrib salary=66K-70K -0.7751334030115394
Attrib salary=41K-45k 0.7376105027424174
Attrib salary=36K-40K -0.31165331988873113
Sigmoid Node 5
Inputs Weights
Threshold -0.10504649400363189
Attrib department=sales 0.1888580478883902
Attrib department=systems 0.15300803282270037
Attrib department=marketing -0.02098097843654761
Attrib department=secretary -0.1280029758315231
Attrib age=31-35 -0.017009595286887676
Attrib age=26-30 0.6482292510400863
Attrib age=21-25 0.6915153763340572
Attrib age=41-45 -0.19586354097092593
Attrib age=36-40 -0.7359220880188584
Attrib age=46-50 -0.29334007131163364
Attrib salary=46K-50k -1.0108613445175572
Attrib salary=26K-30K 0.5262317715574415
Attrib salary=31K-35K 0.9182024924100928
Attrib salary=46K-50K 0.1188196030377099
Attrib salary=66K-70K -0.670937831354511
Attrib salary=41K-45k 0.7010453789252467
Attrib salary=36K-40K -0.22576085497477247
Sigmoid Node 6
Inputs Weights
Threshold 0.09557922327970046
Attrib department=sales -0.018139053459302966
Attrib department=systems -0.10209290440196074
Attrib department=marketing -0.1328719579969402
Attrib department=secretary 0.028572139713019157
Attrib age=31-35 -0.0739998323577446
Attrib age=26-30 -0.3457681026476964
Attrib age=21-25 -0.46939913970448427
Attrib age=41-45 -0.042763003576478054
Attrib age=36-40 0.39852936666748573
Attrib age=46-50 0.06929754238286273
Attrib salary=46K-50k 0.3741253602958736
Attrib salary=26K-30K -0.24297175284159717
Attrib salary=31K-35K -0.4818491178517223
Attrib salary=46K-50K -0.10589748590134765
Attrib salary=66K-70K 0.3434715967606044
Attrib salary=41K-45k -0.5752678422949488
Attrib salary=36K-40K 0.045120149940549435
Sigmoid Node 7
Inputs Weights
Threshold -0.06160115651344392
Attrib department=sales 0.2134113259865331
Attrib department=systems 0.172272506671537
Attrib department=marketing -0.029607224853917643
Attrib department=secretary -0.24106617744959774
Attrib age=31-35 0.029991616105111238
Attrib age=26-30 0.8651532630509001
Attrib age=21-25 0.9906708595270766
Attrib age=41-45 -0.20575332717275158
Attrib age=36-40 -1.047645476787963
Attrib age=46-50 -0.3535834941223248
Attrib salary=46K-50k -1.4332784615759104
Attrib salary=26K-30K 0.6493573607930859
Attrib salary=31K-35K 1.1819371137499333
Attrib salary=46K-50K 0.12596930399229297
Attrib salary=66K-70K -1.064741079546573
Attrib salary=41K-45k 1.1431299292474153
Attrib salary=36K-40K -0.3944593594329362
Sigmoid Node 8
Inputs Weights
Threshold -0.0472094583119319
Attrib department=sales 0.15217654915023798
Attrib department=systems 0.11722287931424957
Attrib department=marketing -0.06807690704999361
Attrib department=secretary -0.024785605611675223
Attrib age=31-35 -0.08736108359059333
Attrib age=26-30 0.4530072183024529
Attrib age=21-25 0.5003986272198906
Attrib age=41-45 -0.03799085042854391
Attrib age=36-40 -0.3920240148745457
Attrib age=46-50 -0.1205586069197979
Attrib salary=46K-50k -0.605742149160759
Attrib salary=26K-30K 0.3802185575469776
Attrib salary=31K-35K 0.6066272566318488
Attrib salary=46K-50K 0.10660422983302109
Attrib salary=66K-70K -0.3320427568391321
Attrib salary=41K-45k 0.42400945288403125
Attrib salary=36K-40K -0.10285393387441728
Sigmoid Node 9
Inputs Weights
Threshold -0.08424420101244516
Attrib department=sales 0.18487395033283963
Attrib department=systems 0.14441985599862148
Attrib department=marketing -0.014746998161577226
Attrib department=secretary -0.17691559953689145
Attrib age=31-35 0.02784723014273439
Attrib age=26-30 0.8489877575101271
Attrib age=21-25 0.9380267979491645
Attrib age=41-45 -0.2608835537805671
Attrib age=36-40 -1.0141091744602124
Attrib age=46-50 -0.39718995571738797
Attrib salary=46K-50k -1.3435954859187489
Attrib salary=26K-30K 0.6385529218748827
Attrib salary=31K-35K 1.186853891625651
Attrib salary=46K-50K 0.14603610059077612
Attrib salary=66K-70K -0.9588501691188384
Attrib salary=41K-45k 1.0099505342882553
Attrib salary=36K-40K -0.362947971136462
Sigmoid Node 10
Inputs Weights
Threshold 0.09908187740254618
Attrib department=sales -0.05342457992841399
Attrib department=systems -0.1192290423182317
Attrib department=marketing -0.05846524907628152
Attrib department=secretary 0.14338709613151754
Attrib age=31-35 -0.09957529167902422
Attrib age=26-30 -0.5969152237408825
Attrib age=21-25 -0.7313360476645168
Attrib age=41-45 0.11828014170484517
Attrib age=36-40 0.7424640467459025
Attrib age=46-50 0.2309416143733404
Attrib salary=46K-50k 0.854775905742494
Attrib salary=26K-30K -0.4117530818438961
Attrib salary=31K-35K -0.7689602300444687
Attrib salary=46K-50K -0.09902009869134437
Attrib salary=66K-70K 0.7183111474183733
Attrib salary=41K-45k -0.8460358832180466
Attrib salary=36K-40K 0.22686248383226118
Class senior
Input
Node 0
Class junior
Input
Node 1
Time taken to build model: 2.09 seconds
Mediante este algoritmo podemos observar cada uno de los nodos con cada uno de sus atributos y los pesos que corresponde a cada nodo.
D. LibSVM tipo de Kernel: Linear
=== Classifier model (full training set) ===
LibSVM wrapper, original code by Yasser EL-Manzalawy (= WLSVM)
Time taken to build model: 0.14 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 125 75.7576 %
Incorrectly Classified Instances 40 24.2424 %
Kappa statistic 0.3429
Mean absolute error 0.2424
Root mean squared error 0.4924
Relative absolute error 56.0304 %
Root relative squared error 105.9548 %
Coverage of cases (0.95 level) 75.7576 %
Mean rel. region size (0.95 level) 50 %
Total Number of Instances 165
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0,346 0,053 0,750 0,346 0,474 0,386 0,647 0,466 senior
0,947 0,654 0,759 0,947 0,843 0,386 0,647 0,755 junior
Weighted Avg. 0,758 0,465 0,756 0,758 0,726 0,386 0,647 0,664
=== Confusion Matrix ===
a b <-- classified as
18 34 | a = senior
6 107 | b = junior
LibSVM Tipo de Kernel : Polinomial
=== Classifier model (full training set) ===
LibSVM wrapper, original code by Yasser EL-Manzalawy (= WLSVM)
Time taken to build model: 0.06 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 132 80 %
Incorrectly Classified Instances 33 20 %
Kappa statistic 0.4409
Mean absolute error 0.2
Root mean squared error 0.4472
Relative absolute error 46.2251 %
Root relative squared error 96.2382 %
Coverage of cases (0.95 level) 80 %
Mean rel. region size (0.95 level) 50 %
Total Number of Instances 165
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0,365 0,000 1,000 0,365 0,535 0,532 0,683 0,565 senior
1,000 0,635 0,774 1,000 0,873 0,532 0,683 0,774 junior
Weighted Avg. 0,800 0,435 0,845 0,800 0,766 0,532 0,683 0,708
=== Confusion Matrix ===
a b <-- classified as
19 33 | a = senior
0 113 | b = junior
LibSVM tipo de Kernel: Funcion Radial
=== Classifier model (full training set) ===
LibSVM wrapper, original code by Yasser EL-Manzalawy (= WLSVM)
Time taken to build model: 0.03 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 165 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Coverage of cases (0.95 level) 100 %
Mean rel. region size (0.95 level) 50 %
Total Number of Instances 165
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
1,000 0,000 1,000 1,000 1,000 1,000 1,000 1,000 senior
1,000 0,000 1,000 1,000 1,000 1,000 1,000 1,000 junior
Weighted Avg. 1,000 0,000 1,000 1,000 1,000 1,000 1,000 1,000
=== Confusion Matrix ===
a b <-- classified as
52 0 | a = senior
0 113 | b = junior
LibSVM tipo de Kernel: Sigmoid
=== Classifier model (full training set) ===
LibSVM wrapper, original code by Yasser EL-Manzalawy (= WLSVM)
Time taken to build model: 0.04 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 143 86.6667 %
Incorrectly Classified Instances 22 13.3333 %
Kappa statistic 0.6513
Mean absolute error 0.1333
Root mean squared error 0.3651
Relative absolute error 30.8167 %
Root relative squared error 78.5782 %
Coverage of cases (0.95 level) 86.6667 %
Mean rel. region size (0.95 level) 50 %
Total Number of Instances 165
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0,577 0,000 1,000 0,577 0,732 0,695 0,788 0,710 senior
1,000 0,423 0,837 1,000 0,911 0,695 0,788 0,837 junior
Weighted Avg. 0,867 0,290 0,888 0,867 0,855 0,695 0,788 0,797
=== Confusion Matrix ===
a b <-- classified as
30 22 | a = senior
0 113 | b = junior
Este tipo de algoritmo nos presenta el numero correcto de instancias que se clasificaron y las que no se clasificaron correctamente con sus respectivos porcentajes, nos muestra un error relativo, una cobertura de los casos y su porcentaje, la raíz del error cuadrado y su error relativo.
Se hace una tabla de valores con los detalles de la precisión por clase, con valores como tasa TP, tasa FP, Precisión, re-llamado, medida F, MCC con valores que varían entre 0 y 1.
Con cada tipo de kernel diferente el número de instancias correctas que se clasifican correcta e incorrectamente cambia por tal razón varían todos los datos y porcentajes.
Con el kernel de función de Base Radial el procentaje de instancias clasificadas incorrectamente fue 0 por tal razón los porcentajes de error, y la raíz del error cuadrado es 0 y la tada TP esta en 1, tada FP 1, precisión 1, recall 1, medida F 1 y MCC 1.
2.
Belief propagation
Es un algoritmo para realizar inferencias en modelos gráficos, como redes bayesianas y los campos aleatorios de Markov . Calculando la distribución marginal de cada nodo. Es utilizado en la inteligencia artificial y teoría de la información, se ha demostrado que es un algoritmo útil en aproximada de gráficos generales.