/TT17 1 Tf 273 541 0.24 -11.24 re )Tj ()Tj

0 Tc 0.0659 Tc (significantly worse on one \(LA\). /TT6 1 Tf 0.0305 Tw 0.6133 0 TD f 0 0 0 rg /TT12 1 Tf /TT4 1 Tf 0.7682 0 TD (\(ZO\). T* (intervals in which it is to be divided. Stone. /TT16 1 Tf 0 Tw 179 234 64 -38 re [(1.3)-1381.2(222.1)]TJ ()Tj -0.0219 Tc 0 Tc f 1.7923 0 TD (selecting multiway splits on numeric attributes. ( 1.51596: 1 \(3.0/1.1\))Tj f /TT4 1 Tf ( is the number of classes, and )Tj 276 286 0.24 -11.24 re /TT4 1 Tf 0.0833 Tc /ExtGState << (ad hoc)Tj (Learning)Tj 0.0072 Tc 0 Tw BT 440 637 0.24 -11.24 re Joint Conf. 0 Tw 0.0098 Tw 0.0107 Tw None of the four considered the use of a resampling)Tj 0.0533 Tw (from 1 up to the number of potential splitpoints )Tj f 0.1945 Tc (Table 2 also includes results for C4.5s pruned trees \(Quinlan, 1993\), which are)Tj 0.0634 Tw ET -0.0084 Tc T* 321 203 0.24 -14.24 re 5.9423 0 TD f In general, an error-based)]TJ )Tj 0.0202 Tw

/TT2 4 0 R (Machine Learning)Tj 0 Tc -15.9167 54.3333 TD 0.0313 Tw 493 245 0.24 -13.24 re f f /TT8 1 Tf /TT4 1 Tf ET T* 0.6133 0 TD 0.0583 Tw -4.5053 -1.0833 TD f 13 0 0 13 99 201 Tm [( will ultimately produce a)]TJ f 12 0 0 12 290 39 Tm

[(1.0)-2490.3(9.4)]TJ f (1R Classifier, )Tj (k-)Tj f

227.594 392.937 l 67 0 obj (| K )Tj 9.2156 0 TD 321 271 0.24 -17.24 re 187 206 l 0 G /TT12 1 Tf -0.0033 Tc -0.0015 Tc BT The gain ratio refers to use gender as the splitting feature rather than the birth month. f f << /TT12 1 Tf )Tj stream f (finding the optimal multiway split for a numeric attribute, given the number of)Tj f -15.9244 -1 TD

153 259 277.24 -0.24 re -0.0225 Tc 0.0645 Tc 2014. 0 -2.5833 TD /TT12 1 Tf

0.0198 Tw

1.7656 0 TD 190.608 206.46 193.52 204.444 193.52 201.96 c T* 0 Tw 5.9423 0 TD /TT11 29 0 R

(. -27.396 -1.0833 TD 1.9006 0 TD /TT6 1 Tf 0.0645 Tc f /TT12 1 Tf -0.0067 Tc -16.1667 54.3333 TD -0.0068 Tc 0.6133 0 TD Conf. -0.0292 Tc 5.3867 0 TD f f f 0 Tw T* 98 521 0.24 -11.24 re

-0.0228 Tc ( standard deviation\))Tj 276 200 0.24 -11.24 re 0 -1.3464 TD endobj (| | | | | | | Na )Tj 98 404 0.24 -27.24 re 0.6122 0 TD T* 0 Tc 12 0 0 12 143 689 Tm So we now ensure some information gain at the split and can therefore continue splitting to get down to our homogeneous nodes. For AN and LA the better results are probably due to the different handling)Tj ( 1.51969: 2 \(6.0/1.2\))Tj /TT4 1 Tf 441 301 0.24 -11.24 re ()Tj 273 531 0.24 -11.24 re -11.3544 -1 TD <<

()Tj (,0.36\) : 7 \(19/1\))Tj T* ET [( about the learning)]TJ 0 g /TT12 1 Tf /TT6 1 Tf 322 189 59.24 -0.24 re -6.2147 -1 TD /TT17 1 Tf /TT17 1 Tf /TT4 1 Tf 0.0049 Tc 99 377 108.24 -0.24 re 0.1034 Tw 273 577 0.24 -17.24 re 0 -1.0833 TD 438 190 0.24 -14.24 re (increasingly smaller regions. f /TT11 1 Tf 0.6133 0 TD We chose the information gain because we found in subsidiary experi-)Tj 0.6133 0 TD T* (its lack of stability: for some datasets it produces results with much lower accuracy. (there is no division into training and test sets. /GS1 6 0 R T* 321 235 9 -8 re (| | | Na > 13.75: 6 \(9.0/1.3\))Tj 0.0108 Tc -6.7856 -1 TD (S)Tj (| Ba )Tj 0.0091 Tc 0 -1.0833 TD /GS1 6 0 R -0.0048 Tc 0 Tc 0.0503 Tw 0.6133 0 TD 15.355 0 TD 9 0 0 9 112 224 Tm 0.0158 Tw /TT6 1 Tf 0.0732 Tw 0.6133 0 TD 30.8683 0 TD -0.0061 Tc 384 316 0.24 -11.24 re 0 Tc /TT6 11 0 R T* 0.0293 Tc )Tj 0.0633 Tc Misclassification Error: Does the split make the model more or less accurate? (8)Tj S on Artificial Neural Networks and Expert Systems)Tj 0 Tw (| Si > 70.16: 7 \(27.0/2.6\))Tj

(would detect both )Tj (depicted in Figure 2a, all successors are leaf nodes \(marked black\). f 0 g (| RI )Tj 0 Tc Although Fayyad and Irani used the)Tj 0.0741 Tc (Quinlan, R. \(1996\): Improved Use of Continuous Attributes in C4.5, )Tj -0.0292 Tc endobj -0.0047 Tc 7 0 0 7 183.75 347 Tm [(2.5)-1935.5(14.1)]TJ (situations like the one of Figure 2b. The three splitting criteria mentioned above are the basis for building a tree model. -0.0101 Tc /TT4 1 Tf f (of instances. f 0.0083 Tc f 0.0013 Tc 0.0189 Tc )Tj -0.0094 Tc T* For four of these datasets \(AD, ZO, AN, GL\) it performs)Tj /TT12 1 Tf 3.1953 0.7005 TD 361.52 213 m /StemH 139 -0.0274 Tc

3.9761 0 TD 0 Tc 388.25 186 l ( standard deviation\))Tj (the trees that they produce when run on standard datasets. f f 98 531 0.24 -11.24 re 0 Tw << 332 683 0.24 -1.24 re 493 281 0.24 -11.24 re (-way split. <0029>Tj (Al )Tj f 9 0 0 9 112 337 Tm ()Tj f (Ian H. Witten)Tj 0.0013 Tc /TT4 1 Tf [(G)64.5(2)-6046.7(21.9)]TJ 0.6133 0 TD 0.0809 Tw 384 362 1.24 -0.24 re This does not pose any problem for the evaluation of the criteria)Tj 0.6133 0 TD 0.6122 0 TD /TT2 1 Tf 0 Tc 384 286 0.24 -11.24 re <00b4>Tj T* 219 577 0.24 -17.24 re 9.3411 0 TD 246.5 203.937 l 217.96 201.96 m /TT12 1 Tf

T* 5.3852 0 TD /TT11 1 Tf W n (twoing rule for multiway splits. (1.4)Tj

(Then, this procedure is used recursively to build a decision tree for the original)Tj 330 276 0.24 -11.24 re f /TT4 1 Tf q q W n 276 362 0.24 -17.24 re /TT4 1 Tf BT 0.201 Tw [(Z)175.3(O)-5713.3(40.7)]TJ 274 683 58.24 -0.24 re /TT4 1 Tf [(Dataset)-8110.9( )]TJ

()Tj In any)Tj ()Tj 219 587 0.24 -11.24 re (automatically incorporates pre-pruning. 1.9245 0.6875 TD /TT12 1 Tf /TT12 1 Tf 98 700 0.24 -1.24 re f 0.003 Tc 5.3867 0 TD <0027>Tj /TT6 1 Tf -30.0022 -1.0833 TD >> << /TT2 1 Tf s 0.0036 Tc -0.0065 Tc /TT6 1 Tf T* (models are selected recursively for increasingly smaller subsets of the domain)Tj T* on)Tj

0 -1.0833 TD /TT12 1 Tf f ()Tj 5.3867 0 TD 0.0581 Tw /TT4 1 Tf (subsets. 0.6133 0 TD 0.4534 Tw -0.03 Tc 0.0651 -1.5417 TD (C)Tj -0.0012 Tc (standard binary decision trees. [(1.3)-1936(72.4)]TJ f 492 220 0.24 -11.24 re 0 Tc With more than two well-represented classes)Tj 494 521 0.24 -11.24 re 0 Tw /TT4 1 Tf /TT4 1 Tf (Numeric)Tj 0.0009 Tc 0.0146 Tc 0.0727 Tw (favorably with C4.5s pruned trees in terms of both accuracy and tree size. <0029>Tj T* 0.5235 0 TD /TT16 1 Tf /ExtGState << 28.3126 0 TD /TT12 1 Tf <0027>Tj In this case the)Tj -28.1503 -1.1 TD )]TJ 384.5 197.04 m 0 -1.8568 TD -0.0052 Tc endstream 7.1111 1.1111 TD (models. As discussed below, this can be done in time linear in )Tj 0.0645 Tc 0.0079 Tc << So this is starting off with the graph of the classification error function. 0.0011 Tc (nodes, is more perspicuous. /TT12 1 Tf 276 210 0.24 -11.24 re ()Tj (Proc. -0.0084 Tc 0 Tc 0.0794 Tw /TT4 1 Tf 0.5188 0 TD <0027>Tj 0.1374 Tw 0 Tw 0.4503 0 TD /TT12 1 Tf 7 0 0 7 139.9063 507.9687 Tm /TT16 1 Tf 1.7181 0 TD )-41.7(0)-5500.1(0)-6111.2(4)-5777.8(3)]TJ 9.3411 0 TD endstream -0.0246 Tc f f (with more than two classes, though their performance is comparable in two-class)Tj T* 5.3852 0 TD /BG2 /Default 1.7198 0 TD /TT17 1 Tf If there is another possible way of splitting, divide it by major, as follows: Splitting data using variable gender reduced SSE from 522.9 to 168; using variable major reduced SSE from 522.9 to 486.8.

0 Tw f

-15.1336 -1.0833 TD 0 Tc 1.7182 0 TD (First, we compare entropy-based with error-based multi-split selection, I-CV )Tj 0 Tc 163 700 0.24 -1.24 re -22.9065 -1.0833 TD (-way split of a numeric attribute is calculated)Tj /TT2 4 0 R (A better choice, and one that we investigate in this paper, is the information gain. 0.0645 Tc 273 597 0.24 -11.24 re (i)Tj /TT17 1 Tf /TT4 1 Tf T* BT T* 0.0136 Tc endstream /TT12 1 Tf T* [(0.9)-1935.5(15.6)]TJ (\) : 2 \(2/2\))Tj f When)Tj -0.0098 Tc 0.0685 Tw q -0.0114 Tc For datasets with just two classes the situation)Tj (event, we felt that GR captured the essence of their approach in a more natural way. >> /ExtGState << /TT2 1 Tf One possibility is the null model, which does not divide the domain at all. -0.0025 Tc 0.0633 Tc 0.9167 -1.0833 TD [( binary trees)]TJ 381 341 0.24 -11.24 re -2.3984 -1 TD [(9.8)-1936(69.4)]TJ f q 0.0645 Tc 0 Tw 0.0068 Tc 0.6133 0 TD 0.6122 0 TD /TT4 1 Tf T*

()Tj 0 Tc /Ascent 750 17 0 0 17 99 367 Tm )Tj 0.0354 Tw -0.0195 Tc ()Tj 494 637 0.24 -11.24 re f 193.52 201.96 m ()Tj <003c>Tj ET 10.8435 0 TD 321 404 0.24 -1.24 re 0 Tc 0.6887 0 TD [(L)153.4(A)-735.2(labor-negotiations)-3290.8(5)-68.5(7)]TJ [(1.0)-2489.7(4.6)]TJ T* 12 0 0 12 294.9063 335.4063 Tm /TT12 30 0 R /TT4 1 Tf /TT12 1 Tf 6.6817 0 TD (splits, we only present results for the optimal algorithm. (1)Tj 335 225 l 0 Tc For this reason, classifiers produced by different learning algorithms are)Tj 0.098 Tw /TT4 1 Tf 0 Tc /TT12 1 Tf /ProcSet [/PDF /Text ] 492 346 0.24 -11.24 re 179.781 546.187 m /TT12 1 Tf f BT 0.75 G 219 647 0.24 -11.24 re /TT12 1 Tf BT 321 377 0.24 -1.24 re (Proc. 9.3411 0 TD /TT17 1 Tf -0.0288 Tc 0.0117 Tc /TT17 1 Tf BT << T* 41 0 obj -0.0298 Tc 0.6133 0 TD 0 Tc 263 351 0.24 -11.24 re 0.0659 Tc 18.1111 0 TD 207.312 332.187 m (in the same way as the classification error, but with )Tj [(L)176.7(A)-6600.8(5.8)]TJ 0.6803 0 TD <0076>Tj f 0.6133 0 TD 0.0032 Tw ()Tj (out of the above discussion:)Tj -0.0298 Tc f In equation (11.1), \(\bar{y}_{1}\) and \(\bar{y}_{2}\) are the average of the sample in \(S_{1}\) and \(S_{2}\). (better results than discretization using an error-based scheme. f 9 0 0 9 112 234 Tm [(2.2)-1935.5(21.3)]TJ /TT16 1 Tf -21.9796 -1.0833 TD T* 0 J 0 j 1 w 10 M []0 d (decision trees with binary splits on numeric attributes. -0.0076 Tc /TT4 1 Tf 0.0615 Tw /TT4 1 Tf 6.3371 0 TD /TT4 1 Tf Q 5.9423 0 TD This is in fact the)Tj )Tj 0.0633 Tc -0.0277 Tc 0.0299 Tw (The methods we employ for generating and selecting the multiway splits are:)Tj 0.0514 Tw 222 220 0.24 -11.24 re f f 5.3852 0 TD 13.5572 0 TD ( \(-)Tj [(3.1)-1935.5(15.1)]TJ 4.0538 0 TD 0 0 0 rg 438 286 0.24 -11.24 re ( [1.5201,+)Tj ()Tj (multiway split is used directly to classify instances, and should therefore be)Tj f T* [(1.8)-1935.5(27.4)]TJ 0.015 Tw 0.9167 -1.0833 TD /Type /FontDescriptor (7)Tj ()Tj /TT12 1 Tf (i)Tj Section 3 explains the criteria we investigated in our)Tj (criterion may not find any clearly defined split-point and base its decision on minor)Tj 103 228 334 -33 re 330 362 0.24 -1.24 re However, how to determine the)Tj <0076>Tj 1.7648 0 TD -0.0309 Tc /GS1 gs f 0 Tc 5.3897 0 TD /TT12 1 Tf BT /TT4 1 Tf The situation in Figure 2b, where not all the successors are)Tj f 0 g 0.5911 0 TD ()Tj (,1.51707\) : 3 \(5/1\))Tj (solution of \(recursively\) rebuilding multiway splits using the error-based criterion if)Tj 0.004 Tc [(,)-18.2()-5.2(,)]TJ -0.0082 Tc /TT2 1 Tf 273 627 0.24 -11.24 re 0.6148 0 TD 207 291 0.24 -11.24 re 423.5 352.656 m /TT4 1 Tf 15.4113 0 TD 0.1113 Tc (We examine each in turn: an experimental comparison appears in the next section. (multiway tree simply by tracking the classification of an instance, starting at the)Tj 0 Tw f 0.6133 0 TD )Tj 2.5532 Tc (resampling technique \(cross-validation, holdout or bootstrap\) to estimate the)Tj 163 587 0.24 -11.24 re /HT /Default 0 Tc -0.0002 Tc 0 Tc 0 Tc T* 214.824 206.96 217.96 204.72 217.96 201.96 c -0.0293 Tc >> 9 0 0 9 107.9856 671 Tm 0.0335 Tc 0.6133 0 TD /TT12 1 Tf 5.3867 0 TD f 5.3867 0 TD f -0.0304 Tc 6.3431 0 TD

273 301 1.24 -0.24 re f 0 Tc 17 0 0 17 99 687 Tm 9.0078 0 TD

0.082 Tw f /TT12 1 Tf 0.0012 Tc 438 296 0.24 -11.24 re 0 Tw ()Tj T* f % ()Tj 0.001 Tc 0.0169 Tw /TT4 1 Tf 0.0018 Tw -0.0294 Tc /TT4 1 Tf 187.781 534.062 l 0 Tc 0 0 0 rg (intervals to create. 13.1201 0 TD f 260.156 550.531 l <0027>Tj (\(\))Tj f You will learn how to train predictive models to classify categorical outcomes and how to use error metrics to compare across different models. /TT17 1 Tf 381 245 0.24 -13.24 re /TT4 5 0 R This background will be useful when you are presented with decision tree ensembles in the next module. f ( 1.51707:)Tj 263 341 0.24 -11.24 re 0 Tc 493 233 0.24 -11.24 re /TT16 1 Tf <00b4>Tj

263.906 355.937 l 0.0006 Tc 0.0455 Tc 0.0019 Tc 0.6133 0 TD 0.6122 0 TD 0.0089 Tc -0.0292 Tc 0.0115 Tc BT (k)Tj ET 13 0 obj (-way split on a numeric attribute, for)Tj BT 263 361 0.24 -11.24 re BT Standard methods of choosing between competing)]TJ 0.0031 Tc 9.3367 0 TD (We ran a set of experiments to compare the four criteria described above for)Tj [(2.3)-1935.5(31.9)]TJ 0.0645 Tc (Datasets LY and PT were collected at the University Medical Center, Institute of)Tj BT <0029>Tj /Length 70 /GS1 6 0 R /TT16 1 Tf 440 607 0.24 -11.24 re 11.1261 0 TD Q 222 190 0.24 -14.24 re T*

13th Int. (future to take a different tack by investigating an extension of Breimans \(1984\))Tj f (| | | | | RI > 1.51969: 1 \(2.0/1.8\))Tj 0 -1.0833 TD CRC. 440 700 0.24 -1.24 re 386 521 0.24 -11.24 re (attributes rarely occur more than once in a path from root to leaf. (E-CV)Tj

-0.0022 Tc 6.2672 0 TD 0.0367 Tw 198.312 212.094 l 0.0645 Tc 0.0645 Tc >> 9 0 0 9 112 342 Tm 0 Tc BT -1.7496 -1.1 TD 39 0 obj << /Linearized 1 /O 41 /H [ 2602 837 ] /L 125305 /E 36863 /N 8 /T 124407 >> endobj xref 39 109 0000000016 00000 n 0000002529 00000 n 0000003439 00000 n 0000004000 00000 n 0000004242 00000 n 0000004485 00000 n 0000004710 00000 n 0000004877 00000 n 0000004973 00000 n 0000005431 00000 n 0000005779 00000 n 0000005918 00000 n 0000006171 00000 n 0000006266 00000 n 0000006317 00000 n 0000007702 00000 n 0000007731 00000 n 0000007809 00000 n 0000008084 00000 n 0000008319 00000 n 0000008444 00000 n 0000008619 00000 n 0000008641 00000 n 0000008738 00000 n 0000009884 00000 n 0000009906 00000 n 0000011095 00000 n 0000011117 00000 n 0000011324 00000 n 0000011543 00000 n 0000011582 00000 n 0000011633 00000 n 0000011701 00000 n 0000012871 00000 n 0000012893 00000 n 0000014103 00000 n 0000014125 00000 n 0000014395 00000 n 0000014668 00000 n 0000014949 00000 n 0000015221 00000 n 0000015542 00000 n 0000015788 00000 n 0000016060 00000 n 0000016361 00000 n 0000016682 00000 n 0000016965 00000 n 0000017246 00000 n 0000017445 00000 n 0000017614 00000 n 0000017665 00000 n 0000017839 00000 n 0000018059 00000 n 0000018327 00000 n 0000018553 00000 n 0000018898 00000 n 0000019177 00000 n 0000019464 00000 n 0000019759 00000 n 0000020014 00000 n 0000020281 00000 n 0000020478 00000 n 0000020788 00000 n 0000021098 00000 n 0000021364 00000 n 0000021573 00000 n 0000021959 00000 n 0000022219 00000 n 0000022401 00000 n 0000022710 00000 n 0000022889 00000 n 0000023202 00000 n 0000023271 00000 n 0000023461 00000 n 0000023513 00000 n 0000023777 00000 n 0000024023 00000 n 0000024397 00000 n 0000024707 00000 n 0000024980 00000 n 0000025245 00000 n 0000025383 00000 n 0000025573 00000 n 0000025801 00000 n 0000025850 00000 n 0000025902 00000 n 0000027038 00000 n 0000027061 00000 n 0000027259 00000 n 0000028466 00000 n 0000028569 00000 n 0000028767 00000 n 0000029974 00000 n 0000030077 00000 n 0000030180 00000 n 0000031511 00000 n 0000031607 00000 n 0000031805 00000 n 0000033012 00000 n 0000034113 00000 n 0000034135 00000 n 0000034967 00000 n 0000034989 00000 n 0000035068 00000 n 0000035999 00000 n 0000036252 00000 n 0000036459 00000 n 0000002602 00000 n 0000003417 00000 n trailer << /Size 148 /Info 38 0 R /Root 40 0 R /Prev 124397 /ID[<6e274438ffb4c5c62539819b404f99ca><5d357c277ceb9a28ced454e51630ce3b>] >> startxref 0 %%EOF 40 0 obj << /Type /Catalog /Pages 37 0 R /Metadata 36 0 R >> endobj 146 0 obj << /S 757 /Filter /FlateDecode /Length 147 0 R >> stream 0 Tw f 250 372 16 -18 re 2.0795 0 TD f /ExtGState << 178.531 549.281 m [(1.5)-1935.5(24.6)]TJ ()Tj f (criteria, employed in the same manner, produce lower average accuracy on datasets)Tj 494 683 0.24 -1.24 re -0.0297 Tc /OP false 7.5602 0 0 6.887 259 193.0002 Tm (standard binary decision trees. 16.3255 0 TD f 0.0006 Tc 0 Tc 0.0645 Tc

494 511 0.24 -14.24 re 0.6133 0 TD (basis of comparison with Fulton )Tj ( [0.535,0.57\) : 2 \(1/1\))Tj 0.0645 Tc (1)Tj BT ()Tj -5.6489 -1.0833 TD -18.2161 -1 TD (Two previous papers introduce methods of creating multiway trees. 348.52 215.484 351.432 217.5 355.02 217.5 c 5.3867 0 TD 32.2076 0 TD -3.3333 -1.8333 TD 0.0645 Tc 276 316 0.24 -11.24 re 0 Tc 0 Tc /TT6 1 Tf (An argument, illustrated in Figure 1, was presented above for the use of)Tj (estimated correctlythat is, using resampling techniquesit can simultaneously)Tj -0.0064 Tc 207.432 212.46 204.52 214.476 204.52 216.96 c /ExtGState << 358.824 218 361.96 215.76 361.96 213 c 16 0 obj <00a4>Tj <00a4>Tj /TT11 29 0 R 14.697 0 TD 0.1016 -1.4635 TD /TT16 1 Tf <0027>Tj 5.3852 0 TD T* 0.9245 0 TD 98 203 0.24 -14.24 re (| | | Na > 13.49: 2 \(7.0/2.4\))Tj 9.0078 0 TD <0044>Tj

( since that is not necessary to minimize the classification error. 0.6133 0 TD 0.0445 Tw ET (Dunedin, New Zealand, pp. 0 Tc (3)Tj 9.0068 0 TD 0 g /TT6 1 Tf 167 379 55.24 -0.24 re )Tj /Font << 493 255 0.24 -11.24 re And the right node is a larger subset with a bit of a better split, so a lower classification error. f 0 Tc (| | | | | RI > 1.51789: 2 \(4.0/1.2\))Tj -0.0017 Tc (Dataset)Tj 0 Tw 0 -0.9609 TD (1)Tj BT 5.1798 0 TD f ET 277.594 401.344 l 0.0396 Tw 0 Tc Since it works for any additive impurity)Tj (| RI = Unknown: 1 \(0\))Tj <002d>Tj 9.6745 0.1562 TD (| | Mg )Tj 381 404 0.24 -1.24 re ()Tj 0 Tc (We view each multiway decision as a model and a decision tree as a recursive)Tj /TT4 1 Tf 0 g /TT6 1 Tf 5.3852 0 TD 0.0645 Tc f /TT16 1 Tf f 9.0111 0 TD -0.0074 Tc endobj BT f /TT6 11 0 R (| | | | | | | Mg > 3.53: 2 \(2.0/1.0\))Tj [(0.6)-1934.9(49.6)]TJ 98 587 0.24 -11.24 re f -0.0286 Tc f 9.3392 0 TD [(2.8)-1935.5(12.5)]TJ 0.75 -1.0833 TD 5.6964 0 TD 166 306 0.24 -11.24 re (vs)Tj stream The equation is as simple as 1- the sum of the accuracy squared. /TT6 1 Tf /TT4 1 Tf f f f

[(they send more than 50% of training instances directly to leaf nodes. -2.3986 -1 TD 98 210 0.24 -11.24 re 192.25 319.187 m /TT17 1 Tf 166 379 0.24 -17.24 re 166 326 0.24 -11.24 re 99 700 64.24 -0.24 re 0 -1.8333 TD T* /TT12 1 Tf f 0 Tw ANOVA: Used in Regression Trees. T* 0.6133 0 TD BT -Use oversampling and undersampling as techniques to handle unbalanced classes in a data set 0 Tc /TT12 1 Tf 5.3867 0 TD

6.2971 0 TD (,)Tj (recursive model selection.

(many subintervals. (that only two of the four classes are well-represented \(one class is represented by)Tj (\(\))Tj 0.6133 0 TD /GS1 6 0 R >> f endstream ET 0.6133 0 TD 98 597 0.24 -11.24 re -11.7516 -1 TD 0 -1.0833 TD 0.0638 Tc 0.0833 Tc 0.0645 Tc 163 497 1.24 -0.24 re 0.0659 Tc 13.1279 0 TD f T* 0.6148 0 TD f << [(65.0)-4916.7(3)-27.8(2)-6027.8(6)-5750(6)]TJ 208.312 319.187 m 358.608 217.5 361.52 215.484 361.52 213 c /Font << 9 0 0 9 112 522 Tm /TT4 1 Tf 0 Tw -0.0016 Tc (. 273 700 1.24 -0.24 re 5.3867 0 TD 0.0912 Tw 441 321 0.24 -11.24 re [( and )]TJ Like the regression tree, the goal of the classification tree is to divide the data into smaller, more homogeneous groups. >> T* (Most conventional decision tree methods use some kind of )Tj 0.0836 Tw (significantly better \(PT, SY, AN, G2, LA\), on one dataset significantly worse)Tj /TT4 1 Tf (Error rate used as the splitting criterion; split selected using ten-fold cross-)Tj 10.8377 0 TD f ( [1.51707,1.51826\) : 1 \(19/1\))Tj ET 0 -1.5234 TD 0.0098 Tc If the surrogate splits cant beat the go with the majority rule, then its not considered as a surrogate. f /TT4 5 0 R 9.0111 0 TD ()Tj (| K )Tj 68 0 obj 0.0073 Tc For this reason it cannot be used for pre-pruning. 0.1106 Tw /GS1 gs f )]TJ /TT16 1 Tf (solution is to use the Laplace correction for the frequency counts. (| | | | | Fe > 0.12: 2 \(2.0/1.0\))Tj The third, MDL, was)Tj 0 Tc (significantly better and for one significantly worse \(LY\);)Tj endobj 386 683 0.24 -17.24 re ()Tj -0.0073 Tc /TT2 1 Tf 0.0645 Tc [(2.7)-1935.5(61.1)]TJ -18.2157 -1 TD f /TT12 1 Tf /GS1 gs 0.0645 Tc /TT6 1 Tf )Tj 0.6133 0 TD f 0.0969 Tw 0.0084 Tw /TT2 1 Tf 381 404 1.24 -0.24 re 441 341 0.24 -11.24 re 0 Tw 0.6133 0 TD 5.9408 0 TD )Tj 0.0046 Tw 0 Tc ()Tj f -0.9089 -1.5573 TD /TT6 1 Tf 222 362 0.24 -17.24 re /TT4 1 Tf >> 0.0426 Tc 0.0104 Tc 0.0321 Tw 0 Tc -0.009 Tc 27 0 obj 0 Tc (| RI )Tj 0 -1.0833 TD 19 0 obj [(1.9)-1936(44.0)]TJ 0 Tw <00b2>Tj 0 Tc f 0.6122 0 TD (code it with respect to the zero frequency assigned for the training set. 0 Tc 0.1224 Tw /Type /ExtGState 0 Tc -0.0011 Tc f <00b2>Tj 0.9167 -1.0833 TD /TT2 1 Tf 0.0377 Tw 0.6133 0 TD 5.3867 0 TD 1 i (accuracy of the resulting decision trees did not differ significantly. -0.0303 Tc /TT4 1 Tf 0 Tc 98 637 0.24 -11.24 re 0 Tc T* 0.0645 Tc W n f 361.96 213 m b0S8"tlQ4"(5w3g]LPuCq53}fYA#koOs@waK {jv{5. (att)Tj W n f 5.3867 0 TD 1.5 -1.8333 TD

263 203 0.24 -14.24 re T* %PDF-1.4 % Conference on Knowledge Discovery &)Tj 0 Tc (Both these procedures for generating multiway trees create full trees, and although)Tj 56 0 obj 0 0.5703 TD 384 230 0.24 -11.24 re 0 Tc -15.9313 -1 TD 330 306 0.24 -11.24 re s ( 1.51721:)Tj 0.0003 Tc f 330 379 0.24 -1.24 re )-73.8(6)-5532.2(9)-6143.3(6)-5809.9(2)]TJ 5.39 0 TD 198 266 168 -84 re T* 331 176 53.24 -0.24 re (| | | Na )Tj 0.0006 Tc >> 0.0645 Tc f -11.1418 -1.0833 TD -6.5722 -1 TD Here an entropy of 1 indicates that the purity of the node is the lowest, that is, each type takes up half of the samples. 0 Tc endobj (number of split points at each node in order to get a small and accurate tree is still)Tj ( subintervals. 14th Int. (4)Tj Based on SSE reduction, you should use gender to split the data. (the criterion for recursive model selection. 9 0 0 9 112 542 Tm /TT6 1 Tf /TT17 1 Tf (We noted earlier that multiway splits used for classifying instances directly should)Tj 274 301 246.24 -0.24 re -0.0018 Tc 5.3867 0 TD [(1.1)-1935.5(23.7)]TJ (information gain rather than classification error for generating splits. /TT17 1 Tf 9 0 0 9 103 688 Tm 9 0 0 9 112 297 Tm /TT12 1 Tf (With classes 1 and 3 combined and classes 4 to 7 deleted. 351.096 208 347.96 210.24 347.96 213 c

For a two-class problem, the Gini impurity for a given node is: It is easy to see that when the sample set is pure, one of the probability is 0 and the Gini score is the smallest. 0.4438 0 TD -0.0287 Tc BT -0.0026 Tc /TT16 1 Tf 0.0006 Tc )Tj -0.0238 Tc 166 276 0.24 -11.24 re 0.0645 Tc 5.3867 0 TD /TT12 1 Tf -0.0287 Tc <0076>Tj f f /ProcSet [/PDF /Text ] /TT4 1 Tf T* /TT4 1 Tf /TT4 1 Tf 163 683 0.24 -1.24 re 0 Tc The entropy of the split using variable gender can be calculated in three steps: So entropy decreases from 1 to 0.39 after the split and the IG for Gender is 0.61. 0 Tw T* (2.3)Tj /TT12 1 Tf 0.0038 Tc 193.844 397 m

Similarly, the entropy of a splitting node is the weighted average of the entropy of each child. 5.3867 0 TD (6.4)Tj f

/TT16 1 Tf BT 4, pp. 5.3867 0 TD T* 0.0629 Tc 0 Tc (,)Tj 0 Tw 0.057 Tc 331.562 397 m >> (=)Tj 0.0021 Tc [(2.0)-1935.5(28.3)]TJ /Length 8486 ( [0.775,0.905\) : 1 \(9/2\))Tj 2.8125 0.763 TD (majority class dictated by the training set on that subinterval, )Tj 384 176 1.24 -0.24 re <00b2>Tj 493 271 0.24 -17.24 re /TT17 1 Tf 17.2222 0 TD 0 -1.0833 TD 0.0638 Tc 386 577 0.24 -17.24 re (The aim of model selection is to estimate the true performance of a model in order)Tj (\(1995\) formulation of the description length for the selection of nominal attributes,)Tj f* 0.0259 Tc 166 210 0.24 -11.24 re It computes the optimal )Tj -0.0297 Tc /GS1 gs (k)Tj -31.1667 -3.0833 TD (k)Tj 0.3929 0 TD /TT4 1 Tf -9.0707 -1 TD 7 0 0 7 448.5625 338.4688 Tm -0.0292 Tc /TT12 1 Tf /TT12 1 Tf q 0.0019 Tc 0.6148 0 TD 5.3867 0 TD 0.6133 0 TD [(In this paper, we extend the idea of model selection to apply recursively at each)]TJ )Tj 0.064 Tc

/TT11 29 0 R -0.0009 Tc 1 i f 98 326 0.24 -11.24 re 2.7894 0 TD 0.1172 Tw 5.3867 0 TD /TT12 1 Tf <0076>Tj -0.0101 Tc /Length 4763 -0.0295 Tc ()Tj (model. (it will not choose)Tj f

)Tj -0.0295 Tc (Fayyad and Irani \(1993\) create multiway trees by devising a way of generating a)Tj 0 0 0 rg

-9.4504 -1 TD 166 379 1.24 -0.24 re ()Tj (Ba )Tj -0.0058 Tc 0.006 Tc Looking at the samples in the following three nodes, which one is the easiest to describe? 0 Tc /Font << ()Tj <0029>Tj So, as your subset increases impurity by 5%, your error goes down linearly by 5%. The previous two metrics are for classification tree. 98 321 0.24 -11.24 re T* -0.0069 Tc >> (Attributes in Decision Tree Learning, NeuroCOLT Technical Report, NC-TR-)Tj

f 5.39 0 TD /TT4 1 Tf /TT4 1 Tf f 5.8348 1.3348 TD f %PDF-1.3 f (decision trees with multiway splits. (,0.535\) : 1 \(4/1\))Tj 0 Tc 98 377 0.24 -17.24 re

f 98 377 0.24 -1.24 re 0.0453 Tc >> /TT17 1 Tf >> /Length 2911 /TT17 1 Tf 0 Tc 5.3867 0 TD (1,)Tj /TT6 1 Tf 0 Tw <0044>Tj (ci)Tj /TT16 1 Tf 0.0659 Tw 0.6148 0 TD )Tj 0 -1.9167 TD (accurate than those produced previously, and their performance is comparable with)Tj 0 Tc 311 245 m )-23.1(0)-4925.9(1)-37(3)-6037(7)-5759.2(2)]TJ -0.0295 Tc /TT4 1 Tf ()Tj (-way split and calculate an estimate of the resulting classification error using ten-)Tj 0.0039 Tc 21.0698 0 TD (the application of the dynamic programming method. 0 Tc 0 Tc 0.1014 Tw f ()Tj f /TT12 1 Tf 163 607 0.24 -11.24 re (one that achieves the same aim by penalizing the complexity of a modelas Fayyad)Tj /TT12 1 Tf 333.852 217.5 336.54 215.484 336.54 213 c 0.0345 Tw -22.159 -1.0833 TD

f -0.0223 Tc 381.588 201.54 384.5 199.524 384.5 197.04 c 385 379 53.24 -0.24 re )-69.4(1)-4972.2(1)-83.3(0)-5527.8(1)-83.3(5)-5750(7)]TJ 0.0002 Tc 0 Tc 0.0074 Tc -0.0003 Tc ET 0.0645 Tc When the new method is augmented with a post-processing step that)Tj /TT4 1 Tf

0.0053 Tc (the gain ratio criterion. T* I would like to give especial thanks to the instructor (the one in the videos) for his great job. /TT12 1 Tf 0.5192 0 TD 0 Tw 10 0 0 10 103.5876 219 Tm f 441 377 1.24 -0.24 re 0.064 Tc 0 Tc 98 700 0.24 -1.24 re >> [(1.4)-1935.5(22.6)]TJ 0.5195 0 TD (to provide a comparison with Fulton )Tj 359 209 l 0.0645 Tc 384 190 0.24 -14.24 re 0 -1.0833 TD f 0.0016 Tc <0044>Tj 12 0 0 12 349.9162 364 Tm

12 0 0 12 290 39 Tm 12 0 0 12 293 39 Tm (S)Tj f /Font << [(0.6)-2489.7(5.5)]TJ 99 185 143.24 -0.24 re (ZO\) while showing similarily high accuracy. 332 551 0.24 -11.24 re 0.0651 Tc /TT4 1 Tf It is widely used in classification tree. 492 256 0.24 -17.24 re 0 Tc 0 Tw 5.3852 0 TD 439.719 355.406 l 0 Tc (Estimation and Model Selection, )Tj /TT4 1 Tf 0 Tc f 0.0031 Tw 0.0081 Tc f 0 Tw T*

0 J 0 j 0.5 w 10 M []0 d (to the fact that multiway trees are larger than C4.5s trees for half the datasets. However, the recursive situation differs from the usual one, and may)Tj <0029>Tj 5.9009 0 TD f 332 577 0.24 -17.24 re 0.0134 Tc (that it always opts to expand a node, because the gain ratio for the no-split option is)Tj -31.1071 -1.0833 TD

[(4.7)-1935.5(17.6)]TJ f /TT12 1 Tf BT 332 627 0.24 -11.24 re 163 700 0.24 -17.24 re [(G)173.3(E)-5159.7(660.9)]TJ 386 541 0.24 -11.24 re 0.0149 Tc 0.2082 Tw 205 214 m /TT17 1 Tf (t)Tj <0044>Tj 12 0 0 12 192.3125 321 Tm 0.1149 Tw 0.0051 Tc 263 233 0.24 -11.24 re f 0.6133 0 TD /TT4 1 Tf f*

<00a5>Tj 0 Tc -0.0026 Tc BT /TT6 11 0 R 0.0659 Tc (First, build a decision tree using I-CV. [(L)175.3(Y)-5713.3(48.0)]TJ 0 Tw 12 0 0 12 99 617 Tm << f f /TT4 1 Tf 0.1079 Tw 5.3852 0 TD ET 3.4036 0.7786 TD 207 223 0.24 -11.24 re (Attribute 1 )Tj 2130. Q 0.0014 Tc 12th Int. 9.3411 0 TD [(1.6)-1935.5(23.1)]TJ 0.0033 Tc ()Tj 204.52 204.444 207.432 206.46 211.02 206.46 c 29.1103 0 TD