MMIS692 Quality Control¶
Items are produced in batches. A batch with more than 1 defective item is deemed a poor quality batch.
We shall develop a model to predict batch quality based on measurements from the 30 quality control tests. Our goal is to predict quality reliably based on as few rules as possible. Any test that is not used for determining batch quality may be eliminated, thereby reducing quality control costs.
For this classification task we shall use Scikit-Learn DecisionTreeClassifier:
Mount Drive¶
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Import libraries¶
! pip install pandasql # to use SQL
import pandasql # for SQL queries
import pandas as pd # for data analysis
from sklearn.tree import DecisionTreeClassifier, plot_tree # for decision tree
from sklearn.model_selection import cross_val_score # for cross validation
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report # to evaluate model
import matplotlib.pyplot as plt # for plots
Requirement already satisfied: pandasql in /usr/local/lib/python3.12/dist-packages (0.7.3) Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (from pandasql) (2.0.2) Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (from pandasql) (2.2.2) Requirement already satisfied: sqlalchemy in /usr/local/lib/python3.12/dist-packages (from pandasql) (2.0.44) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas->pandasql) (2.9.0.post0) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas->pandasql) (2025.2) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas->pandasql) (2025.2) Requirement already satisfied: greenlet>=1 in /usr/local/lib/python3.12/dist-packages (from sqlalchemy->pandasql) (3.2.4) Requirement already satisfied: typing-extensions>=4.6.0 in /usr/local/lib/python3.12/dist-packages (from sqlalchemy->pandasql) (4.15.0) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas->pandasql) (1.17.0)
Get data¶
We shall extract CSV files from the zip file with data and read the data we need for this task into pandas dataframes:
- defective: with data from 'quality_control.defective.csv'
- measurements: with data from 'quality_control.measurements.csv'
- new_batches: with data from 'quality_control.new_batches.csv'
! unzip '/content/drive/MyDrive/NSU - MS - Data Analytics and AI/MMIS 0692 - Data Analytics and AI Project/Task 2/data.MMIS692.Fall2025.zip'
defective = pd.read_csv('quality_control.defective.csv')
measurements = pd.read_csv('quality_control.measurements.csv')
new_batches = pd.read_csv('quality_control.new_batches.csv')
! rm *.csv
Archive: /content/drive/MyDrive/NSU - MS - Data Analytics and AI/MMIS 0692 - Data Analytics and AI Project/Task 2/data.MMIS692.Fall2025.zip inflating: quality_control.new_batches.csv inflating: quality_control.measurements.csv inflating: quality_control.defective.csv inflating: production_planning.resource.csv inflating: production_planning.product.csv inflating: customer_segmentation.unlabeled.csv inflating: customer_segmentation.valid.csv inflating: customer_segmentation.train.csv
Dataframe defective contains the batchID of batches in which defective items were produced.
print("Dataframe defective contains %d rows and %d columns" %defective.shape)
defective.head(10) # display first 10 rows of defective
Dataframe defective contains 11130 rows and 2 columns
| serial_nbr | batchID | |
|---|---|---|
| 0 | S007B022832 | B022832 |
| 1 | S001B048935 | B048935 |
| 2 | S002B021675 | B021675 |
| 3 | S002B020304 | B020304 |
| 4 | S000B009453 | B009453 |
| 5 | S002B001585 | B001585 |
| 6 | S002B003338 | B003338 |
| 7 | S007B023528 | B023528 |
| 8 | S002B038837 | B038837 |
| 9 | S005B014832 | B014832 |
Dataframe measurements contains measurements on 30 quality control tests for each batch
print("Dataframe measurements contains %d rows and %d columns" %measurements.shape)
measurements.head() # display first 5 rows of measurements
Dataframe measurements contains 5000 rows and 31 columns
| batchID | test1 | test2 | test3 | test4 | test5 | test6 | test7 | test8 | test9 | ... | test21 | test22 | test23 | test24 | test25 | test26 | test27 | test28 | test29 | test30 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | B049738 | 396 | 460 | 816 | 125 | 79 | 513 | 576 | 259 | 369 | ... | 19 | 918 | 692 | 958 | 45 | 381 | 758 | 466 | 478 | 777 |
| 1 | B030478 | 792 | 474 | 779 | 109 | 843 | 985 | 946 | 644 | 839 | ... | 948 | 612 | 207 | 360 | 663 | 570 | 398 | 49 | 335 | 919 |
| 2 | B023741 | 715 | 929 | 462 | 718 | 977 | 271 | 479 | 785 | 231 | ... | 168 | 995 | 221 | 456 | 489 | 120 | 530 | 165 | 397 | 203 |
| 3 | B034393 | 92 | 39 | 359 | 625 | 354 | 292 | 672 | 749 | 119 | ... | 277 | 224 | 984 | 233 | 541 | 690 | 480 | 444 | 2 | 387 |
| 4 | B041622 | 93 | 804 | 641 | 580 | 771 | 231 | 163 | 922 | 800 | ... | 483 | 319 | 322 | 249 | 421 | 439 | 993 | 807 | 634 | 205 |
5 rows × 31 columns
Label batches¶
We consider a batch to be of poor quality if it contains more than one defective item; otherwise, the batch quality is considered good.
The dataframe defective contains batchID for batches in which defective items were produced.
We shall identify batches with more than one defective items using SQL and pandas, and confirm that the set of poor quality batches found using both methods are identical.
We shall then label each batch in the dataframe measurements as either a poor quality batch or a good quality batch.
Using SQL¶
SQL queries may be executed using pandasql; each dataframe is a table.
# define a string q with the SQL query to identify poor quality bacthers
q = """
select batchID
from defective
group by batchID
having count(*) > 1;"""
poor_batches_SQL = pandasql.sqldf(q).values # evaluate query to get poor quality batches
print("Number of poor quality batches = %d" %len(poor_batches_SQL))
# assign labels to each batch in the dataframe measurements
labels = ['poor' if b in poor_batches_SQL else 'good'
for b in measurements.batchID] # assign labels for batches
# create dataframe with quality of batches labeled 'poor' or 'good'
quality_SQL = pd.DataFrame(zip(measurements.batchID, labels), columns=['batchID', 'quality'])
quality_SQL.head(10).to_csv('quality_SQL.csv', index=False) # save results for first 10 batches
print("Quality of first 10 batches")
quality_SQL.head(10)
Number of poor quality batches = 2224 Quality of first 10 batches
| batchID | quality | |
|---|---|---|
| 0 | B049738 | poor |
| 1 | B030478 | good |
| 2 | B023741 | good |
| 3 | B034393 | good |
| 4 | B041622 | good |
| 5 | B016158 | poor |
| 6 | B005914 | good |
| 7 | B000418 | poor |
| 8 | B021404 | poor |
| 9 | B024090 | poor |
Using pandas¶
g = defective.groupby('batchID') # group defective items by batches
poor_batches = defective.groupby('batchID').filter(lambda x: len(x)>1).batchID.unique() # batches with more than 1 defective items
print("Number of poor quality batches = %d" %len(poor_batches))
labels = ['poor' if b in poor_batches else 'good' for b in measurements.batchID] # labels for batches
# create dataframe with quality of batches labeled 'poor' or 'good'
quality = pd.DataFrame(zip(measurements.batchID, labels), columns=['batchID', 'quality'])
quality.head(10).to_csv('quality.csv', index=False)
print("Quality of first 10 batches")
quality.head(10)
Number of poor quality batches = 2224 Quality of first 10 batches
| batchID | quality | |
|---|---|---|
| 0 | B049738 | poor |
| 1 | B030478 | good |
| 2 | B023741 | good |
| 3 | B034393 | good |
| 4 | B041622 | good |
| 5 | B016158 | poor |
| 6 | B005914 | good |
| 7 | B000418 | poor |
| 8 | B021404 | poor |
| 9 | B024090 | poor |
Confirm that quality results are identical¶
all(quality==quality_SQL)
True
Reserve data for validation¶
We shall use the first 4000 samples for training and reserve the last 1000 samples for validation.
X_train, y_train: contain input feature values and output labels for training data.
X_test, y_test: contain input feature values and output labels for validation data.
# all but the first column in dataframe measurements are input features
features = list(measurements)[1:] # column headers for input features
print("Number of input features = %d" %len(features))
print(', '.join(features)) # display features
k = 4000 # use first k samples for training
print("First %d samples used for training" %k)
X_train, y_train = measurements[features][:k], quality.quality[:k] # input, label for training
X_test, y_test = measurements[features][k:], quality.quality[k:] # input, label for validation
Number of input features = 30 test1, test2, test3, test4, test5, test6, test7, test8, test9, test10, test11, test12, test13, test14, test15, test16, test17, test18, test19, test20, test21, test22, test23, test24, test25, test26, test27, test28, test29, test30 First 4000 samples used for training
Cross validation accuracy¶
We shall use Scikit-Learn DecisionTreeClassifier to predict batch quality based on measurements from the 30 tests.
Since our goal is to achieve satisfactory classification using as few rules as possible, we shall investigate how classification accuracy varies as we increase the maximum number of leaf nodes (rules)in the decision tree.
We shall use 5-fold cross-validation with the training samples for this investigation.
OVER FITTING
for k in range(2, 20): # maximum number of leaf-nodes (rules)
model = DecisionTreeClassifier(max_leaf_nodes=k) # tree with k rules
acc = cross_val_score(model, X_train, y_train, cv=5).mean() # mean accuracy in 5-fold cross-validation
print("Cross validation accuracy with %d rules = %4.4f" %(k, acc))
Cross validation accuracy with 2 rules = 0.6165 Cross validation accuracy with 3 rules = 0.7688 Cross validation accuracy with 4 rules = 0.8820 Cross validation accuracy with 5 rules = 0.9615 Cross validation accuracy with 6 rules = 0.9613 Cross validation accuracy with 7 rules = 0.9605 Cross validation accuracy with 8 rules = 0.9603 Cross validation accuracy with 9 rules = 0.9585 Cross validation accuracy with 10 rules = 0.9580 Cross validation accuracy with 11 rules = 0.9587 Cross validation accuracy with 12 rules = 0.9577 Cross validation accuracy with 13 rules = 0.9583 Cross validation accuracy with 14 rules = 0.9580 Cross validation accuracy with 15 rules = 0.9565 Cross validation accuracy with 16 rules = 0.9570 Cross validation accuracy with 17 rules = 0.9557 Cross validation accuracy with 18 rules = 0.9555 Cross validation accuracy with 19 rules = 0.9553
Choose a good model¶
Choose the number of rules that results in satisfactory cross-validation accuracy
k = 5 # chosen number of leaf nodes
model = DecisionTreeClassifier(max_leaf_nodes=k) # create model
model.fit(X_train, y_train) # train model
plt.figure(figsize=(10, 10)) # size of figure to be displayed
CL = model.classes_ # class labels (good / poor)
plot_tree(model, feature_names=features, class_names=CL,
filled=True, rounded=True, impurity=False) # plot tree
plt.savefig("decision_tree.png") # save as PNG file
plt.show() # show tree
Results with training data¶
pred_train = model.predict(X_train) # predicted labels for training samples
acc_train = accuracy_score(y_train, pred_train) # training accuracy
print("Training accuracy = %4.4f" %(acc_train))
# confusion matrix
cm = pd.DataFrame(confusion_matrix(y_train, pred_train), columns=[f'Predicted {c}' for c in CL], index=CL)
cm.to_csv('training_confusion_matrix.csv') # save confusion matrix
cm
Training accuracy = 0.9637
| Predicted good | Predicted poor | |
|---|---|---|
| good | 2157 | 68 |
| poor | 77 | 1698 |
print(classification_report(y_train, pred_train, digits=4))
precision recall f1-score support
good 0.9655 0.9694 0.9675 2225
poor 0.9615 0.9566 0.9591 1775
accuracy 0.9637 4000
macro avg 0.9635 0.9630 0.9633 4000
weighted avg 0.9637 0.9637 0.9637 4000
Results with validation data¶
pred_test = model.predict(X_test) # predicted labels for validation samples
acc_test = accuracy_score(y_test, pred_test) # validation accuracy
print("Validation accuracy = %4.4f" %(acc_test))
cm = pd.DataFrame(confusion_matrix(y_test, pred_test), columns=[f'Predicted {c}' for c in CL], index=CL)
cm.to_csv('validation_confusion_matrix.csv')
cm
Validation accuracy = 0.9700
| Predicted good | Predicted poor | |
|---|---|---|
| good | 535 | 16 |
| poor | 14 | 435 |
print(classification_report(y_test, pred_test, digits=4))
precision recall f1-score support
good 0.9745 0.9710 0.9727 551
poor 0.9645 0.9688 0.9667 449
accuracy 0.9700 1000
macro avg 0.9695 0.9699 0.9697 1000
weighted avg 0.9700 0.9700 0.9700 1000
new_batches.head()
| batchID | test1 | test2 | test3 | test4 | test5 | test6 | test7 | test8 | test9 | ... | test21 | test22 | test23 | test24 | test25 | test26 | test27 | test28 | test29 | test30 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | B050257 | 933 | 898 | 591 | 719 | 639 | 522 | 321 | 420 | 245 | ... | 246 | 365 | 13 | 53 | 670 | 691 | 594 | 661 | 101 | 284 |
| 1 | B017350 | 346 | 888 | 469 | 821 | 443 | 351 | 839 | 3 | 990 | ... | 597 | 427 | 131 | 949 | 694 | 118 | 350 | 683 | 369 | 415 |
| 2 | B013761 | 479 | 860 | 229 | 357 | 560 | 708 | 153 | 654 | 552 | ... | 871 | 609 | 147 | 804 | 21 | 68 | 316 | 627 | 877 | 461 |
| 3 | B038024 | 308 | 711 | 786 | 327 | 672 | 767 | 715 | 155 | 260 | ... | 545 | 429 | 418 | 264 | 798 | 860 | 732 | 764 | 506 | 390 |
| 4 | B015511 | 302 | 383 | 590 | 211 | 746 | 224 | 833 | 481 | 286 | ... | 167 | 117 | 211 | 348 | 225 | 611 | 191 | 462 | 964 | 314 |
5 rows × 31 columns
Predict quality for unlabeled samples¶
new_batches
| batchID | test1 | test2 | test3 | test4 | test5 | test6 | test7 | test8 | test9 | ... | test21 | test22 | test23 | test24 | test25 | test26 | test27 | test28 | test29 | test30 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | B050257 | 933 | 898 | 591 | 719 | 639 | 522 | 321 | 420 | 245 | ... | 246 | 365 | 13 | 53 | 670 | 691 | 594 | 661 | 101 | 284 |
| 1 | B017350 | 346 | 888 | 469 | 821 | 443 | 351 | 839 | 3 | 990 | ... | 597 | 427 | 131 | 949 | 694 | 118 | 350 | 683 | 369 | 415 |
| 2 | B013761 | 479 | 860 | 229 | 357 | 560 | 708 | 153 | 654 | 552 | ... | 871 | 609 | 147 | 804 | 21 | 68 | 316 | 627 | 877 | 461 |
| 3 | B038024 | 308 | 711 | 786 | 327 | 672 | 767 | 715 | 155 | 260 | ... | 545 | 429 | 418 | 264 | 798 | 860 | 732 | 764 | 506 | 390 |
| 4 | B015511 | 302 | 383 | 590 | 211 | 746 | 224 | 833 | 481 | 286 | ... | 167 | 117 | 211 | 348 | 225 | 611 | 191 | 462 | 964 | 314 |
| 5 | B049530 | 81 | 47 | 756 | 751 | 626 | 964 | 585 | 506 | 104 | ... | 153 | 388 | 590 | 118 | 200 | 262 | 590 | 338 | 184 | 443 |
| 6 | B039271 | 95 | 950 | 247 | 938 | 373 | 552 | 306 | 268 | 643 | ... | 784 | 513 | 56 | 5 | 66 | 797 | 518 | 364 | 492 | 309 |
| 7 | B047406 | 134 | 487 | 555 | 238 | 899 | 822 | 285 | 754 | 672 | ... | 157 | 134 | 21 | 686 | 99 | 532 | 104 | 344 | 35 | 487 |
| 8 | B042002 | 134 | 878 | 598 | 699 | 34 | 46 | 135 | 684 | 885 | ... | 784 | 257 | 736 | 711 | 870 | 630 | 418 | 233 | 20 | 202 |
| 9 | B034981 | 975 | 723 | 52 | 256 | 986 | 63 | 307 | 645 | 116 | ... | 394 | 467 | 103 | 464 | 59 | 584 | 117 | 611 | 769 | 545 |
| 10 | B017251 | 883 | 543 | 944 | 392 | 39 | 531 | 119 | 416 | 310 | ... | 179 | 85 | 591 | 553 | 728 | 380 | 542 | 723 | 419 | 950 |
| 11 | B008720 | 401 | 967 | 526 | 74 | 556 | 287 | 956 | 277 | 19 | ... | 581 | 950 | 371 | 632 | 423 | 843 | 998 | 0 | 411 | 284 |
| 12 | B018975 | 206 | 887 | 290 | 235 | 647 | 40 | 240 | 16 | 834 | ... | 589 | 400 | 918 | 805 | 609 | 926 | 629 | 79 | 444 | 561 |
| 13 | B044633 | 953 | 821 | 562 | 135 | 118 | 429 | 982 | 722 | 62 | ... | 151 | 608 | 546 | 485 | 766 | 935 | 360 | 469 | 888 | 156 |
| 14 | B031398 | 330 | 904 | 490 | 459 | 74 | 95 | 556 | 988 | 389 | ... | 588 | 707 | 153 | 479 | 571 | 784 | 332 | 724 | 993 | 375 |
| 15 | B036850 | 628 | 998 | 104 | 193 | 989 | 135 | 478 | 2 | 969 | ... | 38 | 910 | 36 | 80 | 929 | 612 | 940 | 10 | 36 | 227 |
| 16 | B045777 | 564 | 518 | 795 | 769 | 208 | 128 | 735 | 240 | 762 | ... | 871 | 476 | 400 | 441 | 881 | 158 | 753 | 234 | 613 | 304 |
| 17 | B007795 | 528 | 252 | 254 | 663 | 389 | 420 | 210 | 549 | 418 | ... | 157 | 446 | 965 | 128 | 83 | 142 | 4 | 163 | 780 | 633 |
| 18 | B014767 | 3 | 571 | 471 | 655 | 787 | 427 | 682 | 600 | 979 | ... | 966 | 791 | 469 | 884 | 507 | 541 | 465 | 843 | 310 | 586 |
| 19 | B048223 | 679 | 578 | 821 | 17 | 165 | 766 | 439 | 415 | 397 | ... | 955 | 455 | 629 | 788 | 751 | 468 | 979 | 241 | 318 | 153 |
| 20 | B035035 | 380 | 163 | 993 | 394 | 868 | 106 | 147 | 612 | 847 | ... | 886 | 34 | 37 | 971 | 747 | 922 | 613 | 468 | 203 | 506 |
| 21 | B014448 | 840 | 187 | 18 | 774 | 435 | 289 | 822 | 231 | 234 | ... | 899 | 187 | 226 | 206 | 924 | 124 | 523 | 535 | 333 | 184 |
| 22 | B012586 | 58 | 203 | 558 | 812 | 841 | 652 | 242 | 21 | 805 | ... | 232 | 499 | 209 | 662 | 922 | 86 | 295 | 508 | 187 | 97 |
| 23 | B037453 | 279 | 962 | 704 | 560 | 633 | 948 | 390 | 902 | 44 | ... | 728 | 155 | 344 | 287 | 993 | 926 | 51 | 306 | 537 | 273 |
| 24 | B041726 | 897 | 912 | 572 | 21 | 671 | 739 | 255 | 859 | 547 | ... | 376 | 461 | 129 | 720 | 138 | 365 | 834 | 911 | 471 | 913 |
| 25 | B003675 | 76 | 731 | 77 | 597 | 189 | 137 | 856 | 121 | 113 | ... | 327 | 55 | 539 | 597 | 19 | 292 | 698 | 560 | 365 | 602 |
| 26 | B018601 | 938 | 569 | 488 | 871 | 368 | 20 | 608 | 783 | 868 | ... | 686 | 830 | 583 | 84 | 87 | 334 | 371 | 741 | 773 | 421 |
| 27 | B034924 | 455 | 421 | 944 | 127 | 461 | 842 | 125 | 554 | 521 | ... | 37 | 740 | 480 | 615 | 111 | 926 | 444 | 306 | 242 | 722 |
| 28 | B038527 | 171 | 238 | 586 | 189 | 381 | 333 | 6 | 957 | 865 | ... | 590 | 510 | 560 | 876 | 983 | 589 | 811 | 25 | 413 | 526 |
| 29 | B045822 | 333 | 391 | 426 | 274 | 632 | 575 | 218 | 39 | 283 | ... | 331 | 326 | 290 | 710 | 288 | 318 | 552 | 830 | 911 | 44 |
| 30 | B005577 | 373 | 121 | 97 | 371 | 940 | 71 | 736 | 768 | 317 | ... | 496 | 160 | 187 | 613 | 778 | 112 | 418 | 175 | 571 | 871 |
| 31 | B027718 | 107 | 759 | 601 | 78 | 196 | 23 | 860 | 480 | 738 | ... | 818 | 130 | 143 | 216 | 666 | 395 | 857 | 799 | 793 | 828 |
| 32 | B004606 | 378 | 418 | 674 | 12 | 163 | 508 | 289 | 224 | 713 | ... | 476 | 253 | 587 | 232 | 804 | 664 | 219 | 49 | 804 | 647 |
| 33 | B038664 | 995 | 804 | 512 | 815 | 774 | 980 | 659 | 145 | 910 | ... | 415 | 116 | 19 | 598 | 806 | 105 | 225 | 612 | 547 | 399 |
| 34 | B043264 | 195 | 661 | 468 | 167 | 380 | 746 | 655 | 904 | 65 | ... | 422 | 445 | 19 | 223 | 78 | 960 | 266 | 33 | 895 | 720 |
| 35 | B027719 | 40 | 545 | 785 | 445 | 982 | 446 | 298 | 610 | 939 | ... | 592 | 746 | 339 | 300 | 604 | 859 | 135 | 866 | 911 | 720 |
| 36 | B042165 | 197 | 424 | 39 | 506 | 516 | 236 | 417 | 270 | 20 | ... | 522 | 76 | 786 | 143 | 871 | 426 | 435 | 997 | 315 | 486 |
| 37 | B019164 | 853 | 547 | 100 | 940 | 208 | 807 | 893 | 750 | 122 | ... | 558 | 389 | 382 | 262 | 261 | 413 | 42 | 836 | 662 | 933 |
| 38 | B048145 | 366 | 815 | 838 | 166 | 334 | 648 | 859 | 310 | 489 | ... | 176 | 915 | 363 | 81 | 936 | 793 | 937 | 124 | 388 | 994 |
| 39 | B036549 | 65 | 584 | 685 | 737 | 420 | 490 | 149 | 561 | 61 | ... | 429 | 239 | 634 | 595 | 414 | 785 | 41 | 753 | 7 | 804 |
40 rows × 31 columns
new_labels = pd.DataFrame()
new_labels['batchID'] = new_batches.batchID
new_labels['quality'] = model.predict(new_batches[features])
new_labels.to_csv("new_labels.csv", index=False)
new_labels
| batchID | quality | |
|---|---|---|
| 0 | B050257 | good |
| 1 | B017350 | good |
| 2 | B013761 | good |
| 3 | B038024 | poor |
| 4 | B015511 | good |
| 5 | B049530 | good |
| 6 | B039271 | good |
| 7 | B047406 | poor |
| 8 | B042002 | good |
| 9 | B034981 | good |
| 10 | B017251 | good |
| 11 | B008720 | good |
| 12 | B018975 | good |
| 13 | B044633 | good |
| 14 | B031398 | good |
| 15 | B036850 | good |
| 16 | B045777 | good |
| 17 | B007795 | good |
| 18 | B014767 | good |
| 19 | B048223 | good |
| 20 | B035035 | good |
| 21 | B014448 | good |
| 22 | B012586 | poor |
| 23 | B037453 | good |
| 24 | B041726 | good |
| 25 | B003675 | good |
| 26 | B018601 | poor |
| 27 | B034924 | poor |
| 28 | B038527 | poor |
| 29 | B045822 | poor |
| 30 | B005577 | good |
| 31 | B027718 | poor |
| 32 | B004606 | poor |
| 33 | B038664 | poor |
| 34 | B043264 | poor |
| 35 | B027719 | poor |
| 36 | B042165 | poor |
| 37 | B019164 | poor |
| 38 | B048145 | poor |
| 39 | B036549 | poor |
Feature importances¶
plt.figure(figsize=(16, 4)) # size of figure to be displayed
plt.bar(features, model.feature_importances_)
plt.xticks(rotation = 90, fontsize=16)
plt.title('Feature importances', fontsize=24)
plt.show()