One-hotベクトルを整数値のベクトルにnumpyでデコードする

どうもたっきーです．これはメモ．

はじめに
やりかた
まとめ
感想

はじめに

　Kerasではkeras.utils.np_utils の to_categorical()という関数を用いれば簡単にOne-hotベクトルが生成できる．

これで多クラス分類の場合は、categorical_crossentropyを計算できる．

しかし，これを使って混合行列とかを表示したいとき，sklearn.metrics.confusion_matrix()に渡す値がOne-hotベクトルだと無理っぽくてラベルの整数値の配列を渡さなければいけないらしい．

（　

つまり，

array([[ 1.,  0.,  0., ...,  0.,  0.,  0.],

       [ 1.,  0.,  0., ...,  0.,  0.,  0.],

       [ 1.,  0.,  0., ...,  0.,  0.,  0.],

       ..., 

       [ 0.,  0.,  0., ...,  0.,  0.,  1.],

       [ 0.,  0.,  0., ...,  0.,  0.,  1.],

       [ 0.,  0.,  0., ...,  0.,  0.,  1.]], dtype=float32)

みたいなOne-hotベクトルじゃなくて，

array([0, 0, 0, ．．．, 9, 9, 9])

のようなラベルの整数値の配列じゃないとダメってこと（この場合だと0〜9の10クラスの分類問題）

）

じゃあ，to_categorical()でOne-hotベクトル化する前の配列を使えばいいじゃん？ってなると思うけど，One-hotベクトル化した後のやつをnp.save()で保存していたので，元のクラスの整数値の配列は保存してなかった．

そこで”One-hotベクトル”　→　”クラスの整数値”ってどうやって変換すんじゃい！！！！ってなったので調べたりしたのでメモる．

やりかた

np.argmax(one_hot, axis=1)でできるっぽい．

www.reddit.com

np.where()を使う方法もあるっぽいけど，np.argmax()のほうが分かりやすいのでこっちのがオススメである．

stackoverflow.com

つまり，

Confusion matrix — scikit-learn 0.20.0 documentation

のようなやつで使いたいときは↓のように書けば良い．

%matplotlib inline
import os
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import itertools

"""
Kerasのモデルの定義や学習は省略
"""

labels_pred = model.predict_classes(x_test,verbose=0)

def plot_confusion_matrix(cm, classes,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    cm_normalize = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    plt.imshow(cm_normalize, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=90)
    plt.yticks(tick_marks, classes)
    thresh = cm.max() / 2
    
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], "d"),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.tight_layout()


# Compute confusion matrix
cnf_matrix = confusion_matrix(np.argmax(y_test, axis=1), labels_pred)
np.set_printoptions(precision=2)

# Plot non-normalized confusion matrix
plt.figure()
plot_confusion_matrix(cnf_matrix, classes=class_name_modelnet,
                      title='Normalized_confusion_matrix')
plt.savefig('Normalized_confusion_matrix.svg')
plt.show()