machine learning - Sklearn for Python: Is there a way to see how close a prediction was? -
i using code perform predictions classify text:
predicted = clf.predict(x_new_tfidf)
my predictions either come out saying text snippet belongs subject or subject b. however, want further analysis on predictions shaky -- is, if model unsure whether or b, had pick 1 sake of it. there way extract relative confidence of predictions?
code:
x_train
has ["sentence know belongs subject a", "another sentence describes subject a", "a sentence subject b", "another sentence subject b"...]
, etc
y_train
contains corresponding classifiers: ["subject a", "subject a", "subject b", "subject b", ...]
, etc.
predict_these_x
list of sentences wish classify: ["some random sentence", "another sentence", "another sentence again", ...]
etc.
count_vect = countvectorizer() tfidf_transformer = tfidftransformer() x_train_counts = count_vect.fit_transform(x_train) x_train_tfidf = tfidf_transformer.fit_transform(x_train_counts) x_new_counts = count_vect.transform(predict_these_x) x_new_tfidf = tfidf_transformer.transform(x_new_counts) estimator = bernoullinb() estimator.fit(x_train_tfidf, y_train) predictions = estimator.predict(x_new_tfidf) print estimator.predict_proba(x_new_tfidf) return predictions
result:
[[ 9.97388646e-07 9.99999003e-01] [ 9.99996892e-01 3.10826824e-06] [ 9.40063326e-01 5.99366742e-02] [ 9.99999964e-01 3.59816546e-08] ... [ 1.95070084e-10 1.00000000e+00] [ 3.21721965e-15 1.00000000e+00] [ 1.00000000e+00 3.89012777e-10]]
from sklearn.datasets import make_classification sklearn.naive_bayes import bernoullinb # generate artificial data x, y = make_classification(n_samples=1000, n_features=50, weights=[0.1, 0.9]) # estimator estimator = bernoullinb() estimator.fit(x, y) # generate predictions estimator.predict(x) out[164]: array([1, 1, 1, ..., 0, 1, 1]) # confidence on prediction estimator.predict_proba(x) out[163]: array([[ 0.0043, 0.9957], [ 0.0046, 0.9954], [ 0.0071, 0.9929], ..., [ 0.8392, 0.1608], [ 0.0018, 0.9982], [ 0.0339, 0.9661]])
now see, each of first 3 observations, has on 99% probability positive case.
Comments
Post a Comment