machine learning - Sklearn for Python: Is there a way to see how close a prediction was? -


i using code perform predictions classify text:

predicted = clf.predict(x_new_tfidf) 

my predictions either come out saying text snippet belongs subject or subject b. however, want further analysis on predictions shaky -- is, if model unsure whether or b, had pick 1 sake of it. there way extract relative confidence of predictions?

code:

x_train has ["sentence know belongs subject a", "another sentence describes subject a", "a sentence subject b", "another sentence subject b"...], etc

y_train contains corresponding classifiers: ["subject a", "subject a", "subject b", "subject b", ...], etc.

predict_these_x list of sentences wish classify: ["some random sentence", "another sentence", "another sentence again", ...] etc.

    count_vect = countvectorizer()     tfidf_transformer = tfidftransformer()      x_train_counts = count_vect.fit_transform(x_train)     x_train_tfidf = tfidf_transformer.fit_transform(x_train_counts)      x_new_counts = count_vect.transform(predict_these_x)     x_new_tfidf = tfidf_transformer.transform(x_new_counts)      estimator = bernoullinb()     estimator.fit(x_train_tfidf, y_train)     predictions = estimator.predict(x_new_tfidf)      print estimator.predict_proba(x_new_tfidf)     return predictions 

result:

[[  9.97388646e-07   9.99999003e-01]  [  9.99996892e-01   3.10826824e-06]  [  9.40063326e-01   5.99366742e-02]  [  9.99999964e-01   3.59816546e-08]  ...  [  1.95070084e-10   1.00000000e+00]  [  3.21721965e-15   1.00000000e+00]  [  1.00000000e+00   3.89012777e-10]] 

from sklearn.datasets import make_classification sklearn.naive_bayes import bernoullinb  # generate artificial data x, y = make_classification(n_samples=1000, n_features=50, weights=[0.1, 0.9])   # estimator estimator = bernoullinb() estimator.fit(x, y)  # generate predictions estimator.predict(x) out[164]: array([1, 1, 1, ..., 0, 1, 1])  # confidence on prediction estimator.predict_proba(x)  out[163]:  array([[ 0.0043,  0.9957],        [ 0.0046,  0.9954],        [ 0.0071,  0.9929],        ...,         [ 0.8392,  0.1608],        [ 0.0018,  0.9982],        [ 0.0339,  0.9661]]) 

now see, each of first 3 observations, has on 99% probability positive case.


Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -

Listeners to visualise results of load test in JMeter -