Model evaluation: quantifying the quality of predictions — scikit-learn 0. This is not discussed on this page, but in each estimator's documentation. Cohen's kappa: a statistic that measures inter-annotator agreement.

Compute the average Hamming loss. Log loss, aka logistic loss or cross-entropy loss. In multilabel classification, the function returns the subset accuracy. 1 and its worst score at 0. The value is between 0 and 1 and higher is better. AP that interpolate the precision-recall curve. F-measures can be applied to each label independently.

Similarly, labels not present in the data sample may be accounted for in macro-averaging. This extends to the multiclass case as follows. The log loss is non-negative. 0 an average random prediction and -1 an inverse prediction. The statistic is also known as the phi coefficient. FPR is one minus the specificity or true negative rate.

F1 score, ROC doesn’t require optimizing a threshold for each label. 0, and the best value is 1. 0 norm or the cardinality of the set. The best possible score is 1. 0, lower values are worse. Best possible score is 1. Lasso and Elastic Net on sparse signals.

