Comparing to Human-Level Performance

Human-Level performance simply refers to the best performance that a human/group of humans can achieve on a given task.

As Deep Learning models get better and better, their performance usually reaches and sometimes even surpasses human-level performance. However, the rate and amount at which the accuracy increases, decreases over time after surpassing human-level performance, and plateaus to a theoretical limit known as Bayes Optimal Error, which denotes the best performance possible using that particular model.

The main reason that ML models don't usually get much better than humans is because they depend on human knowledge for obtaining labeled training data, hyperparameter tuning etc. and once they are better than humans, it becomes difficult to increase their accuracy by a significant amount.

It is important to compare a model’s performance to human-level performance:

  • Say human error for a given task is 1% and the training set error of the model is 10%. Since there is a large gap between the errors, we could focus on reducing bias. The difference between human error and the training set error is called avoidable bias.

  • Say human error for a given task is 9% and the training set error of the model is 10%. Say the test/dev set error is 12%. Since the model is already performing well on the training set (in comparison with human performance), we could focus on reducing variance instead, so as to bridge the gap between training and test/dev set errors.

There are some problems for which ML models achieve much better performance than humans:

  • Online Advertising

  • Product Recommendation

  • Loan Approval Prediction

  • Logistics etc

Note that models for all these problems learn from large amounts of structured data and these tasks do not require natural perception, i.e. they are not computer vision or natural language processing related. Humans are better than ML models at tasks that require natural perception and it is difficult for a model to surpass human-level performance on such tasks.

Last updated