While studying the Pima Indian Diabetes dataset with machine learning algorithms, I re-discovered something I learned earlier, which I didn’t quite understand. Now it’s crystal clear.
False positive = Type 1 error. Cases where the classifier falsely predicted a positive.
False negative = Type 2 error. Cases where classifier falsely predicted a negative.
I can think of this in a non-medical setting too.
1) Classification problem: assume the scientist is traversing a large dataset of companies (potentially millions of rows of companies formed since the 1990s). Feature names could include
- the number of years the business has survived,
- how many founders
- how soon to profitability
- legal threats
- early stage funding
The response vectors would be ‘still in business’, or ‘out of business’.
A false positive would mean given a certain feature set, a company is actually out of business. But in fact, the model predicts wrongly that it’s still surviving.
A false negative would mean given a certain feature set, a company is actually in business, but the model wrongly predicts the business does not survive.
Having large Type 1 and Type 2 errors such as these could could potentially hamper/cripple investment strategies early stage investors or venture capital funds.
Other examples, where a business objective drives certain metrics to be minimized, or maximized
2) In fraudulent transaction detection (1 is fraud, 0 is NOT fraud), priority is to minimize false negatives. False negative means a transaction isn’t fraudulent, but is detected as one. Hence, website owner might lose a sale! Because the system will flag it and ban transaction from happening.
EDIT: obviously, we should never quickly flag / ban transactions. I believe there should just ask for quick verification from users (such as what credit card companies normally do)
3) In spam filters (1 is spam, 0 is NOT spam), we want to minimize false positives. False positive means an email is marked as spam, even though it isn’t. The email user might miss very important emails!