1 min readfrom Machine Learning

[D] Training a classifier entirely in SQL (no iterative optimization)

I implemented SEFR, which is a lightweight linear classifier, entirely in SQL (in Google BigQuery), and benchmarked it against Logistic Regression.

On a 55k fraud detection dataset, SEFR achieves AUC 0.954 vs. 0.986 of Logistic Regression, but SEFR is ~18× faster due to its fully parallelizable formulation (it has no iterative optimization).

submitted by /u/CriticalofReviewer2
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#rows.com
#machine learning in spreadsheet applications
#automated anomaly detection
#large dataset processing
#google sheets
#classifier
#SQL
#SEFR
#logistic regression
#fraud detection
#BigQuery
#AUC
#linear classifier
#dataset
#benchmark
#iterative optimization
#parallelizable
#lightweight
#performance
#speed
[D] Training a classifier entirely in SQL (no iterative optimization)