Fraud is thankfully rare, but that means that there are very few frauds compared to the volume of legitimate transactions. Simply applying traditional machine learning techniques (along with standard metrics like accuracy, sensitivity, specificity) will tend to result in poor models. Data will need to be downsampled and metrics less susceptible to unbiased datasets will need to be selected.
First, the fraud database will need to be matched to the transactional data. This information will need to be enhanced by adding broader fraud trends from aggregations of the fraud database in conjunction with aggregations of the transaction database. Next, this customer’s history needs to be added to the record (again from the transaction database). All this data needs to be cleaned, joined and transformed into valuable ML features before going into model training. This pre-modeling prep process can be frustrating and time consuming. We are here to help.