Data Analyst

RFM customer scoring segmentation - Customer value analysis

RFM (Recency, Frequency, Monetary) table, top customer scoring, scoring segmentation

Introduction
Customer value analysis (CVA) is an analytic method for discovering customers’ characteristics and on this basis making further analysis to extract useful knowledge from big data.

RFM models help us to understand how much is the client worth to the company. This is known as customer scoring. Based on such information, we are able to determine the efficiency of marketing campaigns and decide whether a customer is important enough for us to maintain advertising spendings.

The main advantage of RFM analysis is that it provides meaningful information about customers while using fewer (three-dimensional) criterions as cluster attributes. This reduces the complexity of customer value analysis model without compromising its accuracy.

Data
For current project used online retail dataset from UCI Machine Learning Repository. The libraries used pandas and datetime. The data and full code are available in the repository

I. Create a Recency Frequency Monetary table
• count of unique countries

df = pd.read_excel("online_retail.xlsx")

• countries with the largest number of consumers

customer_country=df[['Country','CustomerID']].drop_duplicates()
print(customer_country.groupby(['Country'])['CustomerID']\
      .agg('count').reset_index().sort_values('CustomerID', ascending=False))

• Since 92% are from UK, let’s limit analysis UK only

df_UK = df.loc[df['Country'] == 'United Kingdom']

• get rid of null values for CustomerID

print(df_UK.isnull().sum(axis=0))
df_UK = df_UK[pd.notnull(df['CustomerID'])]

• minimum values in UnitPrice and Quantity columns

price_min = df_UK.UnitPrice.min()
quantity_min = df_UK.Quantity.min()

• get rid of negative values in Quantity column

df_UK = df_UK[(df_UK['Quantity']>0)]
quantity_min = df_UK.Quantity.min()

• unique values for each column of df_UK

def unique_counts(df_UK):
   for column in df_UK.columns:
       count = df_UK[column].nunique()
       print(column, ": ", count)

unique_counts(df_UK)

• total price calculation

df_UK['TotalPrice'] = df_UK['Quantity'] * df_UK['UnitPrice']

• first and last order date in the data

FD = df_UK['InvoiceDate'].dt.date.min() # 2010-12-01
LD = df_UK['InvoiceDate'].dt.date.max() # 2011-12-09

• recency calculation

calc_time = LD + dt.timedelta(days=1)
df_UK['InvoiceDate'] = pd.to_datetime(df_UK['InvoiceDate'])

• RFM table

rfm_tab = df_UK.groupby('CustomerID').agg({'InvoiceDate': lambda x: (calc_time - x.dt.date.max()).days, # Recency
                                        'InvoiceNo': lambda x: len(x),    # Frequency
                                        'TotalPrice': lambda x: x.sum()}) # Monetary Value

rfm_tab['InvoiceDate'] = rfm_tab['InvoiceDate'].astype(int)
rfm_tab.rename(columns={'InvoiceDate': 'Recency', 
                        'InvoiceNo': 'Frequency', 
                        'TotalPrice': 'Monetary_value'}, inplace=True)

• RFM metrics for each customer

rfm_tab.head()

Interpretation:
– CustomerID has such frequency, monetary value and recency that many days

II. Find the best customers
• Split metrics into segments by using quartile (4 segments)

cust_quantiles = rfm_tab.quantile(q=[0.25,0.5,0.75])
cust_quantiles = cust_quantiles.to_dict()
print(cust_quantiles)

• Segmented RFM table

segmented_rfm = rfm_tab

• Functions to categorize customers, 111 is the best score

def RS_score(x,p,d):
    if x <= d[p][0.25]:
        return 1
    elif x <= d[p][0.50]:
        return 2
    elif x <= d[p][0.75]: 
        return 3
    else:
        return 4
    
def FM_score(x,p,d):
    if x <= d[p][0.25]:
        return 4
    elif x <= d[p][0.50]:
        return 3
    elif x <= d[p][0.75]: 
        return 2
    else:
        return 1

• Function results and sort out by adding a new column to combine RFM score

segmented_rfm['r_quartile'] = segmented_rfm['Recency'].apply(RS_score,\
                              args=('Recency',cust_quantiles,))
segmented_rfm['f_quartile'] = segmented_rfm['Frequency'].apply(FM_score,\
                              args=('Frequency',cust_quantiles,))
segmented_rfm['m_quartile'] = segmented_rfm['Monetary_value'].apply(FM_score,\
                              args=('Monetary_value',cust_quantiles,))

segmented_rfm['RFM_score'] = segmented_rfm.r_quartile.map(str)\
                            + segmented_rfm.f_quartile.map(str)\
                            + segmented_rfm.m_quartile.map(str)

• top 10 customers by score

top_10_customers = segmented_rfm[segmented_rfm['RFM_score']=='111']\
                   .sort_values('Monetary_value', ascending=False).reset_index()

• top 10 customers only by Monetary_value (TotalPrice)

cust_UK_total = df_UK.groupby('CustomerID')['TotalPrice'].sum()
print(cust_UK_total.sort_values(ascending=False).head(10))

Conclusion:
– As we can see, in this case, the top 2 customers are also the customers with the highest monetary value compared to all