Applied ML · Finance & Economics

Bank Campaign Subscription Analysis

Apriori association rule mining on 41,188 bank client records to surface which attribute combinations most strongly predict term deposit non-subscription — delivering actionable targeting intelligence for future marketing campaigns.

Method Apriori · Association Rule Mining
Tech Stack
R arules Apriori Algorithm
Source Code View on GitHub

41,188

Client Records Analyzed

13

Association Rules Generated

0.602

Highest Standardized Lift

The Problem

Bank direct marketing campaigns reach all clients indiscriminately — with no data-driven understanding of which client profiles are predisposed to reject a term deposit subscription

Phone-based marketing campaigns are resource-intensive: every call costs time, money, and risks client attrition if the outreach is poorly targeted. A Portuguese bank ran a direct marketing campaign across 41,188 clients — yet the dataset reveals that 88.9% of those clients did not subscribe to the term deposit being offered. Without a way to identify which client attributes are associated with non-subscription, campaign managers are forced to contact clients who are unlikely to convert, diluting campaign efficiency and wasting outreach capacity. The core gap is the absence of any systematic, data-driven understanding of which combinations of client characteristics — marital status, loan history, contact method, prior engagement — reliably predict whether a client will reject the offer.

The Solution

Apriori association rule mining on 41,188 client records, ranked by standardized lift to surface the attribute combinations that most reliably predict non-subscription

The analysis applies the Apriori algorithm — via the arules package in R — to a reduced dataset of 10 categorical variables drawn from the full 21-variable campaign record. Variables with high proportions of missing data were excluded, and a complete-case approach was adopted given that missing values represented only 7% of the dataset. Minimum support and confidence thresholds of 0.40 and 0.80 were applied, with the right-hand side of every rule fixed to the subscription outcome variable. The algorithm generated 13 association rules, which were then ranked using standardized lift — a formulation introduced by McNicholas, Murphy, and O'Regan (2008) that accounts for the varying upper and lower bounds of lift across rules, enabling fair cross-rule comparison. Rules with standardized lift above 0.40 were retained as high-interest findings, yielding 6 actionable rules identifying the client profiles most likely to decline the term deposit offer.

Key Outcome

Six high-interest association rules identified from 41,188 client records — with married clients who had no prior contact history emerging as the strongest non-subscription signal (standardized lift 0.602) — providing the bank with a data-driven profile of clients to deprioritize in future campaigns and a clear set of variables, including marital status, loan history, and contact method, that meaningfully influence campaign outcomes.

Technical Deep Dive

Methodology & Analysis

Analytical Workflow

Stage 1 — Data Reduction & Preparation

Step 1

Variable Selection

21 variables reduced to 10 categorical · Numerical variables excluded · High-missingness variables dropped

Step 2

Missing Data Handling

7% missingness · Complete-case approach adopted · 41,188 instances retained

Step 3

Descriptive Profiling

Distribution of job, marital, education, loan, contact · 88.9% non-subscription rate confirmed · No significant correlations detected

Stage 2 — Apriori Rule Mining · arules (R)

Parameters

Threshold Configuration

Min support = 0.40 · Min confidence = 0.80 · Rule length 1–6 items

Constraint

Fixed RHS

Right-hand side fixed to subscription outcome · Focuses all rules on campaign target variable

Output

13 Association Rules Generated

Each rule characterized by support, confidence, lift, and standardized lift · LHS = client attribute combinations · RHS = subscription outcome

Stage 3 — Rule Evaluation & Ranking

Ranking Metric

Standardized Lift

McNicholas et al. (2008) formulation · Accounts for varying lift bounds across rules · Enables fair cross-rule comparison

Filter

Std. Lift > 0.40 Threshold

13 rules reduced to 6 high-interest rules · Lower-interest rules excluded from reporting

Findings

6 High-Interest Rules — Actionable Campaign Intelligence

Marital status, loan history & contact method identified as key non-subscription drivers · Job, education, contact timing found non-influential

Stage 1

Data Reduction & Preparation

The full campaign dataset contains 21 variables across 5 groups — client demographics, contact details, campaign information, socio-economic indicators, and the subscription outcome. Only the 10 categorical variables relevant to the subscription decision were retained: job, marital status, education, housing loan, personal loan, contact method, contact month, contact day, number of prior contacts, and the outcome. Numerical variables were excluded as association rule mining operates on categorical data. Two categorical variables — credit default status and prior campaign outcome — were excluded due to high proportions of missing values. The remaining 7% missingness was handled via complete-case analysis, retaining all 41,188 instances.

Stage 2

Apriori Rule Mining

The Apriori algorithm was applied via the arules package in R with minimum support and confidence thresholds of 0.40 and 0.80 respectively, and rule length bounded between 1 and 6 items. The right-hand side was fixed to the subscription outcome variable — constraining all generated rules to the campaign's primary business question. This configuration produced 13 association rules, each characterized by four interestingness measures: support (joint probability of LHS and RHS), confidence (conditional probability of RHS given LHS), lift (ratio of confidence to marginal probability of RHS), and standardized lift.

Stage 3

Rule Evaluation & Ranking

All 13 rules were ranked by standardized lift using the McNicholas, Murphy, and O'Regan (2008) formulation, which normalizes lift against its theoretical minimum and maximum given the rule's support and confidence thresholds — enabling meaningful comparison across rules with different characteristics. Rules with standardized lift below 0.40 were excluded as insufficiently interesting. The 6 retained rules form a coherent, interpretable set of client profiles associated with non-subscription, each grounded in attribute combinations that are both statistically notable and actionable for campaign managers.

Key Methodological Choices

Standardized lift over raw lift for cross-rule comparison

Raw lift has different theoretical upper and lower bounds depending on the marginal probabilities of the LHS and RHS — meaning two rules with identical lift values may have very different levels of interestingness relative to their own feasible range. Standardized lift normalizes each rule's lift to a [0, 1] scale anchored to its own bounds, enabling direct comparison across rules with different support and confidence profiles. This is critical when ranking rules for business prioritization, where the goal is to surface the most genuinely surprising and actionable associations rather than those that happen to have favorable marginal probabilities.

Complete-case analysis over imputation given low missingness

With only 7% of records containing missing values, complete-case analysis introduces minimal bias relative to the size of the dataset — particularly when missingness is not systematically concentrated in a single variable or subgroup. Imputation of categorical variables in association rule mining also carries additional risk: imputed values introduce artificial patterns that can inflate support or confidence for rules involving the imputed variable. Retaining only complete cases avoids this artifact while preserving the vast majority of the data's signal, making complete-case the methodologically cleaner choice for this analysis.

Fixing the RHS to the subscription outcome variable

Association rule mining without a fixed consequent generates rules between any pair of variables — producing a large number of rules that have no bearing on the campaign's primary question. By constraining the right-hand side to the subscription outcome, all generated rules are guaranteed to be directly relevant to the business problem: which client attributes predict whether a client will or will not subscribe. This constraint substantially reduces the rule space and eliminates the need to post-hoc filter out irrelevant rules, making the analysis more focused, interpretable, and immediately actionable for campaign managers.

Tech Stack

Technology Purpose
R Statistical analysis environment and primary implementation language
arules (R package) Apriori algorithm execution, rule generation, and interestingness measure computation
Apriori Algorithm Frequent itemset mining and association rule discovery across categorical client attributes
Standardized Lift McNicholas et al. (2008) rule interestingness metric for fair cross-rule ranking

Results & Metrics

What the analysis reveals

6

High-Interest Rules

Filtered from 13 generated rules using standardized lift threshold of 0.40

0.602

Highest Standardized Lift

Married clients with no prior contact — the strongest non-subscription signal in the dataset

4

Key Influential Variables

Marital status, loan status, contact method, and prior contact count drive non-subscription

🎯

Married clients with no prior contact are the strongest non-subscription profile

Rule 1 — married clients who had not been contacted before the campaign (support=0.489, confidence=0.920, standardized lift=0.602) — is the most interesting rule in the analysis. Rule 2 further refines this: married clients with no personal loan (standardized lift=0.492) are the second-strongest non-subscription signal. Together, these two rules suggest that marital status combined with low prior engagement or low financial exposure is a reliable predictor of campaign rejection.

💍

Marital status alone is a significant non-subscription predictor

Rule 3 shows that married clients as a group — regardless of loan or contact history — tend not to subscribe (support=0.545, confidence=0.899, standardized lift=0.450). With 60.6% of the dataset being married clients, this is a high-coverage finding. Rules 1 and 2 demonstrate that additional attributes (no prior contact, no loan) amplify this tendency — making marital status the single most consistent predictor across the top three rules.

🏠

Loan and housing loan status independently predict non-subscription

Rule 4 identifies clients with a housing loan as non-subscription-prone (standardized lift=0.428), while Rule 6 shows that clients with no personal loan and no prior contact are also a high-risk non-conversion group (standardized lift=0.417). Notably, these two rules point in opposite directions on loan status — suggesting that both having a housing loan and lacking a personal loan can, in different configurations, predict campaign rejection. The key differentiator is the direction of the rule's other attributes.

📵

Job, education, contact timing, and day of week have no influence on subscription outcome

Variables commonly assumed to drive campaign success — including job type, education level, the month of contact, and the day of the week — did not appear in any high-interest rule. This is an equally actionable finding: campaign resources spent segmenting or timing outreach by these variables are not supported by the data, and future campaign design should redirect attention toward the variables the analysis does identify as meaningful.

📋

Six rules provide a directly deployable targeting exclusion framework

The six high-interest rules collectively define a set of client profiles that the bank can use to deprioritize outreach in future campaigns — reducing wasted calls, improving conversion rates among the clients who are contacted, and lowering campaign costs. The rules are interpretable by non-technical campaign managers without additional transformation: each rule is a readable if-then statement about client attributes and their expected response to the term deposit offer.