Bank Campaign Subscription Analysis
Apriori association rule mining on 41,188 bank client records to surface which attribute combinations most strongly predict term deposit non-subscription — delivering actionable targeting intelligence for future marketing campaigns.
41,188
Client Records Analyzed
13
Association Rules Generated
0.602
Highest Standardized Lift
The Problem
Bank direct marketing campaigns reach all clients indiscriminately — with no data-driven understanding of which client profiles are predisposed to reject a term deposit subscription
Phone-based marketing campaigns are resource-intensive: every call costs time, money, and risks client attrition if the outreach is poorly targeted. A Portuguese bank ran a direct marketing campaign across 41,188 clients — yet the dataset reveals that 88.9% of those clients did not subscribe to the term deposit being offered. Without a way to identify which client attributes are associated with non-subscription, campaign managers are forced to contact clients who are unlikely to convert, diluting campaign efficiency and wasting outreach capacity. The core gap is the absence of any systematic, data-driven understanding of which combinations of client characteristics — marital status, loan history, contact method, prior engagement — reliably predict whether a client will reject the offer.
The Solution
Apriori association rule mining on 41,188 client records, ranked by standardized lift to surface the attribute combinations that most reliably predict non-subscription
The analysis applies the Apriori algorithm — via the arules package in R — to a reduced dataset of 10 categorical variables drawn from the full 21-variable campaign record. Variables with high proportions of missing data were excluded, and a complete-case approach was adopted given that missing values represented only 7% of the dataset. Minimum support and confidence thresholds of 0.40 and 0.80 were applied, with the right-hand side of every rule fixed to the subscription outcome variable. The algorithm generated 13 association rules, which were then ranked using standardized lift — a formulation introduced by McNicholas, Murphy, and O'Regan (2008) that accounts for the varying upper and lower bounds of lift across rules, enabling fair cross-rule comparison. Rules with standardized lift above 0.40 were retained as high-interest findings, yielding 6 actionable rules identifying the client profiles most likely to decline the term deposit offer.
Key Outcome
Six high-interest association rules identified from 41,188 client records — with married clients who had no prior contact history emerging as the strongest non-subscription signal (standardized lift 0.602) — providing the bank with a data-driven profile of clients to deprioritize in future campaigns and a clear set of variables, including marital status, loan history, and contact method, that meaningfully influence campaign outcomes.
Technical Deep Dive
Methodology & Analysis
Analytical Workflow
Stage 1 — Data Reduction & Preparation
Step 1
Variable Selection
21 variables reduced to 10 categorical · Numerical variables excluded · High-missingness variables dropped
Step 2
Missing Data Handling
7% missingness · Complete-case approach adopted · 41,188 instances retained
Step 3
Descriptive Profiling
Distribution of job, marital, education, loan, contact · 88.9% non-subscription rate confirmed · No significant correlations detected
Stage 2 — Apriori Rule Mining · arules (R)
Parameters
Threshold Configuration
Min support = 0.40 · Min confidence = 0.80 · Rule length 1–6 items
Constraint
Fixed RHS
Right-hand side fixed to subscription outcome · Focuses all rules on campaign target variable
Output
13 Association Rules Generated
Each rule characterized by support, confidence, lift, and standardized lift · LHS = client attribute combinations · RHS = subscription outcome
Stage 3 — Rule Evaluation & Ranking
Ranking Metric
Standardized Lift
McNicholas et al. (2008) formulation · Accounts for varying lift bounds across rules · Enables fair cross-rule comparison
Filter
Std. Lift > 0.40 Threshold
13 rules reduced to 6 high-interest rules · Lower-interest rules excluded from reporting
Findings
6 High-Interest Rules — Actionable Campaign Intelligence
Marital status, loan history & contact method identified as key non-subscription drivers · Job, education, contact timing found non-influential
Stage 1
Data Reduction & Preparation
The full campaign dataset contains 21 variables across 5 groups — client demographics, contact details, campaign information, socio-economic indicators, and the subscription outcome. Only the 10 categorical variables relevant to the subscription decision were retained: job, marital status, education, housing loan, personal loan, contact method, contact month, contact day, number of prior contacts, and the outcome. Numerical variables were excluded as association rule mining operates on categorical data. Two categorical variables — credit default status and prior campaign outcome — were excluded due to high proportions of missing values. The remaining 7% missingness was handled via complete-case analysis, retaining all 41,188 instances.
Stage 2
Apriori Rule Mining
The Apriori algorithm was applied via the arules package in R with minimum support and confidence thresholds of 0.40 and 0.80 respectively, and rule length bounded between 1 and 6 items. The right-hand side was fixed to the subscription outcome variable — constraining all generated rules to the campaign's primary business question. This configuration produced 13 association rules, each characterized by four interestingness measures: support (joint probability of LHS and RHS), confidence (conditional probability of RHS given LHS), lift (ratio of confidence to marginal probability of RHS), and standardized lift.
Stage 3
Rule Evaluation & Ranking
All 13 rules were ranked by standardized lift using the McNicholas, Murphy, and O'Regan (2008) formulation, which normalizes lift against its theoretical minimum and maximum given the rule's support and confidence thresholds — enabling meaningful comparison across rules with different characteristics. Rules with standardized lift below 0.40 were excluded as insufficiently interesting. The 6 retained rules form a coherent, interpretable set of client profiles associated with non-subscription, each grounded in attribute combinations that are both statistically notable and actionable for campaign managers.
Key Methodological Choices
Standardized lift over raw lift for cross-rule comparison
Raw lift has different theoretical upper and lower bounds depending on the marginal probabilities of the LHS and RHS — meaning two rules with identical lift values may have very different levels of interestingness relative to their own feasible range. Standardized lift normalizes each rule's lift to a [0, 1] scale anchored to its own bounds, enabling direct comparison across rules with different support and confidence profiles. This is critical when ranking rules for business prioritization, where the goal is to surface the most genuinely surprising and actionable associations rather than those that happen to have favorable marginal probabilities.
Complete-case analysis over imputation given low missingness
With only 7% of records containing missing values, complete-case analysis introduces minimal bias relative to the size of the dataset — particularly when missingness is not systematically concentrated in a single variable or subgroup. Imputation of categorical variables in association rule mining also carries additional risk: imputed values introduce artificial patterns that can inflate support or confidence for rules involving the imputed variable. Retaining only complete cases avoids this artifact while preserving the vast majority of the data's signal, making complete-case the methodologically cleaner choice for this analysis.
Fixing the RHS to the subscription outcome variable
Association rule mining without a fixed consequent generates rules between any pair of variables — producing a large number of rules that have no bearing on the campaign's primary question. By constraining the right-hand side to the subscription outcome, all generated rules are guaranteed to be directly relevant to the business problem: which client attributes predict whether a client will or will not subscribe. This constraint substantially reduces the rule space and eliminates the need to post-hoc filter out irrelevant rules, making the analysis more focused, interpretable, and immediately actionable for campaign managers.
Tech Stack
| Technology | Purpose |
|---|---|
| R | Statistical analysis environment and primary implementation language |
| arules (R package) | Apriori algorithm execution, rule generation, and interestingness measure computation |
| Apriori Algorithm | Frequent itemset mining and association rule discovery across categorical client attributes |
| Standardized Lift | McNicholas et al. (2008) rule interestingness metric for fair cross-rule ranking |
Results & Metrics
What the analysis reveals
6
High-Interest Rules
Filtered from 13 generated rules using standardized lift threshold of 0.40
0.602
Highest Standardized Lift
Married clients with no prior contact — the strongest non-subscription signal in the dataset
4
Key Influential Variables
Marital status, loan status, contact method, and prior contact count drive non-subscription
Married clients with no prior contact are the strongest non-subscription profile
Rule 1 — married clients who had not been contacted before the campaign (support=0.489, confidence=0.920, standardized lift=0.602) — is the most interesting rule in the analysis. Rule 2 further refines this: married clients with no personal loan (standardized lift=0.492) are the second-strongest non-subscription signal. Together, these two rules suggest that marital status combined with low prior engagement or low financial exposure is a reliable predictor of campaign rejection.
Marital status alone is a significant non-subscription predictor
Rule 3 shows that married clients as a group — regardless of loan or contact history — tend not to subscribe (support=0.545, confidence=0.899, standardized lift=0.450). With 60.6% of the dataset being married clients, this is a high-coverage finding. Rules 1 and 2 demonstrate that additional attributes (no prior contact, no loan) amplify this tendency — making marital status the single most consistent predictor across the top three rules.
Loan and housing loan status independently predict non-subscription
Rule 4 identifies clients with a housing loan as non-subscription-prone (standardized lift=0.428), while Rule 6 shows that clients with no personal loan and no prior contact are also a high-risk non-conversion group (standardized lift=0.417). Notably, these two rules point in opposite directions on loan status — suggesting that both having a housing loan and lacking a personal loan can, in different configurations, predict campaign rejection. The key differentiator is the direction of the rule's other attributes.
Job, education, contact timing, and day of week have no influence on subscription outcome
Variables commonly assumed to drive campaign success — including job type, education level, the month of contact, and the day of the week — did not appear in any high-interest rule. This is an equally actionable finding: campaign resources spent segmenting or timing outreach by these variables are not supported by the data, and future campaign design should redirect attention toward the variables the analysis does identify as meaningful.
Six rules provide a directly deployable targeting exclusion framework
The six high-interest rules collectively define a set of client profiles that the bank can use to deprioritize outreach in future campaigns — reducing wasted calls, improving conversion rates among the clients who are contacted, and lowering campaign costs. The rules are interpretable by non-technical campaign managers without additional transformation: each rule is a readable if-then statement about client attributes and their expected response to the term deposit offer.