Introduction to Model Interpretability
As machine learning models become more complex and are deployed in critical applications like healthcare, finance, and criminal justice, understanding why a model makes specific predictions becomes as important as the predictions themselves. This is where Explainable AI (XAI) comes in.
SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) are the two most popular frameworks for explaining black-box machine learning models. They help answer questions like:
- Why did the model predict this patient has high risk?
- Which features contributed most to this loan being denied?
- How reliable are the model's decisions?
- Are there biases in the model's predictions?
Why Model Interpretability Matters
- Trust and adoption: Stakeholders need to trust AI decisions before adopting them
- Debugging: Understand why models fail and improve them
- Regulatory compliance: EU GDPR and other regulations require explainable decisions
- Fairness and bias detection: Identify and mitigate discriminatory patterns
- Domain knowledge validation: Verify that the model learns sensible patterns
- Feature engineering: Discover which features matter most
- Model comparison: Compare different models beyond just accuracy metrics
SHAP: SHapley Additive exPlanations
SHAP is based on Shapley values from cooperative game theory. It assigns each feature an importance value for a particular prediction, showing how much each feature contributes to pushing the prediction away from the base value (average prediction).
Key Concepts
- Shapley values: Fair allocation of contribution from game theory
- Base value: Average prediction across all training data
- SHAP value: How much a feature changes the prediction from the base value
- Model-agnostic: Works with any machine learning model
- Additive: SHAP values sum to the difference between base and prediction
Installation and Setup
# Install SHAP
pip install shap
# Also install required dependencies
pip install matplotlib numpy pandas scikit-learn
Basic SHAP Example
import shap
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
# Load data
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
# Split and train model
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Create SHAP explainer
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
# Explain a single prediction
print("Base value (average prediction):", explainer.expected_value[1])
print("Prediction for first test sample:", model.predict_proba(X_test.iloc[[0]])[0])
print("\nTop 5 features for first prediction:")
feature_importance = pd.DataFrame({
'feature': X_test.columns,
'shap_value': shap_values[1][0]
}).sort_values('shap_value', key=abs, ascending=False).head()
print(feature_importance)
SHAP Visualizations
import shap
import matplotlib.pyplot as plt
# 1. Force Plot - Explain single prediction
# Shows how each feature pushes prediction from base value
shap.force_plot(
explainer.expected_value[1],
shap_values[1][0],
X_test.iloc[0],
matplotlib=True
)
plt.savefig('shap_force_plot.png', bbox_inches='tight', dpi=150)
# 2. Waterfall Plot - Alternative single prediction view
shap.waterfall_plot(
shap.Explanation(
values=shap_values[1][0],
base_values=explainer.expected_value[1],
data=X_test.iloc[0],
feature_names=X_test.columns.tolist()
)
)
# 3. Summary Plot - Feature importance across all predictions
shap.summary_plot(shap_values[1], X_test, plot_type="bar")
plt.title("Global Feature Importance")
plt.tight_layout()
plt.savefig('shap_summary_bar.png', dpi=150)
# 4. Beeswarm Plot - Shows feature values and impact
shap.summary_plot(shap_values[1], X_test)
plt.title("SHAP Summary Plot")
plt.savefig('shap_beeswarm.png', bbox_inches='tight', dpi=150)
# 5. Dependence Plot - How feature value affects prediction
shap.dependence_plot(
"mean radius", # Feature to analyze
shap_values[1],
X_test,
interaction_index="mean texture" # Color by interaction
)
plt.savefig('shap_dependence.png', bbox_inches='tight', dpi=150)
LIME: Local Interpretable Model-agnostic Explanations
LIME explains individual predictions by fitting a simple, interpretable model (like linear regression) locally around the prediction. It perturbs the input data and observes how predictions change, then learns a simple model to approximate the complex model's behavior in that local region.
How LIME Works
- Select an instance to explain
- Generate perturbed samples around this instance
- Get predictions for these perturbed samples from the black-box model
- Fit a simple, interpretable model (like linear regression) on this local data
- Use the simple model's coefficients to explain the prediction
Installation
# Install LIME
pip install lime
LIME for Tabular Data
import lime
import lime.lime_tabular
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
# Load and prepare data
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
# Create LIME explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
training_data=np.array(X),
feature_names=X.columns,
class_names=['malignant', 'benign'],
mode='classification'
)
# Explain a prediction
idx = 0
explanation = explainer.explain_instance(
data_row=X.iloc[idx].values,
predict_fn=model.predict_proba,
num_features=10
)
# Display explanation
print("Prediction:", model.predict_proba(X.iloc[[idx]])[0])
print("\nFeature contributions:")
print(explanation.as_list())
# Visualize explanation
explanation.show_in_notebook()
# Or save to file
explanation.save_to_file('lime_explanation.html')
LIME for Text Classification
import lime
from lime.lime_text import LimeTextExplainer
from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
# Sample text data
texts = [
"This movie is fantastic! Great acting and plot.",
"Terrible film. Waste of time and money.",
"Amazing cinematography and storytelling.",
"Boring and predictable. Very disappointing."
]
labels = [1, 0, 1, 0] # 1 = positive, 0 = negative
# Create and train pipeline
pipeline = make_pipeline(
TfidfVectorizer(),
LogisticRegression()
)
pipeline.fit(texts, labels)
# Create LIME explainer for text
explainer = LimeTextExplainer(class_names=['negative', 'positive'])
# Explain a prediction
text = "This film is absolutely wonderful and entertaining!"
explanation = explainer.explain_instance(
text,
pipeline.predict_proba,
num_features=6
)
# Show which words contributed to the prediction
print("Prediction probabilities:", pipeline.predict_proba([text])[0])
print("\nWord contributions:")
for word, weight in explanation.as_list():
print(f" {word}: {weight:.4f}")
# Visualize
explanation.show_in_notebook()
explanation.save_to_file('lime_text_explanation.html')
Comparing SHAP and LIME
SHAP Advantages
- Theoretical foundation: Based on solid game theory principles
- Consistency: Always gives the same explanation for the same input
- Global view: Easy to aggregate explanations across dataset
- Fast for tree models: TreeExplainer is very efficient
- Additive property: SHAP values sum to the prediction difference
LIME Advantages
- Intuitive: Easier to understand conceptually
- Flexible: Works well with text and image data
- Fast explanations: Quick for individual predictions
- Model-agnostic: Works with any black-box model
- Local focus: Excellent for explaining specific instances
Side-by-Side Comparison
import shap
import lime.lime_tabular
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt
# Load data and train model
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
model = GradientBoostingClassifier(random_state=42)
model.fit(X, y)
# Instance to explain
instance_idx = 0
instance = X.iloc[instance_idx]
# SHAP Explanation
shap_explainer = shap.TreeExplainer(model)
shap_values = shap_explainer.shap_values(X.iloc[[instance_idx]])
shap_features = pd.DataFrame({
'feature': X.columns,
'shap_value': shap_values[0]
}).sort_values('shap_value', key=abs, ascending=False).head(10)
# LIME Explanation
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
training_data=np.array(X),
feature_names=X.columns.tolist(),
class_names=['malignant', 'benign'],
mode='classification'
)
lime_exp = lime_explainer.explain_instance(
instance.values,
model.predict_proba,
num_features=10
)
lime_features = pd.DataFrame(lime_exp.as_list(), columns=['feature', 'lime_value'])
# Compare
print("Model prediction:", model.predict_proba(X.iloc[[instance_idx]])[0])
print("\nTop features by SHAP:")
print(shap_features)
print("\nTop features by LIME:")
print(lime_features)
# Both methods often agree on the most important features
# but may differ in exact importance values
Practical Applications
1. Credit Risk Assessment
import shap
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
# Simulate credit data
np.random.seed(42)
n_samples = 1000
credit_data = pd.DataFrame({
'income': np.random.normal(50000, 20000, n_samples),
'age': np.random.randint(18, 70, n_samples),
'credit_score': np.random.randint(300, 850, n_samples),
'debt_to_income': np.random.uniform(0, 1, n_samples),
'employment_years': np.random.randint(0, 40, n_samples),
'num_credit_cards': np.random.randint(0, 10, n_samples)
})
# Create target (approved/denied)
credit_data['approved'] = (
(credit_data['credit_score'] > 650) &
(credit_data['debt_to_income'] < 0.5) &
(credit_data['income'] > 30000)
).astype(int)
# Train model
X = credit_data.drop('approved', axis=1)
y = credit_data['approved']
model = GradientBoostingClassifier(random_state=42)
model.fit(X, y)
# Explain a denied application
denied_idx = y[y == 0].index[0]
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X.iloc[[denied_idx]])
print("Application Status: DENIED")
print(f"Probability of approval: {model.predict_proba(X.iloc[[denied_idx]])[0][1]:.2%}")
print("\nFactors contributing to denial:")
explanation_df = pd.DataFrame({
'Feature': X.columns,
'Value': X.iloc[denied_idx].values,
'Impact': shap_values[0]
}).sort_values('Impact', ascending=True)
for _, row in explanation_df.head(5).iterrows():
direction = "↓ decreases" if row['Impact'] < 0 else "↑ increases"
print(f" {row['Feature']}: {row['Value']:.2f} {direction} approval chance")
2. Medical Diagnosis Explanation
import shap
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
# Load medical data
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
# Explain diagnosis for a patient
patient_idx = 0
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X.iloc[[patient_idx]])
prediction = model.predict(X.iloc[[patient_idx]])[0]
confidence = model.predict_proba(X.iloc[[patient_idx]])[0][prediction]
print(f"Diagnosis: {'Benign' if prediction == 1 else 'Malignant'}")
print(f"Confidence: {confidence:.1%}")
print("\nKey diagnostic factors:")
# Get top contributing features
feature_impact = pd.DataFrame({
'Feature': X.columns,
'Value': X.iloc[patient_idx].values,
'SHAP': shap_values[prediction][0]
}).sort_values('SHAP', key=abs, ascending=False).head(5)
for _, row in feature_impact.iterrows():
direction = "supports" if row['SHAP'] > 0 else "contradicts"
print(f" {row['Feature']}: {row['Value']:.2f} {direction} diagnosis")
# Generate a detailed report
shap.waterfall_plot(
shap.Explanation(
values=shap_values[prediction][0],
base_values=explainer.expected_value[prediction],
data=X.iloc[patient_idx],
feature_names=X.columns.tolist()
)
)
plt.title("Diagnostic Feature Contribution")
plt.savefig('medical_diagnosis_explanation.png', bbox_inches='tight', dpi=150)
3. Bias Detection
import shap
import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
# Simulate hiring data with potential bias
np.random.seed(42)
n = 1000
hiring_data = pd.DataFrame({
'experience_years': np.random.randint(0, 20, n),
'education_level': np.random.randint(1, 5, n),
'technical_score': np.random.randint(50, 100, n),
'age': np.random.randint(22, 65, n),
'gender': np.random.choice([0, 1], n) # 0=female, 1=male
})
# Create biased target (unfairly favoring males)
hiring_data['hired'] = (
(hiring_data['technical_score'] > 70) &
(hiring_data['experience_years'] > 2) &
((hiring_data['gender'] == 1) | (np.random.random(n) > 0.3)) # Bias
).astype(int)
# Train model
X = hiring_data.drop('hired', axis=1)
y = hiring_data['hired']
model = GradientBoostingClassifier(random_state=42)
model.fit(X, y)
# Analyze feature importance
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
# Check if gender has inappropriate influence
print("Feature Importance (Mean Absolute SHAP):")
feature_importance = pd.DataFrame({
'feature': X.columns,
'importance': np.abs(shap_values).mean(axis=0)
}).sort_values('importance', ascending=False)
print(feature_importance)
# Investigate gender impact
gender_impact = np.abs(shap_values[:, X.columns.get_loc('gender')]).mean()
print(f"\nGender SHAP importance: {gender_impact:.4f}")
if gender_impact > 0.1:
print("⚠️ WARNING: Gender appears to have significant influence on hiring decisions!")
print(" This may indicate bias in the model.")
# Compare predictions for identical candidates with different genders
sample_candidate = X.iloc[0].copy()
sample_candidate['gender'] = 0
pred_female = model.predict_proba([sample_candidate])[0][1]
sample_candidate['gender'] = 1
pred_male = model.predict_proba([sample_candidate])[0][1]
print(f"\nSame candidate, different gender:")
print(f" Female: {pred_female:.2%} hiring probability")
print(f" Male: {pred_male:.2%} hiring probability")
print(f" Difference: {abs(pred_male - pred_female):.2%}")
Advanced Techniques
SHAP Interaction Values
import shap
from sklearn.ensemble import RandomForestClassifier
# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Calculate interaction values
explainer = shap.TreeExplainer(model)
shap_interaction_values = explainer.shap_interaction_values(X_test)
# Visualize interaction between two features
shap.dependence_plot(
("mean radius", "mean texture"),
shap_interaction_values[1],
X_test,
display_features=X_test
)
plt.title("Feature Interaction: Mean Radius × Mean Texture")
plt.savefig('shap_interaction.png', bbox_inches='tight', dpi=150)
Model Comparison with SHAP
import shap
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
# Train multiple models
models = {
'Random Forest': RandomForestClassifier(random_state=42),
'Gradient Boosting': GradientBoostingClassifier(random_state=42),
'Logistic Regression': LogisticRegression(random_state=42)
}
for name, model in models.items():
model.fit(X_train, y_train)
# Compare feature importance across models
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
for idx, (name, model) in enumerate(models.items()):
if name == 'Logistic Regression':
# Use KernelExplainer for non-tree models
explainer = shap.KernelExplainer(
model.predict_proba,
shap.sample(X_train, 100)
)
shap_values = explainer.shap_values(X_test[:100])
else:
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
plt.sca(axes[idx])
shap.summary_plot(shap_values[1], X_test, plot_type="bar", show=False)
plt.title(f"{name}\nFeature Importance")
plt.tight_layout()
plt.savefig('model_comparison.png', dpi=150)
Best Practices
- Choose the right tool: Use SHAP for global interpretability, LIME for quick local explanations
- Validate explanations: Check if explanations align with domain knowledge
- Explain to stakeholders: Tailor visualizations to your audience's technical level
- Use TreeExplainer for trees: Much faster than KernelExplainer for tree-based models
- Sample for large datasets: Use representative samples for faster computation
- Document assumptions: Be clear about what your explanations do and don't show
- Test stability: Check if explanations are consistent across similar instances
- Combine methods: Use multiple interpretation techniques for comprehensive understanding
Common Pitfalls
- Over-interpreting local explanations: LIME explains one instance, not the whole model
- Ignoring feature correlation: Correlated features can have unreliable importance values
- Not checking stability: Some explanations can be unstable for similar inputs
- Confusing correlation with causation: Feature importance ≠ causal relationships
- Using wrong explainer: KernelExplainer is slow; use TreeExplainer for tree models
- Neglecting computational cost: SHAP can be expensive for large datasets
- Trusting explanations blindly: Explanations can be misleading; validate them
Master Explainable AI and Model Interpretation
Our Data Science program covers model interpretation in-depth, from fundamental techniques to advanced explainability methods. Learn to build trustworthy, interpretable AI systems with expert guidance and hands-on projects.
Explore Data Science Program