Learn in Public unlocks on Jan 1, 2026
This lesson will be public then. Admins can unlock early with a password.
Build a Simple AI-Based Phishing Detector (Beginner Tutorial)
Train a lightweight phishing classifier with text features, evaluate accuracy, and add anti-spoofing safeguards.
Train and test a lightweight phishing detector end to end with synthetic email data, clear validation, and safety guardrails.
What You’ll Build
- A small TF-IDF + Logistic Regression text classifier for phishing vs benign emails.
- Reproducible dataset generation to avoid leaking real PII.
- Validation after each step plus cleanup.
Prerequisites
- macOS or Linux with Python 3.12+.
pipavailable; ~200 MB free disk.- No email access needed; we generate synthetic samples.
Safety and Legal
- Never train on real mailbox data without explicit approval and PII scrubbing.
- Avoid storing raw emails; keep hashes or redacted text when possible.
- Keep humans in the loop for blocking decisions; start with “quarantine + review.”
Step 1) Create an isolated environment
Click to view commands
python3 -m venv .venv-phish
source .venv-phish/bin/activate
pip install --upgrade pip
pip install pandas scikit-learn joblib
Common fix: If activation fails, run chmod +x .venv-phish/bin/activate.
Step 2) Generate a synthetic labeled dataset
Click to view commands
cat > make_dataset.py <<'PY'
import pandas as pd
phish_samples = [
("Your account is locked. Verify immediately at http://fake-bank.com", 1),
("Urgent: update payroll info now or your pay is delayed", 1),
("Security alert: login from unknown device. Download the attached form", 1),
("Package held: pay customs fee via gift card", 1),
("Congrats, you won a prize! Click to claim", 1),
]
benign_samples = [
("Team meeting notes and next sprint goals", 0),
("Invoice attached for approved purchase order", 0),
("Reminder: security training scheduled next week", 0),
("Quarterly newsletter and product updates", 0),
("Welcome to the platform—getting started guide", 0),
]
df = pd.DataFrame(phish_samples + benign_samples, columns=["text", "label"])
df.to_csv("emails.csv", index=False)
print("Wrote emails.csv with", len(df), "rows")
PY
python make_dataset.py
Step 3) Train and evaluate the classifier
Click to view commands
cat > train_and_eval.py <<'PY'
import json
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, confusion_matrix
df = pd.read_csv("emails.csv")
X_train, X_test, y_train, y_test = train_test_split(df["text"], df["label"], test_size=0.3, random_state=42, stratify=df["label"])
pipeline = Pipeline([
("tfidf", TfidfVectorizer(ngram_range=(1, 2), min_df=1)),
("clf", LogisticRegression(max_iter=400, class_weight="balanced")),
])
pipeline.fit(X_train, y_train)
preds = pipeline.predict(X_test)
report = classification_report(y_test, preds, target_names=["benign", "phish"], digits=3, output_dict=True)
cm = confusion_matrix(y_test, preds, labels=[0, 1])
with open("model.json", "w") as f:
json.dump({"params": pipeline.get_params(deep=False)}, f, indent=2)
print("Confusion matrix [[TN, FP], [FN, TP]]:", cm.tolist())
print("Precision/Recall/F1:", json.dumps(report, indent=2))
PY
python train_and_eval.py
Common fixes:
ValueError: empty vocabulary=> ensureemails.csvis not empty andmin_df≤ sample size.- If class imbalance arises, keep
class_weight="balanced"or add more phishing examples.
Step 4) Add a simple scoring script with safety checks
Click to view commands
cat > score_email.py <<'PY'
import sys
import joblib
import pandas as pd
from sklearn.pipeline import Pipeline
MODEL_PATH = "model.pkl"
def load_model():
return joblib.load(MODEL_PATH)
def main():
if len(sys.argv) < 2:
print("Usage: python score_email.py 'email text'")
sys.exit(1)
text = sys.argv[1]
model: Pipeline = load_model()
proba = model.predict_proba([text])[0][1]
print(f"phish_probability={proba:.3f}")
if proba > 0.7:
print("Action: quarantine and send to human review")
if __name__ == "__main__":
main()
PY
Click to view commands
pip install joblib
python - <<'PY'
import joblib
from sklearn.pipeline import Pipeline
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
df = pd.read_csv("emails.csv")
pipe = Pipeline([
("tfidf", TfidfVectorizer(ngram_range=(1, 2), min_df=1)),
("clf", LogisticRegression(max_iter=400, class_weight="balanced")),
])
pipe.fit(df["text"], df["label"])
joblib.dump(pipe, "model.pkl")
print("Saved model.pkl")
PY
python score_email.py "Please reset your password at http://fake.com/reset"
Step 5) Add non-ML controls (defense in depth)
- Enforce SPF/DKIM/DMARC on inbound mail; reject or quarantine failures.
- Strip or rewrite links; sandbox attachments separately.
- Log decisions and top contributing features for analyst review (use
pipeline["tfidf"].get_feature_names_out()and model coefficients). - Rate-limit scoring API to prevent prompt flooding or model abuse.
Cleanup
Click to view commands
deactivate || true
rm -rf .venv-phish emails.csv make_dataset.py train_and_eval.py score_email.py model.pkl
Quick Reference
- Use synthetic/redacted data; keep humans in the decision loop.
- Validate with precision/recall; watch false positives before blocking.
- Pair ML with email-auth controls and attachment/link sandboxing.
- Keep models versioned (
model.pkl) and log every scored message.