Build a Simple AI-Based Phishing Detector (Beginner Tutorial)
Train a lightweight phishing classifier with text features, evaluate accuracy, and add anti-spoofing safeguards.
Train and test a lightweight phishing detector end to end with synthetic email data, clear validation, and safety guardrails.
What You’ll Build
- A small TF-IDF + Logistic Regression text classifier for phishing vs benign emails.
- Reproducible dataset generation to avoid leaking real PII.
- Validation after each step plus cleanup.
Prerequisites
- macOS or Linux with Python 3.12+.
pipavailable; ~200 MB free disk.- No email access needed; we generate synthetic samples.
Safety and Legal
- Never train on real mailbox data without explicit approval and PII scrubbing.
- Avoid storing raw emails; keep hashes or redacted text when possible.
- Keep humans in the loop for blocking decisions; start with “quarantine + review.”
Step 1) Create an isolated environment
Click to view commands
python3 -m venv .venv-phish
source .venv-phish/bin/activate
pip install --upgrade pip
pip install pandas scikit-learn joblib
Common fix: If activation fails, run chmod +x .venv-phish/bin/activate.
Step 2) Generate a synthetic labeled dataset
Click to view commands
cat > make_dataset.py <<'PY'
import pandas as pd
phish_samples = [
("Your account is locked. Verify immediately at http://fake-bank.com", 1),
("Urgent: update payroll info now or your pay is delayed", 1),
("Security alert: login from unknown device. Download the attached form", 1),
("Package held: pay customs fee via gift card", 1),
("Congrats, you won a prize! Click to claim", 1),
]
benign_samples = [
("Team meeting notes and next sprint goals", 0),
("Invoice attached for approved purchase order", 0),
("Reminder: security training scheduled next week", 0),
("Quarterly newsletter and product updates", 0),
("Welcome to the platform—getting started guide", 0),
]
df = pd.DataFrame(phish_samples + benign_samples, columns=["text", "label"])
df.to_csv("emails.csv", index=False)
print("Wrote emails.csv with", len(df), "rows")
PY
python make_dataset.py
Step 3) Train and evaluate the classifier
Click to view commands
cat > train_and_eval.py <<'PY'
import json
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, confusion_matrix
df = pd.read_csv("emails.csv")
X_train, X_test, y_train, y_test = train_test_split(df["text"], df["label"], test_size=0.3, random_state=42, stratify=df["label"])
pipeline = Pipeline([
("tfidf", TfidfVectorizer(ngram_range=(1, 2), min_df=1)),
("clf", LogisticRegression(max_iter=400, class_weight="balanced")),
])
pipeline.fit(X_train, y_train)
preds = pipeline.predict(X_test)
report = classification_report(y_test, preds, target_names=["benign", "phish"], digits=3, output_dict=True)
cm = confusion_matrix(y_test, preds, labels=[0, 1])
with open("model.json", "w") as f:
json.dump({"params": pipeline.get_params(deep=False)}, f, indent=2)
print("Confusion matrix [[TN, FP], [FN, TP]]:", cm.tolist())
print("Precision/Recall/F1:", json.dumps(report, indent=2))
PY
python train_and_eval.py
Common fixes:
ValueError: empty vocabulary=> ensureemails.csvis not empty andmin_df≤ sample size.- If class imbalance arises, keep
class_weight="balanced"or add more phishing examples.
Step 4) Add a simple scoring script with safety checks
Click to view commands
cat > score_email.py <<'PY'
import sys
import joblib
import pandas as pd
from sklearn.pipeline import Pipeline
MODEL_PATH = "model.pkl"
def load_model():
return joblib.load(MODEL_PATH)
def main():
if len(sys.argv) < 2:
print("Usage: python score_email.py 'email text'")
sys.exit(1)
text = sys.argv[1]
model: Pipeline = load_model()
proba = model.predict_proba([text])[0][1]
print(f"phish_probability={proba:.3f}")
if proba > 0.7:
print("Action: quarantine and send to human review")
if __name__ == "__main__":
main()
PY
Click to view commands
pip install joblib
python - <<'PY'
import joblib
from sklearn.pipeline import Pipeline
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
df = pd.read_csv("emails.csv")
pipe = Pipeline([
("tfidf", TfidfVectorizer(ngram_range=(1, 2), min_df=1)),
("clf", LogisticRegression(max_iter=400, class_weight="balanced")),
])
pipe.fit(df["text"], df["label"])
joblib.dump(pipe, "model.pkl")
print("Saved model.pkl")
PY
python score_email.py "Please reset your password at http://fake.com/reset"
Step 5) Add non-ML controls (defense in depth)
- Enforce SPF/DKIM/DMARC on inbound mail; reject or quarantine failures.
- Strip or rewrite links; sandbox attachments separately.
- Log decisions and top contributing features for analyst review (use
pipeline["tfidf"].get_feature_names_out()and model coefficients). - Rate-limit scoring API to prevent prompt flooding or model abuse.
Cleanup
Click to view commands
deactivate || true
rm -rf .venv-phish emails.csv make_dataset.py train_and_eval.py score_email.py model.pkl
Quick Reference
- Use synthetic/redacted data; keep humans in the decision loop.
- Validate with precision/recall; watch false positives before blocking.
- Pair ML with email-auth controls and attachment/link sandboxing.
- Keep models versioned (
model.pkl) and log every scored message.
Similar Topics
Understand what AI can actually do for password cracking, how passkeys change the game, and the defenses that matter.
Learn how AI detects threats via features, behavior analysis, and models—plus how to defend against AI-specific risks.
Learn how AI hallucinations can mislead users, trigger unsafe actions, and how to add guardrails to prevent exploitation.
Learn direct and indirect prompt injection techniques against AI systems—and the guardrails to stop them.
FAQs
Can I use these labs in production?
No—treat them as educational. Adapt, review, and security-test before any production use.
How should I order or follow the lessons?
Lessons are listed in a consistent order on the Learn page. Start from the top and progress down, or jump to any topic and use Previous/Next to navigate.
What if I don't have test data or a lab?
Use synthetic data and local containers. Never point tools at networks or data you don't own or have written permission to test.
Can I share these materials?
Yes, but keep attribution and follow any licensing terms for included tools or datasets.