🤿 Lab 01: Deep Dive - Secure Pipeline¶

Mission Scenario

You are the new Data Guardian of Atlantyqa. You've been entrusted with a confidential document containing financial and personal data. Your mission: process it and extract intelligence without a single figure or real name touching the cloud.

1. 🗺️ Operations Map¶

Before touching a key, visualize the secure data flow.

graph TD
    Input[📄 Raw Document] -->|Ingest| Clean[🧹 Cleaning]
    Clean -->|Redaction with SpaCy| Safe[🛡️ Secure Tokens]
    Safe -->|Analysis| Json[💎 Final JSON]

    style Input fill:#e7ae4c,stroke:#333,stroke-width:2px,color:#fff
    style Clean fill:#37a880,stroke:#333,stroke-width:2px,color:#fff
    style Safe fill:#e0e7ff,stroke:#333,stroke-width:2px,color:#182232
    style Json fill:#f1f5f9,stroke:#182232,stroke-width:2px,color:#182232

2. ⚔️ Mission Execution¶

Follow the steps with surgical precision.

Step 1: IngestStep 2: Armored AnalysisStep 3: Verification

Create a confidential.txt file in data/input/ with fake (but realistic) data and execute it.

python cogctl.py ingest confidential.txt

Activate the privacy shield. The COGNITIVE_REDACT variable is your best friend.

# In PowerShell
$env:COGNITIVE_REDACT="1"; python cogctl.py analyze

# In Bash
COGNITIVE_REDACT=1 python cogctl.py analyze

Check that the system worked. Open the resulting JSON.

Expected: "PERSON": "[REDACTED]"
Failed: "PERSON": "Juan Pérez"

3. 📸 Evidence Collection¶

To claim your reward (XP), you must present proof.

Delivery Checklist¶

[ ] Output JSON: Confirm there are no real names.
[ ] Audit Log: Verify that outputs/audit/ has a new entry.
[ ] Screenshot: Show your terminal with the "Success" message.

📝 Template for your Pull Request


## 🛡️ Lab 01 Mission Completed

- **File Hash:** [Insert Hash]
- **Redaction Status:** ✅ Activated
- **Incidents:** None

Evidence attached in /evidence folder.

🆘 Common Problems?¶

My document isn't processing

Is it in data/input?
Does it have a .txt or .pdf extension?
Do you have write permissions on outputs/?

I don't see redacted data

Make sure the environment variable is properly set. Do an echo $env:COGNITIVE_REDACT to check.