Incident Report: Git Recovery - December 7, 2025¶
Incident ID: IR-2025-12-07-001 Date: December 7-8, 2025 Severity: High (Data loss risk) Status: β Resolved Affected: v0.2.0 release, prompt_system implementation
Executive Summary¶
On December 7-8, 2025, approximately 39 files (4,508 lines of code) containing the prompt system implementation (ADR-PT04) were orphaned due to a git reset --hard operation that used a stale local branch reference. All work was successfully recovered from git reflog on December 7, 2025, and the v0.2.0 tag was corrected to point to the complete release.
Impact: None - All work recovered, no data lost
Root Cause: Reset to stale local branch reference - local version-0.2.0 not fetched after remote PR merge
Recovery Time: ~4 hours
Timeline¶
All times in PST (UTC-8):
December 6, 2025¶
- 04:56: Created
version-0.1.4branch on GitHub - 16:30: Created
prompt-system-refactorbranch on GitHub - 20:13: Created
version-0.2.0branch on GitHub - 21:01: Codex snapshot commit (WIP)
0b019ed
December 7, 2025¶
- 19:03: Completed prompt system implementation on
prompt-system-refactor(commitc6532f5) - 19:30: PR #11 merged
prompt-system-refactorβversion-0.1.4(merge commit87b6603) - 19:33: PR #12 merged
version-0.1.4βversion-0.2.0(merge commit5bf012d) - 19:37: INCIDENT: Reset main to stale branch reference
- Command:
git reset --hard version-0.2.0(session log line 396) - Problem: Local
version-0.2.0not fetched after remote PR #12 merge - Local branch pointed to
d51fc87(version bump), not5bf012d(merged state) - Missing step:
git fetch origin version-0.2.0before reset - This orphaned all prompt system work (commit
c6532f5and descendants)
December 8, 2025¶
- 03:30: PR #11 branch
prompt-system-refactordeleted from GitHub (automatic cleanup) - 03:33: PR #12 branch
version-0.1.4deleted from GitHub (automatic cleanup) - 03:38: Branch
version-0.2.0deleted from GitHub (automatic cleanup)
December 7, 2025 (Recovery)¶
- ~20:00: Discovered orphaned work
- ~20:30: Located commit
c6532f5in git reflog - ~21:00: Recovered all 39 files from reflog
- ~21:15: Created PR #13 with recovered work
- ~21:20: Merged PR #13 to main
- ~21:22: Corrected v0.2.0 tag to point to proper release
Root Cause Analysis¶
What Happened¶
The incident occurred when prompt system work (commit c6532f5 and descendants) became orphaned after a git operation on the main branch:
- Tag created:
v0.2.0tag was created at commit0d7d459(version bump commit) - Work done on branch: Prompt system work was merged into
version-0.2.0branch via PRs #11 and #12 (merge commit5bf012d) - Destructive operation on main: A git command was executed that caused main to not include the prompt system work
- Branches deleted: GitHub auto-deleted branches
prompt-system-refactor,version-0.1.4, andversion-0.2.0after PR merges (standard behavior) - Work orphaned: With branches deleted and main not including the work, commit
c6532f5and all descendants became unreachable from any branch
Why It Happened (Root Cause: Stale Branch Reference)¶
Root Cause Identified (via session log forensics):
The incident was caused by resetting main to a stale local branch reference that had not been updated after a remote PR merge.
The Fatal Sequence (from recovered session log):
- PR #12 created: Merging
version-0.1.4βversion-0.2.0(session log line 316) - PR #12 merged on GitHub: User confirms merge completion (session log line 321)
- Remote
origin/version-0.2.0now points to5bf012d(includes prompt system) - Local
version-0.2.0still points tod51fc87(version bump only) - Reset executed:
git reset --hard version-0.2.0(session log line 396) - Used local stale reference
d51fc87 - Should have used remote updated reference
5bf012d - Result: Reset to pre-merge state, orphaning all new work
The Missing Step:
Between the PR merge and the reset, this was required but omitted:
# REQUIRED: Update local branch after remote merge
git fetch origin version-0.2.0
git checkout version-0.2.0
git pull
# Verify prompt system work is present
git log --oneline -10
ls -la src/tnh_scholar/prompt_system/
# THEN safe to reset
git checkout main
git reset --hard version-0.2.0
Why This Mistake Happened:
- Git semantics gap: Branch names in git commands refer to local references, not remote state
- Human error: User merged PR on GitHub without checking out/updating local branch
- Agent assumption: Agent assumed
version-0.2.0would automatically point to merged state - No verification: No check that local branch matched remote branch before reset
- Workflow complexity: Multi-tier merge chain (feature β 0.1.4 β 0.2.0 β main) increased opportunity for staleness
Evidence:
Session log confirms reset moved to commit d51fc87:
git reset --hard version-0.2.0
HEAD is now at d51fc87 feat: Automate PyPI README frontmatter stripping
This is the version bump commit before PR #12 merge. The prompt system work was in merge commit 5bf012d, which existed on remote but not in the local branch reference.
Contributing Factors:
- No branch protection on main: Destructive operations were possible without review
- No post-merge fetch protocol: Safety rules didn't require fetch after remote operations
- No content verification: Didn't verify expected files present in reset target
- No pre-reset safety checks: No verification of unpushed commits before destructive operations
- Complex workflow: Using version branches as development location instead of feature branches merged to main
What Was At Risk¶
Code Changes (39 files, 4,508 insertions, 130 deletions)¶
Prompt System Package (28 files):
src/tnh_scholar/prompt_system/- Complete implementation (22 files)tests/prompt_system/- Test suite (6 files)- Architecture: config, domain, transport, adapters, mappers, service layers
Documentation (7 files):
docs/architecture/prompt-system/adr/adr-pt04-prompt-system-refactor.md(1,198 lines)docs/architecture/metadata/adr/adr-md02-metadata-object-service-integration.md(465 lines)docs/architecture/object-service/object-service-design-gaps.md(566 lines)docs/architecture/project-policies/adr/adr-pp01-rapid-prototype-versioning.md(299 lines)docs/architecture/project-policies/index.mddocs/architecture/project-policies/versioning-policy-implementation-summary.mdVERSIONING.md(176 lines)
Modified Files (4 files):
AGENTLOG.md- Session documentationsrc/tnh_scholar/gen_ai_service/pattern_catalog/adapters/prompts_adapter.py- Integration changessrc/tnh_scholar/metadata/metadata.py- Infrastructure updates- Various documentation updates
Recovery Process¶
Discovery¶
- User noticed ADR-PT04 was missing from repository
- Checked recent commits - prompt system not present
- Checked git reflog - found orphaned commit
c6532f5
Forensic Analysis¶
Comprehensive analysis performed:
# Found orphaned commits
git fsck --lost-found --no-reflogs
# Result: 173 orphaned commits total
# Located prompt system work
git reflog | grep -E "prompt-system|version-0"
# Found: c6532f5 (Dec 7, 19:03)
# Verified PR merge commits still in reflog
git show 87b6603 # PR #11 merge
git show 5bf012d # PR #12 merge
# Confirmed prompt_system code identical
git diff --stat 5bf012d c6532f5 -- src/tnh_scholar/prompt_system/
# Result: No differences
Recovery Steps¶
- Created recovery branch:
- Recovered files from orphaned commit:
git checkout c6532f5 -- src/tnh_scholar/prompt_system/
git checkout c6532f5 -- tests/prompt_system/
git checkout c6532f5 -- docs/architecture/metadata/
git checkout c6532f5 -- docs/architecture/prompt-system/
git checkout c6532f5 -- docs/architecture/project-policies/
git checkout c6532f5 -- docs/architecture/object-service/object-service-design-gaps.md
git checkout c6532f5 -- VERSIONING.md
git checkout c6532f5 -- AGENTLOG.md
git checkout c6532f5 -- src/tnh_scholar/gen_ai_service/pattern_catalog/adapters/prompts_adapter.py
git checkout c6532f5 -- src/tnh_scholar/metadata/metadata.py
- Verified recovery:
- Committed and pushed immediately:
git commit -m "feat: Recover prompt system implementation (ADR-PT04)..."
git push -u origin recovery/prompt-system-refactor
-
Created PR #13 with comprehensive recovery documentation
-
Merged to main (user performed merge)
-
Corrected v0.2.0 tag:
# Delete incorrect tag
git tag -d v0.2.0
git push origin :refs/tags/v0.2.0
# Create corrected tag with documentation
git tag -a v0.2.0 -m "Release v0.2.0 (Corrected)..."
git push origin v0.2.0
Forensic Findings¶
Orphaned Commits Analysis¶
Total orphaned commits: 173
Categories:
- Codex snapshots (most common): WIP commits from various dates
- Example:
0b019ed(Dec 6, 21:01) - Earlier version of prompt system - Development branches: Old feature branches never merged
- Experimental work: Test commits, prototypes
Most Recent Orphaned Work:
0b019ed- Codex snapshot with partial prompt system (Dec 6)- Superseded by
c6532f5(more complete implementation) - No unique work lost
GitHub Event Log¶
Retrieved from GitHub API (/repos/.../events):
{
"type": "DeleteEvent",
"ref": "version-0.2.0",
"created_at": "2025-12-08T03:39:43Z"
}
{
"type": "DeleteEvent",
"ref": "prompt-system-refactor",
"created_at": "2025-12-08T03:38:36Z"
}
{
"type": "DeleteEvent",
"ref": "version-0.1.4",
"created_at": "2025-12-08T03:38:19Z"
}
All branch deletions occurred automatically after PR merges (standard GitHub behavior).
Comparison: Original vs Recovered¶
Differences between PR #12 merge (5bf012d) and recovered main (a6523a9):
- Added:
docs/development/git-workflow.md(safety documentation - created during recovery) - Modified: Minor doc updates (README.md, TODO.md) - unrelated to incident
- prompt_system code: Identical (0 differences)
Conclusion: 100% of prompt system implementation recovered with no data loss.
Preventative Measures Implemented¶
1. Git Safety Rules (.claude/CLAUDE.md)¶
Added comprehensive safety rules for AI agents:
**CRITICAL - NEVER VIOLATE THESE RULES:**
1. **NEVER run `git reset --hard` without explicit user approval**
2. **NEVER run `git reset --soft/--mixed` on main branch without explicit approval**
3. **NEVER delete branches without explicit approval**
4. **NEVER force push (`git push --force`) without explicit approval**
5. **ALWAYS push feature branches to origin before switching away**
6. **ALWAYS verify unpushed commits before any reset operation**
7. **ASK FIRST** before any destructive git operation**
**Required checks before destructive operations:**
- Run `git status` and `git branch -vv` to check unpushed work
- Run `git log --branches --not --remotes` to see unpushed commits
- Explicitly confirm with user what will be lost
2. Post-Remote-Merge Safety Protocol (.claude/CLAUDE.md)¶
Added critical protocol for handling remote PR merges:
### Post-Merge Safety Protocol
**CRITICAL: After ANY GitHub PR merge, ALWAYS:**
1. **Fetch the branch**: `git fetch origin <branch>`
2. **Verify merge commit present**: `git log origin/<branch> --oneline -5`
3. **Check local vs remote**: Compare `git rev-parse <branch>` vs `git rev-parse origin/<branch>`
4. **Update local if stale**: `git checkout <branch> && git pull`
5. **Verify expected content**: List key files/dirs that should be present
### Pre-Reset Content Verification
**BEFORE `git reset --hard <target>`, ALWAYS:**
1. **Fetch target**: `git fetch origin <target>` (if branch exists on remote)
2. **Preview changes**: `git diff --stat HEAD..<target>`
3. **Verify expected work present**:
- If expecting new directories/files: `git ls-tree -r <target> --name-only | grep <expected-path>`
4. **Check unpushed work**: `git log --branches --not --remotes --oneline`
5. **Confirm with user**: Show exactly what will be lost/gained
3. Branch Staleness Detection Script¶
Created scripts/git-check-staleness.sh:
#!/bin/bash
# Detect if local branch reference is stale compared to remote
branch="${1:-HEAD}"
git fetch origin "$branch" 2>/dev/null
local_sha=$(git rev-parse "$branch" 2>/dev/null)
remote_sha=$(git rev-parse "origin/$branch" 2>/dev/null)
if [ -n "$remote_sha" ] && [ "$local_sha" != "$remote_sha" ]; then
echo "β οΈ WARNING: Local branch '$branch' is STALE!"
echo " Local: $local_sha"
echo " Remote: $remote_sha"
echo ""
echo " Update first: git checkout $branch && git pull"
exit 1
fi
echo "β Branch '$branch' is up-to-date"
4. Git Workflow Documentation¶
Created docs/development/git-workflow.md:
- Safe workflow patterns
- Recovery procedures
- Incident documentation
- Pre-checkout hooks
5. Git Safety Aliases¶
Configured globally:
git config --global alias.check-unpushed 'log --branches --not --remotes --oneline'
git config --global alias.safe-reset '!f() { echo "β οΈ DANGER..."; read -p "Type YES: " confirm; ...; }; f'
git config --global alias.safe-push '!f() { git log --branches --not --remotes && git push "$@"; }; f'
6. Pre-Checkout Hook¶
Installed at .git/hooks/pre-checkout:
#!/bin/bash
# Warn when switching branches with unpushed commits
current_branch=$(git symbolic-ref --short HEAD)
unpushed=$(git log --branches --not --remotes --oneline)
if [ -n "$unpushed" ]; then
echo "β οΈ WARNING: Unpushed commits on '$current_branch'"
read -p "Continue? (y/N) " -n 1 -r
[[ ! $REPLY =~ ^[Yy]$ ]] && exit 1
fi
7. Workflow Improvements¶
Recommendations for preventing similar incidents:
- Always fetch after remote PR merges before using branch references in local operations
- Always push branches immediately after creation with
git push -u origin <branch> - Use feature branches merged to main instead of version branches as development locations
- Prefer merge workflows over reset workflows for integrating changes to main
- Verify expected content present before destructive operations (check key files/directories)
- Check branch staleness using
scripts/git-check-staleness.sh <branch>before reset operations - Enable branch protection on main to prevent direct pushes and force pushes
Lessons Learned¶
What Went Well¶
β Git reflog preserved work: All commits still accessible locally β Quick identification: Issue discovered within hours β Complete recovery: 100% of work recovered with no data loss β Comprehensive forensics: Thorough analysis ensured no missed work β Documentation: Full incident documentation for future reference
What Could Be Improved¶
β No branch protection: Main branch had no safeguards β No post-merge fetch protocol: Local branches not updated after remote PR merges β No content verification: Did not verify expected files present before reset β Git semantics gap: Agent/user didn't understand branch names use local refs, not remote state β Confusing workflow: Using version branches as primary development location β No pre-push verification: Branches not pushed during development β No safety checks: No verification before destructive git operations
Action Items¶
Immediate (Completed):
- Add git safety rules to
.claude/CLAUDE.md - Create git workflow documentation
- Install pre-checkout hooks
- Configure git safety aliases
- Correct v0.2.0 tag
- Document incident
Short-term (Recommended):
- Enable branch protection on main (require PR reviews)
- Standardize release workflow (avoid version branches)
- Add pre-commit hook to check for unpushed work
- Document tag vs branch naming conventions
Long-term (Future consideration):
- Automated backups of git reflog
- CI checks for orphaned commits
- Team training on git recovery procedures
- Feature request to Claude Code team: Session audit logging (see Appendix B)
Appendix A: Claude Code Session Audit Logging (Feature Request)¶
Problem¶
This incident revealed a critical gap: no audit trail of commands executed by Claude Code sessions. Without shell command history, we cannot definitively determine: - What exact git command caused the incident - The sequence of operations leading to data loss - Whether the issue was human error, agent error, or tool misconfiguration
Proposed Solution¶
Add opt-in session audit logging to Claude Code with two simple log streams:
1. File Operation Log (~/.claude/audit/file-operations.jsonl):
{"timestamp": "2025-12-07T19:37:42-08:00", "session_id": "abc123", "operation": "write", "path": "/path/to/file.ts", "tool": "Write"}
{"timestamp": "2025-12-07T19:37:45-08:00", "session_id": "abc123", "operation": "edit", "path": "/path/to/file.ts", "tool": "Edit"}
2. Shell Command Log (~/.claude/audit/shell-commands.jsonl):
{"timestamp": "2025-12-07T19:37:50-08:00", "session_id": "abc123", "command": "git status", "cwd": "/path/to/repo", "exit_code": 0}
{"timestamp": "2025-12-07T19:37:51-08:00", "session_id": "abc123", "command": "git reset --hard version-0.2.0", "cwd": "/path/to/repo", "exit_code": 0}
Configuration¶
Add to claude-code-settings.json:
{
"auditLog": {
"enabled": true,
"path": "~/.claude/audit",
"logFileOperations": true,
"logShellCommands": true,
"maxSizeMB": 100,
"retentionDays": 30
}
}
Benefits¶
- Incident forensics: Reconstruct exact sequence of operations
- Security auditing: Track what AI agents are doing
- Debugging: Understand why operations failed
- Compliance: Meet audit requirements for regulated environments
- Privacy-preserving: No file contents logged, only metadata
- Opt-in: Users control whether logging is enabled
Implementation Notes¶
- Use newline-delimited JSON (JSONL) for easy parsing and streaming
- Log only metadata (timestamps, paths, commands) - no sensitive data
- Implement log rotation to prevent unbounded growth
- No performance impact (async logging)
- Compatible with existing
cleanupPeriodDayssetting
Why This Matters¶
This incident would have been immediately diagnosable with command logging. Instead, we spent hours on forensics and still cannot determine root cause. Simple audit logging would prevent this ambiguity for all future incidents.
Technical Details¶
Commands Used in Recovery¶
# Discovery
git reflog --all | head -n 50
git fsck --lost-found --no-reflogs
git log --all --oneline --grep="prompt"
# Analysis
git show c6532f5 --stat
git diff --stat HEAD c6532f5
git ls-tree -r c6532f5 --name-only | grep prompt_system
# Recovery
git checkout -b recovery/prompt-system-refactor
git checkout c6532f5 -- [files...]
git add .
git commit -m "feat: Recover prompt system..."
git push -u origin recovery/prompt-system-refactor
# Tag correction
git tag -d v0.2.0
git push origin :refs/tags/v0.2.0
git tag -a v0.2.0 -m "Release v0.2.0 (Corrected)..."
git push origin v0.2.0
Files Recovered¶
Complete list in PR #13: https://github.com/aaronksolomon/tnh-scholar/pull/13
Summary:
- 35 new files
- 4 modified files
- 4,508 insertions
- 130 deletions
References¶
Related Documents¶
- Git Workflow Guide - Safe git practices
- PR #13 - Recovery PR
- ADR-PT04 - Recovered work
- Tag Correction Plan (v0.2.0 tag fix, documentation pending)
Git Commits¶
c6532f5- Original prompt system implementation (orphaned)87b6603- PR #11 merge commit (version-0.1.4 branch)5bf012d- PR #12 merge commit (version-0.2.0 branch)e65ac7d- Recovery commita6523a9- PR #13 merge to main
GitHub Events¶
- PR #11: https://github.com/aaronksolomon/tnh-scholar/pull/11
- PR #12: https://github.com/aaronksolomon/tnh-scholar/pull/12
- PR #13: https://github.com/aaronksolomon/tnh-scholar/pull/13
Appendix: Orphaned Commits Sample¶
First 10 orphaned commits from git fsck:
0b019ed - codex snapshot (2025-12-06 21:01) - Partial prompt system
318192e - codex snapshot (2025-11-28 16:10)
8001092 - codex snapshot (2025-11-23 21:02)
5c0283d - codex snapshot (2025-11-20 15:07)
1c8498b - codex snapshot (2025-11-22 07:37)
72840c1 - [old development work]
8104125 - [old development work]
65050e9 - [old development work]
878664d - [old development work]
8886ed2 - [old development work]
Total: 173 orphaned commits (mostly historical, no unique work lost)
Report prepared by: Claude Sonnet 4.5 Reviewed by: Aaron Solomon (phapman) Date: December 7, 2025 Status: Final