GitHub data-sync (experimental)
Per-user backup of all server-held text + metadata into a GitHub
repository the user owns. Goal: a user can survive a complete server
wipe by git clone-ing their own data back. Static assets we host
(avatars, attachments, cover images) are referenced by URL; their
bytes stay on our CDN and are not committed.
This feature is experimental and gated behind a frontend "Experimental" card in editor settings. It is not wired into any automatic schedule; the user pushes manually for now.
Why a GitHub App, not a personal access token?
- Per-user install (
installation_id) scoped to one repository. - Tokens rotate ~hourly automatically.
- Revoked by the user from their GitHub settings, no server work needed on our side.
- Survives password rotation / 2FA changes on the user's side.
Required env vars
Drop these into the backend .env:
GITHUB_DATA_SYNC_APP_NAME=notechondria-data-sync
GITHUB_DATA_SYNC_APP_CLIENT_ID=<from GitHub App settings>
GITHUB_DATA_SYNC_APP_CLIENT_SECRET=<from GitHub App settings>
GITHUB_DATA_SYNC_APP_PRIVATE_KEY=<PEM, single-line with \n escapes>
GITHUB_DATA_SYNC_APP_INSTALL_URL=https://github.com/apps/notechondria-data-sync/installations/new
The App must request the following permissions:
- Repository: Contents read+write (write is what
commit_and_pushuses). Single repo per install. - Metadata: read (default).
Repository layout
/
├── README.md brief pointer + last-sync timestamp
├── manifest.json schema version + per-section index
├── profile/
│ ├── creator.json Creator row (no api_key_hash, no avatar bytes)
│ ├── settings.json app_settings_json + updated_at
│ └── skill.md mcp_skill_md verbatim
├── courses/
│ └── <slug>.json
├── notes/
│ ├── <uuid>.md markdown body + YAML frontmatter
│ └── <uuid>.meta.json sidecar: system metadata + custom_meta + sharing_id
├── planner/
│ ├── events.json
│ └── feeds.json
└── recycle_bin.json
manifest.json schema versions everything. schema_version=1 is the
shape captured by creators.services.github_sync.materialize. Bump
when you add fields; do not silently break older clones.
Restore (manual, future automated)
git clonethe user's repo.- POST
/api/v1/auth/register/(or sign in via OAuth) to recreate the Creator row. - PATCH
/api/v1/settings/withmcp_skill_md,theme_*, and theapp_settingsblob fromprofile/settings.json. - Recreate every course via
POST /api/v1/courses/using its slug. - Recreate every note via
POST /api/v1/notes/reading the markdown body + the sidecar*.meta.jsonformetadata_jsonandcustom_meta. - Recreate planner events + feeds from
planner/*.json.
The end-to-end restore tooling is not yet shipped; 0.1.90 only
covers the export half. Tracked under "Release / CI" in
docs/TODO.md.
Wire-up flow
- User clicks "Connect to GitHub" in editor settings → frontend
redirects to
GITHUB_DATA_SYNC_APP_INSTALL_URL. - After install, GitHub redirects back to the editor with
?installation_id=...&setup_action=install. - Frontend POSTs
/api/v1/integrations/github/callback/with the install id + chosenrepo_full_nameandrepo_default_branch. - Backend persists a
GithubIntegrationrow keyed by Creator. - User clicks "Push now" →
POST /api/v1/integrations/github/push/. Backend materializes the file tree and PUTs each file via the GitHub Contents API using an installation-scoped access token.
Known gaps as of 0.1.94
The push and restore halves are now end-to-end functional including binary assets. The remaining work is around concurrent edits and long-term repo hygiene.
- Conflict resolution. The Contents API PUTs we use overwrite the remote blob. A user editing on two devices between syncs can lose changes. The next iteration should fetch the existing blob on each path, diff it against the materialized payload, and surface a "remote changed" warning before overwriting.
- Asset rotation / pruning. Repeated pushes with assets
accumulate orphan files for notes that have been deleted
client-side but whose old asset paths still live in the remote
tree. A
--prune-orphansmode on the push pipeline can walk the Trees API and delete unreferencedassets/notes/<uuid>/subtrees in the same commit.
Closed gaps (0.1.94)
- Static-asset re-bundling for both push and restore.
- Push: opt-in via the "Include assets" toggle on the
GitHub Sync card (or
include_assets=trueon the/api/v1/integrations/github/push/endpoint). Inlines avatar / cover / attachment bytes underassets/...paths. Per-file 50 MB and per-push 200 MB caps; oversized files are recorded inmanifest.skipped_assets. - Restore:
backend/scripts/github_sync_restore.py --include-assetswalks the same paths and re-uploads via the existing multipart endpoints (PATCH /settings/for avatar,POST /notes/<id>/cover/,POST /notes/<id>/attachments/).
- Push: opt-in via the "Include assets" toggle on the
GitHub Sync card (or
Closed gaps (0.1.93)
_refresh_installation_tokenis wired:pyjwt + cryptographyship in bothbackend/requirements.txtandbackend/requirements-render.txt; the signer is covered bycreators.tests.GithubSyncTestsagainst a freshly generated test RSA keypair.- The frontend repo picker shipped via the shared
GithubSyncExperimentalCard. Editor / planner / portal each exposegithubSyncStatus / githubSyncRepos / githubSyncCallback / githubSyncPush / githubSyncDisconnecton theirNotechondriaClient; the card itself is callback-driven and works from any of the three apps. - A scriptable restore lives at
backend/scripts/github_sync_restore.py. Stdlib-only, supports--dry-runand--verbose, and usesclient_draft_idto make reruns idempotent.