Backup and Disaster Recovery
What to back up, how to restore, and how to plan for failure.
Backup and Disaster Recovery
ObjectOS itself is stateless — the runtime can be rebuilt from the container image. Backups are about the data ObjectOS depends on, not ObjectOS itself. Plan recovery around those datasets.
What to back up
| Asset | Owned by | Backup strategy |
|---|---|---|
| Business database | Customer | Database-native backup (PITR for managed services); test restore quarterly |
Compiled artifact (objectstack.json) | Application/release team | Store every published artifact in immutable storage; never overwrite |
| Secrets baseline | Customer secret manager | Vendor-native backup; rotate cadence per policy |
Object/file storage (when storage capability is enabled) | Customer | Bucket versioning + cross-region replication if required |
| Customer-managed identity provider | IdP vendor | Vendor-native |
ObjectOS does not require a separate backup of its container
filesystem. Cache directories (OS_CACHE_DIR) are rebuildable.
Database backups
Match the strategy to the driver:
| Driver | Recommended approach |
|---|---|
| PostgreSQL | Continuous WAL archiving + base backup; point-in-time restore |
| Turso / libSQL | Use the platform's backup feature; export to SQLite locally for cold copy |
| MongoDB | Replica-set snapshots; oplog for PITR |
| SQLite (single-node / desktop) | WAL mode + hot snapshots via VACUUM INTO or the Online Backup API; optionally Litestream for continuous replication — see SQLite deployments below |
Whatever the driver, validate that:
- backups complete within the customer's RPO;
- a fresh database can boot ObjectOS from the same artifact and serve traffic;
- restore tests are exercised end-to-end at least once a quarter.
SQLite deployments
SQLite is the default driver and is a legitimate production choice for single-node ObjectOS — desktop apps, internal tools for a small team, edge / on-prem appliances, evaluation environments. The trade-off is structural: SQLite is single-writer, so it does not fit multi-node ObjectOS, high concurrent write throughput, or shared-database deployments. For those, use PostgreSQL.
When you do run SQLite in production, the configuration that makes it safe and the backup recipe go together:
Runtime configuration
- Enable WAL:
PRAGMA journal_mode=WAL(readers don't block the writer; crash-safe). PRAGMA synchronous=NORMALis the usual desktop/single-node default — safe with WAL, much faster thanFULL.- Keep the database file on local disk. Do not put it on NFS, SMB, or inside a folder synced by Dropbox / OneDrive / iCloud while the runtime is running — SQLite's locking is unreliable on those and this is the most common way SQLite deployments corrupt.
Hot snapshot (no downtime)
The runtime keeps serving traffic while you take a snapshot using
SQLite's Online Backup API or VACUUM INTO:
VACUUM INTO '/var/backups/objectos/db-2026-05-27T13-00Z.sqlite';Schedule it (cron, systemd timer, or the in-process scheduler) at a cadence matching your RPO. Tag each snapshot with the artifact version that produced it so you can roll database + artifact back together (see Artifact versioning below).
Offsite copy
A local snapshot doesn't survive disk loss. Push each snapshot to durable storage:
- Server / VM: push to S3 or any S3-compatible bucket (you likely
already have one configured for the
storagecapability). - Desktop app: write the snapshot to a user-controlled sync folder (OneDrive / iCloud / Google Drive) — acceptable here because the snapshot file is closed and immutable, unlike the live database.
Continuous replication (lower RPO)
For sub-minute RPO without giving up SQLite, run Litestream alongside the ObjectOS process. It streams the WAL to S3-compatible storage and supports point-in-time restore. This is the recommended path when a single-node SQLite deployment needs near-zero data loss.
Restore
Stop the runtime, replace the database file with the chosen snapshot
(or run litestream restore), start the runtime against the same
artifact version that produced the snapshot.
Artifact versioning
Treat published artifacts as immutable. Tag each artifact with the
release id used to compile it (e.g. objectstack-2026-05-24.json).
Recovery to a known-good business state usually means:
- Restore the database to the chosen point in time.
- Re-point ObjectOS at the artifact version that was live at that time.
- Restart the runtime.
If you overwrite artifacts in place, you lose the ability to roll back cleanly even when the database backup is perfect.
RPO/RTO planning
A workable starting target for customer deployments:
| Class | RPO | RTO | Notes |
|---|---|---|---|
| Evaluation / demo | best-effort | best-effort | SQLite snapshot is fine |
| Desktop app / small-team single-node | ≤ 1 hour (snapshots) or ≤ seconds (Litestream) | minutes | SQLite + WAL + scheduled VACUUM INTO + offsite copy |
| Single-tenant production | ≤ 15 min | ≤ 1 hour | Managed PostgreSQL with PITR + warm image |
| Multi-tenant / regulated | ≤ 5 min | ≤ 30 min | HA database + multi-AZ ObjectOS + tested runbook |
Tighter targets require platform changes that are outside the ObjectOS container itself (HA database, multi-AZ ingress, warm replicas).
Failure modes worth rehearsing
- Database unavailable for a long window. Confirm the runtime surfaces a clear 503 and that probes correctly mark pods unhealthy.
- Artifact regression. Roll back the artifact pointer; data is unaffected.
- Secret rotation. Rotating
OS_AUTH_SECRETinvalidates every session. Run during a maintenance window or stagger across replicas. - Region outage. If the customer requires region failover, the business database, secret manager, and ingress all need to be cross-region. ObjectOS itself can run anywhere the image is available.
What to capture in the runbook
- Backup schedule, retention, and on-call contact for each dataset.
- Step-by-step restore procedure (database first, then artifact, then start ObjectOS).
- Verification queries to confirm the restored database is consistent.
- Rollback plan for both ObjectOS image and artifact version (see Upgrade and Rollback).
- Communication template for customer-visible incidents.