ObjectOS
Operate

Backup and Disaster Recovery

What to back up, how to restore, and how to plan for failure.

Backup and Disaster Recovery

ObjectOS itself is stateless — the runtime can be rebuilt from the container image. Backups are about the data ObjectOS depends on, not ObjectOS itself. Plan recovery around those datasets.

What to back up

AssetOwned byBackup strategy
Business databaseCustomerDatabase-native backup (PITR for managed services); test restore quarterly
Compiled artifact (objectstack.json)Application/release teamStore every published artifact in immutable storage; never overwrite
Secrets baselineCustomer secret managerVendor-native backup; rotate cadence per policy
Object/file storage (when storage capability is enabled)CustomerBucket versioning + cross-region replication if required
Customer-managed identity providerIdP vendorVendor-native

ObjectOS does not require a separate backup of its container filesystem. Cache directories (OS_CACHE_DIR) are rebuildable.

Database backups

Match the strategy to the driver:

DriverRecommended approach
PostgreSQLContinuous WAL archiving + base backup; point-in-time restore
Turso / libSQLUse the platform's backup feature; export to SQLite locally for cold copy
MongoDBReplica-set snapshots; oplog for PITR
SQLite (single-node / desktop)WAL mode + hot snapshots via VACUUM INTO or the Online Backup API; optionally Litestream for continuous replication — see SQLite deployments below

Whatever the driver, validate that:

  • backups complete within the customer's RPO;
  • a fresh database can boot ObjectOS from the same artifact and serve traffic;
  • restore tests are exercised end-to-end at least once a quarter.

SQLite deployments

SQLite is the default driver and is a legitimate production choice for single-node ObjectOS — desktop apps, internal tools for a small team, edge / on-prem appliances, evaluation environments. The trade-off is structural: SQLite is single-writer, so it does not fit multi-node ObjectOS, high concurrent write throughput, or shared-database deployments. For those, use PostgreSQL.

When you do run SQLite in production, the configuration that makes it safe and the backup recipe go together:

Runtime configuration

  • Enable WAL: PRAGMA journal_mode=WAL (readers don't block the writer; crash-safe).
  • PRAGMA synchronous=NORMAL is the usual desktop/single-node default — safe with WAL, much faster than FULL.
  • Keep the database file on local disk. Do not put it on NFS, SMB, or inside a folder synced by Dropbox / OneDrive / iCloud while the runtime is running — SQLite's locking is unreliable on those and this is the most common way SQLite deployments corrupt.

Hot snapshot (no downtime)

The runtime keeps serving traffic while you take a snapshot using SQLite's Online Backup API or VACUUM INTO:

VACUUM INTO '/var/backups/objectos/db-2026-05-27T13-00Z.sqlite';

Schedule it (cron, systemd timer, or the in-process scheduler) at a cadence matching your RPO. Tag each snapshot with the artifact version that produced it so you can roll database + artifact back together (see Artifact versioning below).

Offsite copy

A local snapshot doesn't survive disk loss. Push each snapshot to durable storage:

  • Server / VM: push to S3 or any S3-compatible bucket (you likely already have one configured for the storage capability).
  • Desktop app: write the snapshot to a user-controlled sync folder (OneDrive / iCloud / Google Drive) — acceptable here because the snapshot file is closed and immutable, unlike the live database.

Continuous replication (lower RPO)

For sub-minute RPO without giving up SQLite, run Litestream alongside the ObjectOS process. It streams the WAL to S3-compatible storage and supports point-in-time restore. This is the recommended path when a single-node SQLite deployment needs near-zero data loss.

Restore

Stop the runtime, replace the database file with the chosen snapshot (or run litestream restore), start the runtime against the same artifact version that produced the snapshot.

Artifact versioning

Treat published artifacts as immutable. Tag each artifact with the release id used to compile it (e.g. objectstack-2026-05-24.json). Recovery to a known-good business state usually means:

  1. Restore the database to the chosen point in time.
  2. Re-point ObjectOS at the artifact version that was live at that time.
  3. Restart the runtime.

If you overwrite artifacts in place, you lose the ability to roll back cleanly even when the database backup is perfect.

RPO/RTO planning

A workable starting target for customer deployments:

ClassRPORTONotes
Evaluation / demobest-effortbest-effortSQLite snapshot is fine
Desktop app / small-team single-node≤ 1 hour (snapshots) or ≤ seconds (Litestream)minutesSQLite + WAL + scheduled VACUUM INTO + offsite copy
Single-tenant production≤ 15 min≤ 1 hourManaged PostgreSQL with PITR + warm image
Multi-tenant / regulated≤ 5 min≤ 30 minHA database + multi-AZ ObjectOS + tested runbook

Tighter targets require platform changes that are outside the ObjectOS container itself (HA database, multi-AZ ingress, warm replicas).

Failure modes worth rehearsing

  • Database unavailable for a long window. Confirm the runtime surfaces a clear 503 and that probes correctly mark pods unhealthy.
  • Artifact regression. Roll back the artifact pointer; data is unaffected.
  • Secret rotation. Rotating OS_AUTH_SECRET invalidates every session. Run during a maintenance window or stagger across replicas.
  • Region outage. If the customer requires region failover, the business database, secret manager, and ingress all need to be cross-region. ObjectOS itself can run anywhere the image is available.

What to capture in the runbook

  • Backup schedule, retention, and on-call contact for each dataset.
  • Step-by-step restore procedure (database first, then artifact, then start ObjectOS).
  • Verification queries to confirm the restored database is consistent.
  • Rollback plan for both ObjectOS image and artifact version (see Upgrade and Rollback).
  • Communication template for customer-visible incidents.

On this page