restore
restore command¶
The restore
command is used to restore a database from a previously created dump. You can specify the dump to restore
by providing the dump ID or use the latest
keyword to restore the latest completed dump.
greenmask --config=config.yml restore DUMP_ID
Alternatively, to restore the latest completed dump, use the following command:
greenmask --config=config.yml restore latest
Note that the restore
command shares the same parameters and environment variables as pg_restore
,
allowing you to configure the restoration process as needed.
Mostly it supports the same flags as the pg_restore
utility, with some extra flags for Greenmask-specific features.
--batch-size int the number of rows to insert in a single batch during the COPY command (0 - all rows will be inserted in a single batch)
-c, --clean clean (drop) database objects before recreating
-C, --create create the target database
-a, --data-only restore only the data, no schema
-d, --dbname string connect to database name (default "postgres")
--disable-triggers disable triggers during data section restore
--enable-row-security enable row security
-N, --exclude-schema strings do not restore objects in this schema
-e, --exit-on-error exit on error, default is to continue
-f, --file string output file name (- for stdout)
-P, --function strings restore named function
-h, --host string database server host or socket directory (default "/var/run/postgres")
--if-exists use IF EXISTS when dropping objects
-i, --index strings restore named index
--inserts restore data as INSERT commands, rather than COPY
-j, --jobs int use this many parallel jobs to restore (default 1)
--list-format string use table of contents in format of text, json or yaml (default "text")
--no-comments do not restore comments
--no-data-for-failed-tables do not restore data of tables that could not be created
-O, --no-owner skip restoration of object ownership
-X, --no-privileges skip restoration of access privileges (grant/revoke)
--no-publications do not restore publications
--no-security-labels do not restore security labels
--no-subscriptions ddo not restore subscriptions
--no-table-access-method do not restore table access methods
--no-tablespaces do not restore tablespace assignments
--on-conflict-do-nothing add ON CONFLICT DO NOTHING to INSERT commands
--overriding-system-value use OVERRIDING SYSTEM VALUE clause for INSERTs
--pgzip use pgzip decompression instead of gzip
-p, --port int database server port number (default 5432)
--restore-in-order restore tables in topological order, ensuring that dependent tables are not restored until the tables they depend on have been restored
-n, --schema strings restore only objects in this schema
-s, --schema-only restore only the schema, no data
--section string restore named section (pre-data, data, or post-data)
-1, --single-transaction restore as a single transaction
--strict-names restore named section (pre-data, data, or post-data) match at least one entity each
-S, --superuser string superuser user name to use for disabling triggers
-t, --table strings restore named relation (table, view, etc.)
-T, --trigger strings restore named trigger
-L, --use-list string use table of contents from this file for selecting/ordering output
--use-session-replication-role-replica use SET session_replication_role = 'replica' to disable triggers during data section restore (alternative for --disable-triggers)
--use-set-session-authorization use SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands to set ownership
-U, --username string connect as specified database user (default "postgres")
-v, --verbose string verbose mode
Extra features¶
Inserts and error handling¶
Warning
Insert commands are a lot slower than COPY
commands. Use this feature only when necessary.
By default, Greenmask restores data using the COPY
command. If you prefer to restore data using INSERT
commands, you
can
use the --inserts
flag. This flag allows you to manage errors that occur during the execution of INSERT commands. By
configuring an error and constraint exclusion list in the config,
you can skip certain errors and continue inserting subsequent rows from the dump.
This can be useful when adding new records to an existing dump, but you don't want the process to stop if some records already exist in the database or violate certain constraints.
By adding the --on-conflict-do-nothing
flag, it generates INSERT
statements with the ON CONFLICT DO NOTHING
clause, similar to the original pg_dump option. However, this approach only works for unique or exclusion constraints.
If a foreign key is missing in the referenced table or any other constraint is violated, the insertion will still fail.
To handle these issues, you can define
anexclusion list in the config.
```shell title="example with inserts and on conflict do nothing"
greenmask --config=config.yml restore DUMP_ID --inserts --on-conflict-do-nothing
By adding the --overriding-system-value
flag, it generates INSERT
statements with the OVERRIDING SYSTEM VALUE
clause, which allows you to insert data into identity columns.
CREATE TABLE people (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
generated text GENERATED ALWAYS AS (id || first_name) STORED,
first_name text
);
greenmask --config=config.yml restore DUMP_ID --inserts --overriding-system-value
Restoration in topological order¶
By default, Greenmask restores tables in the order they are listed in the dump file. To restore tables in topological
order, use the --restore-in-order
flag. This flag ensures that dependent tables are not restored until the tables they
depend on have been restored.
This is useful when you have the schema already created with foreign keys and other constraints, and you want to insert data into the tables in the correct order or catch-up the target database with the new data.
Warning
Greenmask cannot guarantee restoration in topological order when the schema contains cycles. The only way to restore tables with cyclic dependencies is to temporarily remove the foreign key constraint (to break the cycle), restore the data, and then re-add the foreign key constraint once the data restoration is complete.
If your database has cyclic dependencies you will be notified about it but the restoration will continue.
2024-08-16T21:39:50+03:00 WRN cycle between tables is detected: cannot guarantee the order of restoration within cycle cycle=["public.employees","public.departments","public.projects","public.employees"]
Pgzip decompression¶
By default, Greenmask uses gzip decompression to restore data. In mist cases it is quite slow and does not utilize all
available resources and is a bootleneck for IO operations. To speed up the restoration process, you can use
the --pgzip
flag to use pgzip decompression instead of gzip. This method splits the data into blocks, which are
decompressed in parallel, making it ideal for handling large volumes of data.
greenmask --config=config.yml restore latest --pgzip
Restore data batching¶
The COPY command returns the error only on transaction commit. This means that if you have a large dump and an error
occurs, you will have to wait until the end of the transaction to see the error message. To avoid this, you can use the
--batch-size
flag to specify the number of rows to insert in a single batch during the COPY command. If an error
occurs during the batch insertion, the error message will be displayed immediately. The data will be committed only
if all batches are inserted successfully.
This is useful when you want to be notified of errors as immediately as possible without waiting for the entire table to be restored.
Warning
The batch size should be chosen carefully. If the batch size is too small, the restoration process will be slow. If the batch size is too large, you may not be able to identify the error row.
In the example below, the batch size is set to 1000 rows. This means that 1000 rows will be inserted in a single batch, so you will be notified of any errors immediately after each batch is inserted.
greenmask --config=config.yml restore latest --batch-size 1000