Saving Time & Space: Simplifying DevOps with Fast Cloning
As I have written before, I think our fast cloning feature is pretty cool. Not only is it very fast (as in it took me barely a minute to clone a 2TB database!), it allows you to have multiple copies of your data without the additional cost of storage usually associated with replicating data.
That is all well and good, but out in the real world, folks want to see lots of examples on how to make good use of this feature. A common question from customers is how to utilize the Snowflake fast clone feature to support DevOps. Specifically, can it be used to support a promotion process from development to integration/testing to production?
The answer is yes! Following are a few examples of how that might work.
Getting Started: Cloning Your Production Database
Starting out, let’s assume you have one database in Snowflake (i.e. production by default) and it has a schema named PUBLIC. In this database you have two tables (A and B) and they are being populated daily by a load process.
Now you find you need to make changes and additions and realize you should not do those in production so now you need a development database. How do you do that in Snowflake?
You could do it the old-fashioned way and just create a new database then copy the existing tables and data over from production (using a "CREATE TABLE AS SELECT ..." or CTAS operation). If they are large tables that would take some time and cost extra as it also requires more space. However, using Snowflake's fast clone feature you can do it way faster and not incur the cost for extra space.
Here is how to do it:
CREATE DATABASE Dev CLONE Prod;
That’s it, one command! It creates the new database, schema, and tables and gives you a logical copy of the production data to work with almost instantly.
While you are at it, you can also create the integration testing database (INT database) too:
CREATE DATABASE Int CLONE Prod;
Now you are set to continue your data warehouse development efforts with a proper environment in place.
Scenario 2: Promoting New Development
Based on what we did above, your current Development database has tables A and B, cloned from Prod. As you begin your development iteration, you can add more tables. In this example, we added two new tables C and D, which are under development.
Initial State:
Prod database: Tables A, B
Int database: Tables A, B (previously cloned from Prod)
Dev database:
- Tables A, B (previously cloned from Prod)
- Tables C, D (created as part of new development)
Mini Release:
Now suppose you want to do a mini-release, wherein you:
- Promote only Table C to Int for testing
- Leave Table D as is in Dev (work-in-progress, not ready to deploy)
To do this mini-release you need to execute the following steps in a script, using the command-line client, or in the Snowflake worksheet:
1. Deploy Table C to Int by simple clone:
- If C already already contains production data (copy data):
USE Int.public;
CREATE TABLE C CLONE Dev.public.C; // Yes, that simple!- Load incremental new data into Table C
- Validate results
- If Table C has only test data and you do not want to copy the data:
USE Int.public;
CREATE TABLE C LIKE Dev.public.C; // copy the table structure only- Load new data in Table C
- Validate results
2. If the validation of Table C succeeds in Int, then deploy it to Prod:
USE Prod.public;
CREATE TABLE C LIKE Int.public.C;- Load production data into Table C
Now you have the new Table C live in your Prod database.
But you are not quite done yet. You need to complete the cycle.
Refresh Dev:
We now want to refresh Dev to be in sync with Prod. To do that we need to create a production clone of Table C and update Tables A and B with refreshed data. This will also serve to reduce your storage by replacing the original version of Table C in Dev with a cloned table. Likewise, you should sync Int as well.
If there are just a few tables in Prod (as in our example) then:
USE Dev.public;
CREATE OR REPLACE TABLE A CLONE Prod.public.A;
CREATE OR REPLACE TABLE B CLONE Prod.public.B;
CREATE OR REPLACE TABLE C CLONE Prod.public.C;
USE Int.public;
CREATE or REPLACE DATABASE Int CLONE Prod;
Note: CREATE OR REPLACE TABLE will drop and rebuild the table. This will release any space used in Dev (or Int) by those tables subsequent to the last clone operation.
Hint: If there are many tables in Prod, and only a few new tables in Dev, then this approach will be more efficient:
CREATE DATABASE Dev2 CLONE Prod;
USE Dev2.public;
CREATE TABLE D CLONE Dev.public.D; // preserving the WIP table in Dev
DROP DATABASE Dev;
ALTER DATABASE Dev2 RENAME To Dev;
This process removes all previous clones and allows you to preserve Table D as it is currently in development. When your are done with development on Table D, you follow the same process as we did for Table C to promote Table D to Int for testing then up to Prod for go live. After the promotion be sure to replace Table D in Dev and Int with a fresh clone from Prod in order to release the space.
Scenario 3: Promoting a Change to an Existing Table
Now let’s suppose you have three tables already in production and you need to alter one of those tables (i.e., add a new column) and then promote the revised table to Production. This is an important scenario, specially when rows are being added constantly to the data in production table. In this scenario you do not want to reload the modified table but rather change it then continue with incremental loads.
Initial State:
Prod database: Tables A, B, C
Int database: Tables A, B, C (cloned from Prod)
Dev database: Tables A, B, C (cloned from Prod)
Mini Release:
In this scenario, you want to do a mini-release, wherein you will:
- Alter Table C by adding one new column
- Adjust the load process to include that column
- Promote Table C to Int for testing
To do this mini-release you need to execute the following steps in a script or by hand in the Snowflake worksheet:
1. Change the table and the code (in Dev), then refresh the table
USE Dev.public;
ALTER TABLE C ADD COLUMN NEW_COL1 VARCHAR(200);- Modify load to include NEW_COL1
- Write a one-time update script to update existing rows to populate NEW_COL1
- Pull data from Production source if possible
- Review and validate results
- Execute modified load routine to load incremental new rows
- Review and validate results
2. Deploy Table C to Int by simple clone:
USE Int.public;
CREATE OR REPLACE TABLE C CLONE Dev.public.C;- Promote revised load process to Int environment
- Promote Update script to Int environment
- Execute Update script
- Review and validate results
- Execute load process to load incremental new data into Table C
- Review and validate results
3. If promotion to Int succeeds, then deploy to Prod:
USE Prod.public;
ALTER TABLE C ADD COLUMN NEW_COL1 VARCHAR(200);- Promote revised load process to Prod environment
- Promote Update script to Prod environment
- Execute Update script
- Review and validate results
- Execute load process to load incremental new data into Table C
- Review and validate results
- Start writing queries and reports that use the new column
Refresh Dev:
Once the promotion to Production is complete, you can refresh Dev and Int again using the same process as mentioned above.
USE Dev.public;
CREATE OR REPLACE TABLE C CLONE Prod.public.C;
USE Int.public;
CREATE OR REPLACE TABLE C CLONE Prod.public.C;
Now you have a production clone of Table C in Dev & Int with the new column added and updated with fresh production data.
Want to see cloning in action? Check out this video:
Conclusion
These are just a few examples of how you might use Snowflake's fast cloning to support an agile, light-weight DevOps process that does not require a lot of DBA management and does not take up extra space! Hopefully this gives you some ideas how you can use fast cloning in your development environment.
Thanks to Saqib Mustafa (@cloudsommelier) for his help on this post!
As always, keep an eye on this blog site, our Snowflake-related Twitter feeds (@SnowflakeDB), (@kentgraziano), and (@cloudsommelier) for more Cool Things About Snowflake and for updates on all the action and activities here at Snowflake Computing.