Snowflake has a great python connector called write_pandas
which takes a pandas DataFrame and will use Arrow for type safety and clever compression techniques to upload it to Snowflake for you.
But sadly this is quite a large package and at a recent client using this wasn’t an option for various reasons. Unfortunitely I’ve always found the Snowflake docs… lacking or maybe just confusing on what your other options are.
So here’s what I found worked:
First create your destination table
Create your table in Snowflake. There are many ways to do this, but lets just do it in a Worksheet.
create or replace TABLE MY_DATABASE.MY_SCHEMA.MY_TABLE (
NUMBER(38,0) NOT NULL,
COL1 VARCHAR(20),
COL2 VARCHAR(200),
COL3 );
You should see a helpful notifiction saying that this worked.
Now create your file format
This bit took me some time to figure out.
You need a file format. This seems to be a file telling Snowflake what to expect when data is uploaded with a PUT. I don’t know how I feel about the fact this exist, I don’t remember ever having to do anything like this for BigQuery but I can appreciate that the flexibility might be helpful to some.
Again, this is a Snowflake thing, do it in your Worksheet and you should see a success message
create or replace file format csv_format
type = csv
= ','
field_delimiter = 1
skip_header = ('NULL', 'null')
null_if = true
empty_field_as_null = gzip
compression = '"'
FIELD_OPTIONALLY_ENCLOSED_BY ESCAPE = "\\"
= None ESCAPE_UNENCLOSED_FIELD
Create your connection to snowflake
First you will need a connection. You should be able to get most of this information from your admin
import snowflake.connector
= snowflake.connector.connect(
snowflake_connection =SNOWFLAKE_USER,
user=SNOWFLAKE_PASSWORD,
password=SNOWFLAKE_ACCOUNT,
account=SNOWFLAKE_ROLE,
role=SNOWFLAKE_WAREHOUSE,
warehouse="MY_DATABASE",
database="MY_SCHEMA",
schema )
Once connected you only nee to run the following:
f"USE SCHEMA MY_SCHEMA")
snowflake_connection.cursor().execute(f"PUT file:///my_file.csv @%TEST_TABLE")
snowflake_connection.cursor().execute(f"COPY INTO TEST_TABLE file_format=(format_name = 'csv_format')") snowflake_connection.cursor().execute(
This will upload your file for you and append it to your TEST_TABLE. Note, I said append, this is a column orientated database designed for large datasets, so it will take you some work to do something more akin to UPSERT.
Hope that helped, contact me on Twitter if you have any questions.