How to generate BigQuery schema to define column types manually?
Coupler.io allows defining schema for BigQuery destination table manually using JSON format. In the schema for each inserted column, we should define mode (required or not), the name of the column, and its type.
You still can use the BigQuery schema autodetection mechanism. For this keep the "Autodetect table schema" toggle turned ON:
To provide schema manually in the Wizard turn OFF the "Autodetect table schema" toggle and enter schema to the "Columns schema" field:
The simplest schema for 1 required column with the name "id" and type "Integer" looks like this:
[ { "mode":"REQUIRED", "name":"id", "type":"INT64" } ]
Possible values for each attribute:
- mode:
- REQUIRED - for columns where the value cannot be empty (null)
- NULLABLE - for columns where the value can be empty (null)
- name - valid column name. A column name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and it must start with a letter or underscore. The maximum column name length is 300 characters. Read more details about validation for the name.
- type - valid column type supported by BigQuery. Please read here the latest version of supported types. We also added below the most used types:
- STRING - Variable-length character (Unicode) data (i.e. "Some string data")
- INT64 - Numeric values without fractional components (i.e. 55)
- FLOAT64 - Approximate numeric values with fractional components (i.e. 12.64 (12,64) - depends on the radix character)
Please review our example below:
Source data: CSV
time_ref,"account","code","country_code","product_type",value,"status" 202012,"Exports","00","AE","Goods",174134527.0,"F" 202012,"Exports","00","AF","Goods",5473.0,"F" 202012,"Exports","00","AG","Goods",426470.0,"F" 202012,"Exports","00","AI","Goods",13083.0,"F"
Defined schema:
[ { "mode":"REQUIRED", "name":"time_ref", "type":"NUMERIC" }, { "mode":"NULLABLE", "name":"account", "type":"STRING" }, { "mode":"NULLABLE", "name":"code", "type":"STRING" }, { "mode":"NULLABLE", "name":"country_code", "type":"STRING" }, { "mode":"NULLABLE", "name":"product_type", "type":"STRING" }, { "mode":"NULLABLE", "name":"value", "type":"FLOAT64" }, { "mode":"NULLABLE", "name":"status", "type":"STRING" } ]
Results in the BigQuery:
Important notes:
1. Please control that number of columns returned by the data source is the same. In case the number of columns increases - importer execution fails as the schema is not appropriate. In case the number of columns decreases - importer execution can fail as schema as invalid or data can import, but to the invalid columns.
2. If you want to add the "Last updated at" column to the results of import: You need to define this column and put it first in the schema:
{ "mode":"REQUIRED", "name":"last_updated_at", "type":"STRING" }