How to export data to a file in Google BigQuery
Posted by: AJ Welch
As of the time of writing, exporting to a file from BigQuery requires the use of Google Cloud Storage to receive that exported file. After the file is stored in Google Cloud Storage you may, of course, download or export it elsewhere as needed.
Once you have Cloud Storage ready, you’ll also need to create a bucket
, which can be easily accomplished following the official quickstart guide.
Cloud storage URI format
The Cloud Storage URI, which is necessary to inform BigQuery where to export the file to, is a simple format: gs://
.
If you wish to place the file in a series of directories, simply add those to the URI path: gs://
.
Exporting via the WebUI
To export a BigQuery table to a file via the WebUI, the process couldn’t be simpler.
- Go to the BigQuery WebUI.
- Select the
table
you wish to export. - Click on
Export Table
in the top-right. - Select the
Export format
andCompression
, if necessary. - Alter the
Google Cloud Storage URI
as necessary to match thebucket
, optional directories, andfile-name
you wish to export to. Click
OK
and wait for the job to complete.
Exporting via the API
To export a BigQuery table using the BigQuery API, you’ll need to make a call to the Jobs.insert method with the appropriate configuration. The basic configuration structure is given below:
{
'jobReference': {
'projectId': projectId,
'jobId': uniqueIdentifier
},
'configuration': {
'extract': {
'sourceTable': {
'projectId': projectId,
'datasetId': datasetId,
'tableId': tableId
},
'destinationUris': [cloudStorageURI],
'destinationFormat': 'CSV',
'compression': 'NONE'
}
}
}
uniqueIdentifier
is simply a unique string that identifies this particular job, so there won’t be any duplication of data if the job fails during processing and much be retried.projectId
is the BigQuery project ID.datasetId
is the BigQuery dataset ID.tableId
is, of course, the BigQuery table ID.destinationFormat
defaults toCSV
but can also beNEWLINE_DELIMITED_JSON
andAVRO
.compression
defaults toNONE
but can beGZIP
as well.
As an example, if we want to export to the melville
table in our exports
dataset, which is part of the bookstore-1382
project, we might use a configuration of something like this:
{
'jobReference': {
'projectId': 'bookstore-1382',
'jobId': 'bcd56153-b882-4f78-8a30-f509b583a568'
},
'configuration': {
'extract': {
'sourceTable': {
'projectId': 'bookstore-1382',
'datasetId': 'exports',
'tableId': 'melville'
},
'destinationUris': ['gs://bookstore/melville.json'],
'destinationFormat': 'NEWLINE_DELIMITED_JSON',
'compression': 'NONE'
}
}
}
After a few moments for the job to process, refreshing the bookstore
bucket in Cloud Storage reveals the melville.json
file, as expected:
{"BookMeta_Title":"Typee, a Romance of the South Seas","BookMeta_Date":"1920","BookMeta_Creator":"Herman Melville","BookMeta_Language":"English","BookMeta_Publisher":"Harcourt, Brace and Howe"}
{"BookMeta_Title":"Typee: A Real Romance of the South Seas","BookMeta_Date":"1904","BookMeta_Creator":"Herman Melville , William Clark Russell , Marie Clothilde Balfour","BookMeta_Language":"English","BookMeta_Publisher":"John Lane, the BodleyHead"}
{"BookMeta_Title":"Typee: A Narrative of a Four Months' Residence Among the Natives of a Valley of the Marquesas ...","BookMeta_Date":"1893","BookMeta_Creator":"Herman Melville","BookMeta_Language":"English","BookMeta_Publisher":"J. Murray"}
...
Using a wildcard URI for multiple file output
In some cases you may be exporting a table that exceeds the maximum output size of 1 GB
per file. In such cases, you should take advantage of the wildcard URI option by adding an asterisk *
somewhere in the file-name
portion of your URI.
For example, a Cloud Storage URI of gs://bookstore/melville-*.json
in the configuration will actually become an iterated series of incremental file names, like so:
gs://bookstore/melville-000000000000.json
gs://bookstore/melville-000000000001.json
gs://bookstore/melville-000000000002.json
...