Automation-regel in Jira wanneer pull request wordt samengevoegd
Posted by: AJ Welch
Google BigQuery is capable of creating tables using a wide variety of methods, from directly loading existing CSV
or JSON
data to using the BigQuery Command-Line tool.
In some situations, it may be necessary to generate a table based on the results of an executed query. Below we’ll briefly explore two methods for accomplishing this table creation from a query.
Verder gaan dan agile
For these simple examples, we’ll use the one of the public datasets, specifically the GDELT Book Dataset that contains millions of public domain books.
The query we’ll be using as an example is intended to simply extract a few fields (title
, date
, creator
, etc) across all tables in the dataset in which the BookMeta_Creator
field CONTAINS
the name of our searched author (Herman Melville):
SELECT
BookMeta_Title,
BookMeta_Date,
BookMeta_Creator,
BookMeta_Language,
BookMeta_Publisher
FROM (TABLE_QUERY([gdelt-bq:internetarchivebooks], 'REGEXP_EXTRACT(table_id, r"(\d{4})") BETWEEN "1800" AND "2020"'))
WHERE
BookMeta_Creator CONTAINS "Herman Melville"
The resulting data is as expected:
[
{
"BookMeta_Title": "Typee, a Romance of the South Seas",
"BookMeta_Date": "1920",
"BookMeta_Creator": "Herman Melville",
"BookMeta_Language": "English",
"BookMeta_Publisher": "Harcourt, Brace and Howe"
},
{
"BookMeta_Title": "Typee: A Real Romance of the South Sea",
"BookMeta_Date": "1892",
"BookMeta_Creator": "Herman Melville , Arthur Stedman",
"BookMeta_Language": "English",
"BookMeta_Publisher": "the Page companypublishers"
},
{
"BookMeta_Title": "Typee: A Peep at Polynesian Life, During a Four Months' Residence in the Valley of the Marquesas",
"BookMeta_Date": "1850",
"BookMeta_Creator": "Herman Melville",
"BookMeta_Language": "English",
"BookMeta_Publisher": "G. Routledge"
},
...
]
Verder gaan dan agile
No matter how you are engaging with the BigQuery API, the primary usage involves sending a JSON
-formatted configuration
string to the API of your choosing. The official documentation details all the potential resource fields and their use, but for our purposes we’re inserting
a new table, so we need to use the Jobs.insert API call.
We need to specify a few critical fields as part of our configuration
:
query
: The actual SQL query (properly formatted and escaped in this example).destinationTable
: A collection that contains a number of sub-fields to tell the API where the query results should be saved.projectId
: The unique identifier of the project to export to.datasetId
: The name of the (existing) dataset to export to.tableId
: The name of the table to export to. Can either be an existing table or a new table name.
createDisposition
: Determines how the API will handle creation of the table if it doesn’t exist.CREATE_IF_NEEDED
of course states we want the table generated.writeDisposition
: Determines how the API writes new data to the table.WRITE_APPEND
indicates that new data will be appended (added to) the already existing data.
Thus our full configuration
looks like this:
{
"configuration": {
"query": {
"query": "SELECT\n BookMeta_Title,\n BookMeta_Date,\n BookMeta_Creator,\n BookMeta_Language,\n BookMeta_Publisher\nFROM (TABLE_QUERY([gdelt-bq:internetarchivebooks], 'REGEXP_EXTRACT(table_id, r\"(\\d{4})\") BETWEEN \"1800\" AND \"2020\"'))\nWHERE\n BookMeta_Creator CONTAINS \"Herman Melville\"",
"destinationTable": {
"datasetId": "exports",
"projectId": "bookstore-1382",
"tableId": "herman_melville_api"
},
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_APPEND"
}
}
}
Sending this configuration
script to the Jobs.insert API
returns a 200
response:
{
"kind": "bigquery#job",
"etag": "\"T7tifokHjXTVbjJPVpstHNnQ7nk/v8boPuvGUl28hHMhK0vU2q7FGeU\"",
"id": "bookstore-1382:job_LjI4QO95OCvMHwyGeu4iiKFd8Vw",
"selfLink": "https://www.googleapis.com/bigquery/v2/projects/bookstore-1382/jobs/job_LjI4QO95OCvMHwyGeu4iiKFd8Vw",
"jobReference": {
"projectId": "bookstore-1382",
"jobId": "job_LjI4QO95OCvMHwyGeu4iiKFd8Vw"
},
"configuration": {
"query": {
"query": "SELECT\n BookMeta_Title,\n BookMeta_Date,\n BookMeta_Creator,\n BookMeta_Language,\n BookMeta_Publisher\nFROM (TABLE_QUERY([gdelt-bq:internetarchivebooks], 'REGEXP_EXTRACT(table_id, r\"(\\d{4})\") BETWEEN \"1800\" AND \"2020\"'))\nWHERE\n BookMeta_Creator CONTAINS \"Herman Melville\"",
"destinationTable": {
"projectId": "bookstore-1382",
"datasetId": "exports",
"tableId": "herman_melville_api"
},
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_APPEND"
}
},
"status": {
"state": "RUNNING"
},
"statistics": {
"creationTime": "1469328945393",
"startTime": "1469328946001"
},
"user_email": "XXXXXXXXX"
}
And generates our new table, which can easily be queried as normal:
SELECT
*
FROM
exports.herman_melville_api
[
{
"BookMeta_Title": "Typee, a Romance of the South Seas",
"BookMeta_Date": "1920",
"BookMeta_Creator": "Herman Melville",
"BookMeta_Language": "English",
"BookMeta_Publisher": "Harcourt, Brace and Howe"
},
{
"BookMeta_Title": "Typee: A Real Romance of the South Seas",
"BookMeta_Date": "1904",
"BookMeta_Creator": "Herman Melville , William Clark Russell , Marie Clothilde Balfour",
"BookMeta_Language": "English",
"BookMeta_Publisher": "John Lane, the BodleyHead"
},
{
"BookMeta_Title": "Israel Potter: His Fifty Years of Exile",
"BookMeta_Date": "1855",
"BookMeta_Creator": "Herman Melville",
"BookMeta_Language": "English",
"BookMeta_Publisher": "Putnam"
},
...
]
Verder gaan dan agile
If you prefer to use the BigQuery WebUI to execute queries, specifying a destination table
for a query result is very simple.
First, you’ll need to ensure the Project
and Dataset
you wish to export to already exist.
Next, Compose a Query
just like normal, but before executing it via the Run Query
button, click the Show Options
button. From here, you’ll see the Destination Table
section: Simply click Select Table
and the popup will ask you to select the Project
, Dataset
, and specify the Table Name
to use as your destination table. For our purposes, we’ll use:
project
: Bookstoredataset
: exportstable
: herman_melville_web
Now click Run Query
as normal. Once the query executes, the results will be copied/appended to the table you specified above. This new table can then be queried as usual:
SELECT
*
FROM
exports.herman_melville_web
[
{
"BookMeta_Title": "Typee, a Romance of the South Seas",
"BookMeta_Date": "1920",
"BookMeta_Creator": "Herman Melville",
"BookMeta_Language": "English",
"BookMeta_Publisher": "Harcourt, Brace and Howe"
},
{
"BookMeta_Title": "Typee: A Real Romance of the South Seas",
"BookMeta_Date": "1904",
"BookMeta_Creator": "Herman Melville , William Clark Russell , Marie Clothilde Balfour",
"BookMeta_Language": "English",
"BookMeta_Publisher": "John Lane, the BodleyHead"
},
{
"BookMeta_Title": "Israel Potter: His Fifty Years of Exile",
"BookMeta_Date": "1855",
"BookMeta_Creator": "Herman Melville",
"BookMeta_Language": "English",
"BookMeta_Publisher": "Putnam"
},
...
]