Close

Правило автоматизации Jira при слиянии запроса pull

Posted by: AJ Welch

Google BigQuery is capable of creating tables using a wide variety of methods, from directly loading existing CSV or JSON data to using the BigQuery Command-Line tool.

In some situations, it may be necessary to generate a table based on the results of an executed query. Below we’ll briefly explore two methods for accomplishing this table creation from a query.


За рамками Agile


For these simple examples, we’ll use the one of the public datasets, specifically the GDELT Book Dataset that contains millions of public domain books.

The query we’ll be using as an example is intended to simply extract a few fields (titledatecreator, etc) across all tables in the dataset in which the BookMeta_Creator field CONTAINS the name of our searched author (Herman Melville):

SELECT
  BookMeta_Title,
  BookMeta_Date,
  BookMeta_Creator,
  BookMeta_Language,
  BookMeta_Publisher
FROM (TABLE_QUERY([gdelt-bq:internetarchivebooks], 'REGEXP_EXTRACT(table_id, r"(\d{4})") BETWEEN "1800" AND "2020"'))
WHERE
  BookMeta_Creator CONTAINS "Herman Melville"

The resulting data is as expected:

[
  {
    "BookMeta_Title": "Typee, a Romance of the South Seas",
    "BookMeta_Date": "1920",
    "BookMeta_Creator": "Herman Melville",
    "BookMeta_Language": "English",
    "BookMeta_Publisher": "Harcourt, Brace and Howe"
  },
  {
    "BookMeta_Title": "Typee: A Real Romance of the South Sea",
    "BookMeta_Date": "1892",
    "BookMeta_Creator": "Herman Melville ,  Arthur Stedman",
    "BookMeta_Language": "English",
    "BookMeta_Publisher": "the Page companypublishers"
  },
  {
    "BookMeta_Title": "Typee: A Peep at Polynesian Life, During a Four Months' Residence in the Valley of the Marquesas",
    "BookMeta_Date": "1850",
    "BookMeta_Creator": "Herman Melville",
    "BookMeta_Language": "English",
    "BookMeta_Publisher": "G. Routledge"
  },
  ...
]

За рамками Agile


No matter how you are engaging with the BigQuery API, the primary usage involves sending a JSON-formatted configuration string to the API of your choosing. The official documentation details all the potential resource fields and their use, but for our purposes we’re inserting a new table, so we need to use the Jobs.insert API call.

We need to specify a few critical fields as part of our configuration:

  • query: The actual SQL query (properly formatted and escaped in this example).
  • destinationTable: A collection that contains a number of sub-fields to tell the API where the query results should be saved.
    • projectId: The unique identifier of the project to export to.
    • datasetId: The name of the (existing) dataset to export to.
    • tableId: The name of the table to export to. Can either be an existing table or a new table name.
  • createDisposition: Determines how the API will handle creation of the table if it doesn’t exist. CREATE_IF_NEEDED of course states we want the table generated.
  • writeDisposition: Determines how the API writes new data to the table. WRITE_APPEND indicates that new data will be appended (added to) the already existing data.

Thus our full configuration looks like this:

{
  "configuration": {
    "query": {
      "query": "SELECT\n  BookMeta_Title,\n  BookMeta_Date,\n  BookMeta_Creator,\n  BookMeta_Language,\n  BookMeta_Publisher\nFROM (TABLE_QUERY([gdelt-bq:internetarchivebooks], 'REGEXP_EXTRACT(table_id, r\"(\\d{4})\") BETWEEN \"1800\" AND \"2020\"'))\nWHERE\n  BookMeta_Creator CONTAINS \"Herman Melville\"",
      "destinationTable": {
        "datasetId": "exports",
        "projectId": "bookstore-1382",
        "tableId": "herman_melville_api"
      },
      "createDisposition": "CREATE_IF_NEEDED",
      "writeDisposition": "WRITE_APPEND"
    }
  }
}

Sending this configuration script to the Jobs.insert API returns a 200 response:

{
 "kind": "bigquery#job",
 "etag": "\"T7tifokHjXTVbjJPVpstHNnQ7nk/v8boPuvGUl28hHMhK0vU2q7FGeU\"",
 "id": "bookstore-1382:job_LjI4QO95OCvMHwyGeu4iiKFd8Vw",
 "selfLink": "https://www.googleapis.com/bigquery/v2/projects/bookstore-1382/jobs/job_LjI4QO95OCvMHwyGeu4iiKFd8Vw",
 "jobReference": {
  "projectId": "bookstore-1382",
  "jobId": "job_LjI4QO95OCvMHwyGeu4iiKFd8Vw"
 },
 "configuration": {
  "query": {
   "query": "SELECT\n  BookMeta_Title,\n  BookMeta_Date,\n  BookMeta_Creator,\n  BookMeta_Language,\n  BookMeta_Publisher\nFROM (TABLE_QUERY([gdelt-bq:internetarchivebooks], 'REGEXP_EXTRACT(table_id, r\"(\\d{4})\") BETWEEN \"1800\" AND \"2020\"'))\nWHERE\n  BookMeta_Creator CONTAINS \"Herman Melville\"",
   "destinationTable": {
    "projectId": "bookstore-1382",
    "datasetId": "exports",
    "tableId": "herman_melville_api"
   },
   "createDisposition": "CREATE_IF_NEEDED",
   "writeDisposition": "WRITE_APPEND"
  }
 },
 "status": {
  "state": "RUNNING"
 },
 "statistics": {
  "creationTime": "1469328945393",
  "startTime": "1469328946001"
 },
 "user_email": "XXXXXXXXX"
}

And generates our new table, which can easily be queried as normal:

SELECT
  *
FROM
  exports.herman_melville_api
[
  {
    "BookMeta_Title": "Typee, a Romance of the South Seas",
    "BookMeta_Date": "1920",
    "BookMeta_Creator": "Herman Melville",
    "BookMeta_Language": "English",
    "BookMeta_Publisher": "Harcourt, Brace and Howe"
  },
  {
    "BookMeta_Title": "Typee: A Real Romance of the South Seas",
    "BookMeta_Date": "1904",
    "BookMeta_Creator": "Herman Melville ,  William Clark Russell ,  Marie Clothilde Balfour",
    "BookMeta_Language": "English",
    "BookMeta_Publisher": "John Lane, the BodleyHead"
  },
  {
    "BookMeta_Title": "Israel Potter: His Fifty Years of Exile",
    "BookMeta_Date": "1855",
    "BookMeta_Creator": "Herman Melville",
    "BookMeta_Language": "English",
    "BookMeta_Publisher": "Putnam"
  },
  ...
]

За рамками Agile


If you prefer to use the BigQuery WebUI to execute queries, specifying a destination table for a query result is very simple.

First, you’ll need to ensure the Project and Dataset you wish to export to already exist.

Next, Compose a Query just like normal, but before executing it via the Run Query button, click the Show Options button. From here, you’ll see the Destination Table section: Simply click Select Table and the popup will ask you to select the ProjectDataset, and specify the Table Name to use as your destination table. For our purposes, we’ll use:

  • project: Bookstore
  • dataset: exports
  • table: herman_melville_web

Now click Run Query as normal. Once the query executes, the results will be copied/appended to the table you specified above. This new table can then be queried as usual:

SELECT
  *
FROM
  exports.herman_melville_web
[
  {
    "BookMeta_Title": "Typee, a Romance of the South Seas",
    "BookMeta_Date": "1920",
    "BookMeta_Creator": "Herman Melville",
    "BookMeta_Language": "English",
    "BookMeta_Publisher": "Harcourt, Brace and Howe"
  },
  {
    "BookMeta_Title": "Typee: A Real Romance of the South Seas",
    "BookMeta_Date": "1904",
    "BookMeta_Creator": "Herman Melville ,  William Clark Russell ,  Marie Clothilde Balfour",
    "BookMeta_Language": "English",
    "BookMeta_Publisher": "John Lane, the BodleyHead"
  },
  {
    "BookMeta_Title": "Israel Potter: His Fifty Years of Exile",
    "BookMeta_Date": "1855",
    "BookMeta_Creator": "Herman Melville",
    "BookMeta_Language": "English",
    "BookMeta_Publisher": "Putnam"
  },
  ...
]

Следующая тема