Automation-regel in Jira wanneer pull request wordt samengevoegd
Posted by: AJ Welch
Most common SQL database engines implement the LIKE
operator – or something functionally similar – to allow queries the flexibility of finding string pattern matches between one column and another column (or between a column and a specific text string). Luckily, Google BigQuery is no exception and includes support for the common LIKE
operator. For situations that require something with more flexibility and power, BigQuery also allows for the use of regular xxpressions using the RE2 engine by Google.
Below we’ll explore using both methods for narrowing your queries down to the results you’re after.
Verder gaan dan agile
For anyone familiar with other common SQL engines, using the LIKE
operator in BigQuery should be a breeze. Simply add the LIKE
operator and a comparitor to your WHERE
clause and that’s all there is to it:
SELECT
repository_name,
repository_size,
repository_forks,
created_at
FROM
[bigquery-public-data:samples.github_timeline]
WHERE
repository_name LIKE 'node'
LIMIT
1000
The results, as expected, are the first 1000 records where repository_name
is equal to (i.e. LIKE
) ‘node’:
[
{
"repository_name": "node",
"repository_size": "23500",
"repository_forks": "1568",
"created_at": "2012-03-12 21:37:46"
},
{
"repository_name": "node",
"repository_size": "23500",
"repository_forks": "1568",
"created_at": "2012-03-12 21:45:17"
},
{
"repository_name": "node",
"repository_size": "23500",
"repository_forks": "1568",
"created_at": "2012-03-12 21:49:11"
},
...
]
As the official documentation indicates, we can also add special characters into our pattern, such as the %
sign, which will match any number of characters. Typically, a word or phrase is surrounded by two %
signs to retrieve records where that field contains a specific string:
SELECT
repository_name,
repository_size,
repository_forks,
created_at
FROM
[bigquery-public-data:samples.github_timeline]
WHERE
repository_name LIKE '%node%'
LIMIT
1000
Now we get results where the repository_name
simply has the letters 'node'
somewhere in there:
[
{
"repository_name": "node-geos",
"repository_size": "120",
"repository_forks": "4",
"created_at": "2012-03-12 21:29:10"
},
{
"repository_name": "nyc-nodejs-packages",
"repository_size": "928",
"repository_forks": "1",
"created_at": "2012-03-12 21:30:56"
},
{
"repository_name": "node-async-testing",
"repository_size": "420",
"repository_forks": "8",
"created_at": "2012-03-12 21:32:11"
},
...
]
The LIKE
operator works for both Legacy SQL
syntax (which is the default for BigQuery), as well as the updated SQL syntax (if enabled). Here’s the same example as above using the updated syntax:
SELECT
repository_name,
repository_size,
repository_forks,
created_at
FROM
`bigquery-public-data.samples.github_timeline`
WHERE
repository_name LIKE '%node%'
LIMIT
1000
Verder gaan dan agile
If you find yourself using LIKE
along with surrounding %
symbols as above, you’re better off using the CONTAINS string function, which behaves identically but typically has better readability.
Consequently, this WHERE
clause:
WHERE
repository_name LIKE '%node%'
is functionallity identical to this WHERE clause:
WHERE
repository_name CONTAINS 'node'
Verder gaan dan agile
For situations where you need to be very precise with your search pattern in your WHERE
clause, the best option is to use the REGEX_MATCH function.
By taking full advantage of the power of the RE2 regular expression engine and syntax, we can ensure the WHERE
clause we’re creating returns exactly the record matches we want.
To implement REGEX_MATCH
, just pass the column_name
as the first argument and the regex pattern as the second argument:
SELECT
repository_name,
repository_size,
repository_forks,
created_at
FROM
[bigquery-public-data:samples.github_timeline]
WHERE
REGEXP_MATCH (repository_name, r'^node-[[:alnum:]]+js$')
LIMIT
1000
Here we only want matches where repository_name
begins with the word node
followed by a hyphen, then contains at least one or more alphanumeric characters, followed by js
as the final two characters. The results are as expected:
[
{
"repository_name": "node-jinjs",
"repository_size": "104",
"repository_forks": "5",
"created_at": "2012-03-13 02:48:09"
},
{
"repository_name": "node-ldapjs",
"repository_size": "172",
"repository_forks": "17",
"created_at": "2012-04-23 16:29:19"
},
{
"repository_name": "node-xml2js",
"repository_size": "204",
"repository_forks": "39",
"created_at": "2012-04-23 14:37:43"
},
,,,
]