Automation-regel in Jira wanneer pull request wordt samengevoegd
Posted by: AJ Welch
For this tutorial we’ll be using the official dummy dataset, which contains numerous restaurant documents from around the New York area.
Here is an example of the basic document structure in this collection, using the .findOne()
method:
> db.restaurants.findOne()
{
"_id" : ObjectId("56c651e7d84ccfde319961af"),
"address" : {
"building" : "469",
"coord" : [
-73.961704,
40.662942
],
"street" : "Flatbush Avenue",
"zipcode" : "11225"
},
"borough" : "Brooklyn",
"cuisine" : "Hamburgers",
"grades" : [
{
"date" : ISODate("2014-12-30T00:00:00Z"),
"grade" : "A",
"score" : 8
},
{
"date" : ISODate("2014-07-01T00:00:00Z"),
"grade" : "B",
"score" : 23
},
{
"date" : ISODate("2013-04-30T00:00:00Z"),
"grade" : "A",
"score" : 12
},
{
"date" : ISODate("2012-05-08T00:00:00Z"),
"grade" : "A",
"score" : 12
}
],
"name" : "Wendy'S",
"restaurant_id" : "30112340"
}
Verder gaan dan agile
The most important piece of the puzzle when searching within a MongoDB collection is the simple yet flexible db.collection.find() method.
With .find()
, you can easily query a collection of documents, by passing a few simple parameters, and return a cursor. A cursor
is simply a result set and can be iterated through to manipulate or otherwise make use of the documents being pointed to by the cursor
.
As a simple example of the .find()
method in action, we’ll try to find all the restaurants in our collection that server Hamburgers
as their cuisine
:
>db.restaurants.find( { cuisine: "Hamburgers" } )
{ "_id" : ObjectId("56c651e7d84ccfde319961af"), "address" : { "building" : "469", "coord" : [ -73.961704, 40.662942 ], "street" : "Flatbush Avenue", "zipcode" : "11225" }, "borough" : "Brooklyn", "cuisine" : "Hamburgers", "grades" : [ { "date" : ISODate("2014-12-30T00:00:00Z"), "grade" : "A", "score" : 8 }, { "date" : ISODate("2014-07-01T00:00:00Z"), "grade" : "B", "score" : 23 }, { "date" : ISODate("2013-04-30T00:00:00Z"), "grade" : "A", "score" : 12 }, { "date" : ISODate("2012-05-08T00:00:00Z"), "grade" : "A", "score" : 12 } ], "name" : "Wendy'S", "restaurant_id" : "30112340" }
...
The result set is quite large, so a better measurement for our test examples would be to chain the .count()
method onto .find()
to simply see how many documents matched our query:
> db.restaurants.find( { cuisine: "Hamburgers" } ).count()
433
That’s a lot of burgers!
Verder gaan dan agile
Now that we are using .find()
to query our collection, we can actually modify our syntax ever so slightly and begin searching for matches based on a word or phrase that may be a partial match within a given field, similar to the LIKE operator for SQL engines.
The trick is to utilize regular expressions
(or regex
for short), which is basically a text string that defines a search pattern. There are a number of regex
engines that are written in slightly different syntax, but the fundamentals are all basically the same, and in this case, MongoDB uses the Perl Regex (PCRE) engine.
At the most basic level, a regex
expression is a string (series of characters) enclosed on both sides by a single slash (/
).
For example, if we want to use regex
to perform the same query as above and find out how many restaurants serve Hamburgers
, we can replace our string "Hamburgers
" with /Hamburgers/
instead:
> db.restaurants.find( { cuisine: /Hamburgers/ } ).count()
433
Keen observers may recognize that we’ve effectively changed nothing about the actual query we’re performing – we’re still simply looking up all documents where the cuisine
field is equal to the string "Hamburgers"
.
That said, by simply using regex
instead of a normal “quoted string”, we can start to look for partial word/phrase matches instead.
For example, let’s look at the borough
field to get a better idea of how this works. First we’ll notice that there are six boroughs in total within our collection:
> db.restaurants.distinct('borough')
[
"Brooklyn",
"Bronx",
"Manhattan",
"Queens",
"Staten Island",
"Missing"
]
Now let’s use regex
to find out how many restaurants are in the Bronx
borough:
> db.restaurants.find( { borough: /Bronx/ } ).count()
2338
But imagine we want to find the number of restaurants where borough
starts with the first three characters "Bro"
. We’d modify our regex
very slightly, like so:
> db.restaurants.find( { borough: /^Bro/ } ).count()
8424
We’re seeing over 6000 additional documents in this result set, which makes sense because not only are we getting results where the borough
is "Bronx"
, but also everything for "Brooklyn"
as well.
The caret character (^
) specifies the location in our string which should be the beginning, so if we had a document where those three letters were in the middle of the field, we wouldn’t get a match.
As another quick example, let’s search anywhere in the field for the characters "at"
, which should give us results for both "Manhattan"
and "Staten Island"
:
> db.restaurants.find( { borough: /Manhattan/ } ).count()
10259
> db.restaurants.find( { borough: /Staten Island/ } ).count()
969
> db.restaurants.find( { borough: /AT/i } ).count()
11228
Sure enough, our final query has combined the two result sets into one.
You may notice that even though our characters "AT"
are uppercase in our regex
string, but they are lowercase in the actual document records, we still returned results. This is because we also added the special i
flag following our regex closing slash (/
). This informs the regex
engine that we want to the search to be case insensitive
, matching regardless of upper or lowercase.