Querying in MongoDB - BunksAllowed

BunksAllowed is an effort to facilitate Self Learning process through the provision of quality tutorials.

Community

Querying in MongoDB

Share This
The find method is utilized to execute queries in MongoDB. Querying yields a subset of documents within a collection, ranging from none to the entirety of the collection. The documents returned are decided by the initial argument to the search function, which is a document outlining the query to be executed.

An empty query document (i.e., {}) corresponds to all entries in the collection. If find is not provided with a query document, it defaults to {}. For example, the following code returns everything in the collection c.
> db.c.find()
Adding key/value pairs to the query document initiates the restriction of our search. This operates effectively for the majority of categories. Integers correspond to integers, booleans correspond to booleans, and strings correspond to strings. Retrieving a basic type involves merely indicating the desired value. To locate all documents where the "age" value is 27, we can incorporate the key/value pair into the query document:
> db.users.find({"age" : 27})
If we have a string we want to match, such as a "username" key with the value "joe", we use that key/value pair instead:
> db.users.find({"username" : "joe"})
Multiple conditions can be strung together by adding more key/value pairs to the query document, which gets interpreted as “condition1 AND condition2 AND … AND conditionN.” For instance, to get all users who are 27-year-olds with the username “joe,” we can query for the following:
> db.users.find({"username" : "joe", "age" : 27})

Specifying Which Keys to Return

Sometimes, you do not need all of the key/value pairs in a document returned. If this is the case, you can pass a second argument to find (or findOne) specifying the keys you want. This reduces both the amount of data sent over the wire and the time and memory used to decode documents on the client side.
For example, if you have a user collection and you are interested only in the "user name" and "email" keys, you could return just those keys with the following query:
> db.users.find({}, {"username" : 1, "email" : 1})
As you can see from the previous output, the "_id" key is always returned, even if it isn’t specifically listed.
You can also use this second parameter to exclude specific key/value pairs from the results of a query. For instance, you may have documents with a variety of keys, and the only thing you know is that you never want to return the "fatal_weakness" key:
> db.users.find({}, {"fatal_weakness" : 0})
The following one can even prevent "_id" from being returned:
> db.users.find({}, {"username" : 1, "_id" : 0})
There are some restrictions on queries. The value of a query document must be a constant as far as the database is concerned. That is, it cannot refer to the value of another key in the document. For example, if we were keeping inventory and we had both "in_stock" and "num_sold" keys, we could compare their values by querying the following:
> db.stock.find({"in_stock" : "this.num_sold"}) // doesn't work

Conditional Queries

"$lt", "$lte", "$gt", and "$gte" are all comparison operators, corresponding to <, <=, >, and >=, respectively. They can be combined to look for a range of values. For example, to look for users who are between the ages of 18 and 30 inclusive, we can do this:
> db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})
These types of range queries are often useful for dates. For example, to find people who registered before January 1, 2007, we can do this:
> start = new Date("01/01/2007")
> db.users.find({"registered" : {"$lt" : start}})
An exact match on a date is less useful, because dates are only stored with millisecond precision. Often you want a whole day, week, or month, making a range query necessary.
To query for documents where a key’s value is not equal to a certain value, you must use another conditional operator, "$ne", which stands for “not equal.” If you want to find all users who do not have the username “joe,” you can query for them using this:
> db.users.find({"username" : {"$ne" : "joe"}})
"$ne" can be used with any type.

OR Queries

There are two ways to do an OR query in MongoDB. "$in" can be used to query for a variety of values for a single key. "$or" is more general; it can be used to query for any of the given values across multiple keys.
If you have more than one possible value to match for a single key, use an array of criteria with "$in". For instance, suppose we were running a raffle and the winning ticket numbers were 725, 542, and 390. To find all three of these documents, we can construct the following query:
> db.raffle.find({"ticket_no" : {"$in" : [725, 542, 390]}})
"$in" is very flexible and allows you to specify criteria of different types as well as values. For example, if we are gradually migrating our schema to use usernames instead of user ID numbers, we can query for either by using this:
> db.users.find({"user_id" : {"$in" : [12345, "joe"]})
This matches documents with a "user_id" equal to 12345, and documents with a "user_id" equal to "joe".
If "$in" is given an array with a single value, it behaves the same as directly matching the value. For instance, {ticket_no : {$in : [725]}} matches the same documents as {ticket_no : 725}.
The opposite of "$in" is "$nin", which returns documents that don’t match any of the criteria in the array. If we want to return all of the people who didn’t win anything in the raffle, we can query for them with this:
> db.raffle.find({"ticket_no" : {"$nin" : [725, 542, 390]}})
This query returns everyone who did not have tickets with those numbers.
The "$in" gives you an OR query for a single key, but what if we need to find documents where "ticket_no" is 725 or "winner" is true? For this type of query, we’ll need to use the "$or" conditional. "$or" takes an array of possible criteria. In the raffle case, using "$or" would look like this:
> db.raffle.find({"$or" : [{"ticket_no" : 725}, {"winner" : true}]})
"$or" can contain other conditionals. If, for example, we want to match any of the three "ticket_no" values or the "winner" key, we can use this:
> db.raffle.find({"$or" : [{"ticket_no" : {"$in" : [725, 542, 390]}},
												{"winner" : true}]})
With a normal AND-type query, you want to narrow your results down as far as possible in as few arguments as possible. OR-type queries are the opposite: they are most efficient if the first arguments match as many documents as possible.
"$not" is a metaconditional: it can be applied on top of any other criteria. As an example, let’s consider the modulus operator, "$mod". "$mod" queries for keys whose values, when divided by the first value given, have a remainder of the second value:
> db.users.find({"id_num" : {"$mod" : [5, 1]}})
The previous query returns users with "id_num"s of 1, 6, 11, 16, and so on. If we want, instead, to return users with "id_num"s of 2, 3, 4, 5, 7, 8, 9, 10, 12, and so on, we can use "$not":
> db.users.find({"id_num" : {"$not" : {"$mod" : [5, 1]}}})
"$not" can be particularly useful in conjunction with regular expressions to find all documents that don’t match a given pattern.
If you look at the update modifiers in the previous chapter and previous query documents, you’ll notice that the $-prefixed keys are in different positions. In the query, "$lt" is in the inner document; in the update, "$inc" is the key for the outer document. This generally holds true: conditionals are an inner document key, and modifiers are always a key in the outer document.
Multiple conditions can be put on a single key. For example, to find all users between the ages of 20 and 30, we can query for both "$gt" and "$lt" on the "age" key:
> db.users.find({"age" : {"$lt" : 30, "$gt" : 20}})
Any number of conditionals can be used with a single key. Multiple update modifiers cannot be used on a single key, however. For example, you cannot have a modifier document such as {"$inc" : {"age" : 1}, "$set" : {age : 40}} because it modifies "age" twice. With query conditionals, no such rule applies.
Type-Specific Queries
MongoDB has a wide variety of types that can be used in a document. Some of these behave specially in queries.
null behaves a bit strangely. It does match itself, so if we have a collection with the following documents:
> db.c.find()
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523622"), "y" : 1 }
{ "_id" : ObjectId("4ba0f148d22aa494fd523623"), "y" : 2 }
we can query for documents whose "y" key is null in the expected way:
> db.c.find({"y" : null})
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }
However, null not only matches itself but also matches “does not exist.” Thus, querying for a key with the value null will return all documents lacking that key:
> db.c.find({"z" : null})
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523622"), "y" : 1 }
{ "_id" : ObjectId("4ba0f148d22aa494fd523623"), "y" : 2 }
If we only want to find keys whose value is null, we can check that the key is null and exists using the "$exists" conditional:
> db.c.find({"z" : {"$in" : [null], "$exists" : true}})
Unfortunately, there is no "$eq" operator, which makes this a little awkward, but "$in" with one element is equivalent.

Regular Expressions

Regular expressions are useful for flexible string matching. For example, if we want to find all users with the name Joe or joe, we can use a regular expression to do caseinsensitive matching:
> db.users.find({"name" : /joe/i})
Regular expression flags (i) are allowed but not required. If we want to match not only various capitalizations of joe, but also joey, we can continue to improve our regular expression:
> db.users.find({"name" : /joey?/i})
MongoDB uses the Perl Compatible Regular Expression (PCRE) library to match regular expressions; any regular expression syntax allowed by PCRE is allowed in MongoDB. It is a good idea to check your syntax with the JavaScript shell before using it in a query to make sure it matches what you think it matches.
Regular expressions can also match themselves. Very few people insert regular expressions into the database, but if you insert one, you can match it with itself:
> db.foo.insert({"bar" : /baz/})
> db.foo.find({"bar" : /baz/})
{
"_id" : ObjectId("4b23c3ca7525f35f94b60a2d"),
"bar" : /baz/
}

Querying Arrays

Querying for elements of an array is simple. An array can mostly be treated as though each element is the value of the overall key. For example, if the array is a list of fruits, like this:
> db.food.insert({"fruit" : ["apple", "banana", "peach"]})
the following query:
> db.food.find({"fruit" : "banana"})
will successfully match the document. We can query for it in much the same way as though we had a document that looked like the (illegal) document: {"fruit" : "apple", "fruit" : "banana", "fruit" : "peach"}.

If you need to match arrays by more than one element, you can use "$all". This allows you to match a list of elements. For example, suppose we created a collection with three elements:
> db.food.insert({"_id" : 1, "fruit" : ["apple", "banana", "peach"]})
> db.food.insert({"_id" : 2, "fruit" : ["apple", "kumquat", "orange"]})
> db.food.insert({"_id" : 3, "fruit" : ["cherry", "banana", "apple"]})
Then we can find all documents with both "apple" and "banana" elements by querying with "$all":
> db.food.find({fruit : {$all : ["apple", "banana"]}})
{"_id" : 1, "fruit" : ["apple", "banana", "peach"]}
{"_id" : 3, "fruit" : ["cherry", "banana", "apple"]}
Order does not matter. Notice "banana" comes before "apple" in the second result. Using a one-element array with "$all" is equivalent to not using "$all". For instance, {fruit : {$all : ['apple']} will match the same documents as {fruit : 'apple'}.

You can also query by exact match using the entire array. However, exact match will not match a document if any elements are missing or superfluous. For example, this will match the first document shown previously:
> db.food.find({"fruit" : ["apple", "banana", "peach"]})
But this will not:
> db.food.find({"fruit" : ["apple", "banana"]})
and neither will this:
> db.food.find({"fruit" : ["banana", "apple", "peach"]})
If you want to query for a specific element of an array, you can specify an index using the syntax key.index:
> db.food.find({"fruit.2" : "peach"})
Arrays are always 0-indexed, so this would match the third array element against the string "peach".
A useful conditional for querying arrays is "$size", which allows you to query for arrays of a given size. Here’s an example:
> db.food.find({"fruit" : {"$size" : 3}})
One common query is to get a range of sizes. "$size" cannot be combined with another $ conditional (in this example, "$gt"), but this query can be accomplished by adding a "size" key to the document. Then, every time you add an element to the array, increment the value of "size". If the original update looked like this:
> db.food.update({"$push" : {"fruit" : "strawberry"}})
it can simply be changed to this:
> db.food.update({"$push" : {"fruit" : "strawberry"}, "$inc" : {"size" : 1}})
Incrementing is extremely fast, so any performance penalty is negligible. Storing documents like this allows you to do queries such as this:
> db.food.find({"size" : {"$gt" : 3}})
Unfortunately, this technique doesn’t work as well with the "$addToSet" operator.

As mentioned earlier in this chapter, the optional second argument to find specifies the keys to be returned. The special "$slice" operator can be used to return a subset of elements for an array key.
For example, suppose we had a blog post document and we wanted to return the first 10 comments:
> db.blog.posts.findOne(criteria, {"comments" : {"$slice" : 10}})
Alternatively, if we wanted the last 10 comments, we could use -10:
> db.blog.posts.findOne(criteria, {"comments" : {"$slice" : -10}})
"$slice" can also return pages in the middle of the results by taking an offset and the number of elements to return:
> db.blog.posts.findOne(criteria, {"comments" : {"$slice" : [23, 10]}})
This would skip the first 23 elements and return the 24th through 34th. If there are fewer than 34 elements in the array, it will return as many as possible.

Unless otherwise specified, all keys in a document are returned when "$slice" is used. This is unlike the other key specifiers, which suppress unmentioned keys from being returned. For instance, if we had a blog post document that looked like this:
{
"_id" : ObjectId("4b2d75476cc613d5ee930164"),
"title" : "A blog post",
"content" : "...",
"comments" : [
{
"name" : "joe",
"email" : "joe@example.com",
"content" : "nice post."
},
{
"name" : "bob",
"email" : "bob@example.com",
"content" : "good post."
}
]
}
and we did a "$slice" to get the last comment, we’d get this:
> db.blog.posts.findOne(criteria, {"comments" : {"$slice" : -1}})
{
"_id" : ObjectId("4b2d75476cc613d5ee930164"),
"title" : "A blog post",
"content" : "...",
"comments" : [
{
"name" : "bob",
"email" : "bob@example.com",
"content" : "good post."
}
]
}
Both "title" and "content" are still returned, even though they weren’t explicitly included in the key specifier.

Querying on Embedded Documents

There are two ways of querying for an embedded document: querying for the whole document or querying for its individual key/value pairs. Querying for an entire embedded document works identically to a normal query. For example, if we have a document that looks like this:
{
"name" : {
"first" : "Joe",
"last" : "Schmoe"
},
"age" : 45
}
we can query for someone named Joe Schmoe with the following:
> db.people.find({"name" : {"first" : "Joe", "last" : "Schmoe"}})
However, if Joe decides to add a middle name key, suddenly this query won’t work anymore; it doesn’t match the entire embedded document! This type of query is also order-sensitive; {"last" : "Schmoe", "first" : "Joe"} would not be a match.

If possible, it’s usually a good idea to query for just a specific key or keys of an embedded document. Then, if your schema changes, all of your queries won’t suddenly break because they’re no longer exact matches. You can query for embedded keys using dotnotation:
> db.people.find({"name.first" : "Joe", "name.last" : "Schmoe"})
Now, if Joe adds more keys, this query will still match his first and last names.
 
This dot-notation is the main difference between query documents and other document types. Query documents can contain dots, which mean “reach into an embedded document.” Dot-notation is also the reason that documents to be inserted cannot contain the . character. Oftentimes people run into this limitation when trying to save URLs as keys. One way to get around it is to always perform a global replace before inserting or after retrieving, substituting a character that isn’t legal in URLs for the dot (.) character.

Embedded document matches can get a little tricky as the document structure gets more complicated. For example, suppose we are storing blog posts and we want to find comments by Joe that were scored at least a five. We could model the post as follows:
> db.blog.find()
{
"content" : "...",
"comments" : [
{
"author" : "joe",
"score" : 3,
"comment" : "nice post"
},
{
"author" : "mary",
"score" : 6,
"comment" : "terrible post"
}
]
}
Now, we can’t query using db.blog.find({"comments" : {"author" : "joe", "score" : {"$gte" : 5}}}). Embedded document matches have to match the whole document, and this doesn’t match the "comment" key. It also wouldn’t work to do db.blog.find({"comments.author" : "joe", "comments.score" : {"$gte" : 5}}), because the author criteria could match a different comment than the score criteria. That is, it would return the document shown earlier; it would match "author" : "joe" in the first comment and "score" : 6 in the second comment.

To correctly group criteria without needing to specify every key, use "$elemMatch". This vaguely named conditional allows you to partially specify criteria to match a single embedded document in an array. The correct query looks like this:
> db.blog.find({"comments" : {"$elemMatch" : {"author" : "joe",
"score" : {"$gte" : 5}}}})
"$elemMatch" allows us to “group” our criteria. As such, it’s only needed when you have more than one key you want to match on in an embedded document.

$where Queries

Key/value pairs are a fairly expressive way to query, but there are some queries that they cannot represent. For queries that cannot be done any other way, there are "$where" clauses, which allow you to execute arbitrary JavaScript as part of your query. This allows you to do (almost) anything within a query.
The most common case for this is wanting to compare the values for two keys in a document, for instance, if we had a list of items and wanted to return documents where any two of the values are equal. Here’s an example:
> db.foo.insert({"apple" : 1, "banana" : 6, "peach" : 3})
> db.foo.insert({"apple" : 8, "spinach" : 4, "watermelon" : 4})
In the second document, "spinach" and "watermelon" have the same value, so we’d like that document returned. It’s unlikely MongoDB will ever have a $ conditional for this, so we can use a "$where" clause to do it with JavaScript:
> db.foo.find({"$where" : function () {
... for (var current in this) {
... for (var other in this) {
... if (current != other && this[current] == this[other]) {
... return true;
... }
... }
... }
... return false;
... }});
If the function returns true, the document will be part of the result set; if it returns false, it won’t be.
We used a function earlier, but you can also use strings to specify a "$where" query; the following two "$where" queries are equivalent:
> db.foo.find({"$where" : "this.x + this.y == 10"})
> db.foo.find({"$where" : "function() { return this.x + this.y == 10; }"})
"$where" queries should not be used unless strictly necessary: they are much slower than regular queries. Each document has to be converted from BSON to a JavaScript object and then run through the "$where" expression. Indexes cannot be used to satisfy a "$where", either. Hence, you should use "$where" only when there is no other way of doing the query. You can cut down on the penalty by using other query filters in combination with "$where". If possible, an index will be used to filter based on the non-$where clauses; the "$where" expression will be used only to fine-tune the results.

Cursors

The database returns results from find using a cursor. The client-side implementations of cursors generally allow you to control a great deal about the eventual output of a query. You can limit the number of results, skip over some number of results, sort results by any combination of keys in any direction, and perform a number of other powerful operations.
To create a cursor with the shell, put some documents into a collection, do a query on them, and assign the results to a local variable (variables defined with "var" are local). Here, we create a very simple collection and query it, storing the results in the cursor variable:
> for(i=0; i<100; i++) {
... db.c.insert({x : i});
... }
> var cursor = db.collection.find();
The advantage of doing this is that you can look at one result at a time. If you store the results in a global variable or no variable at all, the MongoDB shell will automatically iterate through and display the first couple of documents. This is what we’ve been seeing up until this point, and it is often the behavior you want for seeing what’s in a collection but not for doing actual programming with the shell.
To iterate through the results, you can use the next method on the cursor. You can use hasNext to check whether there is another result. A typical loop through results looks like the following:
> while (cursor.hasNext()) {
... obj = cursor.next();
... // do stuff
... }
cursor.hasNext() checks that the next result exists, and cursor.next() fetches it. The cursor class also implements the iterator interface, so you can use it in a forEach loop:
> var cursor = db.people.find();
> cursor.forEach(function(x) {
... print(x.name);
... });
adam
matt
zak
When you call find, the shell does not query the database immediately. It waits until you actually start requesting results to send the query, which allows you to chain additional options onto a query before it is performed. Almost every method on a cursor object returns the cursor itself so that you can chain them in any order. For instance, all of the following are equivalent:
> var cursor = db.foo.find().sort({"x" : 1}).limit(1).skip(10);
> var cursor = db.foo.find().limit(1).sort({"x" : 1}).skip(10);
> var cursor = db.foo.find().skip(10).limit(1).sort({"x" : 1});
At this point, the query has not been executed yet. All of these functions merely build the query. Now, suppose we call the following:
> cursor.hasNext()
At this point, the query will be sent to the server. The shell fetches the first 100 results or first 4MB of results (whichever is smaller) at once so that the next calls to next or hasNext will not have to make trips to the server. After the client has run through the first set of results, the shell will again contact the database and ask for more results. This process continues until the cursor is exhausted and all results have been returned.
Limits, Skips, and Sorts
The most common query options are limiting the number of results returned, skipping a number of results, and sorting. All of these options must be added before a query is sent to the database.
To set a limit, chain the limit function onto your call to find. For example, to only return three results, use this:
> db.c.find().limit(3)
If there are fewer than three documents matching your query in the collection, only the number of matching documents will be returned; limit sets an upper limit, not a lower limit.
skip works similarly to limit:
> db.c.find().skip(3)
This will skip the first three matching documents and return the rest of the matches. If there are less than three documents in your collection, it will not return any documents.
sort takes an object: a set of key/value pairs where the keys are key names and the values are the sort directions. Sort direction can be 1 (ascending) or -1 (descending). If multiple keys are given, the results will be sorted in that order. For instance, to sort the results by "username" ascending and "age" descending, we do the following:
> db.c.find().sort({username : 1, age : -1})
These three methods can be combined. This is often handy for pagination. For example, suppose that you are running an online store and someone searches for mp3. If you want 50 results per page sorted by price from high to low, you can do the following:
> db.stock.find({"desc" : "mp3"}).limit(50).sort({"price" : -1})
If they click Next Page to see more results, you can simply add a skip to the query, which will skip over the first 50 matches (which the user already saw on page 1):
> db.stock.find({"desc" : "mp3"}).limit(50).skip(50).sort({"price" : -1})
However, large skips are not very performant, so there are suggestions on avoiding them in a moment.



Happy Exploring!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.