MongoDB $redact to filter out some elements of an array -
i trying formulate query on sample bios collection http://docs.mongodb.org/manual/reference/bios-example-collection/:
retrieve persons , awards received before receiving turing award.
i have come query:
db.bios.aggregate([ {$match: {"awards.award" : "turing award"}}, {$project: {"award1": "$awards", "award2": "$awards", "first_name": "$name.first", "last_name": "$name.last"}}, {$unwind: "$award1"}, {$match: {"award1.award" : "turing award"}}, {$unwind: "$award2"}, {$redact: { $cond: { if: { $eq: [ { $gt: [ "$award1.year", "$award2.year"] }, true]}, then: "$$keep", else: "$$prune" } } } ])
and answer:
/* 0 */ { "result" : [ { "_id" : 1, "award1" : { "award" : "turing award", "year" : 1977, "by" : "acm" }, "award2" : { "award" : "w.w. mcdowell award", "year" : 1967, "by" : "ieee computer society" }, "first_name" : "john", "last_name" : "backus" }, { "_id" : 1, "award1" : { "award" : "turing award", "year" : 1977, "by" : "acm" }, "award2" : { "award" : "national medal of science", "year" : 1975, "by" : "national science foundation" }, "first_name" : "john", "last_name" : "backus" }, { "_id" : 4, "award1" : { "award" : "turing award", "year" : 2001, "by" : "acm" }, "award2" : { "award" : "rosing prize", "year" : 1999, "by" : "norwegian data association" }, "first_name" : "kristen", "last_name" : "nygaard" }, { "_id" : 5, "award1" : { "award" : "turing award", "year" : 2001, "by" : "acm" }, "award2" : { "award" : "rosing prize", "year" : 1999, "by" : "norwegian data association" }, "first_name" : "ole-johan", "last_name" : "dahl" } ], "ok" : 1 }
what don't solution unwind $award2
. instead, happy keep award2 array, , remove awards received after award1. so, instance, answer john backus should be:
{ "_id" : 1, "first_name" : "john", "last_name" : "backus", "award1" : { "award" : "turing award", "year" : 1977, "by" : "acm" }, "award2" : [ { "award" : "w.w. mcdowell award", "year" : 1967, "by" : "ieee computer society" }, { "award" : "national medal of science", "year" : 1975, "by" : "national science foundation" } ] }
is possible achieve $redact
without doing $unwind: "$award2"
?
it might have been little more helpful if had included original state of document example in question shows "where coming from" , "where want to" goal in addition desired output given.
that's tip, seems starting document this:
{ "_id" : 1, "name": { "first" : "john", "last" : "backus" }, "awards" : [ { "award" : "w.w. mcdowell award", "year" : 1967, "by" : "ieee computer society" }, { "award" : "national medal of science", "year" : 1975, "by" : "national science foundation" }, { "award" : "turing award", "year" : 1977, "by" : "acm" }, { "award" : "some other award", "year" : 1979, "by" : "someone else" } ] }
so real points here while may haved reached $redact
here (and bit nicer using $project
logical condition , using $match
filter logical match ) isn't best tool comparison want here.
before moving on want point out main problem here $redact
. whatever here logic ( without unwind ) compare "directly" on $$descend
in order process the array elements on value of "year" @ whatever level.
that recursion going invalidate "award1" condition since has same field name. renaming field kills logic since projected value missing not greater tested value.
in nutshell, $redact
ruled right out since cannot "take here only" logic applies.
the alternate use $map
, $setdifference
filter contents arrays follows:
db.bios.aggregate([ { "$match": { "awards.award": "turing award" } }, { "$project": { "first_name": "$name.first", "last_name": "$name.last", "award1": { "$setdifference": [ { "$map": { "input": "$awards", "as": "a", "in": { "$cond": [ { "$eq": [ "$$a.award", "turing award" ] }, "$$a", false ]} }}, [false] ]}, "award2": { "$setdifference": [ { "$map": { "input": "$awards", "as": "a", "in": { "$cond": [ { "$ne": [ "$$a.award", "turing award" ] }, "$$a", false ]} }}, [false] ]} }}, { "$unwind": "$award1" }, { "$project": { "first_name": 1, "last_name": 1, "award1": 1, "award2": { "$setdifference": [ { "$map": { "input": "$award2", "as": "a", "in": { "$cond": [ { "$gt": [ "$award1.year", "$$a.year" ] }, "$$a", false ]} }}, [false] ]} }} ])
and there no "pretty" way of getting around either usage of $unwind
in itermediatary stage or second $project
here, since $map
( , $setdifference
filter ) returns "still array". $unwind
necessary make "array" singular ( provided condition matches 1 element ) entry use in comparison.
trying "squish" logic in single $project
result in "arrays of arrays" in second output, , still "unwinding" therefore required, @ least way unwinding (hopefully) 1 match not costly , keeps output clean.
but other thing note here not "aggregating" here @ all. document manipulation, might consider manipulation directly in client code. demonstrated shell example:
db.bios.find( { "awards.award": "turing award" }, { "name": 1, "awards": 1 } ).foreach(function(doc) { doc.first_name = doc.name.first; doc.last_name = doc.name.last; doc.award1 = doc.awards.filter(function(award) { return award.award == "turing award" })[0]; doc.award2 = doc.awards.filter(function(award) { return doc.award1.year > award.year; }); delete doc.name; delete doc.awards; printjson(doc); })
at rate, both approaches output same:
{ "_id" : 1, "first_name" : "john", "last_name" : "backus", "award1" : { "award" : "turing award", "year" : 1977, "by" : "acm" }, "award2" : [ { "award" : "w.w. mcdowell award", "year" : 1967, "by" : "ieee computer society" }, { "award" : "national medal of science", "year" : 1975, "by" : "national science foundation" } ] }
the real difference here using .aggregate()
content of "award2" filtered when returned server, isn't going different doing client processing approach unless items removed comprises reasonably large list per document.
for record, alteration existing aggregation pipeline required here add $group
end "re-combine" array entries single document:
db.bios.aggregate([ { "$match": { "awards.award": "turing award" } }, { "$project": { "first_name": "$name.first", "last_name": "$name.last", "award1": "$awards", "award2": "$awards" }}, { "$unwind": "$award1" }, { "$match": {"award1.award" : "turing award" }}, { "$unwind": "$award2" }, { "$redact": { "$cond": { "if": { "$gt": [ "$award1.year", "$award2.year"] }, "then": "$$keep", "else": "$$prune" } }}, { "$group": { "_id": "$_id", "first_name": { "$first": "$first_name" }, "last_name": { "$first": "$last_name" }, "award1": { "$first": "$award1" }, "award2": { "$push": "$award2" } }} ])
but again, there "array duplication" , "cost of unwind" associated operations here. either of first 2 approaches want in order avoid that.
Comments
Post a Comment