node.js - Getting the difference of two differently structured collections -
supposed have 2 collections, a
, b
.
a
contains simple documents of following form:
{ _id: '...', value: 'a', data: '...' } { _id: '...', value: 'b', data: '...' } { _id: '...', value: 'c', data: '...' } …
b
contains nested objects this:
{ _id: '...', values: [ 'a', 'b' ]} { _id: '...', values: [ 'c' ]} …
now can happen there documents in a
not referenced document in b
, or there referenced documents in b
not existent in a
.
let's call them "orphaned".
my question is: how find orphaned documents, in efficient way? in end, need _id
field.
so far have used unwind
"flatten" a
, , calculated difference using differencewith
function of ramda, takes quite long time , sure not efficient, work on client instead of in database.
i have seen there $setdifference
operator in mongodb, did not work.
can point me right direction, how solve issues using node.js, , running (all?) of work in database? hints appreciated :-)
in mongodb can use aggregation pipeline trying. if doesn't can use mapreduce bit more complicated.
for example named 2 collections "tags" , "papers", tags named "b" in example, , papers "a".
first set of values exist , referencing documents. this, flatten each value in tags collection , pack together. unwinding creates document original _id each value in 'values' array. flat list recollected , ids ignored.
var referenced_tags = db.tags.aggregate( {$unwind: '$values'}, {$group: { _id: '', tags: { $push: '$values'} } });
this returns:
{ "_id" : "", "tags" : [ "a", "b", "c"] }
this list collection of values in documents.
then, create similar collection, containing set of tags of available documents. doesn't need unwind step, since _id scalar value (=not list)
var papers = db.papers.aggregate( {$group: { _id: '', tags: {$push: '$value'} } });
yielding
{ "_id" : "", "tags" : [ "a", "b", "c", "d"] }
as can see, set put in database, there appears document (paper) in id "d", not referenced in tags collection , thererfore orphan.
you can compute difference set in way like, might slow suitable example:
var = referenced_tags.tags; var b = tags.tags; var delta = a.filter(function (v) { return b.indexof(v) < 0; });
as next step, can find ids looking these values in delta, , projecting ids:
db.papers.find({'value' : {'$in': delta}}, {'_id': 1})
returning:
{ "_id" : objectid("558bd2...44f6a") }
edit: while nicely shows how approach problem aggregation framework, not feasible solution. 1 doesn't need aggregation, since mongodb quite smart:
db.papers.find({'value' : {'$nin': tags.values }}, {'_id': 1})
where tags is
var cursor = db.tags.find(); var tags = cursor.hasnext() : cusor.next() : null;
as pointed out @karthick.k
Comments
Post a Comment