node.js - Getting the difference of two differently structured collections -


supposed have 2 collections, a , b.

a contains simple documents of following form:

{ _id: '...', value: 'a', data: '...' } { _id: '...', value: 'b', data: '...' } { _id: '...', value: 'c', data: '...' } … 

b contains nested objects this:

{ _id: '...', values: [ 'a', 'b' ]} { _id: '...', values: [ 'c' ]} … 

now can happen there documents in a not referenced document in b, or there referenced documents in b not existent in a.

let's call them "orphaned".

my question is: how find orphaned documents, in efficient way? in end, need _id field.

so far have used unwind "flatten" a, , calculated difference using differencewith function of ramda, takes quite long time , sure not efficient, work on client instead of in database.

i have seen there $setdifference operator in mongodb, did not work.

can point me right direction, how solve issues using node.js, , running (all?) of work in database? hints appreciated :-)

in mongodb can use aggregation pipeline trying. if doesn't can use mapreduce bit more complicated.

for example named 2 collections "tags" , "papers", tags named "b" in example, , papers "a".

first set of values exist , referencing documents. this, flatten each value in tags collection , pack together. unwinding creates document original _id each value in 'values' array. flat list recollected , ids ignored.

 var referenced_tags = db.tags.aggregate(      {$unwind: '$values'},      {$group: {          _id: '',           tags: { $push: '$values'}      }  }); 

this returns:

{ "_id" : "", "tags" : [ "a", "b", "c"] } 

this list collection of values in documents.

then, create similar collection, containing set of tags of available documents. doesn't need unwind step, since _id scalar value (=not list)

var papers = db.papers.aggregate(     {$group: {          _id: '',          tags: {$push: '$value'}     } }); 

yielding

{ "_id" : "", "tags" : [ "a", "b", "c", "d"] } 

as can see, set put in database, there appears document (paper) in id "d", not referenced in tags collection , thererfore orphan.

you can compute difference set in way like, might slow suitable example:

var = referenced_tags.tags; var b = tags.tags; var delta = a.filter(function (v) { return b.indexof(v) < 0; }); 

as next step, can find ids looking these values in delta, , projecting ids:

db.papers.find({'value' : {'$in': delta}}, {'_id': 1}) 

returning:

{ "_id" : objectid("558bd2...44f6a") } 

edit: while nicely shows how approach problem aggregation framework, not feasible solution. 1 doesn't need aggregation, since mongodb quite smart:

db.papers.find({'value' : {'$nin': tags.values }}, {'_id': 1}) 

where tags is

var cursor = db.tags.find(); var tags = cursor.hasnext() : cusor.next() : null; 

as pointed out @karthick.k


Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

javascript - Complex json ng-repeat -

jquery - Cloning of rows and columns from the old table into the new with colSpan and rowSpan -