node.js - Getting the difference of two differently structured collections -

- August 15, 2013

supposed have 2 collections, a , b.

a contains simple documents of following form:

{ _id: '...', value: 'a', data: '...' } { _id: '...', value: 'b', data: '...' } { _id: '...', value: 'c', data: '...' } …

b contains nested objects this:

{ _id: '...', values: [ 'a', 'b' ]} { _id: '...', values: [ 'c' ]} …

now can happen there documents in a not referenced document in b, or there referenced documents in b not existent in a.

let's call them "orphaned".

my question is: how find orphaned documents, in efficient way? in end, need _id field.

so far have used unwind "flatten" a, , calculated difference using differencewith function of ramda, takes quite long time , sure not efficient, work on client instead of in database.

i have seen there $setdifference operator in mongodb, did not work.

can point me right direction, how solve issues using node.js, , running (all?) of work in database? hints appreciated :-)

in mongodb can use aggregation pipeline trying. if doesn't can use mapreduce bit more complicated.

for example named 2 collections "tags" , "papers", tags named "b" in example, , papers "a".

first set of values exist , referencing documents. this, flatten each value in tags collection , pack together. unwinding creates document original _id each value in 'values' array. flat list recollected , ids ignored.

 var referenced_tags = db.tags.aggregate(      {$unwind: '$values'},      {$group: {          _id: '',           tags: { $push: '$values'}      }  });

this returns:

{ "_id" : "", "tags" : [ "a", "b", "c"] }

this list collection of values in documents.

then, create similar collection, containing set of tags of available documents. doesn't need unwind step, since _id scalar value (=not list)

var papers = db.papers.aggregate(     {$group: {          _id: '',          tags: {$push: '$value'}     } });

yielding

{ "_id" : "", "tags" : [ "a", "b", "c", "d"] }

as can see, set put in database, there appears document (paper) in id "d", not referenced in tags collection , thererfore orphan.

you can compute difference set in way like, might slow suitable example:

var = referenced_tags.tags; var b = tags.tags; var delta = a.filter(function (v) { return b.indexof(v) < 0; });

as next step, can find ids looking these values in delta, , projecting ids:

db.papers.find({'value' : {'$in': delta}}, {'_id': 1})

returning:

{ "_id" : objectid("558bd2...44f6a") }

edit: while nicely shows how approach problem aggregation framework, not feasible solution. 1 doesn't need aggregation, since mongodb quite smart:

db.papers.find({'value' : {'$nin': tags.values }}, {'_id': 1})

where tags is

var cursor = db.tags.find(); var tags = cursor.hasnext() : cusor.next() : null;

as pointed out @karthick.k

Search This Blog

Alconcel

node.js - Getting the difference of two differently structured collections -

Comments

Post a Comment

Popular posts from this blog

c# - Where does the .ToList() go in LINQ query result -

Listeners to visualise results of load test in JMeter -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -