Final: Question 2
Please use the Enron dataset you imported for the previous problem. For this question, you will use the aggregation framework to figure out pairs of people that tend to communicate a lot. To do this, you will need to unwind the To list for each message.
This problem is a little tricky because a recipient may appear more than once in the To list for a message. You will need to fix that in a stage of the aggregation before doing your grouping and counting of (sender, recipient) pairs.
Which pair of people have the greatest number of messages in the dataset?
Solution: Query to run as per the question is below as per my understanding:
db.messages.aggregate([
{$project: {
from: "$headers.From",
to: "$headers.To"
}},
{$unwind: "$to"},
{$project: {
pair: {
from: "$from",
to: "$to"
},
count: {$add: [1]}
}},
{$group: {
_id: "$pair",
count: {$sum: 1}
}},
{$sort: {
count: -1
}},
{$limit: 2},
{$skip: 1}
])
Results that I found is below:
{ "_id" : { "from" : "susan.mara@enron.com", "to" : "richard.shapiro@enron.com" }, "count" : 974 }
Let me know if you guys find extra so that we can discuss it.
Thanks









