【MongoDB】MongoDB的聚合(Aggregate、Map Reduce)与管道(Pipline) 及索引详解(附详细案例)
更多相关内容可查看
MongoDB的聚合操作(Aggregate)
简单理解,其实本质跟sql一样,只不过写法不一样,仔细看以下示例
图例:
代码示例:
> db.orders.insertMany( [
{ _id: 1, cust_id: "abc1", ord_date: ISODate("2012-11-02T17:04:11.102Z"), status: "A", amount: 50 },
{ _id: 2, cust_id: "xyz1", ord_date: ISODate("2013-10-01T17:04:11.102Z"), status: "A", amount: 100 },
{ _id: 3, cust_id: "xyz1", ord_date: ISODate("2013-10-12T17:04:11.102Z"), status: "D", amount: 25 },
{ _id: 4, cust_id: "xyz1", ord_date: ISODate("2013-10-11T17:04:11.102Z"), status: "D", amount: 125 },
{ _id: 5, cust_id: "abc1", ord_date: ISODate("2013-11-12T17:04:11.102Z"), status: "A", amount: 25 }
] );
{ "acknowledged" : true, "insertedIds" : [ 1, 2, 3, 4, 5 ] }
> db.orders.find({})
{ "_id" : 1, "cust_id" : "abc1", "ord_date" : ISODate("2012-11-02T17:04:11.102Z"), "status" : "A", "amount" : 50 }
{ "_id" : 2, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-01T17:04:11.102Z"), "status" : "A", "amount" : 100 }
{ "_id" : 3, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-12T17:04:11.102Z"), "status" : "D", "amount" : 25 }
{ "_id" : 4, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-11T17:04:11.102Z"), "status" : "D", "amount" : 125 }
{ "_id" : 5, "cust_id" : "abc1", "ord_date" : ISODate("2013-11-12T17:04:11.102Z"), "status" : "A", "amount" : 25 }
>
> db.orders.aggregate([
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
{ $sort: { total: -1 } }
])
{ "_id" : "xyz1", "total" : 100 }
{ "_id" : "abc1", "total" : 75 }
根据上述不难看出具体是怎么操作的,对sql有一定基础的应该可以很容易看懂
MongoDB的管道(Pipline操作)
MongoDB的聚合管道(Pipline)将MongoDB文档在一个阶段(Stage)处理完毕后将结果传递给下一个阶段(Stage)处理。阶段(Stage)操作是可以重复
的。
阶段 | 描述 | 类似于 SQL 中的 |
---|---|---|
$match | 用于过滤文档,只传递满足条件的文档到下一个阶段 | WHERE |
$group | 用于将文档分组,并可用于计算聚合值(如总和、平均值、计数等) | GROUP BY |
$project | 用于选择和重命名字段,或者创建计算字段 | SELECT |
$sort | 用于对文档进行排序 | ORDER BY |
$limit | 用于限制传递到下一个阶段的文档数量 | LIMIT |
$skip | 用于跳过指定数量的文档 | OFFSET |
$unwind | 用于将数组字段中的每个元素拆分为独立的文档 | N/A |
$bucket | 根据指定的边界将文档分组到不同的桶中 | N/A |
$facet | 允许在单个聚合管道中并行执行多个不同的子管道 | N/A |
代码示例:
$project
> db.orders.aggregate(
{ $project : {
_id : 0 , // 默认不显示_id
cust_id : 1 ,
status : 1
... }});
{ "cust_id" : "abc1", "status" : "A" }
{ "cust_id" : "xyz1", "status" : "A" }
{ "cust_id" : "xyz1", "status" : "D" }
{ "cust_id" : "xyz1", "status" : "D" }
{ "cust_id" : "abc1", "status" : "A" }
>
$skip
> db.orders.aggregate(
{ $skip : 4 });
{ "_id" : 5, "cust_id" : "abc1", "ord_date" : ISODate("2013-11-12T17:04:11.102Z"), "status" : "A", "amount" : 25 }
>
$unwind
> db.inventory2.insertOne({ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] })
{ "acknowledged" : true, "insertedId" : 1 }
> db.inventory2.aggregate( [ { $unwind : "$sizes" } ] )
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }
$bucket
> db.artwork.insertMany([
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
"price" : NumberDecimal("199.99") },
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
"price" : NumberDecimal("280.00") },
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
"price" : NumberDecimal("76.04") },
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
"price" : NumberDecimal("167.30") },
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
"price" : NumberDecimal("483.00") },
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
"price" : NumberDecimal("385.00") },
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 },
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
"price" : NumberDecimal("118.42") }
])
{
"acknowledged" : true,
"insertedIds" : [
1,
2,
3,
4,
5,
6,
7,
8
]
}
> db.artwork.find({})
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926, "price" : NumberDecimal("199.99") }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902, "price" : NumberDecimal("280.00") }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925, "price" : NumberDecimal("76.04") }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai", "price" : NumberDecimal("167.30") }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931, "price" : NumberDecimal("483.00") }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913, "price" : NumberDecimal("385.00") }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 } // 注意这里没有price,聚合结果中为Others
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918, "price" : NumberDecimal("118.42") }
> db.artwork.aggregate( [
{
$bucket: {
groupBy: "$price",
boundaries: [ 0, 200, 400 ],
default: "Other",
output: {
"count": { $sum: 1 },
"titles" : { $push: "$title" }
}
}
}
] )
{ "_id" : 0, "count" : 4, "titles" : [ "The Pillars of Society", "Dancer", "The Great Wave off Kanagawa", "Blue Flower" ] }
{ "_id" : 200, "count" : 2, "titles" : [ "Melancholy III", "Composition VII" ] }
{ "_id" : "Other", "count" : 2, "titles" : [ "The Persistence of Memory", "The Scream" ] }
这里有很多朋友短时间内看不懂,其实bucket就是按照边界值进行分桶操作,以上案例就是价格字段在0-200放一个桶,200-400放一个桶,没有价格的数据放到other中
$bucket + $facet
db.artwork.aggregate( [
{
$facet: {
"price": [
{
$bucket: {
groupBy: "$price",
boundaries: [ 0, 200, 400 ],
default: "Other",
output: {
"count": { $sum: 1 },
"artwork" : { $push: { "title": "$title", "price": "$price" } }
}
}
}
],
"year": [
{
$bucket: {
groupBy: "$year",
boundaries: [ 1890, 1910, 1920, 1940 ],
default: "Unknown",
output: {
"count": { $sum: 1 },
"artwork": { $push: { "title": "$title", "year": "$year" } }
}
}
}
]
}
}
] )
// 输出
{
"year" : [
{
"_id" : 1890,
"count" : 2,
"artwork" : [
{
"title" : "Melancholy III",
"year" : 1902
},
{
"title" : "The Scream",
"year" : 1893
}
]
},
{
"_id" : 1910,
"count" : 2,
"artwork" : [
{
"title" : "Composition VII",
"year" : 1913
},
{
"title" : "Blue Flower",
"year" : 1918
}
]
},
{
"_id" : 1920,
"count" : 3,
"artwork" : [
{
"title" : "The Pillars of Society",
"year" : 1926
},
{
"title" : "Dancer",
"year" : 1925
},
{
"title" : "The Persistence of Memory",
"year" : 1931
}
]
},
{
// Includes the document without a year, e.g., _id: 4
"_id" : "Unknown",
"count" : 1,
"artwork" : [
{
"title" : "The Great Wave off Kanagawa"
}
]
}
],
"price" : [
{
"_id" : 0,
"count" : 4,
"artwork" : [
{
"title" : "The Pillars of Society",
"price" : NumberDecimal("199.99")
},
{
"title" : "Dancer",
"price" : NumberDecimal("76.04")
},
{
"title" : "The Great Wave off Kanagawa",
"price" : NumberDecimal("167.30")
},
{
"title" : "Blue Flower",
"price" : NumberDecimal("118.42")
}
]
},
{
"_id" : 200,
"count" : 2,
"artwork" : [
{
"title" : "Melancholy III",
"price" : NumberDecimal("280.00")
},
{
"title" : "Composition VII",
"price" : NumberDecimal("385.00")
}
]
},
{
// Includes the document without a price, e.g., _id: 7
"_id" : "Other",
"count" : 2,
"artwork" : [
{
"title" : "The Persistence of Memory",
"price" : NumberDecimal("483.00")
},
{
"title" : "The Scream"
}
]
}
]
}
这里代码太长,可能有朋友没有足够的耐心看完,$bucket + $facet是非常常用的场景,这里解释一下,就是将两组bucket跟组合到了一起进行返回,可以按我自己的理解一个bucket就是多个List数组,List<List>,而一个facet就是在这个bucket在套一层List
更多的聚合关键字可以查看官方文档:https://www.mongodb.com/zh-cn/docs/manual/reference/operator/aggregation-pipeline/
MongoDB的聚合(Map Reduce)
图例:
代码示例:
{ "_id": 1, "customerId": "A123", "amount": 100 }
{ "_id": 2, "customerId": "B456", "amount": 200 }
{ "_id": 3, "customerId": "A123", "amount": 150 }
{ "_id": 4, "customerId": "C789", "amount": 50 }
{ "_id": 5, "customerId": "B456", "amount": 300 }
使用 MapReduce 来计算每个 customerId
的总 amount
。
// Map function
var mapFunction = function() {
emit(this.customerId, this.amount);
};
// Reduce function
var reduceFunction = function(customerId, amounts) {
return Array.sum(amounts);
};
// Execute MapReduce
db.orders.mapReduce(
mapFunction,
reduceFunction,
{ out: "order_totals" }
);
// 查看结果
db.order_totals.find().forEach(printjson);
{ "_id": "A123", "value": 250 }
{ "_id": "B456", "value": 500 }
{ "_id": "C789", "value": 50 }
- Map Function: 对于每个文档,
emit
函数将customerId
作为键,amount
作为值发射出去。 - Reduce Function: 对于每个唯一的
customerId
,reduceFunction
接收一个键和与该键相关联的所有值的数组,并返回这些值的总和。 - Output: 结果存储在
order_totals
集合中,每个文档包含一个customerId
和该客户的总订单金额。
MongoDB的索引
图例:
类型:
- 单一索引
{ "_id": 1, "username": "alice", "age": 30 }
{ "_id": 2, "username": "bob", "age": 25 }
db.users.createIndex({ username: 1 });
这里的 1 表示升序索引。对于降序索引,可以使用 -1
- 复合索引
db.users.createIndex({ username: 1, age: -1 });
- 多键索引
{ "_id": 1, "title": "MongoDB Basics", "tags": ["database", "NoSQL"] }
{ "_id": 2, "title": "Advanced MongoDB", "tags": ["database", "performance"] }
db.posts.createIndex({ tags: 1 });
- 文字索引
支持文本搜索。它们允许对字符串字段进行全文搜索。
{ "_id": 1, "content": "MongoDB is a NoSQL database" }
{ "_id": 2, "content": "Text search in MongoDB" }
我们可以在 content
字段上创建文字索引:
db.articles.createIndex({ content: "text" });
然后,我们可以执行全文搜索:
db.articles.find({ $text: { $search: "NoSQL" } });
- 地理空间索引
引用于加速地理位置查询。MongoDB 支持 2D 和 2DSphere 索引
{ "_id": 1, "name": "Central Park", "coordinates": [40.785091, -73.968285] }
{ "_id": 2, "name": "Golden Gate Bridge", "coordinates": [37.819929, -122.478255] }
我们可以在 coordinates
字段上创建 2DSphere 索引:
db.locations.createIndex({ coordinates: "2dsphere" });
- 哈希索引
用于均匀分布数据,适合需要高效等值查询的场景
{ "_id": 1, "sku": "A123" }
{ "_id": 2, "sku": "B456" }
我们可以在 sku
字段上创建哈希索引:
db.products.createIndex({ sku: "hashed" });
索引的操作:
查看集合索引
db.col.getIndexes()
查看集合索引大小
db.col.totalIndexSize()
删除集合所有索引
db.col.dropIndexes()
删除集合指定索引
db.col.dropIndex("索引名称")
原文地址:https://blog.csdn.net/Aaaaaaatwl/article/details/143527726
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!