【MongoDB】MongoDB的聚合(Aggregate、Map Reduce)与管道(Pipline) 及索引详解（附详细案例）

🕗 发布于 2024-11-10 05:34 mongodb 数据库 Aggregate mapreduce Pipline

在这里插入图片描述

文章目录

MongoDB的聚合操作（Aggregate）

简单理解，其实本质跟sql一样，只不过写法不一样，仔细看以下示例

图例：
在这里插入图片描述

代码示例：

> db.orders.insertMany( [
     { _id: 1, cust_id: "abc1", ord_date: ISODate("2012-11-02T17:04:11.102Z"), status: "A", amount: 50 },
     { _id: 2, cust_id: "xyz1", ord_date: ISODate("2013-10-01T17:04:11.102Z"), status: "A", amount: 100 },
     { _id: 3, cust_id: "xyz1", ord_date: ISODate("2013-10-12T17:04:11.102Z"), status: "D", amount: 25 },
     { _id: 4, cust_id: "xyz1", ord_date: ISODate("2013-10-11T17:04:11.102Z"), status: "D", amount: 125 },
     { _id: 5, cust_id: "abc1", ord_date: ISODate("2013-11-12T17:04:11.102Z"), status: "A", amount: 25 }
 ] );
{ "acknowledged" : true, "insertedIds" : [ 1, 2, 3, 4, 5 ] }
> db.orders.find({})
{ "_id" : 1, "cust_id" : "abc1", "ord_date" : ISODate("2012-11-02T17:04:11.102Z"), "status" : "A", "amount" : 50 }
{ "_id" : 2, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-01T17:04:11.102Z"), "status" : "A", "amount" : 100 }
{ "_id" : 3, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-12T17:04:11.102Z"), "status" : "D", "amount" : 25 }
{ "_id" : 4, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-11T17:04:11.102Z"), "status" : "D", "amount" : 125 }
{ "_id" : 5, "cust_id" : "abc1", "ord_date" : ISODate("2013-11-12T17:04:11.102Z"), "status" : "A", "amount" : 25 }
>
> db.orders.aggregate([
                      { $match: { status: "A" } },
                      { $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
                      { $sort: { total: -1 } }
                   ])
{ "_id" : "xyz1", "total" : 100 }
{ "_id" : "abc1", "total" : 75 }

根据上述不难看出具体是怎么操作的，对sql有一定基础的应该可以很容易看懂

MongoDB的管道（Pipline操作）

MongoDB的聚合管道（Pipline）将MongoDB文档在一个阶段（Stage）处理完毕后将结果传递给下一个阶段（Stage）处理。阶段（Stage）操作是可以重复的。

阶段	描述	类似于 SQL 中的
$match	用于过滤文档，只传递满足条件的文档到下一个阶段	WHERE
$group	用于将文档分组，并可用于计算聚合值（如总和、平均值、计数等）	GROUP BY
$project	用于选择和重命名字段，或者创建计算字段	SELECT
$sort	用于对文档进行排序	ORDER BY
$limit	用于限制传递到下一个阶段的文档数量	LIMIT
$skip	用于跳过指定数量的文档	OFFSET
$unwind	用于将数组字段中的每个元素拆分为独立的文档	N/A
$bucket	根据指定的边界将文档分组到不同的桶中	N/A
$facet	允许在单个聚合管道中并行执行多个不同的子管道	N/A

代码示例：

$project

> db.orders.aggregate(
     { $project : {
         _id : 0 , // 默认不显示_id
         cust_id : 1 ,
         status : 1
...     }});
{ "cust_id" : "abc1", "status" : "A" }
{ "cust_id" : "xyz1", "status" : "A" }
{ "cust_id" : "xyz1", "status" : "D" }
{ "cust_id" : "xyz1", "status" : "D" }
{ "cust_id" : "abc1", "status" : "A" }
>

$skip

> db.orders.aggregate(
   { $skip : 4 });
{ "_id" : 5, "cust_id" : "abc1", "ord_date" : ISODate("2013-11-12T17:04:11.102Z"), "status" : "A", "amount" : 25 }
>

$unwind

> db.inventory2.insertOne({ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] })
{ "acknowledged" : true, "insertedId" : 1 }
> db.inventory2.aggregate( [ { $unwind : "$sizes" } ] )
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }

$bucket

> db.artwork.insertMany([
 { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
     "price" : NumberDecimal("199.99") },
 { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
     "price" : NumberDecimal("280.00") },
 { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
     "price" : NumberDecimal("76.04") },
 { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
     "price" : NumberDecimal("167.30") },
 { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
     "price" : NumberDecimal("483.00") },
 { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
     "price" : NumberDecimal("385.00") },
 { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 },
 { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
     "price" : NumberDecimal("118.42") }
 ])
{
        "acknowledged" : true,
        "insertedIds" : [
                1,
                2,
                3,
                4,
                5,
                6,
                7,
                8
        ]
}
> db.artwork.find({})
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926, "price" : NumberDecimal("199.99") }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902, "price" : NumberDecimal("280.00") }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925, "price" : NumberDecimal("76.04") }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai", "price" : NumberDecimal("167.30") }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931, "price" : NumberDecimal("483.00") }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913, "price" : NumberDecimal("385.00") }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 } // 注意这里没有price，聚合结果中为Others
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918, "price" : NumberDecimal("118.42") }
> db.artwork.aggregate( [
   {
     $bucket: {
       groupBy: "$price",
       boundaries: [ 0, 200, 400 ],
       default: "Other",
       output: {
         "count": { $sum: 1 },
         "titles" : { $push: "$title" }
       }
     }
   }
 ] )
{ "_id" : 0, "count" : 4, "titles" : [ "The Pillars of Society", "Dancer", "The Great Wave off Kanagawa", "Blue Flower" ] }
{ "_id" : 200, "count" : 2, "titles" : [ "Melancholy III", "Composition VII" ] }
{ "_id" : "Other", "count" : 2, "titles" : [ "The Persistence of Memory", "The Scream" ] }

这里有很多朋友短时间内看不懂，其实bucket就是按照边界值进行分桶操作，以上案例就是价格字段在0-200放一个桶，200-400放一个桶，没有价格的数据放到other中

$bucket + $facet

db.artwork.aggregate( [
  {
    $facet: {
      "price": [
        {
          $bucket: {
              groupBy: "$price",
              boundaries: [ 0, 200, 400 ],
              default: "Other",
              output: {
                "count": { $sum: 1 },
                "artwork" : { $push: { "title": "$title", "price": "$price" } }
              }
          }
        }
      ],
      "year": [
        {
          $bucket: {
            groupBy: "$year",
            boundaries: [ 1890, 1910, 1920, 1940 ],
            default: "Unknown",
            output: {
              "count": { $sum: 1 },
              "artwork": { $push: { "title": "$title", "year": "$year" } }
            }
          }
        }
      ]
    }
  }
] )

// 输出
{
  "year" : [
    {
      "_id" : 1890,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Melancholy III",
          "year" : 1902
        },
        {
          "title" : "The Scream",
          "year" : 1893
        }
      ]
    },
    {
      "_id" : 1910,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Composition VII",
          "year" : 1913
        },
        {
          "title" : "Blue Flower",
          "year" : 1918
        }
      ]
    },
    {
      "_id" : 1920,
      "count" : 3,
      "artwork" : [
        {
          "title" : "The Pillars of Society",
          "year" : 1926
        },
        {
          "title" : "Dancer",
          "year" : 1925
        },
        {
          "title" : "The Persistence of Memory",
          "year" : 1931
        }
      ]
    },
    {
      // Includes the document without a year, e.g., _id: 4
      "_id" : "Unknown",
      "count" : 1,
      "artwork" : [
        {
          "title" : "The Great Wave off Kanagawa"
        }
      ]
    }
  ],
      "price" : [
    {
      "_id" : 0,
      "count" : 4,
      "artwork" : [
        {
          "title" : "The Pillars of Society",
          "price" : NumberDecimal("199.99")
        },
        {
          "title" : "Dancer",
          "price" : NumberDecimal("76.04")
        },
        {
          "title" : "The Great Wave off Kanagawa",
          "price" : NumberDecimal("167.30")
        },
        {
          "title" : "Blue Flower",
          "price" : NumberDecimal("118.42")
        }
      ]
    },
    {
      "_id" : 200,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Melancholy III",
          "price" : NumberDecimal("280.00")
        },
        {
          "title" : "Composition VII",
          "price" : NumberDecimal("385.00")
        }
      ]
    },
    {
      // Includes the document without a price, e.g., _id: 7
      "_id" : "Other",
      "count" : 2,
      "artwork" : [
        {
          "title" : "The Persistence of Memory",
          "price" : NumberDecimal("483.00")
        },
        {
          "title" : "The Scream"
        }
      ]
    }
  ]
}

这里代码太长，可能有朋友没有足够的耐心看完，$bucket + $facet是非常常用的场景，这里解释一下，就是将两组bucket跟组合到了一起进行返回，可以按我自己的理解一个bucket就是多个List数组，List<List>,而一个facet就是在这个bucket在套一层List

更多的聚合关键字可以查看官方文档：https://www.mongodb.com/zh-cn/docs/manual/reference/operator/aggregation-pipeline/

在这里插入图片描述

MongoDB的聚合（Map Reduce）

图例：

在这里插入图片描述

代码示例：

{ "_id": 1, "customerId": "A123", "amount": 100 }
{ "_id": 2, "customerId": "B456", "amount": 200 }
{ "_id": 3, "customerId": "A123", "amount": 150 }
{ "_id": 4, "customerId": "C789", "amount": 50 }
{ "_id": 5, "customerId": "B456", "amount": 300 }

使用 MapReduce 来计算每个 customerId 的总 amount。

// Map function
var mapFunction = function() {
    emit(this.customerId, this.amount);
};

// Reduce function
var reduceFunction = function(customerId, amounts) {
    return Array.sum(amounts);
};

// Execute MapReduce
db.orders.mapReduce(
    mapFunction,
    reduceFunction,
    { out: "order_totals" }
);

// 查看结果
db.order_totals.find().forEach(printjson);

{ "_id": "A123", "value": 250 }
{ "_id": "B456", "value": 500 }
{ "_id": "C789", "value": 50 }

Map Function: 对于每个文档，emit 函数将 customerId 作为键，amount 作为值发射出去。
Reduce Function: 对于每个唯一的 customerId，reduceFunction 接收一个键和与该键相关联的所有值的数组，并返回这些值的总和。
Output: 结果存储在 order_totals 集合中，每个文档包含一个 customerId 和该客户的总订单金额。

MongoDB的索引

图例：

在这里插入图片描述

类型：

单一索引

{ "_id": 1, "username": "alice", "age": 30 }
{ "_id": 2, "username": "bob", "age": 25 }

在这里插入图片描述

db.users.createIndex({ username: 1 });

这里的 1 表示升序索引。对于降序索引，可以使用 -1

复合索引

在这里插入图片描述

db.users.createIndex({ username: 1, age: -1 });

多键索引

{ "_id": 1, "title": "MongoDB Basics", "tags": ["database", "NoSQL"] }
{ "_id": 2, "title": "Advanced MongoDB", "tags": ["database", "performance"] }

在这里插入图片描述

db.posts.createIndex({ tags: 1 });

文字索引

支持文本搜索。它们允许对字符串字段进行全文搜索。

{ "_id": 1, "content": "MongoDB is a NoSQL database" }
{ "_id": 2, "content": "Text search in MongoDB" }

我们可以在 content 字段上创建文字索引：

db.articles.createIndex({ content: "text" });

然后，我们可以执行全文搜索：

db.articles.find({ $text: { $search: "NoSQL" } });

地理空间索引

引用于加速地理位置查询。MongoDB 支持 2D 和 2DSphere 索引

{ "_id": 1, "name": "Central Park", "coordinates": [40.785091, -73.968285] }
{ "_id": 2, "name": "Golden Gate Bridge", "coordinates": [37.819929, -122.478255] }

我们可以在 coordinates 字段上创建 2DSphere 索引：

db.locations.createIndex({ coordinates: "2dsphere" });

哈希索引

用于均匀分布数据，适合需要高效等值查询的场景

{ "_id": 1, "sku": "A123" }
{ "_id": 2, "sku": "B456" }

我们可以在 sku 字段上创建哈希索引：

db.products.createIndex({ sku: "hashed" });

索引的操作：

查看集合索引

db.col.getIndexes()

查看集合索引大小

db.col.totalIndexSize()

删除集合所有索引

db.col.dropIndexes()

删除集合指定索引

db.col.dropIndex("索引名称")

原文地址：https://blog.csdn.net/Aaaaaaatwl/article/details/143527726

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：html的week控件获取周(星期)的第一天(周一)和最后一天(周日)
下一篇：1.集合体系补充（1）

删除单链表的重复结点（题型合集）
2.求删除有序链表中重复的元素（无头结点）1.删除重复的结点保留一个（带头结点）
阅读更多2024-11-27
单调队列解决滑动窗口问题
由于每个元素最多入队出队一次，所以整个时间复杂度是0(n)
阅读更多2024-11-27
学习threejs，使用设置lightMap光照贴图创建阴影效果
本文详细介绍如何基于threejs在三维场景中设置lightMap光照贴图创建阴影效果，亲测可用。希望能帮助到您。一起学习，加油！加油！
阅读更多2024-11-27
【ArcGIS Pro实操第10期】统计某个shp文件中不同区域内的站点数
统计某个shp文件中不同区域内的站点数
阅读更多2024-11-27
STM32外设应用
STM32的外设种类丰富，包含了数字和模拟输入/输出、通信接口、定时器、PWM等，适用于各种控制和通信场景。STM32的外设模块可以通过配置寄存器进行详细控制。GPIO（通用输入输出口）ADC（模拟数
阅读更多2024-11-27
matlab根据excel表头筛选表格数据
代码读取表格得到：包含数字和字母的表格。但其中部分数字被当作字母读取了，这样会降低数字的精度。筛选num中的3后:即得到style中的A，color中的F2，num中的3所对应的数据。如果要筛选sty
阅读更多2024-11-27
即插即用的3D神经元注意算法！
介绍一个新颖的注意力算法--SimAM，可用在多种任务中，并实现了即插即用的功能
阅读更多2024-11-27
【算法一周目】滑动窗口（2）
从暴力美学中领悟滑动窗口，解析指针永不后退的奥妙。
阅读更多2024-11-27
高效制作定期Excel报表：自动化与模板化的策略
定期制作Excel报表虽然看似繁琐，但通过合理利用Excel的自动化功能、建立模板化报表以及结合第三方工具，可以显著提升工作效率和报表质量。关键在于找到适合自己的方法，并不断优化和完善报表制作流程。记
阅读更多2024-11-27
Lua--1.基础知识
lua中所有的变量申明都不需要申明变量类型他会自动的判断类型(字符串可以进行算数运算符操作会自动转成number。没有复合运算符 += -= /= *= %=–函数名相同参数类型不同或者
阅读更多2024-11-27

【MongoDB】MongoDB的聚合(Aggregate、Map Reduce)与管道(Pipline) 及索引详解（附详细案例）

文章目录

MongoDB的聚合操作（Aggregate）

MongoDB的管道（Pipline操作）

MongoDB的聚合（Map Reduce）

MongoDB的索引

相关文章