时间:2020-09-28来源:www.pcxitongcheng.com作者:电脑系统城
MapReduce 能够计算非常复杂的聚合逻辑,非常灵活,但是,MapReduce非常慢,不应该用于实时的数据分析中。MapReduce能够在多台Server上并行执行,每台Server只负责完成一部分wordload,最后将wordload发送到Master Server上合并,计算出最终的结果集,返回客户端。
MapReduce的基本思想,如下图所示:

在这个例子中,我们以一个求和为例。首先执行Map阶段,把一个大任务拆分成若干个小任务,每个小任务运行在不同的节点上,从而支持分布式计算,这个阶段叫做Map(如蓝框所示);每个小任务输出的结果再进行二次计算,最后得到结果55,这个阶段叫做Reduce(如红框所示)。
使用MapReduce方式计算聚合,主要分为三步:Map,Shuffle(拼凑)和Reduce,Map和Reduce需要显式定义,shuffle由MongoDB来实现。
我们以下面的测试数据(员工数据)为例,来为大家演示。
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
db.emp.insert([{_id:7369,ename:'SMITH' ,job:'CLERK' ,mgr:7902,hiredate:'17-12-80',sal:800,comm:0,deptno:20},{_id:7499,ename:'ALLEN' ,job:'SALESMAN' ,mgr:7698,hiredate:'20-02-81',sal:1600,comm:300 ,deptno:30},{_id:7521,ename:'WARD' ,job:'SALESMAN' ,mgr:7698,hiredate:'22-02-81',sal:1250,comm:500 ,deptno:30},{_id:7566,ename:'JONES' ,job:'MANAGER' ,mgr:7839,hiredate:'02-04-81',sal:2975,comm:0,deptno:20},{_id:7654,ename:'MARTIN',job:'SALESMAN' ,mgr:7698,hiredate:'28-09-81',sal:1250,comm:1400,deptno:30},{_id:7698,ename:'BLAKE' ,job:'MANAGER' ,mgr:7839,hiredate:'01-05-81',sal:2850,comm:0,deptno:30},{_id:7782,ename:'CLARK' ,job:'MANAGER' ,mgr:7839,hiredate:'09-06-81',sal:2450,comm:0,deptno:10},{_id:7788,ename:'SCOTT' ,job:'ANALYST' ,mgr:7566,hiredate:'19-04-87',sal:3000,comm:0,deptno:20},{_id:7839,ename:'KING' ,job:'PRESIDENT',mgr:0,hiredate:'17-11-81',sal:5000,comm:0,deptno:10},{_id:7844,ename:'TURNER',job:'SALESMAN' ,mgr:7698,hiredate:'08-09-81',sal:1500,comm:0,deptno:30},{_id:7876,ename:'ADAMS' ,job:'CLERK' ,mgr:7788,hiredate:'23-05-87',sal:1100,comm:0,deptno:20},{_id:7900,ename:'JAMES' ,job:'CLERK' ,mgr:7698,hiredate:'03-12-81',sal:950,comm:0,deptno:30},{_id:7902,ename:'FORD' ,job:'ANALYST' ,mgr:7566,hiredate:'03-12-81',sal:3000,comm:0,deptno:20},{_id:7934,ename:'MILLER',job:'CLERK' ,mgr:7782,hiredate:'23-01-82',sal:1300,comm:0,deptno:10}]); |
| 1 2 3 |
var map1=function(){emit(this.job,1)}var reduce1=function(job,count){return Array.sum(count)}db.emp.mapReduce(map1,reduce1,{out:"mrdemo1"}) |
| 1 2 3 |
var map2=function(){emit(this.deptno,this.sal)}var reduce2=function(deptno,sal){return Array.sum(sal)}db.emp.mapReduce(map2,reduce2,{out:"mrdemo2"}) |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
定义自己的emit函数:var emit = function(key, value) {print("emit");print("key: " + key + " value: " + tojson(value));}测试一条数据:emp7839=db.emp.findOne({_id:7839})map2.apply(emp7839)输出以下结果:emitkey: 10 value: 5000测试多条数据:var myCursor=db.emp.find()while (myCursor.hasNext()) { var doc = myCursor.next(); print ("document _id= " + tojson(doc._id)); map2.apply(doc); print();} |
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
一个简单的测试案例var myTestValues = [ 5, 5, 10 ];var reduce1=function(key,values){return Array.sum(values)}reduce1("mykey",myTestValues)测试:Reduce的value包含多个值测试数据:薪水、奖金:var myTestObjects = [ { sal: 1000, comm: 5 }, { sal: 2000, comm: 10 }, { sal: 3000, comm: 15 } ];开发reduce方法:var reduce2=function(key,values) { reducedValue = { sal: 0, comm: 0 }; for(var i=0;i<values.length;i++) { reducedValue.sal += values[i].sal; reducedValue.comm += values[i].comm; } return reducedValue;}测试:reduce2("aa",myTestObjects) |
2023-11-01
React中immutable的使用2023-11-01
命令行清除Redis缓存的实现2023-11-01
Redis缓存空间优化实践详解引言大厂很多项目都是部署到多台服务器上,这些服务器在各个地区都存在,当我们访问服务时虽然执行的是同一个服务,但是可能是不同服务器运行的;在我学习项目时遇到这样一个登录情...
2023-11-01
1.多次修改一个redis的String过期键,如何保证他仍然能保留第一次设置时的删除时间 2.修改hash、set、Zset、list的值,会使过期时间重置吗?...
2023-11-01