大数据分析手段精准营销spark技术

他入错了行 · 发表于 2017-1-3 17:11:59

Chapter 1: Getting Started with Apache Spark 1
Introduction 1
Installing Spark from binaries 3
Building the Spark source code with Maven 5
Launching Spark on Amazon EC2 7
Deploying on a cluster in standalone mode 12
Deploying on a cluster with Mesos 16
Deploying on a cluster with YARN 18
Using Tachyon as an off-heap storage layer 21
Chapter 2: Developing Applications with Spark 27
Introduction 27
Exploring the Spark shell 27
Developing Spark applications in Eclipse with Maven 29
Developing Spark applications in Eclipse with SBT 33
Developing a Spark application in IntelliJ IDEA with Maven 34
Developing a Spark application in IntelliJ IDEA with SBT 36
Chapter 3: External Data Sources 39
Introduction 39
Loading data from the local filesystem 40
Loading data from HDFS 41
Loading data from HDFS using a custom InputFormat 45
Loading data from Amazon S3 47
Loading data from Apache Cassandra 49
Loading data from relational databases 54
ii
Table of Contents
Chapter 4: Spark SQL 57
Introduction 57
Understanding the Catalyst optimizer 60
Creating HiveContext 63
Inferring schema using case classes 65
Programmatically specifying the schema 66
Loading and saving data using the Parquet format 69
Loading and saving data using the JSON format 72
Loading and saving data from relational databases 74
Loading and saving data from an arbitrary source 76
Chapter 5: Spark Streaming 79
Introduction 79
Word count using Streaming 82
Streaming Twitter data 83
Streaming using Kafka 88
Chapter 6: Getting Started with Machine Learning Using MLlib 95
Introduction 95
Creating vectors 96
Creating a labeled point 98
Creating matrices 99
Calculating summary statistics 101
Calculating correlation 102
Doing hypothesis testing 104
Creating machine learning pipelines using ML 105
Chapter 7: Supervised Learning with MLlib ?Regression 109
Introduction 109
Using linear regression 111
Understanding cost function 113
Doing linear regression with lasso 118
Doing ridge regression 120
Chapter 8: Supervised Learning with MLlib ?Classification 121
Introduction 121
Doing classification using logistic regression 122
Doing binary classification using SVM 128
Doing classification using decision trees 131
Doing classification using Random Forests 138
Doing classification using Gradient Boosted Trees 143
Doing classification with Na飗e Bayes 145
iii
Table of Contents
Chapter 9: Unsupervised Learning with MLlib 147
Introduction 147
Clustering using k-means 148
Dimensionality reduction with principal component analysis 155
Dimensionality reduction with singular value decomposition 161
Chapter 10: Recommender Systems 167
Introduction 167
Collaborative filtering using explicit feedback 169
Collaborative filtering using implicit feedback 172
Chapter 11: Graph Processing Using GraphX 177
Introduction 177
Fundamental operations on graphs 178
Using PageRank 179
Finding connected components 181
Performing neighborhood aggregation 184
Chapter 12: Optimizations and Performance Tuning 187
Introduction 187
Optimizing memory 190
Using compression to improve performance 193
Using serialization to improve performance 193
Optimizing garbage collection 194
Optimizing the level of parallelism 195
Understanding the future of optimization ?project Tungsten 196
Index 199

goodyeah · 发表于 2017-1-3 20:30:02

文档打不开

百里登风 · 发表于 2017-1-3 20:40:05

文档打不开

wyhgood · 发表于 2017-1-3 23:42:56

spark这种技术 aff圈会用吗

54clz · 发表于 2017-1-4 08:09:38

spark是什么技术？

他入错了行 · 发表于 2017-1-4 09:15:10

OReilly.Learning.Spark.2015.1.pdf 看这里！不知道为什么传上去打不开了重新传了一份在这里，，

saoyang · 发表于 2018-5-5 03:44:55

下来看看的

		自动登录	找回密码
密码			立即注册

谷歌+Bing+TT+MSN官方代理	⚡️按条S5代理⚡️静态⚡️独享⚡️5G	皇家代理IP⚡️#1性价比⚡️	Mediabuy⚡️玩家开户首选
【鲁班跨境通-自助充值转账】	FB/GG/TT❤️官方免费开户	Affiliate 全媒体流量资源⚡️	Taboola/Outbrain /Bing⚡️一级代理
*开户投流-724h❤️人工在线**	【官方】❤️搜索套利买量投流开户	独立站⚡️开户投放	FB BM不限额，短id账单户
E.PN 虚拟卡	DuoPlus专注打造跨境电商云手机	BINOM TRACKER 60% OFF!	比Adplexity还好用的Spy工具
ADPLEXITY + ADVERTCN	7200W全球动态不重复住宅IP代理	虚拟信用卡+独立站收款	全球虚拟卡, 支持U充值
Facebook 批量上广告	尤里改 - FB 稳定投放	免费黑五教程（持续更新、欢迎交流）	FB 三不限源头 - 自助下户充值转款
各种主页、账单户、BM户（优势）	⚡️个人户，bm户不限额，账单户	Google、Bing官方总代联盟流量开户	FB资源，账单户，分享户，国内一手
FB企业户BM户账单户源头	海外CL企业户源头	PTM全球虚拟卡—进来交个朋友!	PTM虚拟卡⚡️费率透明⚡️额度随心
FB虚拟卡⚡️消费越多返现越多	虚拟卡 - Pay2.House	【找量】BA独家Nutra单找量	广告位出租
8500万高质量住宅IP，助力各种需求	虚拟卡返佣1%，国内持牌机构

大数据分析手段精准营销spark技术

本帖子中包含更多资源

相关帖子

本帖子中包含更多资源

社区QQ达人