博客

Ways to Speed up your Cloud Environment and Workload Performance on Intel® Architecture

Setting up a cloud environment is complicated, and it involves multiple elements such as database, network infrastructure, security, etc., (depending on the need).  How do you increase the p

作者: Thai Le (Intel) 最后更新时间: 2019/07/04 - 17:05
博客

Use HiBench as a representative proxy for benchmarking Hadoop applications

As any good engineer knows, “if you cannot measure it, you cannot improve it.” And a representative benchmark suite is the key for measuring any computer systems.

作者: Jason Dai (Intel) 最后更新时间: 2019/07/03 - 20:08
博客

mahout 0.5 基于 hadoop 的 CF 代码分析

mahout的taste框架是协同过滤算法的实现。它支持DataModel,如文件、数据库、NoSQL存储等,也支持hadoop的MapReduce。这里主要分析mahout0.5中的基于MR的实现。

作者: 最后更新时间: 2019/01/24 - 16:00
博客

Benefits of Intel® Enterprise class SSD

In this blog, I want share with you the benefits of the Intel® Enterprise Class Solid-State Drive (SSD).  I have compiled a list of articles, white papers, solution briefs, and blogs and provided l

作者: Thai Le (Intel) 最后更新时间: 2019/07/04 - 10:36
博客

Ceph Erasure Coding Introduction

Ceph introduction
作者: Yuan Zhou (Intel) 最后更新时间: 2017/06/14 - 15:45
博客

Restudy SchemaRDD in SparkSQL

At the very beginning, SchemaRDD was just designed as an attempt to make life easier for developers in their daily routines of code debugging and unit testing on SparkSQL core module. The idea can boil down to describing the data structures inside RDD using a formal description similar to the relational database schema. On top of all basic functions provided by common RDD APIs, SchemaRDD also...
作者: 最后更新时间: 2017/06/14 - 16:50
博客

Hadoop RPC机制+源码分析

 一、RPC基本原理

作者: 最后更新时间: 2019/07/03 - 20:08
博客

Big Performance Gains for Big Data

Imagine two teams of data analysts working on the same goal: to extract usable business intelligence (BI) from massive, growing data sets.

作者: 最后更新时间: 2019/01/28 - 15:20
博客

Getting Started with Tachyon by Use Cases

In-memory computing has become an irreversible trend in big data technology, for which the wide popularity of Spark provides a good evidence. Meanwhile, memory storage and management for large data sets are still posing challenges. Out of numerous solutions, Tachyon, a memory-centric distributed storage, well solves the problems faced by many application scenarios. For example, it avoids severe...
作者: 最后更新时间: 2019/06/07 - 16:01