Filtros

Mensagem de blog

mahout 0.5 基于 hadoop 的 CF 代码分析

mahout的taste框架是协同过滤算法的实现。它支持DataModel,如文件、数据库、NoSQL存储等,也支持hadoop的MapReduce。这里主要分析mahout0.5中的基于MR的实现。

Criado por Última atualização em 24/01/2019 - 16:00
Article

Hadoop 0.22.0 及其 RAID 部署

        使用0.20.X系列版本的Hadoop快有一年时间了,主要集中在HDFS上。期间自己参与了部署Hadoop集群(1 Server + 20 PC),也参与了分析HDFS的源码。

Criado por Última atualização em 24/01/2019 - 16:00
Mensagem de blog

ubuntu 中安装 hadoop 记录

Hadoop 版本:hadoop-1.2.1-bin.tar

Jdk 版本:jdk-6u30-linux-i586

Criado por Última atualização em 24/01/2019 - 16:00
Mensagem de blog

Optimizing Big Data processing with Haswell 256-bit Integer SIMD instructions

Big Data requires processing huge amounts of data. Intel Advanced Vector Extensions 2 (aka AVX2) promoted most Intel AVX 128-bits integer SIMD instruction sets to 256-bits.

Criado por gaston-hillar (Blackbelt) Última atualização em 06/07/2019 - 17:00
Article

大数据: 请认真对待

本文最初发表在InformationWeek上

Criado por Shen Zhou (Intel) Última atualização em 05/07/2019 - 14:15
Mensagem de blog

Experimenting with OpenStack* Sahara* on Docker* Containers

Docker* is an emerging technology that has become very popular recently in the market. It provides a flexible architecture to deploy applications. OpenStack* is another hot technology on the market. It has been available for several years, became more stable and also added more features support in recent releases.
Criado por WEITING C. (Intel) Última atualização em 06/07/2019 - 17:10
Article

Intel® Xeon® Processor E7 v3 Product Family

Criado por Nguyen, Khang T (Intel) Última atualização em 06/07/2019 - 16:40
Mensagem de blog

Restudy SchemaRDD in SparkSQL

At the very beginning, SchemaRDD was just designed as an attempt to make life easier for developers in their daily routines of code debugging and unit testing on SparkSQL core module. The idea can boil down to describing the data structures inside RDD using a formal description similar to the relational database schema. On top of all basic functions provided by common RDD APIs, SchemaRDD also...
Criado por Última atualização em 14/06/2017 - 16:50
Article

Intel® Parallel Computing Center at Georgia Institute of Technology

The Intel® Parallel Computing Center (Intel® PCC) on Big Data in Biosciences and Public Health is focused on developing and optimizing parallel algorithms and software on Intel® Xeon® Processor and Intel® Xeon Phi™ Coprocessor systems for handling high-throughput DNA sequencing data and gene expression data.
Criado por administrar Última atualização em 14/11/2017 - 08:27
Mensagem de blog

The JITter Conundrum - Just in Time for Your Traffic Jam

In interpreted languages, it just takes longer to get stuff done - I earlier gave the example where the Python source code a = b + c would result in a BINARY_ADD byte code which takes 78 machine instructions to do the add, but it's a single native ADD instruction if run in compiled language like C or C++. How can we speed this up? Or as the performance expert would say, how do I decrease...
Criado por David S. (Blackbelt) Última atualização em 04/07/2019 - 20:00