Cascalog - Data processing on Hadoop
Cascalog is a fully-featured data processing and querying library for Clojure. The main use cases for Cascalog are processing Big Data on top of Hadoop or doing analysis on your local computer from the Clojure REPL. Cascalog is a replacement for tools like Pig, Hive, and Cascading.
comments powered by Disqus
OpenMobster, is an open source Enterprise Backend for Mobile Apps. It provides a bi-directional data synchronization service for mobile apps to synchronize their locally stored database with Enterprise services in the Cloud such as server apps, CRM, ERP, etc. It supports a platform-agnostic Cloud-initiated Push Notification System. It has framework for creating end-to-end Location Aware Apps.
HandBrake is a tool for converting video from nearly any format to a selection of modern, widely supported codecs. It converts video from nearly any format. Handbrake can process most common multimedia files and any DVD or BluRay sources that do not contain any kind of copy protection.
hadoop4winHadoop for Windows using Cygwin æœ¬è»Ÿé«”å°ˆæ¡ˆç”± åœ‹å®¶é«˜é€Ÿç¶²è·¯èˆ‡è¨ˆç®—ä¸å¿ƒ(NCHC) è´ŠåŠ© è»Ÿé«”ç°¡ä»‹hadoop4winï¼Œé¡§å��æ€�ç¾©ç‚ºã€ŽHadoop for Windowsã€�ï¼Œä¸»è¦�æ˜¯æ��ä¾› Windows å¹³å�°ä¸Šç°¡æ˜“å®‰è£� Hadoop çš„æ‰¹æ¬¡å®‰è£�æª”ã€‚æ¤æ‰¹æ¬¡å®‰è£�æª”å…§å®¹ï¼Œä¸»è¦�å�ƒè€ƒè‡ªåœ‹ç¶²ä¸å¿ƒä¼�éµ�é¾�èˆ‡å†�ç”Ÿé¾�åœ˜éšŠæˆ�å“¡å«æŒ¯å‡±å…ˆç”Ÿä¹‹ drbl-winroll ä½œå“�ï¼ŒæŠ½å�–å®‰è£�éƒ¨åˆ†ç¨‹å¼�æ”¹å¯«æˆ� hadoop4win æ‰€éœ€çš„æ¥é©Ÿã€‚ hadoop4win ç›®å‰�åŒ…å�«å››å¤§è»Ÿé«”çµ„æˆ�ï¼š Cygwin - æ��ä¾
What is HBase-util?HBase-util is open source module that enables it to store bean class directly into HBase tables (http://hbase.org/) running on the Hadoop Distributed FileSystem (http://hadoop.apache.org/core/) this project contributed apache hbase(http://wiki.apache.org/hadoop/Hbase) This is not JDO (persistence api). just simple module for hbase hbase-util can make to handle the hbase more easily this project can help you for executing java program simply. http://code.google.com/p/simple-jav
pypig - a python tool to manage Pig reports Pig provides an amazing set of tools to create complex relational processes on top of Hadoop, but it has a few missing pieces: # Looping constructs for easily creating multiple similar reports # Caching of intermediate calculations # Data management and cleanup code # Easy testing for report correctness pypig is an attempt to fill in these holes by providing a python module that knows how to talk to a Hadoop cluster and can create and manage complex re
GoldenOrb is a cloud-based project for massive-scale graph analysis, built upon Apache Hadoop and modeled after Google's Pregel architecture. It provides solutions to complex data problems, remove limits to innovation and contribute to the emerging ecosystem that spans all aspects of big data analysis. It enables users to run analytics on entire data sets instead of samples.
Mr.CL ProjectOverviewWe combine the power of two major tools for data processing: Hadoop and NVIDIA CUDA. Hadoop is for scalability over multiple nodes and CUDA is for speeding up block-level calculations. The goal of this project was improving distributed matrix multiplication on Hadoop. Hadoop emphasizes data locality using HDFS, but matrix multiplication has to access remote data unless columns or rows are aligned to each machine. This makes matrix multiplication infeasible on MapReduce schem
What is PrIter?PrIter is a modified version of Hadoop MapReduce framework that supports prioritized iterative computation, which support a large collection of iterative algorithms, including pagerank and shortest path. PrIter runs on a cluster of commodity PCs or in Amazon EC2 cloud. It ensures faster convergence of iterative process by reorganizing the update order of data items. Priter also supports online queries and generates top-k result snapshot every period of time. For details, please re
Parallel BASH is a modified version of BASH intended for text processing on computer clusters. It enables use of common UNIX text processing tools (e.g., awk, perl, grep) across multicore or distributed systems. It is particularly suited for scalable processing of large (multi-GB or larger) files. parbash interprets scripts in the same way as BASH does except when a structure similar to the following is encountered: cat hdfs:/student_marks | grep ^A | sort | uniq -c > hdfs:/outIn this case, parb
Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.