Cascalog - Data processing on Hadoop

Cascalog is a fully-featured data processing and querying library for Clojure. The main use cases for Cascalog are processing Big Data on top of Hadoop or doing analysis on your local computer from the Clojure REPL. Cascalog is a replacement for tools like Pig, Hive, and Cascading.



https://github.com/nathanmarz/cascalog

Bookmark and Share          1507



comments powered by Disqus


Related Products

OpenMobster - Open Source Mobile Cloud Platform

OpenMobster, is an open source Enterprise Backend for Mobile Apps. It provides a bi-directional data synchronization service for mobile apps to synchronize their locally stored database with Enterprise services in the Cloud such as server apps, CRM, ERP, etc. It supports a platform-agnostic Cloud-initiated Push Notification System. It has framework for creating end-to-end Location Aware Apps.

Read more

Handbrake - The open source video transcoder

HandBrake is a tool for converting video from nearly any format to a selection of modern, widely supported codecs. It converts video from nearly any format. Handbrake can process most common multimedia files and any DVD or BluRay sources that do not contain any kind of copy protection.

Read more

Hadoop4win - Hadoop for Windows using Cygwin

hadoop4winHadoop for Windows using Cygwin 本軟體專案由 國家高速網路與計算中心(NCHC) 贊助 軟體簡介hadoop4win,顧å��æ€�義為『Hadoop for Windowsã€�,主è¦�是æ��ä¾› Windows å¹³å�°ä¸Šç°¡æ˜“安è£� Hadoop 的批次安è£�檔。此批次安è£�檔內容,主è¦�å�ƒè€ƒè‡ªåœ‹ç¶²ä¸­å¿ƒä¼�éµ�é¾�與å†�生é¾�團隊æˆ�員孫振凱先生之 drbl-winroll 作å“�,抽å�–安è£�部分程å¼�改寫æˆ� hadoop4win 所需的步驟。 hadoop4win ç›®å‰�包å�«å››å¤§è»Ÿé«”組æˆ�: Cygwin - æ��ä¾

Read more

Hbase-jdo - Simple util with hbase

What is HBase-util?HBase-util is open source module that enables it to store bean class directly into HBase tables (http://hbase.org/) running on the Hadoop Distributed FileSystem (http://hadoop.apache.org/core/) this project contributed apache hbase(http://wiki.apache.org/hadoop/Hbase) This is not JDO (persistence api). just simple module for hbase hbase-util can make to handle the hbase more easily this project can help you for executing java program simply. http://code.google.com/p/simple-jav

Read more

Pigpy - Pig report management tool

pypig - a python tool to manage Pig reports Pig provides an amazing set of tools to create complex relational processes on top of Hadoop, but it has a few missing pieces: # Looping constructs for easily creating multiple similar reports # Caching of intermediate calculations # Data management and cleanup code # Easy testing for report correctness pypig is an attempt to fill in these holes by providing a python module that knows how to talk to a Hadoop cluster and can create and manage complex re

Read more

GoldenOrb - Scalable Graph Analysis

GoldenOrb is a cloud-based project for massive-scale graph analysis, built upon Apache Hadoop and modeled after Google's Pregel architecture. It provides solutions to complex data problems, remove limits to innovation and contribute to the emerging ecosystem that spans all aspects of big data analysis. It enables users to run analytics on entire data sets instead of samples.

Read more

Mrcl - Hadoop + CUBLAS

Mr.CL ProjectOverviewWe combine the power of two major tools for data processing: Hadoop and NVIDIA CUDA. Hadoop is for scalability over multiple nodes and CUDA is for speeding up block-level calculations. The goal of this project was improving distributed matrix multiplication on Hadoop. Hadoop emphasizes data locality using HDFS, but matrix multiplication has to access remote data unless columns or rows are aligned to each machine. This makes matrix multiplication infeasible on MapReduce schem

Read more

Priter - Distributed Computing Framework for Prioritized Iteration

What is PrIter?PrIter is a modified version of Hadoop MapReduce framework that supports prioritized iterative computation, which support a large collection of iterative algorithms, including pagerank and shortest path. PrIter runs on a cluster of commodity PCs or in Amazon EC2 cloud. It ensures faster convergence of iterative process by reorganizing the update order of data items. Priter also supports online queries and generates top-k result snapshot every period of time. For details, please re

Read more

Parbash - Parallel BASH

Parallel BASH is a modified version of BASH intended for text processing on computer clusters. It enables use of common UNIX text processing tools (e.g., awk, perl, grep) across multicore or distributed systems. It is particularly suited for scalable processing of large (multi-GB or larger) files. parbash interprets scripts in the same way as BASH does except when a structure similar to the following is encountered: cat hdfs:/student_marks | grep ^A | sort | uniq -c > hdfs:/outIn this case, parb

Read more

Sqoop - Transfers data between Hadoop and Datastores

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.

Read more

Related Tags
Browse projects by tags.