Sqoop - Transfers data between Hadoop and Datastores
Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.
http://incubator.apache.org/sqoop/
comments powered by Disqus
Related Products
Pigpy - Pig report management tool
pypig - a python tool to manage Pig reports Pig provides an amazing set of tools to create complex relational processes on top of Hadoop, but it has a few missing pieces: # Looping constructs for easily creating multiple similar reports # Caching of intermediate calculations # Data management and cleanup code # Easy testing for report correctness pypig is an attempt to fill in these holes by providing a python module that knows how to talk to a Hadoop cluster and can create and manage complex re
Mrcl - Hadoop + CUBLAS
Mr.CL ProjectOverviewWe combine the power of two major tools for data processing: Hadoop and NVIDIA CUDA. Hadoop is for scalability over multiple nodes and CUDA is for speeding up block-level calculations. The goal of this project was improving distributed matrix multiplication on Hadoop. Hadoop emphasizes data locality using HDFS, but matrix multiplication has to access remote data unless columns or rows are aligned to each machine. This makes matrix multiplication infeasible on MapReduce schem
Hadoop4win - Hadoop for Windows using Cygwin
hadoop4winHadoop for Windows using Cygwin 本軟體專案由 國家高速網路與計算ä¸å¿ƒ(NCHC) 贊助 軟體簡介hadoop4win,顧å��æ€�義為『Hadoop for Windowsã€�,主è¦�是æ��ä¾› Windows å¹³å�°ä¸Šç°¡æ˜“安è£� Hadoop 的批次安è£�æª”ã€‚æ¤æ‰¹æ¬¡å®‰è£�檔內容,主è¦�å�ƒè€ƒè‡ªåœ‹ç¶²ä¸å¿ƒä¼�éµ�é¾�與å†�生é¾�團隊æˆ�員嫿Œ¯å‡±å…ˆç”Ÿä¹‹ drbl-winroll 作å“�,抽å�–安è£�部分程å¼�改寫æˆ� hadoop4win 所需的æ¥é©Ÿã€‚ hadoop4win ç›®å‰�包å�«å››å¤§è»Ÿé«”組æˆ�: Cygwin - æ��ä¾
Parbash - Parallel BASH
Parallel BASH is a modified version of BASH intended for text processing on computer clusters. It enables use of common UNIX text processing tools (e.g., awk, perl, grep) across multicore or distributed systems. It is particularly suited for scalable processing of large (multi-GB or larger) files. parbash interprets scripts in the same way as BASH does except when a structure similar to the following is encountered: cat hdfs:/student_marks | grep ^A | sort | uniq -c > hdfs:/outIn this case, parb
cc2svn is a tool that converts ClearCase view files with history and labels to SVN dump
cc2svn tool converts ClearCase view files with all history and given labels to SVN dump. The dump can be loaded by SVN using 'cat svndump.txt | svnadmin load' command. Features:transfers history of changes for files saving the date, author and comment for each revision converts all/some/none branches (configurable) converts all/some/none labels (configurable) incremental dump mode retry/ignore failed CC commands cache for ClearCase files tested on Linux/Solaris, python 2.5/2.6 Main points:The to
Cascading - Data Processing Workflows on Hadoop
Cascading is a Data Processing API, Process Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on an Apache Hadoop cluster. It is a thin Java library and API that sits on top of Hadoop's MapReduce layer and is executed from the command line like any other Hadoop application.
Esutil - A variety of python utilities, with a focus on numerical data analysis.
This python packages includes a wide variety of utilities, focused primarily on numerical python, statistics, and file input/output. News2012-04-27: Tagged version 0.5.1 Release Notes 2012-01-15: Tagged version 0.5.0 2012-01-15: The comoving volume returned by the cosmology class is now for the whole sky, whereas previously it was per steradian. To get the old behavior, divide by 4*pi older news. Click here to view a full list of changes Getting the codeTo use stable versions, get one of the dow
Transfer-entropy-toolbox - Tools for computing delayed and higher-order transfer entropy
Transfer Entropy ToolboxA suite of MATLAB/C and C++ tools for computing standard and extended versions of Thomas Schreiber's transfer entropy on sparse, binary time series. What is Transfer Entropy (TE)?From Schreiber, 2000: An information theoretic measure is derived that quantifies the statistical coherence between systems evolving in time. The standard time delayed mutual information fails to distinguish information that is actually exchanged from shared information due to common history and
Reliabletransfer - a file transfer tool using reliable udp
Project GoalToday, applications have two choices when it comes to the transport channel: TCP or UDP. While UDP is an unreliable protocol, TCP provides reliability, flow control, and congestion control. TCP has an explicit connection establishment phase, which some application may not find desirable. Your job in this project is to design and implement a file transfer application which has all the good features of TCP without the connection establishment phase. Your application, reliable UDP, shou
Cascalog - Data processing on Hadoop
Cascalog is a fully-featured data processing and querying library for Clojure. The main use cases for Cascalog are processing Big Data on top of Hadoop or doing analysis on your local computer from the Clojure REPL. Cascalog is a replacement for tools like Pig, Hive, and Cascading.