Sqoop - Transfers data between Hadoop and Datastores

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses.



http://incubator.apache.org/sqoop/

Bookmark and Share          1262



comments powered by Disqus


Related Products

Pigpy - Pig report management tool

pypig - a python tool to manage Pig reports Pig provides an amazing set of tools to create complex relational processes on top of Hadoop, but it has a few missing pieces: # Looping constructs for easily creating multiple similar reports # Caching of intermediate calculations # Data management and cleanup code # Easy testing for report correctness pypig is an attempt to fill in these holes by providing a python module that knows how to talk to a Hadoop cluster and can create and manage complex re

Read more

Mrcl - Hadoop + CUBLAS

Mr.CL ProjectOverviewWe combine the power of two major tools for data processing: Hadoop and NVIDIA CUDA. Hadoop is for scalability over multiple nodes and CUDA is for speeding up block-level calculations. The goal of this project was improving distributed matrix multiplication on Hadoop. Hadoop emphasizes data locality using HDFS, but matrix multiplication has to access remote data unless columns or rows are aligned to each machine. This makes matrix multiplication infeasible on MapReduce schem

Read more

Hadoop4win - Hadoop for Windows using Cygwin

hadoop4winHadoop for Windows using Cygwin 本軟體專案由 國家高速網路與計算中心(NCHC) 贊助 軟體簡介hadoop4win,顧å��æ€�義為『Hadoop for Windowsã€�,主è¦�是æ��ä¾› Windows å¹³å�°ä¸Šç°¡æ˜“安è£� Hadoop 的批次安è£�檔。此批次安è£�檔內容,主è¦�å�ƒè€ƒè‡ªåœ‹ç¶²ä¸­å¿ƒä¼�éµ�é¾�與å†�生é¾�團隊æˆ�員孫振凱先生之 drbl-winroll 作å“�,抽å�–安è£�部分程å¼�改寫æˆ� hadoop4win 所需的步驟。 hadoop4win ç›®å‰�包å�«å››å¤§è»Ÿé«”組æˆ�: Cygwin - æ��ä¾

Read more

Parbash - Parallel BASH

Parallel BASH is a modified version of BASH intended for text processing on computer clusters. It enables use of common UNIX text processing tools (e.g., awk, perl, grep) across multicore or distributed systems. It is particularly suited for scalable processing of large (multi-GB or larger) files. parbash interprets scripts in the same way as BASH does except when a structure similar to the following is encountered: cat hdfs:/student_marks | grep ^A | sort | uniq -c > hdfs:/outIn this case, parb

Read more

cc2svn is a tool that converts ClearCase view files with history and labels to SVN dump

cc2svn tool converts ClearCase view files with all history and given labels to SVN dump. The dump can be loaded by SVN using 'cat svndump.txt | svnadmin load' command. Features:transfers history of changes for files saving the date, author and comment for each revision converts all/some/none branches (configurable) converts all/some/none labels (configurable) incremental dump mode retry/ignore failed CC commands cache for ClearCase files tested on Linux/Solaris, python 2.5/2.6 Main points:The to

Read more

Cascading - Data Processing Workflows on Hadoop

Cascading is a Data Processing API, Process Planner, and Process Scheduler used for defining and executing complex, scale-free, and fault tolerant data processing workflows on an Apache Hadoop cluster. It is a thin Java library and API that sits on top of Hadoop's MapReduce layer and is executed from the command line like any other Hadoop application.

Read more

Esutil - A variety of python utilities, with a focus on numerical data analysis.

This python packages includes a wide variety of utilities, focused primarily on numerical python, statistics, and file input/output. News2012-04-27: Tagged version 0.5.1 Release Notes 2012-01-15: Tagged version 0.5.0 2012-01-15: The comoving volume returned by the cosmology class is now for the whole sky, whereas previously it was per steradian. To get the old behavior, divide by 4*pi older news. Click here to view a full list of changes Getting the codeTo use stable versions, get one of the dow

Read more

Transfer-entropy-toolbox - Tools for computing delayed and higher-order transfer entropy

Transfer Entropy ToolboxA suite of MATLAB/C and C++ tools for computing standard and extended versions of Thomas Schreiber's transfer entropy on sparse, binary time series. What is Transfer Entropy (TE)?From Schreiber, 2000: An information theoretic measure is derived that quantifies the statistical coherence between systems evolving in time. The standard time delayed mutual information fails to distinguish information that is actually exchanged from shared information due to common history and

Read more

Reliabletransfer - a file transfer tool using reliable udp

Project GoalToday, applications have two choices when it comes to the transport channel: TCP or UDP. While UDP is an unreliable protocol, TCP provides reliability, flow control, and congestion control. TCP has an explicit connection establishment phase, which some application may not find desirable. Your job in this project is to design and implement a file transfer application which has all the good features of TCP without the connection establishment phase. Your application, reliable UDP, shou

Read more

Cascalog - Data processing on Hadoop

Cascalog is a fully-featured data processing and querying library for Clojure. The main use cases for Cascalog are processing Big Data on top of Hadoop or doing analysis on your local computer from the Clojure REPL. Cascalog is a replacement for tools like Pig, Hive, and Cascading.

Read more

Related Tags
Browse projects by tags.

We have collection of more than 400,000 open source products ranging from Enterprise product to small libraries in all platforms. We aggregate information from all open source repositories. Search and find the best for your needs.



Follow feeds Follow bestopensource on Twitter Follow bestopensource on Facebook


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.

Do you provide Consulting, Training, Support for any open source products. Register your business

Tag Cloud >>