hadoop - move data from HDFS to RDS directly -

- March 15, 2015

background: working on web project expose analytical data stored on local mssql database. database updated regularly. emr cluster responsible use custom hive scripts process raw data s3 , save analytical results s3. every time update database, emr launched , analytical files downloaded s3 local drive, , imported sql server tables. current data flow is:

s3 -> hdfs/hive-> s3 -> local drive -> db tables

so going move db server aws , make process automated , faster. want new data flow be:

s3 -> hdfs/hive -> rds.

the hive script complex , have use it. hive cannot use rds storage location external tables. looked @ data pipeline needs use s3 intermediate storage , needs lots of setup. read creating custom map/reduce job connects rds via jdbc , import. willing study custom mapper/reducer if solution.

my question solution easy config (we need update different tables according dates), having smaller learning curve(c#/sql shop , new emr, time-constraint) , efficient(no unnecessary data moving) , able scale achieve hdfs/hive -> rds?

Search This Blog

harsh

hadoop - move data from HDFS to RDS directly -

Comments

Post a Comment

Popular posts from this blog

Java 3D LWJGL collision -

spring - SubProtocolWebSocketHandler - No handlers -

methods - python can't use function in submodule -