hadoop - move data from HDFS to RDS directly -
background: working on web project expose analytical data stored on local mssql database. database updated regularly. emr cluster responsible use custom hive scripts process raw data s3 , save analytical results s3. every time update database, emr launched , analytical files downloaded s3 local drive, , imported sql server tables. current data flow is:
s3 -> hdfs/hive-> s3 -> local drive -> db tables
so going move db server aws , make process automated , faster. want new data flow be:
s3 -> hdfs/hive -> rds.
the hive script complex , have use it. hive cannot use rds storage location external tables. looked @ data pipeline needs use s3 intermediate storage , needs lots of setup. read creating custom map/reduce job connects rds via jdbc , import. willing study custom mapper/reducer if solution.
my question solution easy config (we need update different tables according dates), having smaller learning curve(c#/sql shop , new emr, time-constraint) , efficient(no unnecessary data moving) , able scale achieve hdfs/hive -> rds?
Comments
Post a Comment