Hacker News new | past | comments | ask | show | jobs | submit login

Makes sense I'm used to physical servers with tons of ram at work. If you don't mind, and I assume you're on Amazon entirely then, what size instances/how many are you using for the ETL(luigi) nodes? And do is there any infrastructure besides s3 (mysql dump, logs) => amazon instances (luigi) => redshift being involved?



That is all there is. s3, one amazon instance for luigi which holds the mysql read replica, and amazon redshfit. There wasn't any heavy ETL in Luigi. Luigi mostly just extracted/dumped data. All the heavy lifting was in EMR


Also to add. Logging too much should not be a problem if you are using it. Wish's hadoop log store is over 40tb compressed right now. And its worth every penny.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: