The reason why Cassandra is not good for ETL
I’ve been recently working with cassandra. We have been using cassandra in production for 3 months now. The main project I’m dealing with is a simple event api for getting event logs on a central data store. We’ve been using 5 nodes of cassandra and 4 nodes of web servers for now. We have been using cassandra for storing event logs on a central cluster. We have done some calls with Data Stax for some support and questions to answer.
In our project, the main aim is to incrementally dump from cassandra datastore to vertica - which can support aggregate queries with great query execution performance. We used to handle that problem in mysql before migrating to cassandra. This was good but mysql is not good enough for handling 2 billion+ rows in a single table.
Workarounds for migrating to cassandra
To make time series etl work, you need to add 2 - 3 fields to each column
Continue reading →