Is there a guide anyone could recommend for dealing with / analyzing large time series data (200Gb – 2 TB)? I have a large set of equity(stock) data I want to back-test strategies on. Currently i am using SQL on a SQLite database for this but am finding it inadequate for speed/flexibility.

The raw data is in a csv that is 13 columns by 2.3 billion rows.
I recommend you to watch this part of dotnetconf.
Personally, I’ve found that using a timeseries database e.g. InfluxDB, TimescaleDB, etc. helps with the exact type of problem you face. It makes it easy to perform many different aggregations while still being very perform at, in my experience.
Also check out QuestDB, CrateDB, and ClickHouse.
C# devs
null reference exceptions

source