When planning the storage for a system one thing you need to know above all other information is how that data is going to be accessed. I’m talking about 95% reads 5% writes, 1.2Gb/minute average transfer, highly latency sensitive. You can make some assumptions based on the applications that’ll be accessing the storage, but if you really need to know, the only way to find out is measuring.
Once you know how that data is going to be accessed, you can build or provision its storage accordingly. Knowing how likely the dataset is to grow is also something you need to know, but that’s a luxury we often don’t get. And for the love of performance metrics, don’t forget peak loading and behavior under fault conditions.