105 samples per voice set
105 is the outside, insanely optimistic number of samples.
Also please note: you are using 2 mutually exclusive arguments.
1) there are too many samples, it will take forever to get a complete set
2) we will get too many complete sets, and the DL size will be too big.
In OGG format, this would be about 10MB (based on my observations from the music folder - see David01.ogg, almost 10MB for 2:41, though there are more efficient ones in there)
While we will want to keep a high-quality unprocessed version of the vocals samples with the project, we only need to ship reasonable quality versions (with radio effects perhaps).
I don't use OGG a lot, but it is supposed to be roughly comparable to ACC and MP3. If so, 10MB per ~3 minutes is
audiophile quality, significantly better than what you would get with most downloaded music services, which would weigh in around 1MB per minute. However the human speaking voice can be compressed
smaller than music without noticeable loss of quality.
The professional sounding audiobooks i listen to are generally recorded at 64pbps. I wipped up a vocal OGG at that rate and it came in at .4 MP per minute. At this size, the blue-sky scenario of 12 full sets would weigh around 17 MB.
If we ever get to the point of having sound sets in alternate languages, i think they could be downloaded separately, to avoid a game bloated with content most players won't use... but that won't be an issue for some time.