So, I'm playing around with various GC settings for GHC RTS, and this thing is quite weird. I'd expect that the maximum performance would be if the nursery size is equal to my CPU cache size (15M), but clearly this is not the case. I'm wondering what's going on here. Perhaps, GHC allocator parameters should be tweaked further?

The code I used for benchmark is this raytracer

