There are several points that should have been included in your article:

1 min readJan 31, 2024

1) Instead of increasing the values for cores, memory and so on, you should have focused on calculating the minimal number of executors needed considering the cluster you have.

2) You didn't talk about the beautiful feature included from 3.x versions was AQE. The beauty of it is that if you are not sure how many partitions are needed you just pass the higher number and AQE will adjust in the run time providing flexibility and improved performance..

3) When you are talking about the executor memory, you could have had explained about the inner structure of executor like Off-heap memory, disk spilling and so on. So, what happens when you are out of off-heap memory.

4) Spark-UI. You should see the consumption by executors. It is a key practice. If there is data skewness and identifying such other factors which could be the common cause for OOM.

I see that you have talked about it in your other articles but should have been included in this article since it is more relevant to the points I highlighted.

I believe that merely setting up properties and values without the right approach is of no benefit to oneself or to followers.

Written by Siddhesh K

No responses yet