attempt_1537806816366_0010_1_05_000131_0:java.lang.RuntimeException: java.lang.IllegalArgumentException: Invalid task handle
Hive is introducing bloomfilters on dynamic semi-joins to broadcast better data. Unfortunately it didn't work as intended, so we turned off this property "hive.tez.dynamic.semijoin.reduction" to false.
For more details, please visit: https://issues.apache.org/jira/browse/HIVE-15269
This feature allows hive to send bloomfilters between scans to optimize the amount of data being broadcasted as the result of a join. This is still a WIP as when running through Okera, we could see that a lot of tasks made no progress for extended periods of time. These tasks represented a table scan, and they were expecting the other scan to send a bloomfilter, and timing out on wait. This work in progress is represented here: https://issues.apache.org/jira/browse/HIVE-1721
To workaround this issue, we have disabled this by using the property
hive.tez.dynamic.semijoin.reduction to false
It is worth noting that this particular issue may be seen in other newer EMR jobs, so if query plans show bloom-filters causing socket timeout, please disable the property as stated above.