How does scaling in Okera work?
Does Okera data pulls scale with the size of the EMR cluster connecting to it? Would worker nodes in my EMR parallelize connections to the Okera workers?
The number of workers is the number of concurrent map tasks. The EMR cluster controls the number of mappers typically as a function of number of machines and cores per machine. Each concurrent map task (which is dictated by EMR and is a function of cores in the cluster) will have one read in-flight at a time to the ODAS cluster.
A worker will only reject a task (SERVER_BUSY) when all its connections are saturated. Each worker supports 255 connections, so as workers are added, the number of connections increases proportionately.