How does the Okera Platform handle connections from clients to its services? In the context of a load-balanced cluster we noticed a skew where sometimes certain services (such as one Planner out of multiple) is handling many more connections than others. Why is that happening?
The ODAS client libraries supplied by Okera, combined with the server side settings of each service, provide a random distribution across all load-balanced endpoints. For example, the Planner exposes multiple services that are used by clients to get access to query preparation details, such as dataset schemas. When you run multiple Planner instances within an ODAS cluster the settings on both sides take care of spreading the load as evenly as possible.
Some factors though are adverse to this strategy, including sticky sessions. The latter are needed for technical reason, as they decrease new connection overhead for longer running clients. When a workload from a client load is skewed, then sticky sessions may cause skew in the connections as well.
In continuing the example, let's assume you are using many EMR clusters to run queries against a shared ODAS cluster. Let's also assume that one of the EMR instance (called High-EMR here) is running, in comparision, as many queries as all other EMR instances combined. In total, the load on the ODAS services should still be roughly even.
Now assume you stop all other EMR instances but the High-EMR one at the end of the business day. The High-EMR will continue with 50% of the total load, but due to session stickiness will only talk to one Planner instance - unless there is a reason to switch, like a service outage on that particular node. This will result in that over time, the load on one Planner would seem to be much higher than on the other, load-balanced instances. This is normal and no reason for concern.