How does Okera handle JOINs? Can we run any type of JOIN inside the ODAS cluster?
Generally speaking, Okera is not a SQL engine, but uses SQL statements to define what is needed for its recommended schema design, which includes the preparation of the original data schema for consumption. On top of that, Okera uses SQL to define the access control views that are exposed to users. This may require JOINs that (for example, in the case of GDPR use-cases) filter records based on white- or blacklists.
Okera has two supported types of views, the default internal views and external views.
All views created in Okera are considered (by default) "internal" views. If the result of a JOIN is defined as an internal view, Okera will handle the JOIN at query time: The JOIN and all data registered in ODAS will be evaluated internally by the system, with the result being sent to the analytics layer for further use.
Initially, JOINs in external views are evaluated like all other views in ODAS: Like the internal views with JOINs, Okera managed data will still be accessed through ODAS (including access control, audit logging, etc.), but the evaluation of the join will be executed by analytics engine. External data will not have access control, UDFs, and other features that Okera can provide to managed datasets.
Okera will manage views differently depending on if the joined view is defined as internal or external. In both cases, data will reside in their source systems (ex: Okera data will continue to be managed in Okera while external views will continue to be managed by their non-Okera source).
The primary difference between internal and external views is that the underlying query is not evaluated in Okera. External data will not have access control, UDFs, and other features that Okera can provide to managed datasets.