Does Okera cache RDS data? We have encountered inconsistencies between what's in RDS and what the permissions endpoints returns?
For example, for a particular user, the RDS matches with what the user was able to perform. The discrepancy came where the /api/permissions showed that a particular dataset wasn't granted to a particular AD however the user still had SELECT access on that dataset. So in this example, /api/permissions displayed that the user wasn't supposed to have permissions on the dataset, but in RDS it showed that they had permissions.
This observation makes sense. It is possible that the output from the /api/permissions endpoint was lagging what was in RDS (which is the ultimate source of truth).
Also, there's are few different models for how planners update their internal data so depending on setup on a cluster, such as:
- Is this running through CDH (Cloudera Distribution Including Apache Hadoop)?
- Does this cluster have one or multiple planners?
- Is it multi-zones?
In a cross-AZ setup, permission updates will exhibit lag of up to 1 minute. The source of this lag is that a given cluster polls its RDS instance for updates every so often (every minute) and caches that data for later use.
- If a command is entered via a GRANT command, then it is immediately cached in the cluster that it is issued against as part of that update being written out to RDS for long-term persistence.
- If data is changed in RDS directly, then a cluster will only observe that change when its timer goes off and it polls RDS.
- In the cross-AZ scenario, the cluster where the command was *NOT* issued will not observe the change until it actively polls RDS, which will take up to a minute but on average will be ~30 seconds.
- The refresh rate for ODAS' local cache should be 60 seconds, be default.