There are a few problems with small files from the storage side and compute side. For storage, small files leads to more metadata, causing things to slow down. This is particularly problematic for HDFS namenode. For most Hadoop compute frameworks, small files can cause the computation to be much slower due to the high per file/per task overhead.
For compute, ODAS will combine small files automatically which often provides significant performance benefits. Okera does not currently manage writing files so does not directly help with the issue on the storage side. For non-HDFS storage managers, this may not be an issue at all. Users will want to do background compaction/ETL to combine small files. Okera does help with this illicitly by decoupling applications reading the data from the storage (i.e. file path details).