| Tabular classification remains a core workload in applied machine learning, yet ensemble integration in practice is often reduced to uniform probability averaging. This paper provides a focused comparison of three classical integration strategies over tabular expert pools, Mean Ensemble, Static Weighted Ensemble, and Stacking. We evaluate 14 datasets under two expert regimes, tree-only and hybrid. Across datasets, aggregated accuracy improves from 0.9064 ± 0.0791 (Mean Ensemble) to 0.9133 ± 0.0754 (Static Weighted Ensemble) and 0.9158 ± 0.0761 (Stacking). The average gains versus mean are +0.0069 ± 0.0080 for static weighting and +0.0094 ± 0.0113 for stacking. Wilcoxon signed-rank tests show significant improvements over mean in the aggregated view (static, p = 0.008775; stacking, p = 0.001871), with strong evidence in the hybrid pool. Stacking is top-ranked most often in 9/14 datasets, compared with static weighting in 3/14 datasets and mean averaging being top-ranked or tied in 2/14. Overall, classical stacking is a robust, low-complexity upgrade over uniform averaging, especially when expert diversity is higher. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.