Secret Ingredients Found in Kaggle Competitions from 2021 to 2023

Jan 23, 2024

article

In May 2023, Kaggle launched the AI Report competition. The aim was to summarize what the ML community has learned about working and experimenting with AI over the past two years. The competition considered seven pivotal categories: text data, image and video data, tabular and time series data, Kaggle competitions, Generative AI, ethics, and others. Overall, 220 teams and 279 competitors submitted their reports, and a panel of seven Kaggle Grandmasters reviewed them to select the best of them in each category.

In this post, I will provide a summary of the top 5 solutions exclusively related to the Kaggle Competition category, which are indicated below. Let’s get started!

Towards Green AI: How to Make Deep Learning Models More Efficient in Production

Want to discover how Kaggle has aligned with Green AI, aiming for more sustainable AI solutions?

In this report, Leonie Monigatti contrasts the state-of-the-art in Green AI literature with the insights the ML community at Kaggle has gathered over the past two years to win the Efficiency Prize in Kaggle competitions. Kaggle introduced the Efficiency Prize, a dual-scoring method evaluating participants based on predictive accuracy and inference runtime, aimed at reducing the carbon footprint of Deep Learning models during inference. Various carbon reduction techniques are explored including pruning, low-rank factorization, quantization, knowledge distillation, and model conversion to ONNX format for creating lightweight models. Finally, the author suggests integrating Kaggle’s Efficiency Prize as a primary metric across all Kaggle competitions.

How to Win a Kaggle Competition

Would you like to learn six tips that can enhance your chances of winning a Kaggle competition?

In this report, Darek Kłeczek conducts a two-stage meta-analysis of Kaggle write-ups from the past decade, utilizing LLMs to extract keywords related to Machine Learning methods. Based on the findings, the author discusses trends in using model ensembling methods, data augmentation techniques, and popular deep learning architectures—from CNNs to Transformers. The author emphasizes the dominant role of the Adam family of optimizers in winning solutions, along with the utilization of CE, BCE, and MSE loss functions. Lastly, the author encourages the audience to unveil the magic in their solutions by implementing labeling and post-processing techniques.

Kaggle AI Report: Medical Imaging Competitions

Are you curious to know the interplay between medical imaging modalities, deep learning, and Kaggle competitions?

In this report, Nghi Huynh provides a solid background suitable for all audiences on medical imaging-related tasks, including object detection, classification, and segmentation that use imaging modalities such as MRI, CT, and X-rays. Moreover, by conducting a meta-analysis of the top 10 write-ups of 11 Kaggle medical imaging competitions over the past five years, the author describes the evolution of deep learning models from traditional and reliable methods such as CNNs to newly emerging ones such as Vision Transformers.

A Journey Through Kaggle Text Data Competitions From 2021 to 2023

Have you ever wondered if the top 3 solutions in text-oriented Kaggle competitions share one or more secret ingredients? What do some Kagglers refer to as ‘the magic’?

In this report, Liliana Badillo and Salomon Marquez have conducted an in-depth analysis of 27 write-ups corresponding to nine text-oriented Kaggle competitions. In a practical and results-driven manner, the authors guide us to comprehend four widely used concepts in text competitions that have ensured an improvement in the Private Leader (PB) score in winning solutions: model architectures, pseudo labeling, adversarial weight perturbation, and mask language modeling. Furthermore, the authors analyzed the main questions that Kagglers have when implementing these strategies in their solutions.

Myths Kaggle

Have you ever wondered how Kaggle competitions have evolved over the past 3 years?

Tim Riggins debunks the myths and stereotypes that have emerged during the evolution of Kaggle competitions. The author employs a highly conversational tone, to discuss the following topics: level stacking vs. model averaging, solo vs. team-based solutions, PB score analysis for top solutions, overcoming data leaks, and the shifting participation dynamics between old-school and debutant Kagglers.

Conclusion

In summary, the Kaggle AI Report competition of 2023 provided valuable insights into the trends and innovations in Machine Learning. This post focused on the top 5 solutions within the Kaggle Competition category, each offering valuable perspectives on domains such as Green AI, medical imaging, best practices handling text data, and the evolution of Kaggle competitions over time, showcasing the collaborative spirit and continuous innovation of the ML community. For further information about the top essays created for this competition, please refer to the Kaggle AI report 2023.