Vectorization of calculations for code optimization in the Python programming language

Oleksii Zemlianyi

ORCID: https://orcid.org/0009-0001-6157-8725

Oles Honchar Dnipro National University

Oleh Baibuz

ORCID: https://orcid.org/0000-0001-7489-6952

Oles Honchar Dnipro National University

Purpose. The purpose of this study is to explore vectorization as an engineering technique to improve the performance and readability of Python code, particularly in data processing tasks. We aim to demonstrate the benefits of vectorization through practical examples involving the handling of missing data. Design / Method / Approach. To achieve the research goals, we performed a comparative analysis between loop-based and vectorized implementations. Specifically, two versions of a function were developed to identify columns containing missing values within a dataset. These implementations were tested on two real-world datasets. We compared execution time and code readability. Findings. The findings showed that vectorization resulted in substantial performance improvements, reducing execution time by hundreds of times compared to traditional loop-based methods. Additionally, the vectorized code was more compact, leading to greater readability and ease of maintenance. Theoretical Implications. Vectorization provides a higher level of abstraction for performing operations on data structures. This allows developers to focus on algorithmic logic rather than managing iterative control structures, contributing to broader discussions on optimizing computational efficiency in Python. Practical Implications. For data engineers and analysts, vectorization represents a highly effective solution for optimizing Python code. It significantly accelerates data-intensive tasks, such as missing data imputation, data analysis, and machine learning, making it an essential tool for enhancing productivity in data-driven environments. Originality / Value. This study presents a practical approach to optimizing Python code through vectorization. It is valuable for professionals seeking to improve efficiency in their workflows. Research Limitations / Future Research. The limitation of this research lies in its focus on a single problem – missing data imputation. Future research should expand the scope to other computational areas, such as image processing and simulation modeling, or examine the use of vectorization alongside Just-In-Time (JIT) compilation using tools like Numba to further boost Python’s performance.



RELATED PAPERS