Implementing a DataFrame in C: Flattened Matrix vs. Traditional Matrix
Abstract
This article investigates the impact of memory layout on array traversal in the context of machine learning. A 1D array was compared against a 2D array, both representing a 1,000,000100 matrix. 53,046 paired measurements were collected over the span of 6 hours, which were used to analyze execution times and apply statistical hypothesis testing. Results show that the 1D array consistently outpaced the 2D array by ~0.002734 seconds per traversal, revealing a highly statistically significant difference (p-value 1e-323). When applied in real-world scenarios relating to deep learning, these small performance gains translate into massive long-term benefits, potentially amounting to hours. This article highlights the implications of memory layouts in computational efficiency relating to data science workflows.