INTERNAL
Market Basket Analysis (Node.js Big Data Experiment)

Data MiningApriori AlgorithmMarket Basket AnalysisBackendBig DataAlgorithm Optimization
Market Basket Analysis (Node.js Big Data Experiment)
1. Problem
Following an academic study I conducted on Market Basket Analysis, I wanted to test the theoretical limits of Node.js in handling CPU-bound, heavy data processing. I had a massive dataset (>800,000 transactions) from the Sasuai App project sitting idle. The engineering challenge was: Can a single-threaded JavaScript runtime efficiently execute the Apriori algorithm on a Big Data scale without crashing?
2. Solution
Engineered a headless, experimental Data Mining engine entirely in Node.js. It implements the Apriori algorithm to autonomously scan the 800k+ transactional rows, calculate Support and Confidence metrics, and extract hidden consumer purchasing patterns.
3. Architecture
- Runtime Environment: Node.js (Focus on V8 Engine memory management)
- Dataset: 809,571 transactions (66,503 unique SKUs) from PostgreSQL export
- Algorithm: Apriori (Market Basket Analysis)
4. Key Engineering Decisions
- Node.js for Heavy Compute: Purposefully chose Node.js over Python/Pandas as an architectural stress test. It required strictly optimized asynchronous iteration to prevent blocking the event loop during massive array permutations.
- Aggressive Algorithmic Pruning: To prevent the V8 engine from hitting Out-Of-Memory (OOM) limits during the combinatorial explosion of candidate itemsets, I implemented aggressive pruning using strict Minimum Support Thresholds early in the pipeline.
5. Challenges
- The sheer memory bottleneck. Scanning 66,503 unique items to find pairs and triplets naturally consumes exponential memory, threatening constant heap crashes.
- Optimizing data structures (using Maps and Sets instead of standard Arrays) to achieve $O(1)$ lookup times during the heavy
Countingphase.
6. Result
- Successfully processed the entire 800k+ transaction dataset in Node.js without memory crashes, proving the runtime's capability for specific data mining tasks when optimized correctly.
- Bonus Insight: The experiment successfully generated accurate, statistically proven cross-selling recommendations and high-demand product bundles (e.g., specific item pairings with NASI PUTIH) ready for real-world marketing application.
7. Future Improvements
- Port the exact same algorithm to a multi-threaded, memory-safe language like Rust or Go to conduct a direct performance and memory-usage benchmark against the Node.js implementation.