ibobdb.
Back to Projects
INTERNAL

Market Basket Analysis (Node.js Big Data Experiment)

Market Basket Analysis (Node.js Big Data Experiment)
Data MiningApriori AlgorithmMarket Basket AnalysisBackendBig DataAlgorithm Optimization

Market Basket Analysis (Node.js Big Data Experiment)

1. Problem

Following an academic study I conducted on Market Basket Analysis, I wanted to test the theoretical limits of Node.js in handling CPU-bound, heavy data processing. I had a massive dataset (>800,000 transactions) from the Sasuai App project sitting idle. The engineering challenge was: Can a single-threaded JavaScript runtime efficiently execute the Apriori algorithm on a Big Data scale without crashing?

2. Solution

Engineered a headless, experimental Data Mining engine entirely in Node.js. It implements the Apriori algorithm to autonomously scan the 800k+ transactional rows, calculate Support and Confidence metrics, and extract hidden consumer purchasing patterns.

3. Architecture

  • Runtime Environment: Node.js (Focus on V8 Engine memory management)
  • Dataset: 809,571 transactions (66,503 unique SKUs) from PostgreSQL export
  • Algorithm: Apriori (Market Basket Analysis)

4. Key Engineering Decisions

  • Node.js for Heavy Compute: Purposefully chose Node.js over Python/Pandas as an architectural stress test. It required strictly optimized asynchronous iteration to prevent blocking the event loop during massive array permutations.
  • Aggressive Algorithmic Pruning: To prevent the V8 engine from hitting Out-Of-Memory (OOM) limits during the combinatorial explosion of candidate itemsets, I implemented aggressive pruning using strict Minimum Support Thresholds early in the pipeline.

5. Challenges

  • The sheer memory bottleneck. Scanning 66,503 unique items to find pairs and triplets naturally consumes exponential memory, threatening constant heap crashes.
  • Optimizing data structures (using Maps and Sets instead of standard Arrays) to achieve $O(1)$ lookup times during the heavy Counting phase.

6. Result

  • Successfully processed the entire 800k+ transaction dataset in Node.js without memory crashes, proving the runtime's capability for specific data mining tasks when optimized correctly.
  • Bonus Insight: The experiment successfully generated accurate, statistically proven cross-selling recommendations and high-demand product bundles (e.g., specific item pairings with NASI PUTIH) ready for real-world marketing application.

7. Future Improvements

  • Port the exact same algorithm to a multi-threaded, memory-safe language like Rust or Go to conduct a direct performance and memory-usage benchmark against the Node.js implementation.

Project Gallery