How to analyze Amazon reviews at scale.
Most brands start with a spreadsheet. It works well enough for a few hundred reviews. Somewhere between 500 and 2,000 reviews per SKU, the spreadsheet becomes a bottleneck disguised as a dashboard. This post walks through the stack that scales — what each layer actually does, what to build versus buy, and the mistakes that look like shortcuts but aren't.
The short answer
Analyzing Amazon reviews at scale requires four layers: ingestion (pulling reviews from ASINs via API or partner feed), normalization (tying ASINs to internal Model# and UPC), classification (theme and sentiment tagging), and retrieval (letting analysts query the corpus by SKU, theme, or channel). Spreadsheets handle the first hundreds; dedicated platforms handle the rest. The bottleneck at scale is almost always linking ASINs to your own SKU schema.
The spreadsheet is not wrong — it is a phase.
A product manager opens a CSV of 400 reviews on a flagship ASIN. Sorts by rating. Reads the one-star and two-star reviews. Writes down themes. Tags each review by theme. Builds a pivot table. This works. It is also an upper bound. Once that manager has 4,000 reviews across twelve ASINs, the read-and-tag loop collapses under its own weight — not because the tool is wrong, but because the workload has moved past a person's reading capacity.
The instinct at this point is to tier up to a bigger spreadsheet or Airtable. This is the detour. The real scaling problem is not data volume. It is that the next question after "what are reviews saying about Model 7?" is "what are reviews saying about Model 7 versus Model 6, on Amazon versus Walmart, this quarter versus last, by theme?" Spreadsheets cannot hold that shape.
The four layers of a production stack.
Layer 1 — Ingestion
Pulling Amazon reviews is more awkward than it should be. Amazon's public review APIs are limited, and the Product Advertising API does not give you review text directly. Practical options:
- Seller/Vendor Central data — if you are a registered seller or vendor, the Amazon APIs under Seller Central provide review-level data for your own ASINs. Rate-limited, but authoritative. This is the default for brands selling their own products.
- Partner programs — Bazaarvoice, PowerReviews, and similar review networks syndicate reviews across retailers including Amazon for some categories. If you already have access, turn it on.
- Licensed feeds — several data providers sell structured Amazon review data feeds. Useful for competitive analysis where you do not own the ASINs.
- Scraping — fragile, often against ToS, and rate-limit-hostile at scale. Do not build production on scraping; it breaks within a few weeks and the maintenance cost exceeds any buy-versus-build calculus.
For brands with their own ASINs, option 1 is the starting point. For competitive work, licensed feeds beat scraping by a wide margin.
Layer 2 — Normalization
This is where most stacks fail. Amazon indexes reviews by ASIN. You index your catalog by Model# and UPC. The same physical product can have three or more ASINs — one for each pack size, one for a variation, sometimes one for an Amazon-specific exclusive. Reviews live under the ASIN, but your product decisions live at Model# level. If your stack cannot resolve ASIN → Model# cleanly, you will be reading reviews for "a product" rather than "the product you actually ship."
The normalization work is mechanical but tedious: map each ASIN to your UPC and Model#, handle the variations, flag the parent/child ASIN relationships, and maintain the map as the catalog changes. Do this once, keep it in a table, and every downstream layer benefits.
The Amazon review problem is not text analysis. It is the mapping between retailer identifiers and the catalog you actually manufacture. Indellia — Amazon at scale
Layer 3 — Classification
Once normalized, every review needs a theme and a sentiment label. At low volumes, human tagging works. At scale — tens of thousands of reviews a month — automated classification becomes necessary. Three approaches in practice:
- Keyword-and-rules — fastest to deploy, useful for narrow themes ("battery," "charger," "packaging"), bad at aspect-based nuance.
- Supervised ML — classical classifiers (SVM, logistic regression) trained on your own labeled corpus. Cheap at inference, require labeling investment upfront.
- LLM-assisted classification — use an LLM with a structured prompt to produce theme and aspect labels. Better at nuance, higher per-call cost, can hallucinate if not grounded.
The production pattern that works is hybrid. Route the 80% of predictable reviews through a deterministic classifier, send the 20% ambiguous cases through an LLM pass, log everything so you can evaluate and retrain. This keeps per-review cost predictable and accuracy high. A naive LLM-only pipeline burns budget at volume.
Layer 4 — Retrieval
The last layer is what analysts and product teams actually touch. It must answer SKU-and-theme questions quickly. Two surfaces matter:
- Structured queries — dashboards and filters that let a PM answer "negative reviews on Model 7, in the last 90 days, by theme."
- Natural-language queries — "What do reviewers complain about on Model 7 compared to Model 6?" — answered with citations to actual reviews. This is where grounded LLM retrieval wins; free-text summaries without citations are not defensible in a product review with leadership.
Skip the stack. Indellia is the four layers above, native for Amazon, Walmart, Best Buy, Costco, Lowe's, Target, and more. Every review tied to your Model# and UPC.
Mistakes that look like shortcuts.
Analyzing star ratings instead of review text. Star ratings are a lossy summary. They tell you "how bad" but not "about what." A 1-star review complaining about shipping and a 1-star review reporting a safety defect look identical by rating and different in consequence. The text is where the signal lives.
Ignoring Bazaarvoice-powered retailers. A surprising share of your non-Amazon retail reviews live in Bazaarvoice. Walmart.com, Target, Home Depot, Lowe's, and many others syndicate through it. If you are only analyzing Amazon, you are missing a comparably sized dataset that is already yours for the asking.
Monthly batches instead of streams. A monthly batch analysis misses the two-week spike that tells you the Model 7 battery issue started with a specific production run. Continuous ingestion and anomaly detection beat monthly batches for anything that matters operationally.
When to build versus buy.
Build when you have unusual internal identifier logic that no vendor will support, or when your legal or compliance posture requires the pipeline to be in-house. Buy when none of that applies — which is most consumer brands. The scaling question is engineering-intensive; the ongoing maintenance is worse. For most brands, the right answer is a dedicated platform that already has Amazon ingestion, SKU normalization, classification, and retrieval in one place, plus the same capability for Walmart, Best Buy, Costco, Bazaarvoice-powered retailers, Zendesk, Loop Returns, and the rest. That is the shape of Indellia, which exists because building this well the first time takes about three years and an NEC Labs worth of research.
Related reading.
Have a specific question?
Indellia's AI agents answer with citations from real customer feedback across Amazon, Walmart, Best Buy, and 20+ retail channels.
Analyze your Amazon corpus in one place.
Ingestion, SKU normalization, theme and sentiment, natural-language retrieval. Every ASIN linked to your Model#.