The short answer
Sentiment analysis for product reviews is the computational classification of review text by polarity (positive, neutral, negative) and intensity. Five methods dominate in 2026: keyword-based (fast, brittle), classical ML (good baseline), deep learning with BERT-family models (reliable workhorse), LLM-based (flexible, variable), and hybrid systems that combine them. For consumer brands, aspect-based sentiment — polarity per product feature — matters more than document-level polarity alone.
What sentiment analysis is, as a discipline
Sentiment analysis is the computational task of assigning polarity and intensity scores to text. Document-level sentiment scores the whole review. Sentence-level sentiment scores individual sentences. Aspect-level sentiment scores polarity separately for each product feature mentioned in the review.
For consumer brands, aspect-level is the one that matters. A review reads: "The camera body is great but the battery drains way too fast." Document-level scoring flattens this to neutral. Aspect-level scoring captures what the reviewer actually said — positive on design, negative on battery life. Those two signals route to different teams.
Why product reviews are harder than general sentiment
Four factors. Product reviews use domain-specific vocabulary ("loosey-goosey," "OOB experience," "out of the box") that generic sentiment models weren't trained for. They contain negation and sarcasm at higher rates than most text ("works great if you don't mind the battery dying"). They express mixed sentiment ("love the design, hate the app") in the same sentence. And they often include specific product features that sentiment models treat as noise unless trained otherwise.
A sentiment model that scores 92% accurate on IMDB movie reviews can score 68% accurate on camera reviews without retraining. The gap is domain adaptation — something any vendor pitch that claims "92% accuracy" without naming the test set is hiding.
A sentiment model that's 92% accurate on movie reviews can drop to 68% on camera reviews. The gap is domain adaptation — and any "92% accuracy" pitch that doesn't name the test set is hiding it. Indellia — Methods
Method 1 — Keyword-based
The simplest approach. Maintain a list of positive words ("great," "love," "excellent") and negative words ("broken," "disappointed," "terrible"). Score each review as the difference between counts. Fast, interpretable, terrible at negation and sarcasm.
Where it works: product-category dashboards that only need rough aggregate signal. Where it fails: aspect-level classification, any case where nuance matters.
Method 2 — Classical ML (logistic regression, SVM, gradient boosting)
Train a classifier on labeled examples. Features are usually bag-of-words, n-grams, or TF-IDF. Better than keyword approaches because the model learns context (which words matter in combination, which are spurious). Requires a training corpus — for product reviews, usually 5,000–50,000 labeled examples to hit credible accuracy in a specific category.
Still the default for production systems where latency matters (classical ML inference is milliseconds, versus hundreds of milliseconds for neural approaches).
Method 3 — Deep learning (BERT-era)
Fine-tuned BERT-family models (DistilBERT, RoBERTa, DeBERTa) on product-review corpora. Accuracy meaningfully beats classical ML on held-out test sets for review data. Handles negation, context, and multi-sentence reasoning reasonably well. Latency is in the 100–300ms range per review on GPU inference, more on CPU.
For the last 5 years, BERT-fine-tuned has been the default "professional" choice for production sentiment systems. Still the workhorse in 2026 for systems that need predictable accuracy on known domains.
Method 4 — LLM-based (zero-shot and few-shot)
Send the review text plus a prompt ("classify this review as positive, negative, or neutral") to a large language model (GPT-4-family, Claude-family, Gemini-family) and parse the output. Zero-shot means no training; few-shot means including 3–10 examples in the prompt to shape the classification behavior.
Advantages: no training data needed, flexible for new categories, handles mixed sentiment well when prompted properly. Disadvantages: cost per record (tens of milliseconds and fractions of a cent each, but adds up at hundreds of thousands of reviews), variability (the same review can get different scores on different runs unless temperature is fixed), and the ever-present hallucination risk in more complex queries than simple polarity.
As of Q1 2026 Verified · Q1 2026, LLM-based classification is the default for teams that prioritize flexibility over predictability, and for cases where per-aspect judgment is more valuable than raw throughput.
Method 5 — Hybrid
The architecture most production VoC systems actually run. Classical ML or BERT at ingestion for the high-volume polarity pass, LLMs for harder judgments (aspect-level sentiment on long reviews, sarcasm detection, edge cases flagged by the first-pass classifier). This is the architecture Indellia's Theme and Sentiment agents use internally — deterministic classical and BERT passes with LLM augmentation only where the first pass has low confidence.
The case for hybrid: most reviews are easy for classical methods (short, explicit sentiment, obvious polarity), so paying LLM cost on every one is wasteful. Save the LLM budget for the reviews that actually need it.
Aspect-based sentiment analysis
The thing that actually matters for consumer brands. Aspect-Based Sentiment Analysis (ABSA) classifies polarity per aspect within a single review. A review like "The camera body is great but the battery drains way too fast" gets decomposed into (design, positive) + (battery life, negative).
For product feedback, aspects usually map to product features — battery, display, build quality, packaging, documentation, app, accessories, price-to-value. For services, aspects map to service touchpoints — checkout, shipping, returns, support, warranty.
ABSA requires either an explicit aspect taxonomy (useful for category work — all cameras share "battery," "lens," "sensor") or dynamic aspect extraction (useful when the review surfaces an aspect the taxonomy didn't know about). Modern systems do both: taxonomy-seeded with dynamic extraction for novel aspects.
See the aspect-based sentiment analysis glossary entry for the technical definition.
Try sentiment analysis on your reviews. The free AI Sentiment Analysis Tool runs polarity and aspect-level sentiment on any review set.
Common pitfalls
- Scoring without a test set. "Our sentiment model is 87% accurate" is meaningless without the test set. Always ask: accurate on what data, measured how, against which ground truth?
- Document-level only. A review with mixed sentiment (one aspect positive, one negative) gets classified as "neutral" by document-level scoring, which hides both the positive and the negative signal. Aspect-level is the fix.
- No language handling. A model trained only on English reviews fails on reviews in Spanish, French, or German. For brands with multi-region presence, language detection and per-language models (or a multilingual model like XLM-RoBERTa) are required.
- No drift monitoring. Sentiment distributions shift as product categories evolve. A model deployed in 2023 and not re-trained or audited by 2026 is almost certainly producing meaningfully worse classifications than it did at launch.
- Confusing sentiment with satisfaction. A 5-star review with negative sentiment on two aspects is a real thing — the customer still recommended the product but had specific complaints. Treat sentiment and rating as complementary signals, not one as a shortcut to the other.