Skip to main content
Guide · Reference

Sentiment analysis: definitions, methods, and tools.

Sentiment analysis is a discipline with 20+ years of academic and applied history, and a lot of vendor confusion. This reference guide covers the definitions that matter in practice, the methods that produce reliable classifications, evaluation protocols, domain adaptation, and the trade-offs you should understand before picking a tool.

Reading time · 10 min Format · Reference Updated · April 2026

The short answer

Sentiment analysis is the computational task of identifying polarity (positive, negative, neutral) and sometimes intensity in text. It operates at document, sentence, or aspect granularity. Methods range from keyword matching through classical machine learning and BERT-family deep learning to LLM-based classification. The right method depends on domain, latency budget, and whether aspect-level classification is needed. Production systems typically combine methods in a hybrid pipeline.

Definition

Sentiment analysis (sometimes called opinion mining) is the subfield of natural language processing concerned with identifying and extracting subjective information from text — primarily polarity (positive, negative, neutral) and sometimes intensity, emotion category, or subjectivity.

As a discipline, it emerged in the late 1990s from the information-retrieval community and was formalized in the early-to-mid 2000s by academic work from Pang, Lee, and others. The initial test beds were movie and product reviews; most commercial application has stayed close to that territory.

Sentiment analysis is distinct from thematic analysis (what themes appear) and topic modeling (unsupervised theme discovery), though modern feedback platforms run all three together. It is also distinct from emotion detection, which classifies more granular emotional categories (anger, disappointment, delight, relief) beyond simple polarity.

Granularity levels

Three common levels.

Document-level. One score per document. Fast, coarse, loses information when a document expresses mixed sentiment. Useful for aggregate review-score tracking.

Sentence-level. One score per sentence. Captures mixed-sentiment documents better. The default granularity for general-purpose social-listening tools.

Aspect-level. One score per aspect (feature) mentioned in the document. "The camera is great but the battery drains too fast" → (camera, positive) + (battery, negative). The granularity that matters for product feedback because aspects map to product features and route to different teams.

Aspect-level sentiment is what matters for product feedback. Aspects map to features; features map to teams; teams can act. Indellia — Granularity

Methods

Five dominant methods. Summary table:

  • Keyword-based. Positive/negative word lists, arithmetic scoring. Very fast, very brittle, poor on negation and sarcasm.
  • Classical ML. Logistic regression, SVM, gradient boosting trained on labeled examples. Reliable baseline, predictable latency (milliseconds), interpretable.
  • Deep learning (BERT-family). Fine-tuned transformer models. Meaningfully better than classical ML on held-out test sets for review data. Latency 100–300ms on GPU.
  • LLM-based. Zero-shot or few-shot classification with GPT-4-family, Claude-family, or Gemini-family models. Flexible, no training required, handles nuance well with good prompting. Higher cost and variability.
  • Hybrid. Classical or BERT for high-volume polarity pass, LLM for hard cases flagged by the first pass. The production-default architecture as of Q1 2026.

See the sentiment analysis for product reviews guide for implementation detail on each method.

Evaluation

The number that usually gets cited — "our model is 92% accurate" — is meaningless without three specifics. What test set? What metric? What baseline?

Test set. Accuracy on IMDB movie reviews has no bearing on accuracy on camera reviews. Production evaluation must use in-category data. Vendor claims that don't specify the test set should be treated as marketing.

Metric. Accuracy (correct / total) is a poor metric when classes are imbalanced. F1 score, precision/recall per class, and confusion matrices are more informative for production systems where the cost of false positives differs from the cost of false negatives.

Baseline. "92% accurate" relative to what? Random guessing on a 3-class problem is 33% accurate; a majority-class classifier may hit 60%+ if the class distribution is skewed. The lift over baseline is what actually matters.

Production teams should run a monthly human audit — a random 200-record sample labeled by a domain expert, compared against the classifier. Drift beyond 5 percentage points usually triggers a retrain or re-evaluation.

Domain adaptation

A general-purpose sentiment model trained on news or movie reviews will underperform on product-review data. The gap — often 15–30 percentage points in accuracy — is domain adaptation. Three ways to close it.

Full fine-tuning. Label 5,000–50,000 in-domain examples and fine-tune the model. Best accuracy, highest cost. Weeks to months of effort for a single category.

Few-shot with in-domain examples. For LLM-based methods, include 10–30 category-specific examples in the prompt. Lower accuracy than fine-tuning but faster to deploy and flexible across categories.

Domain-aware platforms. Use a VoC platform that has pre-trained on category-specific data. Indellia's sentiment models, built on NEC Labs foundations, are trained on product-feedback corpora rather than generic sentiment datasets.

Multilingual considerations

Brands with presence in non-English markets need multilingual sentiment classification. Three practical options.

Language detection plus per-language models. Detect the language on ingestion, route to the appropriate model. Highest accuracy per language, most moving parts.

Multilingual models. XLM-RoBERTa, mBERT, and multilingual LLMs can classify sentiment across 50+ languages. Accuracy is lower than per-language fine-tuned models but significantly higher than translation-then-classify pipelines.

Translation-then-classify. Translate to English first, then classify. Meaningful accuracy loss from translation artifacts; not recommended as a primary approach.

Choosing a tool: trade-offs

Five questions to answer before committing.

  • Granularity. Document-level is cheap but loses aspect-level signal. For product feedback, insist on aspect-level.
  • Domain fit. Has the tool been evaluated on your category's review data? Or only on public benchmarks?
  • Latency. Can the tool handle your ingestion volume at your required freshness? A system that takes 2 days to classify new reviews is not a production system.
  • Explainability. When a review is classified as negative, can the system show you which text drove the classification? This matters for audit and for user trust.
  • Integration. Does the tool output work with your downstream systems — theme detection, anomaly detection, SKU-level rollups, exec reporting?

The Indellia sentiment analysis software landing page covers how Indellia handles each.

Try aspect-based sentiment on your reviews. The free AI Sentiment Analysis Tool runs polarity and aspect-level sentiment on any review set.

FAQ

Frequently asked questions

What is sentiment analysis?

Sentiment analysis is the subfield of natural language processing concerned with identifying subjective information in text — primarily polarity (positive, negative, neutral) and sometimes intensity, emotion category, or subjectivity. It operates at document, sentence, or aspect granularity, using methods ranging from keyword matching through classical machine learning and BERT-family deep learning to LLM-based classification.

How is sentiment analysis different from thematic analysis?

Sentiment analysis classifies polarity — how positive or negative the text is. Thematic analysis classifies topic — what the text is about. Both run together in a modern feedback platform: theme tells you the complaint is about battery; sentiment tells you the complaint is negative and its intensity. Neither substitutes for the other.

What's the difference between document-level and aspect-level sentiment?

Document-level sentiment assigns one polarity score to the whole document. Aspect-level sentiment assigns separate scores for each aspect (feature) mentioned. A review like "the camera is great but the battery drains fast" has neutral document-level sentiment but (camera, positive) + (battery, negative) aspect-level sentiment. Aspect-level is more useful for product feedback because aspects map to features that route to different teams.

How do you evaluate a sentiment analysis tool?

Three specifics are required: what test set (in-category, not public benchmarks), what metric (F1 per class, not blunt accuracy), and what baseline (lift over majority-class baseline, not over random). Run a monthly human audit on a random 200-record sample from your own data. Drift beyond 5 percentage points should trigger a retrain or re-evaluation.

Do I need separate models for different languages?

Options. Per-language fine-tuned models give the highest accuracy but add operational complexity. Multilingual models (XLM-RoBERTa, mBERT, multilingual LLMs) give reasonable accuracy across 50+ languages in one system. Translation-then-classify has meaningful accuracy loss and isn't recommended. For brands with presence in 3+ non-English markets, multilingual models are usually the right fit.

Is LLM-based sentiment analysis better than BERT-based?

Better on some dimensions, worse on others. LLMs handle nuance and aspect-level judgments well, especially with good prompting. BERT-family models are faster and more predictable at high volume. The production default as of Q1 2026 is hybrid — BERT or classical ML for the high-volume pass, LLM for the edge cases where the first pass has low confidence.

Ask Indellia

Have a specific question?

Indellia's AI agents answer with citations from real customer feedback across Amazon, Walmart, Best Buy, and 20+ retail channels.

Get started

Sentiment analysis that routes to action.

Indellia runs aspect-based sentiment natively. Every ingested review gets polarity per aspect, mapped to your taxonomy, and routed by team ownership.