Engineering

How to Build a Real-Time Customer Feedback Pipeline

A technical guide to designing a data pipeline that ingests, normalizes, embeds, and delivers customer feedback insights in real time.

Chris, CEO of Sentivy Jan 24, 2026 11 min read

Customer feedback is unstructured, messy, and scattered across a dozen tools. Support tickets live in Zendesk. Reviews sit on G2 and the App Store. Survey responses are trapped in Typeform or SurveyMonkey. NPS data lives in its own silo. And the most valuable feedback of all, the offhand comments in sales calls and Slack threads, is never captured systematically.

A customer feedback pipeline solves this by treating feedback as a data engineering problem. You build an architecture that continuously ingests data from every source, normalizes it into a consistent schema, enriches it with embeddings and metadata, stores it for fast retrieval, and surfaces patterns through queries and automated analysis. This article walks through each layer of that architecture.

Pipeline Architecture Overview

A production feedback pipeline has five stages: ingestion, normalization, enrichment, storage, and query. Data flows left to right, with each stage transforming the raw input into something progressively more useful.

Ingestion pulls data from source systems via APIs, webhooks, or file uploads. Normalization maps disparate data formats into a unified schema. Enrichment adds computed properties like sentiment scores, topic labels, and vector embeddings. Storage persists the enriched data in both a relational database for structured queries and a vector database for semantic search. The query layer exposes this data through APIs, dashboards, and conversational interfaces.

The critical design decision is whether to process synchronously or asynchronously. For most feedback pipelines, the answer is asynchronous. A webhook from Intercom should not block on embedding generation. Instead, ingestion should be fast and reliable, writing raw data to a queue, while enrichment workers process items at their own pace. This decoupling makes the system more resilient and easier to scale.

Data Ingestion Patterns

Feedback sources fall into three categories based on how you pull data from them. Push-based sources send data to you via webhooks. Zendesk, Intercom, and most modern SaaS tools support this pattern. You register a webhook URL, and the source system POSTs new data as it arrives. This is the ideal pattern because it is near-real-time with minimal overhead.

Poll-based sources require you to periodically query their API for new data. Review platforms like G2, social media APIs, and some legacy systems fall into this category. You run a scheduled job every few minutes or hours that fetches new items since the last poll. The tradeoff is latency: feedback may sit unprocessed for the duration of your polling interval.

File-based sources deliver data as CSV exports, email attachments, or bulk uploads. Survey platforms without API access, internal spreadsheets, and one-time data migrations use this pattern. You need a file parser that can handle varying column names, date formats, and encoding issues.

For each source, build an adapter that handles authentication, pagination, rate limiting, and error recovery. The adapter's job is to produce a stream of raw feedback items and hand them to the normalization layer. Keep adapters thin. They should not contain business logic, only data transport.

Normalization and Schema Design

Every feedback item, regardless of source, needs to conform to a common schema before it enters the enrichment pipeline. A practical schema includes: a unique identifier, the source system name, the original text content, a timestamp, an optional customer identifier, and a metadata object for source-specific fields like ticket priority, review rating, or survey question.

Normalization is where you handle the messy reality of real-world data. Timestamps come in different formats and timezones. Some sources provide structured fields; others give you a blob of text that needs parsing. Customer identifiers may be emails in one system and account IDs in another. You need a resolution layer that maps these disparate identifiers to a canonical customer record.

Pay special attention to deduplication. A customer who emails support and also leaves a review may express the same feedback twice. Exact deduplication (same text, same customer) is straightforward. Near-deduplication (similar text, same customer, close timestamps) requires fuzzy matching. At minimum, implement exact deduplication at the normalization stage and flag potential near-duplicates for the enrichment layer to handle via embedding similarity.

Embedding and Vectorization

Vector embeddings are what transform a feedback pipeline from a text search system into an intelligence platform. An embedding model converts each feedback item's text into a high-dimensional vector that captures its semantic meaning. Two pieces of feedback that describe the same problem in completely different words will have similar vectors, making them retrievable together.

The choice of embedding model matters significantly. General-purpose models trained primarily on academic text underperform on customer feedback, which tends to be informal, fragmented, and full of product-specific jargon. Models like Google's Gemini Embedding, which handle conversational text well and produce 3072-dimensional vectors, are a strong choice for feedback use cases.

Chunking strategy also matters. A short support ticket can be embedded as a single unit. A long survey response or call transcript should be split into semantically coherent chunks, typically 200 to 500 tokens each, with overlap between chunks to preserve context at boundaries. Each chunk gets its own embedding and inherits the metadata from the parent document.

Run embedding generation asynchronously. Embedding API calls add latency and can fail due to rate limits or transient errors. Use a job queue with retry logic and backoff. A worker picks items from the queue, calls the embedding API, writes the resulting vectors to your vector store, and marks the item as processed. If the embedding call fails, the item goes back in the queue with an incremented retry count.

Real-Time Processing vs. Batch Processing

Not everything in your pipeline needs to be real-time. The key is to make the right tradeoff between latency and computational cost for each operation.

Ingestion and normalization should be near-real-time. When a new support ticket is created, you want it in your system within seconds or minutes, not hours. This ensures that anyone querying the system gets results that include the latest feedback. Embedding generation should also be near-real-time for the same reason, though a delay of a few minutes is acceptable.

Trend detection, on the other hand, is better suited to batch processing. Running a clustering algorithm across all feedback from the past week to identify emerging themes is computationally expensive and does not need to happen every time a new item arrives. Schedule this as a daily or weekly job. Similarly, generating summary reports, computing sentiment trends over time, and training custom classification models are all batch operations.

A practical architecture uses an event-driven system for real-time operations and a cron-based scheduler for batch jobs. Tools like Inngest or Temporal handle the orchestration layer well, providing retry logic, rate limiting, and observability for both patterns. The key is that both real-time and batch operations write to the same storage layer, so queries always reflect the latest data plus the most recent analysis.

Storage Considerations

A feedback pipeline needs two storage systems that work together: a relational database for structured data and a vector database for semantic search.

The relational database (Postgres is the standard choice) stores the normalized feedback items with all their metadata: source, timestamp, customer, sentiment score, topic labels, and any computed fields. This is where you run structured queries like "show me all negative feedback from enterprise customers in the last 30 days" or "count feedback items by topic for Q4."

The vector database (Pinecone, Weaviate, or pgvector) stores the embedding vectors alongside a reference to the relational record. This is where you run semantic queries like "find feedback about slow onboarding" or "what are customers saying about our pricing?" The vector database returns the most semantically similar items, which you then hydrate with full metadata from the relational database.

Keep the vector database focused on retrieval and the relational database focused on filtering and aggregation. A common pattern is to use metadata filters in the vector query to narrow the search space (e.g., only search feedback from the last 90 days, or only from a specific customer segment) and then use the relational database to enrich the results with full context.

Plan for growth. A mid-market company might generate 10,000 feedback items per month. At 3072 dimensions per embedding, that is approximately 120 MB of vector data per month, which is trivially small. But if you are processing call transcripts chunked into 200-token segments, the volume multiplies quickly. Choose storage tiers and retention policies accordingly.

The Query Layer

The query layer is where your pipeline delivers value. It needs to support three access patterns: structured queries, semantic queries, and conversational queries.

Structured queries answer questions like "what is the average sentiment score this week compared to last week?" or "which topics have increased in volume by more than 20%?" These are SQL queries against your relational database, typically exposed through a REST API and visualized in a dashboard.

Semantic queries answer questions like "what are customers saying about our mobile app performance?" This involves embedding the query text, searching the vector database for similar items, and returning ranked results. The user does not need to know the exact keywords customers used; the embedding captures meaning rather than matching terms.

Conversational queries combine both patterns through a natural language interface. A product manager types "why are enterprise customers churning?" and the system retrieves relevant feedback via semantic search, filters it by customer segment via structured queries, synthesizes the results using an LLM, and returns a coherent answer with citations to specific feedback items. This is the most powerful access pattern and the one that drives the widest adoption across non-technical teams.

Scaling and Operational Considerations

As your pipeline matures, you will face scaling challenges in three areas: ingestion throughput, embedding compute, and query latency.

Ingestion throughput is rarely the bottleneck. Most feedback sources produce at most a few hundred items per hour, which any modern queue system handles trivially. The exception is when you backfill historical data from a new source, which can produce tens of thousands of items at once. Handle this by setting a separate backfill queue with lower priority than real-time ingestion.

Embedding compute is typically the most expensive operation. Embedding API calls cost money per token and have rate limits. Cache embeddings aggressively; if the same text has been embedded before, reuse the existing vector. Implement exponential backoff for rate limit errors. Consider batching multiple items into a single API call where the embedding provider supports it.

Query latency depends on your vector database size and the complexity of your metadata filters. As your vector index grows, search times may increase. Use namespacing or index partitioning to keep query times under 200 milliseconds. Monitor p95 latency and add capacity before users notice degradation.

Finally, build observability into every stage. Log ingestion rates, processing times, error rates, and queue depths. Set up alerts for anomalies: a sudden drop in ingestion rate might mean a webhook stopped firing, and a spike in embedding errors might mean your API key hit its quota. The pipeline is only as valuable as its reliability.

Want to see how Sentivy can help? Get started for free.

Frequently Asked Questions

What is a customer feedback pipeline?

A customer feedback pipeline is a data architecture that continuously ingests feedback from multiple sources (support tickets, reviews, surveys, social media), normalizes it into a consistent format, enriches it with embeddings and metadata, stores it for retrieval, and surfaces insights through queries or automated alerts. It is essentially an ETL system purpose-built for unstructured customer text data.

Should I use real-time processing or batch processing for feedback data?

Most teams benefit from a hybrid approach. Use real-time processing for ingestion and embedding so new feedback is searchable within minutes. Use batch processing for expensive operations like trend analysis, weekly summaries, and cross-source correlation. Real-time gives you responsiveness; batch gives you depth.

What vector dimension should I use for feedback embeddings?

Modern embedding models like Google's Gemini Embedding produce 3072-dimensional vectors that capture nuanced semantic meaning. Higher dimensions generally improve retrieval quality but increase storage and compute costs. For most feedback use cases, 768 to 3072 dimensions work well. The more important decision is choosing an embedding model trained on conversational and review-style text.

How do I handle feedback in multiple languages?

Use a multilingual embedding model so that feedback in different languages maps to the same vector space. This allows a query in English to retrieve semantically similar feedback written in Spanish, French, or Japanese. Store the original language alongside a machine-translated English version for display. Most modern embedding models handle multilingual input natively.

How much feedback data do I need before a pipeline is useful?

A pipeline becomes useful with as little as a few hundred feedback items, especially if you are using semantic search rather than relying on statistical trends. The value increases with volume because patterns become clearer and trend detection becomes more reliable. Start with whatever data you have, and the pipeline's value compounds as more sources are connected.

Ready to hear what your customers are saying?

Join teams who use Sentivy to turn customer feedback into their biggest competitive advantage.

Get started free