Customer Data Platforms have quickly gained the attention of marketers and others who need a unified,
accessible view of their customer data. But the popularity of the concept has led to confusion as many
vendors with related systems seek to take advantage of buyer interest. The purpose of this paper is to
reduce that confusion by providing buyers with a simple checklist of items that are required for a system
to be considered a CDP. This is called RealCDP.
The term Customer Data Platform was introduced in 2013 to describe what was then a new phenomenon:
packaged software that build a unified customer database in addition to providing applications such as
predictive modeling, campaign management, or advertising audience creation. Previously, marketing
applications had largely connected to custom-built systems such as data warehouses or held only limited
information, such as CRM, marketing automation, and Data Management Platforms (DMPs). In
subsequent years, marketers became increasingly aware that previous solutions were not adequate for
assembling and sharing a comprehensive profile for each customer – often known as a Single Customer
View. By 2016, interest in CDPs as a new option began to rise sharply
RealCDP Checklist
The RealCDP checklist consists of six items that cover key features needed to provide a unified, accessible
customer database that meets buyer expectations for such a system. A complete set of core CDP features
would be much larger, so only meeting the five checklist items does not ensure a CDP will meet your own
business needs. Even the six checklist items themselves can be met in different ways, some of which will
be more or less appropriate for a specific situation. The purpose of RealCDP is to set a minimum standard
for calling a system a CDP: a product that does not meet the RealCDP criteria is definitely not a CDP
according to the CDP Institute definition. Such products may still be suitable for your situation, but you
should consider carefully whether you really need any RealCDP feature they are missing.
Ingest all sources.
The CDP promise to assemble all customer data requires that the CDP be able to
ingest that data as a start. This includes structured data, such as purchase transactions and customer
address details; semi-structured data, such as Web interaction logs; and unstructured data, such as chat
transcripts or social media comments. Ingestion may be via API connectors that push events into the CDP,
via queries from the CDP that pull data from source systems, via streaming connectors that load a constant stream of data, and via file imports.
It may happen in period batch processes or continuously in “real time”; when looking at real time ingestion, it’s important to understand how long it takes before the new data is available for use by applications. Ingestion processes may also include checks related to privacy permissions, data quality and governance. Users will also want to understand the effort needed to add a new source: in particular, whether data must be mapped in advance to a specific schema or can simply be ingested and stored as it arrives, with structures applied later.
Capture full detail
The CDP promises to build a profile containing all customer data. This means it
must retain the full details of all ingested data, rather than summaries or selected attributes. For
regulatory reasons, the system may not be allowed to retain some items and may be required to capture
customer consent or constraints on how it can be used.
The CDP may either retain the data in its original format, which then requires subsequent processing to extract useful attributes, or perform initial processing that adds some structure to simplify later access and analysis. Often the initial processing converts the data into “key-value pairs” that include an identifier (the “key”) and the data being stored (the “value”). Some systems may apply more elaborate structures, similar to a conventional multi-table relational database model.
This requires the most preparation and initial processing but requires the least
subsequent work when data is accessed. In practice, few companies will truly store all details from all
sources, but the CDP should have this capability so that system limits do not prevent users from storing
what they determine they need.
Persist data indefinitely
The CDP needs to store the data it ingests. This is a key difference from
integration platforms and tag managers, which gather data but then pass it to other systems without
retaining a copy. It also differs from many approaches that query the original source system in real time.
Both of these approaches support many useful applications but they cannot create a complete customer
view because the source systems often don’t retain old data for as long as a CDP user might want it. In
other cases, retrieving data from source systems would take too long because significant processing is
required to find the right records or perform complex calculations, or the source system may simply not
allow direct real time access.
Specific uses for stored data include identity management time-series,
trends, aggregates, and change detection. Like initial data capture, data storage must comply with privacy
regulations. Similarly, few companies will store all data forever; rather, they will set limits on data types
and retention period based on business needs and costs. In practice, most companies will rely on a mix
of stored data and real-time queries against source systems.
Unified profiles
The unified customer profile is the fundamental purpose of the CDP. It needs to
associate all available data with an identified individual. The ability to work with identified individuals in
particular is an important contrast to systems like DMPs that are limited to anonymous profiles. It
requires the system to manage personal identifiers (PII), to link identifiers that belong the same individual
(identity management), and to process the raw data in ways that make it usable.
Such processing may include extracting specific information from unstructured or semi-structured data sources; standardizing information from different sources; and, creating derived values such as aggregates, segment assignments, and predictive model scores. Again, profiles need to comply with privacy regulations.
Open access
The CDP needs to make its data available to all external systems. This is an important
goal for many users, who want to avoid being locked into a particular set of tools. It also reduces the cost
of maintaining separate databases for individual systems, ensures that all systems work with the same
data, and makes it easier to orchestrate customer experience across systems. Open access is usually
achieved through a published API that lets any system query the CDP. In practice, many CDPs have
developed prebuilt connectors for specific systems.
Access may also be achieved by exporting CDP data into other formats, such as analytical data sets, flat files, or relational databases. Such exports are usually designed with a segmentation tool built into the CDP. Users often need to specify in advance which CDP data will be available for use. Some preparation may be needed to make data available use. A CDP should offer similar access capabilities to all target systems, rather than favoring access by its own applications.
Real-time response
The CDP needs to respond immediately to certain events. The acceptable
response time varies but is usually under one second. Typical real-time use cases include immediate
reaction to events such as a dropped shopping cart or new customer sign-up, and immediate response to
a profile request from a personalization system or call center. Real-time reaction to events is often
achieved by parsing new data as it enters the CDP system, either in a continuous stream or through receipt of batch files.