Skip to main content

Introducing deduplication and identity resolution

In this guide, we describe how Splio deduplicates and merges data coming from your various data sources.

Firstly, we need to understand the main concepts of this feature:

Deduplication: This process prevents your database from having duplicates when importing new data in Splio, by updating existing users, products, or stores with the data provided in a new import.

Identity resolution: This process is the set of rules defining how two users with the same data for specific fields are merged into one digital identity in Splio. It happens after the technical deduplication.

Understanding technical deduplication

When new data is imported into Splio, our platform checks if this data already exists or not.

Then there are two cases:

  • If the record (user, product, store...) does not exist in the database yet, we add it as a new line.

  • If the record already exists in the database, then the new data updates the existing one.

To know if a record already exists or not in the database, we have to rely on a specific field of the entity: this field is an ID that you need to provide for every record imported. It is called the deduplication key or unique key.

Understanding identity resolution

Once the data is deduplicated on the users, the CDP can take next step, called identity resolution, to find out if two or more users with different unique ID values belong to the same "identity", the same person.

Once an identity has been chosen for each user (merged or not), the users are completed with a new field: the user_ID, making the CDP the single source of truth in your data ecosystem to define unique individuals.

All the pre-computed attributes available in Custom Audience Filter are calculated based on unique users having different user_IDs.

To do this resolution, several options are available, and you need to choose between them during setup:

  • the CDP relies on one field to know if several users belong to the same person.

  • the CDP relies on several fields having the same value on several users to merge them into one identity.

    When one of the fields is empty, the CDP attributes a user_id to users only when the two fields have a value.

  • the CDP merges users together if one among several fields has the same value between the users.


Based on one of these rules, users are merged together, meaning they have the same “digital identity” and have the following characteristics:

  • Only one user for one identity in the user table, identified by a unique user_id

  • All events (purchases, email events...) are now attributed to this new user.

What happens if one of two users, merged using the email field, updates the email address?

In this case, since the email is the deduplication key, the CDP splits the two users again.

Understanding user profile resolution

When several users are merged together, Splio needs to choose which values to keep for every user attribute (such as first name, gender, address...) if the values are different for the different users.

By default, the latest not-empty value is kept. “Latest” refers to the user update date, which must be provided in a dedicated field. It means that for a given attribute, Splio looks for the most recently updated.

  • If the field is not empty, it is kept as the final field.

  • If it is empty, the second record in terms of update is checked, and so on.

But you can choose different rules for some fields, defined during setup:

  • Splio can prioritize a source compared to another, overriding the "latest update" rule.

    If the field is empty on the prioritized source, then the CDP keeps the value of the latest non-empty source for this field.

  • Fields can be grouped together by source for profile resolution. This option is interesting for data consistency when several fields are linked together.

  • Splio can keep the oldest value instead of the latest. This can be relevant for the user creation_date for example.

  • These options can be grouped together.

Understanding consent and channel specificities

In a multi-source context with identity resolution, with a rule of the kind “email OR cellphone”, a resulting merged user_id can end up being linked to several email addresses or cellphone numbers.

Then the email address/phone number to keep is chosen depending on the rules chosen on the previous page.

Once the email address/phone number is chosen, the CDP keeps the last collected or updated value of the consent associated with it for a given user_id, across all sources.

Please note that in every step of the processing, Splio links together the consent and the email address/cellphone numbers.