What’s wrong with data modelling

There’s something clearly wrong about data modelling.

First of all, there are exceptional books like Silverston’s « The Data Model Ressource book » and then there are « models » in the real world. Just look at your CRM or even (may be) your ERP – why they work with the very same data and have such different models when there are « reference models » already?

Think about it: why do we need all those « dictionnaries » when we can have a unified model?

A thought experiment

First of all, let’s say we work in a B2B environment (to make it easy!) and all we want is a reference model for a « Party ». Basically, the main « object » of this model is a « company » or more generally an « organization ». It is a clear and identifiable object (using some kind of VAT/TVA/SIREN/INN or other local code). We will not go any deeper with this model (addresses, etc).

Now, CRM. We’ve decided to use the very same « reference » model. We’ve developed our own CRM to do that!
Ok, now I have Janette Doe « from IBM » who contacts me to buy something (or just get a quote). This is a new contact, I have to create it in my CRM, but I want to attach it to the company (right?).

Which one ? Should I ask for VAT ? What if it is not « that same IBM » (yes, there are many of companies with that same name). What should I do with my model?

I’m in this strange state where I have to give up and say « I cannot really identify her company, because the only thing I know is the name of the company, which is not unique« .

It’s actually nearly always the case for this process.

So, I give up:

  • either I create my model around a « contact » (the name of the company is just a field and there is no identifiable entity like « company »)
  • …or I say « I have no way to model the companies, so I have CRM accounts » (whatever it is, because IRL there are no « accounts » – it’s a pure abstraction)

What can we get from this experiment ?

First of all, our rigid ER-modelling has no simple way to cope with the uncertainty and either « we have everything » (but I have no clients ’cause everyone has to give me lots of technical details just so that I can use my mentally rigid data model) or « we have nothing » (i.e. we forget about companies in our B2B environment).

Yes, there are lots of different solutions, like:

  • create a company with a flag « incomplete » so that we can identify it… and complete or remove it later
  • create a column with a name and a nullable FK to the company object (to complete when we have more information)
  • etc

These just look like a solution. In fact, if you analyze them from the point of view of the user, it’s all the same. If the user has no need to put a VAT/DUNS/whatever, he/she will not.

That’s the second observation – our model works well for a given population who has more or less homogeneous expectations and constraints. Otherwise « it works not ».

Too many words

Hey, I’m nearly there.

My overall conclusions:

  • if you acquire the uncertain data, and you want a « certain » one (CRM -> ERP -> FI) you will have more than one model, just add some BPM and it’ll work;
  • you will (probably) never ever have only one Master-Data-whatever;
  • the models generally reflect not the reality, but what we know about the reality at a given step of the process;
  • don’t model an object if you cannot identify this object, just live with it.

Hopefully that helps to understand why you will (probably) not have a full common data dictionnary, full common MDM, common anything but in very specific cases.

Good health to you and to your models.