Technology Musings

June 11, 2014

General / Why You Shouldn't Put Validations on Your Model


This post is about model validators in Ruby on Rails, although it probably applies to validators in most similar MVC systems that allow declarative model validation.

One of the biggest disasters, in my opinion, has been the trend for developers to put more and more validation in their models.  Putting validation in the model always *sounds* like a good idea.  "We want our objects to look like X, therefore, we will stick our validations in the place where we define the object."

However, this leads to many, many problems.

There are two problems, and the most general way of stating them is this:

  1. life never works the way you expect it, but validating on the model assumes that it will.  
  2. while it is cleaner to validate on the model for a given configuration, when that validation needs to change, you don't always have enough information to understand the purpose of the validation in the first place (i.e., if a new developer is making the change)

Example of both -

Let's say that we require all users to enter their first name, last name, birthdate, and email.  Okay, so we'll put validators on the model to make sure that this happens.  

Let's say that six months go by, and your company signed an agreement with XYZ corporation to load in new users.  However, XYZ corp didn't require that their users enter in their birthdate.  However, you don't know that, and the bit of data you looked at, it looks like they are all there.  So, you do a load through the database, and everything loads in cleanly, and you think that everything is good.  

In fact, *Everyone* can view their records.

However, the next week, people start getting errors when they try to update their profile.  Why?  Because the people who don't have a birthdate entered are getting validation errors when they try to update something else.

So here is problem #1 - if an existing record doesn't match the validation, nothing can be saved, and this causes headaches because you are getting errors in places where they were unexpected.  This doesn't just happen with dataloads.  It happens every time you add a validator as well.  So, every time you change policy, even if it seems benign, you have the potential of messing up your entire program, even if all of your tests succeed (because your tests will never have had the bad data in them, since it doesn't have the old version of the code!).

So, we go back and recode, and remove the validator from the model.  Great, but let's say this is a new programmer.  Now the programmer has to figure out where to put the validation in.  Presumably, we still want new users to enter in their birthdate, even if we allow non-birthdate users that we load from outside.  However, now, because we aren't validating on the model, we have to move it somewhere else.  But where?  Since the validator just floated out there declaratively, there is no direct link between the validator and *where* it was to be enforced.  Now the new programmer has to go through and recode and retest all of the situations to figure out the proper place(s) for the validator.  This is especially problematic if you have more than one path for entrance on your system.  It's not so bad if it is a system you wrote yourself, but when having to maintain someone else's code, this is a problem.

Therefore, the policy I follow is this - only use model validation when a failure of the validator would lead to a loss of data *integrity*, not just cleanliness.  The phone number field should be validated somewhere else.  You should validate a uniqueness constraint only if uniqueness is actually required for the successful running of the code.  Otherwise, validate in the controller.  Now, you can make this easier by putting the code to validate in the model, but just don't hook it in to the global validation sequence.  So, go ahead and define a method called "is_proper_record?" or something, but don't hook it up to the validation system - call it directly from your controller when you need it.

NOTE - if I don't get around to blogging about it, there are similar problems with State Machine, and a lot of other declarative-enforcement procedures in the model.  Basically, they are elegant ways of coding, but *terrible* when you need to modify them, especially if the person modifying the code wasn't the one who built it.  Untangling the interaction between declarative and manual processes leads to both delays (as the new developer tries to figure out the code) and bugs (when the new developer doesn't figure out the code perfectly the first time).  When your processes are coded *as processes* and not declarations, it is more visible to future programmers, and therefore more maintainable.