A comparison of validations with ActiveRecord in Ruby & Ecto in Elixir
Basics: Validations in ActiveRecord
Let’s say we’re building a simple online shop. We’ll assume that we already have implemented the possibility for a user to sign in to the application. The next step is to enable the user to enter her address. In order to ship items to the user, we need all address details, so we’ll add a bunch of validations to the user model:
We validate if the user has entered all needed information; and additionally we check if the address exists by using an external service. The validation can now be used within the controller (I also use Strong Parameters here):
Note that <code>@user.update</code> calls <code>@user.valid</code>? under the hood which returns whether the validation succeeded or not. As a side effect, it stores all errors in a property of the user object which allows for displaying detailed error messages to the user when the <code>edit</code> view is rendered.
The shortcomings of the ActiveRecord approach
Defining the validations per model is an explicit design choice of the authors of ActiveRecord, as stated within the official RailsGuides:
"Model-level validations are the best way to ensure that only valid data is saved into your database. They are database agnostic, cannot be bypassed by end users, and are convenient to test and maintain."
While this approach works well in approaches where a user alters the model as a whole, it can be insufficient in more complex cases. Let’s say we want to build a second page within our application that lets the user enter her preferred payment type that can be chosen from a predefined list. We add some more validations to our model:
However, this solution will not work: If a user tries to enter her address details first; the model will be invalid and therefore not be saved because of the missing payment type; and if she wants to enter her payment data first; it will not be saved because of the missing address data, and the user has no chance to enter any data. Another weakness of the approach is that the possibly expensive <code>AddressServiceValidator</code> is always run; even if the user only entered her payment details. The reason for these problems is that ActiveRecord always validates and saves the model as a whole and has no notion of the current action that the user is performing.
Possible solutions within ActiveRecord
There are multiple ways to circumvent the problem mentioned above:
- Splitting the user model into separate models each containing just the data that is present within one form.
- Adding an additional layer by using two form objects (e.g. with the reform gem). You would then move the validation logic to these two objects and they would validate user input before saving the model.
- Using validation contexts within the model to separate the validations for the two use cases.
While I would tend to use form objects in a real world scenario; I do not want to introduce another gem to this example code and will therefore go with validation contexts. An implementation for our problem could look like the following:
and the corresponding controller code will be changed to
While this code solves our problem at hand, there are some new problems introduced with this approach:
- You cannot use the contexts for distinguishing between create and update which is their default behavior. You can read more about this in the arkency blog
- There is a duplication of knowledge between the controller and the model: The controller knows which attributes are allowed for each action for permitting them via Strong Parameters, and the model needs the same knowledge in order to validate the exact same parameters within the contexts.
Introducing Ecto Changesets
Ecto performs validation within changesets. Let’s have a look at them by using them for the address example from above:
The steps in this function work as follows:
- cast/3 takes the current user data, the params provided by the user and a list of allowed parameters and returns a changeset struct. This struct contains all relevant information including the given data and possible validation errors. Note that the list of allowed parameters takes the role of strong parameters in the ActiveRecord example
- validate_required/3 takes a changeset, a list of required keys and an optional options and returns a new changeset containing the content of the old changeset plus all validation errors of that step
So the <code>address_changeset</code> function will return a changeset that has both sanitized the user input and performed the given validations.
This function can then be used within a Phoenix controller:
<code>Repo.update</code> will now check whether the given changeset is valid and then either perform the updates or not.
Just as ActiveRecord, Ecto offers a bunch of predefined validations, you can find them in the documentation of the Changeset Module.
In order to dive a little deeper, let’s have a look on how we would implement a custom validation as mentioned in the ActiveRecord example:
We added a private method called <code>validate_against_address_service</code> that takes a changeset. We do not want to make the possibly expensive call to the external service if it’s not necessary, so we skip the validation if there are already other errors attached due to previous validations or if the user did not alter any data. This shows another strength of the ecto approach: Chaining the validations explicitly gives you control over the order in which they are executed so that you can perform expensive validations only if the earlier ones have passed. This behavior is also used by ecto as it distinguishes between in memory validations and validations that need to hit the database (like uniqueness constraints).
If the given address data is not valid, we can use add_error/3 which will return a new changeset including the errors of the old changeset plus the one provided; otherwise we just return the given changeset unaltered.
Why do I prefer the changeset approach?
As shown above, the ActiveRecord approach is to validate the complete state of a model at once, independent of the action that the user is performing on the data. This approach shows its weaknesses as soon as you do not want display a form to the user that lets her alter all fields of the model, but distinguish between finer grained actions instead. While you can circumvent the problems by using contexts (as shown above) or form objects, I really like the approach that ecto is taking:
Changesets offer a way to validate the single actions is performing. This allows for the same flexibility as form objects do while it has a very concise syntax and allows for composing validations easily. Also, it eliminates the duplication of knowledge between sanitizing user input via strong parameters and validating the input within the model.