Separation of validator and service with external API calls

Asked 25/5, 2015 at 11:40 Answered 22/1, 2020 at 17:38

I'm currently building a web application and attempting to design it following good MVC and service-oriented architecture.

I have, however, hit a bit of a wall in connecting the presentation layer (i.e. my controllers) and the back-end services while still maintaining good error/validation reporting back to the user.

I read a really good SO post here about how to separate Validation logic from the service layer and for the most part it all made sense. However there was one "flaw", if you can call it that, in this model that niggled at me: How do you avoid duplicating effort when looking up objects that are required by both the validator and the service?

I think it'd be easier to explain with a reasonably simple example:

Let's say I have an application that allows users to share code snippets around. Now, I've decided to add a new feature which allows a user to attach their GitHub account to their account on my site (i.e. to build up a profile). For the purpose of this example I'm going to simply assume that all my users are trustworthy and would only attempt to add their own GitHub accounts, not anyone else's :)

Following the aforementioned SO article I've set up a basic GitHub service for retrieving GitHub user info.

interface IGitHubUserService {
    GitHubUser FindByUserName(string username);
}

The concrete implementation of GitHubUserService makes an expensive call to https://api.github.com/users/{0} in order to pull user information. Again, following the article's model I implemented the following command to link a user account to a GitHub user:

// Command for linking a GitHub account to an internal user account
public class GitHubLinkCommand {
    public int UserId { get; set; }
    public string GitHubUsername { get; set }
};

My validator needs to validate that the username entered by the user is a valid GitHub account. This is very straightforward: call FindByUserName on the GitHubUserService and make sure that the result isn't null:

public sealed class GitHubLinkCommandValidator : Validator<GitHubLinkCommand> {
    private readonly IGitHubUserService _userService;

    public GitHubLinkCommandValidator(IGitHubUserService userService) {
        this._userService = userService;
    }

    protected override IEnumerable<ValidationResult> Validate(GitHubLinkCommand command) {
        try {
            var user = this._userService.FindByUserName(command.GitHubUsername);
            if (user == null)
                yield return new ValidationResult("Username", string.Format("No user with the name '{0}' found on GitHub's servers."));
        }
        catch(Exception e) {
            yield return new ValidationResult("Username", "There was an error contacting GitHub's API.");
        }
    }
}

Okay that's great! The validator is really straightforward and makes sense. Now it's time to make the GitHubLinkCommandHandler:

public class GitHubLinkCommandHandler : ICommandHandler<GitHubLinkCommand>
{
    private readonly IGitHubUserService _userService;

    public GitHubLinkCommandHandler(IGitHubUserService userService)
    {
        this._userService = userService;
    }

    public void Handle(GitHubLinkCommand command)
    {
        // Get the user details from GitHub:
        var user = this._userService.FindByUserName(command.GitHubUsername);

        // implementation of this entity isn't really relevant, just assume it's a persistent entity to be stored in a backing database
        var entity = new GitHubUserEntity
        {
            Name = user.Login,
            AvatarUrl = user.AvatarUrl
            // etc.
        };

        // store the entity:
        this._someRepository.Save(entity);
    }
}

Again, this looks really neat and straightforward. However there's one glaring issue: The duplicate calls to IGitHubUserService::FindByUserName, one from the validator and one from the service. On a bad day such a call can take 1-2 seconds without server-side caching, making duplication far too expensive to use this architectural model.

Has anyone else encountered such an issue when writing validators/services around external APIs and how did you reduce the duplication of effort outside of implementing a cache in your concrete class?

Rohde answered 25/5, 2015 at 11:40 Comment(5)

I'd usually go decoupling IGitHubUserService from my application. There I would put a cache and even if it often acts like a Proxy (with some decoration) it may also become an Adapter (if GitHub interface changes) or even a Bridge (in case you want to make it generic enough to be used also with CodePlex, Google Code...) – Append 25/5, 2015 at 11:52

@AdrianoRepetti That's pretty much the only solution I've come up with, but it's not really much of a "solution". I could see it working for a large project, but it seems like a huge investment for such a trivially implemented feature such as the one in the example. – Rohde 25/5, 2015 at 13:6

Proxy with cache for IGitHubService isn't much more lines of code than GitHubLinkCommandHandler (assuming you need to expose just FindUserByName) but yes, I agree it would be nice if there was something to automate this boilerplate code (as we'd do with AOP). – Append 25/5, 2015 at 13:57

As far as I can see you have 2 options the cache option, is an annoying amount of boiler plate. But perhaps a better option is to not have a separate class for a validator. Or rather the validator only handles trivial issues, null sent, etc. But for anything requiring a real amount of time that validation be in the handle method... It's annoying to have to do, but might be the best option in terms of readable code and efficiency? – Threonine 27/5, 2015 at 12:57

@DanielSlater I agree. My current way of doing this was to build exactly that; an 'input validator' (essentially just verifies that the bare essentials of the user's request are conforming to the model's schematic; nullable fields, numeric fields...etc) and then passing off business-layer validation to my service as it's running the action. The main problem I had with this was coming up with a correct way of exposing errors back to the caller (i.e. the controller), which is why I investigated the SO article I mentioned in the original question. It's getting confusing! – Rohde 27/5, 2015 at 13:33

From my point of view, the problem is that neither the LinkCommandHandler nor the LinkCommandValidator should be retrieving the GitHub user in the first place. If you think about in terms of the Single Responsibility Principle, the Validator has a single job, to validate the existence of a user, and the LinkCommandHanlder has a single job to load the entity into the repository. Neither of them should have the job of pulling the Entity/User from GitHub.

I like to structure my code in the following pattern, each representing an attributable layer. Each layer can speak with the layer above and the layer below but it cant skip over a layer.

Data Layer -- this represents a datasource such as a database or a service usually you don't write code for this, you just consume it.
Access Layer -- this represents the code to interact with the datalayer
Peristence Layer -- this represents the code to get items ready for the call to the Access Layer such as data translations, building entities from the data, or grouping multiple calls to the access layer into a single request to retrieve data or store data. Also, the decision to cache, and the mechanisms for caching and clearing the cache would reside in this layer.
Processor Layer - this represents the code that performs the business logic. It is also where you would make use of validators, other processors, parsers, etc.

And then I keep all the above separate from my presentation layer. The concept being that the core code and functionality shouldn't know if it is being used from a website or a desktop app, or a WCF Service.

So in your example, I would have a GitHubLinkProcessor object a method called LinkUser(string username). Within that class, I would instantiate my GitHubPeristenceLayer class and call its FindUserByName(string username) method. Next, we proceed to instantiate a GitHubUserValidator class to validate the user is not null and all the necessary data is present. One validation is passed, a LinkRepositoryPersistence object is instantiated and passed the GitHubUser for persistence into the AccessLayer.

But I want to strongly note that this is just the way I would do it, and by no means do I want to imply that other methodologies are any less valid.

EDIT:

I was going for a simple answer because I was afraid my response was already too long and boring. =) I am going to split hairs here for a moment, so please bear with me. To me, you are not validating the user by calling Git. You are checking for the existence of a remote resource, which may or may not fail. An analogy may be that you can validate that (800) 555-1212 is a valid format for a US Phone number, but not that the phone number existing and belongs to the correct person. That is a separate process. Like I said, it is splitting hairs, but by doing so it allows for the overall code pattern I describe.

So let's assume your local user object has a property for UserName and Email that cannot be null. You would validate for those and only move on to checking for the resource if that validation was correct.

public class User 
{
    public string UserName { get; set; }
    public string Email { get; set; }

    //git related properties
    public string Login { get; set; }
    public string AvataUrl { get; set; }
}

//A processor class to model the process of linking a local system user
//to a remote GitHub User
public class GitHubLinkProcessor()
{
    public int LinkUser(string userName, string email, string gitLogin) 
    {
            //first create our local user instance
            var myUser = new LocalNamespace.User { UserName = userName, Email = email };

        var validator = new UserValidator(myUser);
        if (!validator.Validate())
            throw new Exception("Invalid or missing user data!");

        var GitPersistence = new GitHubPersistence();

        var myGitUser = GitPersistence.FindByUserName(gitLogin);
        if (myGitUser == null)
            throw new Exception("User doesnt exist in Git!");

        myUser.Login = myGitUser.Login;
        myUser.AvatorUrl = myGitUser.AvatarUrl;

        //assuming your persistence layer is returning the Identity
        //for this user added to the database
        var userPersistence = new UserPersistence();
        return userPersistence.SaveLocalUser(myUser);

        }
}

public class UserValidator
{
    private LocalNamespace.User _user;

    public UserValidator(User user)
    {
        this._user = user;
    }

    public bool Validate()
    {
        if (String.IsNullOrEmpty(this._user.UserName) ||
            String.IsNullOrEmpty(this._user.Email))
        {
            return false;
        }
    }
}

Prolusion answered 30/5, 2015 at 5:13 Comment(3)

The theory is sound, but I've tried such a solution before and it has one serious drawback: You are always retrieving the GitHub user, even before perform the most basic validation (i.e. are all form fields present?). Ignoring a caching layer, this is a lot of redundant work. There is a lot of "cheap" validation that can fail before even needing to make an API call out to GitHub and it would be far more efficient to run that first. – Rohde 2/6, 2015 at 7:20

The only way I can see around it (using your architecture) would be to break the validation process into multiple methods, so that the GitHubLinkProcessor can still be in charge of the flow, while still being able to fail early in the validation process. – Rohde 2/6, 2015 at 7:24

Hi Jason, you are correct in that assumption. That is why I added a bit more to my answer to help clarify it a bit. – Prolusion 3/6, 2015 at 23:56

I'm going to have a shorter answer than Peter Lange, but I think it's simply that your UserCommandValidator should be validating if the UserCommand is valid, not the User.

Think of the command validator as a stop gap measure to ensure that you're never spending the 1-2 seconds hitting up the github api for a blank username, or a username that has invalid characters. Once you are sure the command itself is valid, you can submit it. From there you can let the User itself, or a UserValidator decide if the user is valid.

You are 100% correct that the duplication was wrong, but it was a code smell that you're validating the wrong thing in that section.

Odelsting answered 22/1, 2020 at 17:38 Comment(0)

Recommended topics

Hot tags