Howdy all. I’m working on building a services infrastructure to rework a lot of in-house apps at my work, and I need some help.
I should note first, I’m out of my element here. I’ve worked on large software libraries for almost 20 years now, but I’ve never worked with building a Service Bus like this, but I have spent several months doing some research. I’m just trying to make my software at work better for my dev team and my users, so apologies upfront if I seem to have weird ideas.
I’ve been pulling my hair out over this for months, and I really just need to talk to someone that understands application service infrastructure.
Anyway:
I work for a hardware+software engineering enterprise with a lot of in-house apps and tech. We have a lot of winforms/wpf/console apps that run on end-user machines and directly connect to our databases to get work done. Databases are used for storing and fetching engineering data, test results, etc. I own everything – the client apps, the databases, and should I ask for it, a small server environment to run services.
We have a lot of complexity – lots of functions, queries, reports – lots of data, etc. But we don’t have a huge amount of computational load – realistically I’m only looking at query rates of a few hundred queries a minute at the worst. So our focus isn’t on scalable performance, but a software and system architecture that leaves us free to add features easily.
Today:
Every app has to have a database communication layer in it, which makes maintenance on the software and the database very difficult.
Adding features means releasing new desktop apps and waiting for users to update installs.
We don’t have Windows Integrated Authentication at the database, so the apps log into the database with a separate username/password.
We’re in an enterprise though, so Windows Integrated Authentication is available to Apps and Services: I even wrote a library some number of years ago in prep of today’s project.
We have a lot of strict information separation requirements; as an employee, what information you’re allowed to access depends on what project you’re working on. Part of this means that we have many separate databases (all with the same schema) to separate out information per project. As part of implementing this services architecture, we may change the database to make it inherently “multi-tenant”.
Some of our tools have to integrate all of that information from multiple databases in a single application session, for example, to generate reports. But only for the databases you have permission to access, which is different per user. This means tools are logging into databases simultaneously and cross-correlating data between them.
This sucks. It sucks from a software maintenance perspective, and it sucks from a security perspective, and it sucks from an architecture perspective. It’s what I inherited back when our department was a looot smaller and simpler, and we just grew organically over the years. Time to fix things and make things better for users and devs alike.
So, of course, the standard attack plan is to take functionality and database access out of the myriad client apps and move it all into services, and then use an application service bus like MassTransit or ReBus (on top of RabbitMQ) to allow the client apps to talk to the services (and allow the services to talk between themselves; though there’s very little of that I think).
At the bottom, I want a messaging system that has a set of particular properties because it aligns very well with our business case; however, those particular properties seem to align poorly with MT/ReBus/etc, and I can’t tell if I’m fighting against the grain, or if I just have to bite the bullet and do things the hard way because of my requirements.
So what I started with is basically simple:
RabbitMQ forms the messaging nexus between all clients and services.
All code run by administrators (me) is trusted. All other code is untrusted. Which means:
Client apps are untrusted, and thus, clients can’t be allowed to connect to RabbitMQ directly. Services are trusted, and thus, indeed can be allowed to connect to RabbitMQ directly.
Services have to be able to connect directly to Rabbit because they have to be able to declare exchanges, named queues, etc. Clients don’t; they best they need is an anonymous queue created on their behalf that is used as the ReplyTo address in all of their outgoing messages. A gateway can handle that one tiny bit of privilege for them, thus preventing clients from being able to modify the RabbitMQ structure/queues/routing behavior.
So my solution has a pipeline that looks something like this, where certain software is trusted and certain software is untrusted:
And code communicates in the following order:
Where ‘AppBus’ is some yet-to-be-determined technology: MassTransit? Rebus? Something custom?
Some ClientApp wants to call ServiceX to do something, so it invokes some method in the ServiceX.ClientLib library.
ServiceX.ClientLib needs to inject a message into the app service bus to get the message to Service X, so it calls AppBus.
AppBus sends a message over a socket to the gateway.
The AppBus library running in the gateway receives the message over the socket, passes it up to the gateway routing logic.
The Gateway receives the message and figures out how to forward it to the service through rabbit.
The Gateway calls the RabbitMQ client code to inject the message into the rabbit custer.
Rabbit receives the message and forwards it the appropriate rabbit client.
The Rabbit ClientAPI running in the process for ServiceX receives the message.
The AppBus library runnning in the process for ServiceX receives the message.
ServiceX finally receives the message, processes it, and life goes on.
One consequence of the above design is that steps 4-7 have no knowledge of the type information for the message, only its routing slip. Additionally, AppBus must be able to support Socket and RabbitMQ connections between itself.
One goal I have is keeping the Gateway thin; the software running the gateway should have no knowledge of what messages exist. Clients send messages to the Gateway bearing a routing slip. Services are required to validate, authenticate, and authorize every single message they receive. There is no authentication/authorization performed at the gateway, because each service could make decisions differently.
Why do I want to keep the Gateway thin? Well, one common pattern I see a lot, which I don’t like, is the following:
Clients make an http connection to a web API gateway. Gateway has code in it for each and every message that can go between a client and a service. The gateway then forwards the message (often using something like MassTransit) to the service. Somehow, replies wind their way back to the client.
The part I don’t like is that the gateway contains code in it for each specific message it understands; every time a new message is implemented between clients and services, the gateway has to be updated and redeployed, which seems silly. Said in dependency chain terms, I don’t want the gateway implementation to have to link against the library that defines my messages. Only the client library and the service implementation should do so.
Also, this architecture seems to indicate that it’s not possible for clients to simply listen for event-type messages; everything is request-reply. TBH, I’d rather everything just be Messages over Sockets, having nothing to do with web, if possible.
What I basically want is for clients to be almost directly attached to RabbitMQ (as if using MT) but with the security isolation that I need.
I’ve started to put together an experiment to demo this, and I was fairly successful. What I’m unsure about is the wisdom of continuing forward with this experiment, instead of trying to use more off-the-shelf parts like MT or Rebus. I’ve talked with Chris Patterson (author of MT) for a bit over discord and it seems MT doesn’t support this gatewaying scenario, at least not the way I’ve envisioned it. He is working on an automatic HTTP Gateway to be released in the next year or so I guess, but I’m still not sure it really fits my concept nor if it’ll be available before I need to start this project.
Here’s some answers for questions I anticipate getting:
1 – If you’re the author of the client apps, why are they untrusted?
Any software running on hardware you don’t own is software that can’t be trusted. In this environment, trust, information privilege, etc matters. I work with data controlled under ITAR, to be specific.
If I allow client apps to directly connect to Rabbit, then they could use the Rabbit API to modify my Rabbit cluster and intercept messages they have no permission to view. For example, a low-priv user could write a custom app that connects to Rabbit, sets themselves up as a subscriber on some queue, and start receiving information for high-priv services. Really, really bad.
2 – Why not partition rabbit, the gateway etc so that you have separate instances for each ‘information container’ you have in your system?
Because that’s the hell we have today that we’re trying to get away from. Today, users have to configure database login creds for each and every database that they have permission to log in to, and it’s a giant pain in the neck that affects users and is measurably a problem for my department. I really want a system where clients have to configure one piece of information – the hostname of the gateway to connect to. From there, the system mediates everything else automatically for the user.
Post this in the dotnet subreddit too.
I haven’t read all your details but have you had a look at NServiceBus?
I’ve looked at it for design inspiration, but I won’t be able to use it (costs far too much). That’s why I was planning to roll my own or use open source.
But based on what I understand about nsb is that it has the design that is problematic for my needs. Really what I’m trying to do is have NSB or MT implement a gateway to itself, before they attach to Rabbitmq so that I can isolate client-type producer/consumers from rabbitmq.
Correct me if I’m wrong, but I saw no requirement for asynchronous operation between the clients and the services. In fact, it seems the opposite. Clients appear to be expecting, and waiting for, a response from the services (the reply-to queues). IF there’s no justifiable need for queued operation, I would get rid of the message bus altogether — it’s an unnecessary ball of complexity. That’s just my observation from 30,000 feet. If there’s in fact a justified reason for using a pub/sub model, then please disregard.
You have an API gateway in your design. That’s great. This gateway will proxy/aggregate calls to downstream services, and respond to the clients with the result(s) returned from the services. This isolates your untrusted clients from the services, and provides a place to handle cross-cutting tasks. And this works regardless of whether you use pub/sub or not. Well enough.
Your authentication & authorization requirements (untrusted clients, per-user dataset-permissions), are a textbook use case for OAuth/OIDC. Using an OAuth flow…
The Identity Provider (IdP) authenticates the user;
The IdP (eventually) issues an access token (for the authenticated user) that contains permission claims for the datasets the user can access;
The access token containing the claims is presented to the gateway (an OAuth Relying Party, or RP);
The gateway passes the user’s permissions on to the downstream services (in headers or in the request payload);
The services use the permissions to determine what data should be included for the user.
You could use an in-house IdP developed using IdentityServer4, or use Azure AD, Auth0, or any number of other identity providers.