Monday, July 17, 2023

Microservice Architecture and its 10 Most Important Design Patterns


Over view: Explains important terms used  and patterns in micro service based system. Couple of patterns like CQRS or BFF  quite popular. Even event based couple of well suited for few systems. 

Article gives good over view of different patterns. ***

All of them used the age-old and proven technique to tackle the complexity of a large system: divide and conquer. Since the 2010s, those techniques proved insufficient to tackle the complexities of Web-Scale applications or modern large-scale Enterprise applications. As a result, Architects and Engineers developed a new approach to tackle the complexity of Software Systems in modern times: Microservice Architecture. It also uses the same old “Divide and Conquer” technique, albeit in a novel way.

Software Design Patterns are general, reusable solutions to the commonly occurring problem in Software Design. Design Patterns help us share a common vocabulary and use a battle-tested solution instead of reinventing the wheel. In a previous article: Effective Microservices: 10 Best Practices, I have described a set of best practices to develop Effective Microservices. Here, I will describe a set of Design Patterns to help you implement those best practices. If you are new to Microservice Architecture, then no worries, I will introduce you to Microservice Architecture.

By reading this article, you will learn:

  • Microservice Architecture
  • Advantages of Microservice Architecture
  • Disadvantages of Microservice Architecture
  • When to use Microservice Architecture
  • The Most important Microservice Architecture Design Patterns, including their advantages, disadvantages, use cases, Context, Tech Stack example, and useful resources.

Please note that most of the Design Patterns of this listing have several contexts and can be used in non-Microservice Architecture. But I will describe them in the context of Microservice Architecture.

Microservice Architecture

I have covered Microservice Architecture in details in my previous Blog Posts: Microservice Architecture: A brief overview and why you should use it in your next project and Is Modular Monolithic Software Architecture Really Dead?If you are interested, then you can read them to have a deeper look.

What is a Microservice Architecture. There are many definitions of Microservice Architecture. Here is my definition:

Microservice Architecture is about splitting a large, complex systems vertically (per functional or business requirements) into smaller sub-systems which are processes (hence independently deployable) and these sub-systems communicates with each other via lightweight, language-agnostic network calls either synchronous (e.g. REST, gRPC) or asynchronous (via Messaging) way.

Here is the Component View of a Business Web Application with Microservice Architecture:

Microservice Architecture by Md Kamaruzzaman

Important Characteristics of Microservice Architecture:

  • The whole application is split into separate processes where each process can contain multiple internal modules.
  • Contrary to Modular Monoliths or SOA, a Microservice application is split vertically (according to business capability or domains)
  • The Microservice boundary is external. As a result, Microservices communicates with each other via network calls (RPC or message).
  • As Microservices are independent processes, they can be deployed independently.
  • They communicate in a lightweight way and don’t need any smart Communication channel.

Advantages of Microservice Architecture:

  • Better development scaling.
  • Higher development velocity.
  • Supports iterative or incremental modernization.
  • Take advantage of the modern Software Development Ecosystem (Cloud, Containers, DevOps, Serverless).
  • Supports horizontal scaling and granular scaling.
  • It puts low cognitive complexity on the developer’s head thanks to its smaller size.

Disadvantages of Microservice Architecture:

  • A higher number of Moving parts (Services, Databases, Processes, Containers, Frameworks).
  • Complexity moves from Code to the Infrastructure.
  • The proliferation of RPC calls and network traffic.
  • Managing the security of the complete system is challenging.
  • Designing the entire system is harder.
  • Introduce complexities of Distributed Systems.

When to use Microservice Architecture:

  • Web-Scale Application development.
  • Enterprise Application development when multiple teams work on the application.
  • Long-term gain is preferred over short-term gain.
  • The team has Software Architects or Senior Engineers capable of designing Microservice Architecture.

Design Patterns for Microservice Architecture

Database per Microservice

Once a company replaces the large monolithic system with many smaller microservices, the most important decision it faces is regarding the Database. In a monolithic architecture, a large, central database is used. Many architects favor keeping the database as it is, even when they move to microservice architecture. While it gives some short-term benefit, it is an anti-pattern, especially in a large-scale system, as the microservices will be tightly coupled in the database layer. The whole object of moving to microservice will fail (e.g., team empowerment, independent development).

A better approach is to provide every Microservice its own Data store, so that there is no strong-coupling between services in the database layer. Here I am using the term database to show a logical separation of data, i.e., the Microservices can share the same physical database, but they should use separate Schema/collection/table. It will also ensure that the Microservices are correctly segregated according to the Domain-Driven-Design.

Database per Microservice by Md Kamaruzzaman

Pros

  • Complete ownership of Data to a Service.
  • Loose coupling among teams developing the services.

Cons

  • Sharing data among services becomes challenging.
  • Giving application-wide ACID transactional guarantee becomes a lot harder.
  • Decomposing the Monolith database to smaller parts need careful design and is a challenging task.

When to use Database per Microservice

  • In large-scale enterprise applications.
  • When the team needs complete ownership of their Microservices for development scaling and development velocity.

When not to use Database per Microservice

  • In small-scale applications.
  • If one team develops all the Microservices.

Enabling Technology Examples

All SQL, NoSQL databases offer logical separation of data (e.g., separate tables, collections, schemas, databases).

Further Reading

Event Sourcing

In a Microservice Architecture, especially with Database per Microservice, the Microservices need to exchange data. For resilient, highly scalable, and fault-tolerant systems, they should communicate asynchronously by exchanging Events. In such a case, you may want to have Atomic operations, e.g., update the Database and send the message. If you have SQL databases and want to have distributed transactions for a high volume of data, you cannot use the two-phase locking (2PL) as it does not scale. If you use NoSQL Databases and want to have a distributed transaction, you cannot use 2PL as many NoSQL databases do not support two-phase locking.

In such scenarios, use Event based Architecture with Event Sourcing. In traditional databases, the Business Entity with the current “state” is directly stored. In Event Sourcing, any state-changing event or other significant events are stored instead of the entities. It means the modifications of a Business Entity is saved as a series of immutable events. The State of a Business entity is deducted by reprocessing all the Events of that Business entity at a given time. Because data is stored as a series of events rather than via direct updates to data stores, various services can replay events from the event store to compute the appropriate state of their respective data stores.

Event Sourcing by Md Kamaruzzaman

Pros

  • Provide atomicity to highly scalable systems.
  • Automatic history of the entities, including time travel functionality.
  • Loosely coupled and event-driven Microservices.

Cons

  • Reading entities from the Event store becomes challenging and usually need an additional data store (CQRS pattern)
  • The overall complexity of the system increases and usually need Domain-Driven Design.
  • The system needs to handle duplicate events (idempotent) or missing events.
  • Migrating the Schema of events becomes challenging.

When to use Event Sourcing

  • Highly scalable transactional systems with SQL Databases.
  • Transactional systems with NoSQL Databases.
  • Highly scalable and resilient Microservice Architecture.
  • Typical Message Driven or Event-Driven systems (e-commerce, booking, and reservation systems).

When not to use Event Sourcing

  • Lowly scalable transactional systems with SQL Databases.
  • In simple Microservice Architecture where Microservices can exchange data synchronously (e.g., via API).

Enabling Technology Examples

Event Store: EventStoreDBApache KafkaConfluent CloudAWS KinesisAzure Event HubGCP Pub/SubAzure Cosmos DBMongoDBCassandraAmazon DynamoDB,

Frameworks: LagomAkkaSpringakkatectureAxonEventuate

Further Reading

Command Query Responsibility Segregation (CQRS)

If we use Event Sourcing, then reading data from the Event Store becomes challenging. To fetch an entity from the Data store, we need to process all the entity events. Also, sometimes we have different consistency and throughput requirements for reading and write operations.

In such use cases, we can use the CQRS pattern. In the CQRS pattern, the system's data modification part (Command) is separated from the data read (Query) part. CQRS pattern has two forms: simple and advanced, which lead to some confusion among the software engineers.

In its simple form, distinct entity or ORM models are used for Reading and Write, as shown below:

CQRS (simple) by Md Kamaruzzaman

It helps to enforce the Single Responsibility Principle and Separation of Concern, which lead to a cleaner design.

In its advanced form, different data stores are used for reading and write operations. The advanced CQRS is used with Event Sourcing. Depending on the use case, different types of Write Data Store and Read Data store are used. The Write Data Store is the “System of Records,” i.e., the entire system's golden source.

CQRS (advanced) by Md Kamaruzzaman

For the Read-heavy applications or Microservice Architecture, OLTP database (any SQL or NoSQL database offering ACID transaction guarantee) or Distributed Messaging Platform is used as Write Store. For the Write-heavy applications (high write scalability and throughput), a horizontally write-scalable database is used (public cloud global Databases). The normalized data is saved in the Write Data Store.

NoSQL Database optimized for searching (e.g., Apache Solr, Elasticsearch) or reading (Key-Value data store, Document Data Store) is used as Read Store. In many cases, read-scalable SQL databases are used where SQL query is desired. The denormalized and optimized data is saved in the Read Store.

Data is copied from the Write store to the read store asynchronously. As a result, the Read Store lags the Write store and is Eventual Consistent.

Pros

  • Faster reading of data in Event-driven Microservices.
  • High availability of the data.
  • Read and write systems can scale independently.

Cons

  • Read data store is weakly consistent (eventual consistency)
  • The overall complexity of the system increases. Cargo culting CQRS can significantly jeopardize the complete project.

When to use CQRS

  • In highly scalable Microservice Architecture where event sourcing is used.
  • In a complex domain model where reading data needs query into multiple Data Store.
  • In systems where read and write operations have a different load.

When not to use CQRS

  • In Microservice Architecture, where the volume of events is insignificant, taking the Event Store snapshot to compute the Entity state is a better choice.
  • In systems where read and write operations have a similar load.

Enabling Technology Examples

Write Store: EventStoreDBApache KafkaConfluent CloudAWS KinesisAzure Event HubGCP Pub/SubAzure Cosmos DBMongoDBCassandraAmazon DynamoDB

Read Store: Elastic SearchSolrCloud SpannerAmazon AuroraAzure Cosmos DBNeo4j

Frameworks: LagomAkkaSpringakkatectureAxonEventuate

Further Reading

Saga

If you use Microservice Architecture with Database per Microservice, then managing consistency via distributed transactions is challenging. You cannot use the traditional Two-phase commit protocol as it either does not scale (SQL Databases) or is not supported (many NoSQL Databases).

You can use the Saga pattern for distributed transactions in Microservice Architecture. Saga is an old pattern developed in 1987 as a conceptual alternative for long-running database transactions in SQL databases. But a modern variation of this pattern works amazingly for the distributed transaction as well. Saga pattern is a local transaction sequence where each transaction updates data in the Data Store within a single Microservice and publishes an Event or Message. The first transaction in a saga is initiated by an external request (Event or Action). Once the local transaction is complete (data is stored in Data Store, and message or event is published), the published message/event triggers the next local transaction in the Saga.

Saga by Md Kamaruzzaman

If the local transaction fails, Saga executes a series of compensating transactions that undo the preceding local transactions' changes.

There are mainly two variations of Saga transactions co-ordinations:

  • Choreography: Decentralised co-ordinations where each Microservice produces and listen to other Microservice’s events/messages and decides if an action should be taken or not.
  • Orchestration: Centralised co-ordinations where an Orchestrator tells the participating Microservices which local transaction needs to be executed.

Pros

  • Provide consistency via transactions in a highly scalable or loosely coupled, event-driven Microservice Architecture.
  • Provide consistency via transactions in Microservice Architecture where NoSQL databases without 2PC support are used.

Cons

  • Need to handle transient failures and should provide idempotency.
  • Hard to debug, and the complexity grows as the number of Microservices increase.

When to use Saga

  • In highly scalable, loosely coupled Microservice Architecture where event sourcing is used.
  • In systems where distributed NoSQL databases are used.

When not to use Saga

  • Lowly scalable transactional systems with SQL Databases.
  • In systems where cyclic dependency exists among services.

Enabling Technology Examples

AxonEventuateNarayana

Further Reading

Backends for Frontends (BFF)

In modern business application developments and especially in Microservice Architecture, the Frontend and the Backend applications are decoupled and separate Services. They are connected via API or GraphQL. If the application also has a Mobile App client, then using the same backend Microservice for both the Web and the Mobile client becomes problematic. The Mobile client's API requirements are usually different from Web client as they have different screen size, display, performance, energy source, and network bandwidth.

Backends for Frontends pattern could be used in scenarios where each UI gets a separate backend customized for the specific UI. It also provides other advantages, like acting as a Facade for downstream Microservices, thus reducing the chatty communication between the UI and downstream Microservices. Also, in a highly secured scenario where downstream Microservices are deployed in a DMZ network, the BFF’s are used to provide higher security.

Backends for Frontends by Md Kamaruzzaman

Pros

  • Separation of Concern between the BFF’s. We can optimize them for a specific UI.
  • Provide higher security.
  • Provide less chatty communication between the UI’s and downstream Microservices.

Cons

  • Code duplication among BFF’s.
  • The proliferation of BFF’s in case many other UI’s are used (e.g., Smart TV, Web, Mobile, Desktop).
  • Need careful design and implementation as BFF’s should not contain any business logic and should only contain client-specific logic and behavior.

When to use Backends for Frontends

  • If the application has multiple UIs with different API requirements.
  • If an extra layer is needed between the UI and Downstream Microservices for Security reasons.
  • If Micro-frontends are used in UI development.

When not to use Backends for Frontends

  • If the application has multiple UI, but they consume the same API.
  • If Core Microservices are not deployed in DMZ.

Enabling Technology Examples

Any Backend frameworks (Node.js, Spring, Django, Laravel, Flask, Play, …..) supports it.

Further Reading

API Gateway

In Microservice Architecture, the UI usually connects with multiple Microservices. If the Microservices are finely grained (FaaS), the Client may need to connect with lots of Microservices, which becomes chatty and challenging. Also, the services, including their APIs, can evolve. Large enterprises will like to have other cross-cutting concerns (SSL termination, authentication, authorization, throttling, logging, etc.).

One possible way to solve these issues is to use API Gateway. API Gateway sits between the Client APP and the Backend Microservices and acts as a facade. It can work as a reverse proxy, routing the Client request to the appropriate Backend Microservice. It can also support the client request's fanning-out to multiple Microservices and then return the aggregated responses to the Client. It additionally supports essential cross-cutting concerns.

API Gateway by Md Kamaruzzaman

Pros

  • Offer loose coupling between Frontend and backend Microservices.
  • Reduce the number of round trip calls between Client and Microservices.
  • High security via SSL termination, Authentication, and Authorization.
  • Centrally managed cross-cutting concerns, e.g., Logging and Monitoring, Throttling, Load balancing.

Cons

  • Can lead to a single point of failure in Microservice Architecture.
  • Increased latency due to the extra network call.
  • If it is not scaled, they can easily become the bottleneck to the whole Enterprise.
  • Additional maintenance and development cost.

When to use API Gateway

  • In complex Microservice Architecture, it is almost mandatory.
  • In large Corporations, API Gateway is compulsory to centralize security and cross-cutting concerns.

When not to use API Gateway

  • In private projects or small companies where security and central management is not the highest priority.
  • If the number of Microservices is fairly small.

Enabling Technology Examples

Saturday, July 15, 2023

Introducing and Scaling a GraphQL BFF

ref: Introducing and Scaling a GraphQL BFF (infoq.com)

Interesting points: 
  • We had to make multiple requests to get all the data that we needed. we were making around 15 different requests to different endpoints of the same Content API.
  • The JSON response for each endpoint often included hundreds of fields, and of these, we would only use a tiny fraction
  • First of all, the REST API was not always intuitive to use.
Finally, our React components were littered with API related business logic

Solution: 

 We wrapped all the REST APIs that we were using in a GraphQL layer. The React application, instead of speaking directly to any REST APIs would speak to this GraphQL layer. The GraphQL BFF was effectively shielding the React application from having to deal with any of the REST logic together



Summary

Michelle Garrett talks about the journey of introducing and then scaling a GraphQL BFF to serve multiple applications. She covers the benefits of the Backend For Frontend pattern and why it's a popular way to introduce GraphQL. She talks about how to remain agile and support a production application throughout this process. 


Garrett: My talk is about introducing and scaling a GraphQL BFF. I'm hoping that this talk will have something for people who have already adopted GraphQL and might have been using it in production for a while. Hopefully, we'll help you think about what you want to do next with that architecture. I also hope that if you are thinking about using GraphQL, or you want to know more about it, this will give you some ideas about the benefits that might be in store for you if you do decide to use GraphQL. This is a talk about growing a GraphQL API much like a plant. We're going to talk about what it's like to introduce GraphQL to a code base. We're also going to talk about what it's like to use GraphQL in production for a year or two, and how your architecture might evolve. I'm going to take you on the journey of a GraphQL API that I've built with my team at Condé Nast. Ultimately, what's happened to it over the course of just over two years. I'm specifically talking about the Backend for Frontend pattern, which some of you may have heard of. If not, it's ok, I'm going to give a crash course. I'm talking about this not just because it's what I built at Condé Nast. Also, because in the industry, this is quite a common pattern for adopting GraphQL. There are a lot of people out there with GraphQL BFFs. I wanted to give a talk that goes a little bit beyond GraphQL 101 because GraphQL is no longer a bleeding edge technology. There are a lot of people out there who have already adopted GraphQL, and are starting to feel the need to scale their APIs. If you have GraphQL in production, there's a lot for you. If not, then there'll be something for you as well.

My talk is split into four sections. The first section will be a quick introduction to GraphQL, and the BFF design pattern. I know some of you will already be familiar with these concepts, but I'm going to get us all on the same page regardless. Secondly, we're going to talk about building a GraphQL BFF, how GraphQL and the BFF design pattern relate to each other. What it's like to actually build a GraphQL BFF. Then I'm going to talk about some problems or opportunities that might arise with this architecture, and how you ultimately might want to scale your GraphQL BFF. Finally, I'm going to talk about the future of GraphQL architecture, some industry trends, and where we ultimately might want to evolve our GraphQL architecture.

Who Am I? I'm Michelle. I'm a Software Engineer at Condé Nast in London.

What Is GraphQL?

Many of you will already know about GraphQL. I want to give a quick 101 regardless. How many people are using GraphQL in production here? It's like a quarter, a third. This is a good mix. Let's talk about what GraphQL is. Here are a few definitions. If you go to the official GraphQL website, the documentation will tell you that it is a query language for APIs. I also quite like the definition of a language for requesting remote data. Finally, you've probably heard GraphQL talked about as the cool, popular new alternative to REST. This is a definition that I think encompasses all of that from How to GraphQL, which is a great website for learning how to GraphQL. It's howtographql.com. They say GraphQL is a new API standard that provides a more efficient, powerful, and flexible alternative to REST.

Here are some things that GraphQL is not. GraphQL is not a database language. It's not related to graph databases in any way. It's not anything to do with Neo4j. It is a query language for APIs not for databases. It provides an interface to describe data, regardless of where that data is stored, whether that's a database or another API. GraphQL is not just for React or JavaScript developers. GraphQL may be extremely popular with web developers. JavaScript is not the only language in which you can write and work with GraphQL. GraphQL can be used anywhere that a client communicates with an API regardless of the language. There are libraries for working with GraphQL in Python, Go, Ruby, Java.

Here are some of the key differences between REST and GraphQL. In REST APIs, you'll usually have multiple endpoints. You have a Content API that gives content for a news website, you might have an endpoint, which is /article and one which is /video. The article endpoint will give you article data. The video endpoint will give you video data. In GraphQL, there is only one multipurpose endpoint. This single endpoint can provide all the data that that API is concerned with. If it's a Content API, then you can go to /GraphQL to find out about articles, videos, galleries, and whatever else that API cares about. In REST, you get the same set of fields from an endpoint every single time. If I'm looking up an article, I will always get the same 50 fields every single time. Even if I only want the title of an article, if I make a request to the article endpoint, I'll probably get the body, the contributors, the categories, the tags, a bunch of metadata, and probably a lot of stuff that I won't use. With GraphQL, you get exactly the fields that you ask for, no more and no less. You specify exactly the fields that you want, and GraphQL will give those to you.

Key Concepts in GraphQL

The key concepts in GraphQL are schema and query. The GraphQL schema tells you what fields you are able to request from the API. Then you write a GraphQL query describing the data fields that you want. It's like a shopping list for all the data that you want back from the API. You will get back exactly the data that you ask for in your query.

Let's do the BFF software design pattern, which does not stand for best friends forever. It stands for a Backend for Frontend API. This is a software design pattern that was first popularized by SoundCloud a number of years ago. It's a design pattern for internal APIs. I'm talking about APIs that are internal to a particular organization, rather than published as a third-party API for the public to use.

I want to talk about what the Backend for Frontend API pattern is offering an alternative to. That is the one size fits all API. This is a monolithic API that is shared between multiple applications or frontends inside of an organization. It most likely wraps the primary data source inside of that organization. It might look like this where you have multiple applications, all speaking to a single shared API. These applications might have different user experiences or use different parts of that shared API. Because it is a one size fits all API, the API must serve all of these clients equally. It is a monolith that is a common denominator between all platforms.

It's not necessarily a bad idea to share an API between different applications. There are some common pain points in this scenario. Firstly, different clients need different sets of data. The one size fits all has to try and serve all of these different needs. Unfortunately, it's almost impossible to provide the perfect endpoint for every single client. If it tries to do this, the monolith is probably going to become increasingly unruly as it tries to keep up with the demands of all the different clients.

Secondly, the shared API becomes a bottleneck when rolling out new features. Every time a new feature is required, the frontend team has to coordinate with the API team or the backend team, which is responsible for the single shared API. This API team then in turn has to balance the priorities of all of the different clients. Because of the nature of software, your priority might not always be the top of the list. This is where the Backend for Frontend design pattern comes in. In order to solve the pain points of having one API and multiple consumers, the BFF pattern recommends building one API per client. That means that each frontend essentially has its own custom API, which is built and maintained by the same team as the frontend.

It might look a bit like this. A BFF is created as an interface between each API consumer and a shared API or data resource. The BFF implements API logic that is specific to that particular application. It's essentially a translation layer that ensures data is transformed specifically to suit the needs of that particular client. Clients might be using multiple APIs, in which case, the BFF can also act as an API gateway that is defined just for a single application. It will perform the task of aggregating and combining all the data from a set of APIs into a common format that is convenient for the client.

What are the benefits of this? Firstly, it's easier to adopt the API as UI requirements change. As a frontend developer, now instead of having to wait for the API team to create a particular endpoint for you or integrate a new field, or a new set of data, you now have the power to just go ahead and make those changes yourself. It also simplifies the process of lining up server and client releases. Now that one team manages both the UI and the API, there's no longer coordination that has to happen between a frontend and a backend team. Thirdly, because it's focused, the BFF API will be smaller than the shared single purpose API. It'll probably be easier to understand and navigate and will probably have smaller payloads if it's a REST API. Finally, you're able to aggregate multiple calls to downstream APIs into a single call to the BFF, which is a lot simpler and often more performant.

GraphQL + Backend for Frontend API

Now that we're on the same page about GraphQL and BFFs, let's talk about GraphQL BFFs. I want to talk about how GraphQL and BFFs relate to each other. If you're a GraphQL enthusiast, or you listened really well in my crash course, you might be thinking that a lot of the benefits that I just mentioned of a BFF actually sound quite familiar to you. You would be right because a lot of the benefits that I just mentioned, actually come for free with GraphQL. There's a lot that is shared in terms of benefits between GraphQL and BFFs. For example, just as a BFF provides only the data that is needed by a client, GraphQL allows clients the flexibility to define their own data needs. BFFs reduce over-fetching since a BFF returns only the data for one particular client rather than many clients. GraphQL reduces over-fetching because GraphQL returns only the data that you actually requested in the query. BFFs allows you to combine multiple data sources into one single BFF interface. GraphQL allows you to combine multiple data sources into one single GraphQL interface.

Even though there's a lot of shared ground between GraphQL and BFFs, GraphQL is not necessarily equivalent to a BFF API. Backend for Frontend is a design pattern that was invented before GraphQL. It might leverage either REST or GraphQL. GraphQL APIs are not always designed to be a BFF. Sometimes a GraphQL API is built to be shared by multiple applications. It's built to be generic. In my experience, GraphQL BFFs are actually very common and is a very common way for people to introduce GraphQL to their code base for the first time. I've heard of many teams introducing GraphQL to their company or project in the form of a BFF that will frequently wrap legacy APIs. It will frequently aggregate multiple data sources, possibly ease the transition to microservices, or help with migrations between different APIs. This is my theory about why GraphQL BFFs are so popular. A BFF is a low-stakes way to introduce GraphQL to an organization. Its surface area is limited to just one application. It does not require you to actually rewrite any downstream services or APIs in GraphQL.

With this in mind, I'm going to tell you about the GraphQL BFF that I built with my team at Condé Nast. I'll tell you where we started out. Then I will take you on the journey of where this BFF API has ended up after over two years in production. I work for Condé Nast, which is the parent company and publisher of some magazine brands that you probably have heard of, like "Vogue" and "GQ." I always feel the need to clarify when I give a GraphQL talk that "GQ" is not a magazine about GraphQL.

For three years, I've been working on a team that builds the international websites for "Vogue" and "GQ" magazine, which are all served by a single multi-tenant React application. These sites are currently fully powered by a GraphQL API in production and have been since early 2018. That's just over two years. The content for a Vogue website needs to come from somewhere. If you've ever worked for a media company or built any product that needs content, then you will know that the Vogue website must be backed by a content management system or a CMS. The content for our websites comes from an internally built content management system called Copilot. Copilot has a REST API, which allows you to request data for content such as articles, videos, galleries. When we first started working on the website application in 2017, GraphQL existed but it wasn't really a thing yet. It wasn't on our radar that much and it wasn't really that popular. Of course, Copilot did not have a GraphQL API for us to use. We didn't even really think about it. We use the REST API.

Before GraphQL, our API architecture used to look something like this in a very simplified form. A website application, which is written in React, was speaking pretty much directly to multiple REST APIs. The most important of those APIs was the Content API from Copilot LCMS. We worked like this, with this architecture for about a year, and re-launched the first Vogue site successfully with this architecture. Overall, it was fine, but there were some pain points that really started to make themselves known, the more we started to scale and add more websites to our multi-tenant application.

Problems We Had

Here are the main problems that we had with the REST API. First of all, the REST API was not always intuitive to use. Some of the fields had really confusing names that we had to explain whenever we onboarded a new developer to the code base. These abstract names sometimes made our React components really difficult for people to understand. Secondly, we had to make multiple requests to get all the data that we needed. To render an article, we didn't just need to get the data from the /article endpoint, we were making around 15 different requests to different endpoints of the same Content API. We were fetching a wide range of related data such as categories, tags, and contributor information from different endpoints and then trying to combine all this data together. Number three, we were massively over-fetching from all these APIs. The JSON response for each endpoint often included hundreds of fields, and of these, we would only use a tiny fraction. Unfortunately, we weren't great at filtering out the data that we didn't need. A lot of this data did end up in the client. Finally, our React components were littered with API related business logic, rather than being mostly presentational as we would prefer. This isn't necessarily the fault of the REST API, because we're the ones who put the logic there. Something about our architecture was encouraging. A quite poor separation of our concerns, and we incurred a lot of tech debt because of this. We needed a clear distinction between our data layer and our view layer, and we didn't really have that.

Solution

To solve these problems, in early 2018, we introduced GraphQL to our project in the form of a Backend for Frontend API. Although, I don't think we knew that we were creating a Backend for Frontend API at the time. We wrapped all the REST APIs that we were using in a GraphQL layer. The React application, instead of speaking directly to any REST APIs would speak to this GraphQL layer. The GraphQL BFF was effectively shielding the React application from having to deal with any of the REST logic together. Our React application didn't have to glue together 15 different requests to make an article. It would just request the article data from GraphQL.

Challenges and Duration to Switch to GraphQL

People ask how long it takes to switch over to GraphQL. Of course, this depends on the size and complexity of your application. For us, it took about three months. We integrated our new BFF API incrementally, making the switch literally page by page. We started with a category page, and we ended with the homepage. By the end, our websites were 100% powered by GraphQL. Old data that is required for the site is accessed through a single GraphQL interface. People also asked what the most challenging part of switching to GraphQL was. For us, the most time consuming part of this whole process was ensuring that all pages were covered by really comprehensive acceptance tests, so that we could ensure that there were no regressions when we turned on the GraphQL feature flag. We did break the article page about three times in this process. We learned from our mistakes and we got there in the end. Now we're fully powered by GraphQL. We have very comprehensive acceptance tests to make sure that we don't break things.

What GraphQL Gave Us

Here's what GraphQL gave us. Firstly, it gave us a really nice, intuitively designed API. No more confusing names or confusing deeply nested relationships. We designed it in a way that actually made sense to us as product developers and was human readable. Secondly, we gained the ability to fetch only the data that we actually needed. No more over-fetching because we could specify exactly the fields that we wanted. We didn't have to filter through hundreds of lines of unnecessary data to get what we actually wanted. We could do all of this in one request instead of 15 requests. Thirdly, we now have a single interface to interact with multiple downstream services. Our React application now does not have to care about the distinction between multiple REST APIs and request data from them separately. Components can now request the data that they need regardless of where that data comes from. It can focus on being the presentational layer. Finally, we gained components free of API related business logic. GraphQL really encouraged us to clean up a lot of our tech debt around this. The data transformation that we were previously doing inside of our components was now delegated to the GraphQL layer, leaving the components to focus solely on presentational logic.

We loved our new GraphQL API. Honestly, we were a bit obsessed with it. It was a really successful development in our architecture that made everyone working in the code base extremely happy. It encouraged us to clean up a lot of our tech debt. We felt really confident in our application. Newcomers to the code base were way less confused because instead of us following internal patterns that we had created ourselves, we'd used a pattern which was open source and something that they could easily familiarize themselves with. For now, the API that we've built is just being used for a single application. We'd created our own personal Backend for Frontend API. We always had a slight feeling that perhaps someday, this API would come in useful beyond the confines of just our application. We did keep this in the back of our minds while we were designing our initial schema.

We have this GraphQL BFF API, which we love. What happens next? This section is going to focus on some of the dilemmas that you might find yourself having, once you've actually built a GraphQL BFF. I'll tell you how we scaled our GraphQL BFF, and how we've ultimately evolved it into something that goes a little beyond the Backend for Frontend design pattern.

Duplication and Abstraction

One of the most common criticisms of the BFF pattern is duplication. Already, there's some risk of duplication in the GraphQL BFF that I just described. We're essentially re-implementing a shared Content API. If another application in the same company, perhaps, wrapped the same API in GraphQL, or built a GraphQL BFF wrapping this API, then our code would probably look pretty similar. I think that as developers, we are naturally averse to duplication. We are obsessed sometimes, and often to a fault with making everything as dry and abstract as possible all of the time. We really love abstraction and will often aim to create a shared component function or service to avoid doing something more than once. For many of us, BFFs will feel counterintuitive, because duplication is inherent to the BFF pattern.

Sam Newman, said, "One of the concerns of having a single BFF per user interface is that you can end up with lots of duplication between the BFFs themselves. They may end up performing the same types of aggregation or have the same or similar code for interfacing with downstream services. Some people react to this by wanting to merge these back together and have a general-purpose aggregating Edge API service. This model has proven time and again to lead to highly bloated code with multiple concerns squashed together." That's a damning review of creating abstractions out of a BFF. If you Google the Backend for Frontend pattern, you will find Sam Newman's iconic blog post from which this quote was taken.

I want to share another quote by Sandi Metz, which is, "Duplication is cheaper than the wrong abstraction." I really love this quote. Ever since I heard it, I think about it every time. I go to abstract something out into a function or abstract something in some way. I've been burned by creating the wrong abstraction too many times. After being burned, I really believe that when we design code and services, we should always start from a place of duplication wherever possible. We should optimize later down the line once we've validated the actual need for a particular abstraction, and validated that it is the correct abstraction. I'm talking about this philosophy because on our GraphQL BFF journey, we did ultimately create an abstraction to try and fix some of the duplication that was inherent to our BFF. I don't suggest that you do this without really questioning first, if your abstraction is really necessary, or if it is the correct one. Don't create a shared microservice before you actually need one.

Costs of Shared Abstraction

There are costs associated with creating a shared abstraction, or in the case of what I'm going to talk about, a shared service. Once you start to share something with other developers, the stakes have risen and you have acquired essentially a burden. The first cost is that you now have less ability to take risks and try out new things whenever you want. Your decisions now affect more than just your own application. You might not be able to change things as frequently as you would like. Other people are now depending on your API to be stable and not break their application. Which leads me to the second cost, which is, you actually have to be extra careful not to break other people's applications. Five minutes of downtime might be fine for you and your team's application, but it might be critical for another application in the organization. When you start to have a shared abstraction, you really need to create a strategy around breaking changes.

Advantages of Shared Abstraction

That's not to say that you should absolutely never create a shared abstraction. Assuming that you have done all of your due diligence, and you've decided that the costs of sharing a service or an abstraction are worth it in your particular case, then there are some compelling advantages. The first one is, of course, less duplication of efforts. Now that multiple teams aren't building the exact same thing, hopefully, developers will have more time to work on other perhaps more impactful projects rather than duplicating the same efforts. Fix once, benefit everywhere. If there's a bug, it can be fixed all in one go. The same goes for features and improvements. You might benefit from an edge case found by another application, or a feature that you would not have thought to actually build yourself. Finally, you'll gain organizational alignment. More teams in the organization will now be doing everything or things in the same way. That opens up opportunity for collaboration. In the case of GraphQL, having a shared schema is really valuable because it means that teams will now have alignment on naming, which as we all know, is a very difficult thing.

Scaling GraphQL Backend for Frontend API at Condé Nast

With that in mind, I'm going to talk to you about how and why we scaled our GraphQL BFF at Condé Nast. I'll tell you about the shared abstraction that we ultimately created. When the question came up early last year of what was next for our GraphQL architecture, we had been using our GraphQL API in production for about a year. We were using our API to power a platform that is serving 21 different websites in 10 countries across the world. That's international Vogue and GQ websites. Our GraphQL API was serving around 200 million users each month, which is a big number. By all accounts, our GraphQL API was quite successful. Why did we feel the need to actually scale beyond what we were already doing?

When we first built our GraphQL API, we were rebels. Because it was 2018 and GraphQL was not as popular as it is now. What we were doing was considered experimental. We really had to prove the technical value of what we were doing. We had to fight against the perception that we were introducing a new technology for vanity reasons that didn't deliver any actual business value. Now in 2020, it's a very different world for GraphQL. GraphQL is skyrocketing in popularity, and everybody wants to work with GraphQL APIs. We started hearing from other engineering teams at Condé Nast that they were thinking of introducing GraphQL to their projects.

We had wrapped the REST Content API that we used in GraphQL. We'd spent time designing a schema that we really liked and that made sense to us. We had validated the schema in production for over a year. It did seem a shame to keep this to ourselves because other teams in the business were still using the REST API that we had found difficult to work with. We even discovered that there was another team building their own GraphQL wrapper around the exact same REST API, producing remarkably and eerily similar results. With this in mind, it came time to think seriously about abstracting out the part of our BFF that actually made sense to share with other people and would be useful to many teams in the organization.

The New Architecture

This is the new architecture that we introduced. We decided to split our GraphQL API into two, the core Content API and the BFF. It was clear to us that the content related fields in our schema would be useful to other teams, so we decided to create a call Content API, making GraphQL a first class citizen for any application requesting data from our content management system, Copilot. We'd already written a schema and resolvers wrapping that REST API for ourselves. Now we wanted to make this available for other teams to use as well. We still felt that there was a need for the BFF. The core Content API cannot serve all of the data needs of our React application alone. There are some parts of our schema that were not content related. They were quite specific to our particular application, things like configuration, or integrations with some other third-party APIs that went via our BFF layer. We decided that resolvers for these things would remain inside the BFF schema for now.

Here's what our architecture looked like before we split out the Content API into its own service. The GraphQL BFF is speaking directly to the REST Content API, as well as some other APIs. Here's the change that took place. Our GraphQL BFF will actually consume a GraphQL Content API directly. The logic required to wrap the REST API has now been moved into its own service. The BFF will consume pure GraphQL content data. That GraphQL service can be used by other applications in place of the REST API. It's now been elevated to the various team status of being an official published GraphQL API for the content management system itself. A big part of the dream of this new architecture is that we would be consuming GraphQL all the way down. Instead of carrying the complexity of wrapping the REST API inside of our BFF, we could just consume pure organic GraphQL direct from the source from our new service.

Here's what we need to do. We need to build a new content microservice. We need to integrate this microservice back into the existing BFF. We need to do this incrementally and crucially, without breaking 21 production websites. First of all, the building. I was part of the team that built this new microservice. Because this microservice was now going to serve the entire organization, instead of just our particular team, we assembled a team with engineers from across multiple areas in the business to build this together. The team included myself and some other people from the team that had built the GraphQL BFF. The team also included some engineers who were working on the other GraphQL BFF that was wrapping the same REST API. The team also included some people who had worked on the REST API itself. It was a pretty solid mix of experiences for this team.

Product + Hindsight = Better Product

As a developer, it is quite a rare opportunity when you're given the time and space to improve on something that you've built. This time, the challenge was different. The challenge this time was to take the schema that we had already used in production for over a year, and we needed to make it generic enough to be shared by applications other than our own. Although, this new GraphQL service is representing pretty much the same content data as what was in the BFF, we now have the license to make improvements with hindsight in mind. We can take all the learnings and pain points of using our schema for over a year in production and actually make something better to be used by other people. We also have the added insight of engineers who worked on other relevant teams and have experienced use cases that we did not experience on our team.

Core Content API

We built it. This is the core Content API. We spent a few months building it, hence then there was the final challenge, which is to integrate this new Content API back into the existing BFF schema. As a reminder, here's what our application architecture now looks like. The new GraphQL Content API may replace parts of the BFF schema. The BFF is still very much the GraphQL point of entry for our application. There is data from other sources that is still represented in the BFF, as you see from the other REST APIs down the bottom. Here's what we want in order to integrate with the new microservice. We need a way to forward our proxy content queries through to our new Content API via the BFF. We need the ability to integrate with the new Content API incrementally.

Apollo Federation

If you're familiar with the GraphQL space, you might know that there is a new open source tool from Apollo, who is the dominant player in the GraphQL space. This tool is called Apollo Federation. Federation is Apollo's answer to implementing GraphQL in a microservices architecture. It's replacing schema stitching, which has now been deprecated by Apollo. I'm personally quite excited about Apollo Federation, and I would like to use it. When we looked into using Apollo Federation for our purposes here, it didn't seem quite right for our use case right now, while we still have the BFF schema. It would probably require a bit of an architectural shift for us to use it. Even though we'd like to use that eventually, what we really wanted right now was a minimal way to get started querying this new Content API without having to rearchitect any further. We were happy to go with something imperfect, and iterate on it later.

Schema Delegation

We ultimately chose something called schema delegation, which is a method made available by Apollo. Basically, it's a way to forward GraphQL queries from one GraphQL service to another. You set up what's called a remote schema, and you delegate queries to that schema to be resolved. In this diagram, you'll see the content GraphQL microservice is our remote schema. We delegate all content related queries to this service to be resolved. Any non-content related queries to the BFF are resolved locally by the BFF. This option was a little bit messy for various reasons that are too niche to go into. This approach got us forwarding content queries through to our new service with minimal effort and pretty minimal code changes. Most importantly for us, we were now able to integrate incrementally. Much like how we initially released GraphQL, we set up schema delegation per content type so we could switch the feature flag on, page by page.

The Results

What are the results of all of these? We now have a Vogue article that looks exactly the same. The plight of a backend developer. Except, underneath the surface, unbeknownst to all the readers of Vogue, Germany, who want to catch up on the German fashion week news, the article data is now coming from a different place. Instead of the article data coming from the BFF, that data is now coming from a beautiful content microservice somewhere else. The developers that work on this application might now have some more free time to work on other features that readers actually will see and notice on the front page. They're much happier because now they don't have to maintain all that local resolve occurred.

That fantasy isn't quite real just yet. This is actually still a work in progress. We're still in the process of migrating over to using the new content microservice. In the meantime, several other engineering teams at Condé Nast have also started to use this new service. It's already validating the model that we were going for. The other team that had also wrapped the same REST API, are now using this GraphQL service.

The Future of GraphQL Architecture

I've shown you how my GraphQL architecture has evolved. I have some ideas about how it might evolve even further in the future. I want to talk about where GraphQL is headed this year, starting with some industry trends.

Number one, GraphQL adoption is growing, and many more APIs will be built with GraphQL in 2020. These are the results of the state of JavaScript survey in 2019, which has 21,000 respondents. In 2016, 5% of JavaScript devs used GraphQL, compare that to 2019, 39% of JavaScript developers have used GraphQL. In 2016, 36% had never heard of it. In 2019, only 5% had never heard of GraphQL. I expect these numbers to continue growing in 2020.

Number two, GraphQL architecture is moving towards federated microservices with Apollo Federation. When I was at GraphQL Summit last year, this was a really clear theme and a really clear takeaway from the conference. Nearly every speaker from a large organization using GraphQL spoke about their architecture evolving in this way. GraphQL APIs are becoming a set of federated microservices. The ideal GraphQL API of the future will be an API gateway that allows you to query seamlessly for data from multiple different microservices. I think that it's a natural progression as organizations are starting to scale their GraphQL APIs. Engineers are finding that a monolithic GraphQL API needs to be split out into microservices to be worked on more efficiently, or they're finding that people have built GraphQL services all around the organization and it would be convenient if they could share a schema. Apollo Federation has been out for about a year now. Given Apollo's enormous influence in the GraphQL space, I expect that this will become industry standard. Personally, I would really like to start using Apollo Federation.

The final industry trend of note is that GraphQL architecture is moving towards one universal data graph per organization. This is related to the former trend. Organizations will seek to combine all of their GraphQL services into one common data graph shared by everyone in the organization. It will look something like this. A data graph for the entire organization. A gateway to every GraphQL microservice that is owned by that company. Teams able to query for any data that the organization owns, and which is in GraphQL in one place, and seamlessly combine data from multiple different places.

Where Does This Leave The Backend for Frontend Pattern?

Where does this leave our humble BFF pattern? With GraphQL becoming more widespread across organizations, GraphQL architecture patterns are ultimately evolving beyond the BFF pattern. Regardless of this, I still think that if you were introducing GraphQL to an organization for the very first time or you're a very small organization, then the BFF pattern is still a good place to start. You may ultimately evolve it into something bigger. When you do, you will be informed by real world usage. Thinking back to the similarities between GraphQL and BFFs, here's what I think now. GraphQL is not a like-for-like replacement for a BFF. It solves enough of the same problems that if you evolve your architecture towards consuming pure GraphQL microservices, you might find that you no longer actually need a BFF.

Michelle Garrett of Condé Nast, what do I want our architecture to end up like? With these three trends in mind, this is the architecture that I personally would really like to move towards. I'd like to get to a point where we could remove the existing BFF and speak directly to a single universal Condé Nast graph that is backed by GraphQL microservices. If we have the ability to get all of our application data via pure GraphQL from a single endpoint, then there's no need to keep our application level BFF. I think that the BFF will be useful for a while, and it will probably be a while before we remove it. If we convert the wide organization to building APIs GraphQL first behind a gateway, then ultimately, I think we might grow out of this pattern. Much like this turtle, I don't want to rush the process of evolving our architecture. I want to be thoughtful about how we scale our GraphQL. I would like to continue validating all of our abstractions before we make them.

Closing Thoughts and Takeaways from the GraphQL Journey

I really believe in focusing on incremental change. We've done this throughout our entire GraphQL journey and I think that it's proven to be very important. You may dream of having an ultimate GraphQL gateway of microservices. If you're evolving the architecture of a production application, then this isn't necessarily realistic. You need to start smaller. Big bang releases are risky. If you build something for too long in isolation, you might build the wrong thing. Try and get in production as soon as possible and keep going, even if it's not your ideal solution or architecture. Related to this, don't optimize too early. We could have built out Content API as the generic official organization-wide API in the first place. We built it first for one application. I think that, ultimately, we created a better API because of this, because by the time we went to design a replacement for the REST API, we had a year of learnings and product experience to inform this. We were also able to take more risks along the way as we didn't yet have the burden of supporting other clients with a published API.

Performance optimization techniques in ReactJS

Summary: Helps to learn how to measure performance improvements. As the majority of modern web applications rely on what React.js brings to ...