Combatting Transaction Slowdowns in Cloud-Ready Applications

Your app’s transaction time is five to six seconds on the LAN, why is it 30 seconds for cloud users? As you’ll find out in this blog post, there are a number of reasons why your remote users are experiencing such dramatic slowdowns. In many cases, you might have simply moved an application into the cloud without first making it “cloud-ready.”

When you make the decision to move an application to the cloud, your primary concern should be the architecture you have in place for it. For instance, are you using a standard Three Tier architecture, or are all Three Tiers hosted on one server?

Also, where are your users located? Are they remote, or in the same place as the application? Lastly, how much bandwidth is required to support the traffic, be it user-to-app, user-to-cloud, or user-to-server?

Considering Application Architecture

When it comes to application architecture, the Three-Tier approach is by far the most traditional. As you may know, this is constructed of a Web Tier, an Application Tier, and a Database Tier. The Web Tier is responsible for hosting the front-end servers and for facilitating communication between the users and the Application Tier. The Application Tier, of course, hosts the application itself and facilitates communication between the Web Tier and the Database Tier. Lastly, the Database Tier communicates with the Application Tier while also hosting all of the app’s information.

Not all applications follow this structure. Some use one server to host all application components, while others might choose to instead collapse the architecture. One option is to eliminate the Web and Application Tiers completely, allowing the client to communicate directly with the database. Not only do we feel this is a poor practice, but it has become the subject of the case study we’ll be reviewing today.

A Study in Application Slowdowns

Our team recently collaborated with one of our main clients on a common workflow automation application issue. Their app was designed to streamline the delivery of work to users, thus helping to improve critical business processes and automate manual tasks. As with many other companies, the pandemic of 2020 had forced all of our client’s users to begin working remotely, accessing the application via Citrix Virtual Desktops.

Now, the Citrix servers were contained in the same onsite data center as the workflow application database server. However, the client had decided to stop using Citrix and instead leverage cloud-based virtual desktops – the long-term goal being to move the entire workflow application into the cloud.

In pursuing this goal, they ended up at a point where the users’ virtual desktops were in the cloud, but the workflow application server remained onsite at their data center. During this time, the app worked well for onsite and Citrix users but performed very poorly for cloud users – despite having the same thick client installed. For reference, a transaction that took an onsite user five to six-second to complete took upwards of 30 seconds for a cloud user.

Now, why would this happen? Let’s take a look.

There Were Approximately 3000 Application Turns per Transaction

Even without a proper grasp of the application turn, it’s never good to have a single transaction create 3,000 of anything, right? For those who don’t know, an application turn is counted when a client sends a request to a server and the server responds with data to meet that request. You can visualize this with a network capture, which usually shows a server responding with a TCP PUSH flag set. In this case, the PUSH flag signals that all the data has been returned for the request.

If the app is written particularly well, it will minimize the number of application turns per request. Yet this was not the case at all here. Here, we had app transactions that were full of nested queries. This is where the client sends a request to the server, and the server’s response is used to create another request. In our case study, this was happening 3,000 times! Of course, on a local network, all of these extra turns can be masked due to the super low response time. But as the RTT (round trip time) is extended, the slowdown becomes more and more evident.

For instance:

If we have a 1 millisecond (msec) RTT, then the best-case scenario for this application transaction is 3 seconds:

3000 application turns   1 msec RTT = 3 seconds

But what happens when the average RTT to the cloud is 11 msecs?

3000 application turns   11 msec RTT = 33 seconds!

Now, in analyzing this data, you might think that the problem is the 11 msecs RTT to the cloud. However, this is actually a standard response time for any cloud provider. The variable that is actually causing the issue here is the 3000 application turns.

The Thick Client Communicated Directly with the SYBASE Database

This particular application utilized a collapsed architecture wherein the thick client communicates directly with the database. In our experience, this should be avoided both for security and performance reasons. Instead, users should communicate with applications via HTTPs over TCP 443. Such a connection is both encrypted and well-established as a suitable port for user-to-server communication. SYBASE communication, on the other hand, is not encrypted and, therefore, not secure. On top of this, database communications are specifically written for low latency networks. 

Providing Solutions

So how can we go about fixing this issue? First, we need to optimize the poor-performing transactions in order to minimize the number of application turns. As we stated above, this is the variable that’s causing the actual slowdown. Therefore, the fewer application turns we can manage, the less time it will take for users to receive a response.

Next, we need to introduce an application tier to the architecture. This would be responsible for handling communication between the thick client and the database. Specifically, the application should be placed close to the database server. If both of these solutions are implemented simultaneously, it should solve the problem immediately.

That said, we should keep in mind that both local users and Citrix users have long accepted five to six-second transaction response times. In most cases, they are simply told that this is the best they can get. However, this is absolutely not true. For example, if application developers optimize code so the transaction only utilizes 100 application turns, that transaction time could potentially be reduced to 1.1 seconds for the cloud user. That’s 27 times faster! 

As we continue to implement solutions for our client, I will be keeping you informed of this case via additional blog posts. However, I want to leave you with this: don’t settle for slow transactions! Instead, reach out to us. Trust me, we have the tools, experience, and knowledge to make application slowdowns a problem of the past.

Ready to get to know your network?

Creating dynamic network maps is easy with bitB.