Payment Gateway Timings

HomeAway has relationships with multiple Payment Gateways. By collecting timing data, we are able to compare gateways with each other and perform other explorations. We have a centralized service (called “HAPI”) to manage gateway connections. This service presents a common interface to clients, converts the common data into gateway-specific requests, makes the request and responds back to the client. It also indexes three main timestamps associated with the payment gateways:

HAPI Sequence Diagram

  1. When did HAPI get the initial request
  2. When did HAPI send the request to the gateway
  3. When did HAPI get the response from the gateway

By subtracting #3 from #2, we can see how long a gateway takes to process a request, and we can break them out by gateway:

HAPI Sequence Diagram

From top to bottom, the gateways are “Braspag,” “Cybersource,” “Intuit,” “PayflowPro” and “Yapstone.” These gateways cover different business lines at HomeAway and provide a rare opportunity to compare gateway response times. Many reasons exist to choose between gateways; response time is just one facet. In tabular form, the summary statistics (in seconds) look like:

Gateway Minimum Median Mean Maximum Geometric Mean
Braspag 0.010 2.456 2.959 540.6 2.438
Cybersource 0.210 1.477 1.787 204.9 1.591
Intuit 0.003 2.710 3.130 143.2 2.765
PayflowPro 0.004 0.643 0.776 60.24 0.684
Yapstone 0.007 2.084 2.196 1627 1.978

The geometric mean and other geometric statistics are used because this is a timing analysis, and response times are well-modeled as right-sided, log-normal curves. The minimum and maximum values show a pretty wide disparity in observations. I wanted to trim the data to keep only the points within two (geometric) standard deviations from the (geometric) mean:

Gateways

Essentially, the middle data has the same “shape” as the entire dataset, but without the egregious tails.

The data and charts above were calculated via the following R statements. The “geomean” and “geosd” functions calculate the geometric mean and standard deviation; R really should have geometric statistics built-in.

Source code:

require(ggplot2) 
require(RCurl) 

geomean <- function(x, na.rm = FALSE, trim = 0, ...) { 
	exp(mean(log(x, ...), na.rm = na.rm, trim = trim, ...)) 
} 

geosd <- function(x, na.rm = FALSE, ...) { 
	exp(sd(log(x, ...), na.rm = na.rm, ...)) 
}

 # # no empty gateway selector codes and no _TEST ones either # 
myCSV = getURL("https://tsar/solr/select/?q=!requestType:TOKENIZE%20AND%20gatewaySelectorCode:[*%20TO%20*]%20AND%20-gatewaySelectorCode:*_TEST%20AND%20requestReceived:[2011-10-01T00:00:00.000Z%20TO%20NOW/DAY]&fl=requestType,success,gatewaySelectorCode,requestReceived,requestTransmitted,responseReceived&sort=requestReceived%20desc&wt=csv&rows=1000000", userpwd="uid:andpassword") 

t=read.csv(textConnection(myCSV), stringsAsFactors=F) 
t$responseReceived = strptime(t$responseReceived, format="%Y-%m-%dT%H:%M:%OSZ") 
t$requestTransmitted = strptime(t$requestTransmitted, format="%Y-%m-%dT%H:%M:%OSZ") 
t$requestReceived = strptime(t$requestReceived, format="%Y-%m-%dT%H:%M:%OSZ") 

#clean up an annoying problem 
t$gatewaySelectorCode= gsub("([A-Z]),.*", "\\1", 
t$gatewaySelectorCode) 

# calculate the gateway timings, and remove entries with no timing 
t$gw = as.numeric(t$responseReceived - t$requestTransmitted) 
ht=t[!is.na(t$gw),] 
tapply(ht$gw, ht$gatewaySelectorCode, FUN=function(x)quantile(x,probs=c(0.025, 0.975))) 
tapply(ht$gw, ht$gatewaySelectorCode, summary) 

# Geometric mean! tapply(ht$gw, ht$gatewaySelectorCode, geomean) 

# just get the middle 95% of each gw's time 
mids = by(ht, ht$gatewaySelectorCode, function(z) z[abs(z$gw-geomean(z$gw)) < 2*geosd(z$gw),]) 

# mids is now a list of the rows of the ht dataframe; want to combine this list into one big dataframe 

ht_mid = do.call(rbind.data.frame, mids) 

#now plot these timings 

ggplot(ht_mid, aes(x=gw,fill=gatewaySelectorCode))+theme_bw() +geom_density(alpha=0.75)+ ggtitle("Time Taken by Gateways (95%)") + labs(x="seconds")+xlim(0,7.5)

Overlaying all the gateway timings on one plot is hard to interpret, but breaking them out via facet_grid() geometry is easier to comprehend.

Gateways Gridded

Source Code:

ggplot(ht_mid, aes(x=gw, fill=gatewaySelectorCode)) 
	+ geom_density(alpha=0.75) + ggtitle("Time Taken by Gateways (95%)") 
	+ labs(x="seconds")+ xlim(0,10)+theme_bw()
	+facet_grid(gatewaySelectorCode~.) + theme(legend.position="none")

This plot illustrates that for all types of requests made to gateways, PayflowPro is quickest to respond and (since it has a relatively narrow peak) it is the most predictable. The other gateways have other curves with other response times; the bimodal nature of the Yapstone response caught my eye.

Each gateway provides several different functions:

Authorize - Ensures that the credit card is valid and has an open-to-buy limit of a given amount

Capture - Moves the money that was initially authorized

Sale - A one-call “Authorize-and-Capture”

Credit - Moves money from the merchant to the customer, without a referenced Sale or Capture

Refund - Moves money from the merchant to the customer with a referenced Sale or Capture

Verification - A quick “card check”

Void - Cancels the Sale or Capture

Void-or-Refund - Either cancels or refunds the Sale if the “void” time limit has expired

We can visualize the speed of each operation in many ways. Here we’re going to use an empirical cumulative density function, or “ECDF”, and plot it for each operation and gateway.

Gateway ECDF

Source Code:

ggplot(ht_mid, aes(x=gw, colour=gatewaySelectorCode))
	+theme_bw() + stat_ecdf(size=1.5) 
	+ facet_grid(requestType~.)+ xlim(0,7.5)

The x-axis is the amount of time (in seconds) before the operation completes. This ECDF visualizes the time needed to complete ALL of the observations of a given operation - the steeper the slope, the smaller the time-spread between the fastest and slowest observation and the more predictable the response. The Yapstone (purple line) responses for Capture, Credit and Refund are very steep and tend towards the left, which means that Yapstone is able to process these operations quickly and predictably. Likewise, PayflowPro is very fast on Credits, Refunds and Sales.

Operation ECDF

Source Code:

ggplot(ht_mid, aes(x=gw, colour=requestType))
	+theme_bw() + stat_ecdf(size=1.5) 
	+ facet_grid(gatewaySelectorCode~.)
	+ xlim(0,7.5)

Constructing different ECDFs allows us to see how each gateway processes all of the operations. Intuit and Braspag are very predictable for all of their operations, whereas the other gateways have varying operational speeds. Yapstone again reveals the most bimodal behavior and stimulates a closer look.

Yapstone Operations

Source Code:

ggplot(ht_mid[ht_mid$gatewaySelectorCode=="YAPSTONE",], aes(y=gw, x=requestType))
	+theme_bw() + geom_violin(trim=T, aes(fill=requestType)) 
	+ ggtitle("Time Taken by Yapstone Operations") 
	+ labs(y="seconds", x="Gateway") 
	+theme(axis.text.x=element_text(angle=-45)) 
	+ scale_fill_brewer(type="qual", palette="Dark2") 
	+ ylim(0,5)+ theme(legend.position="none")

Here a violin plot compares the various operations. “Authorize” and “Sale” take around 2 seconds and the other operations are in the subsecond range (except for “Void” - its shape is indicative of a small number of operations - the clients who used “Void” switched to “Void_or_Refund” once it was available). These operations - “Capture,” “Credit,” “Refund” and “Void_or_Refund” - all require a referenced transaction. One conclusion suggested by this chart is that Yapstone is able to process follow-on operations quickly and predictably.

Daily Yapstone

Source Code:

ggplot(ht_mid[ht_mid$gatewaySelectorCode=="YAPSTONE",], aes(y=gw, x=requestReceived, colour=requestType))
	+theme_bw()+geom_point(size=0.5)
	+ scale_colour_brewer(type="qual")
	+ guides(colour = guide_legend(override.aes = list(size=3)))
	+theme(axis.text.x=element_text(angle=-45))

Visualizing the Yapstone operations as a dot-plot over time shows the above. The bimodal nature again stands out strongly with most “Sale” operations taking more than 1.5 seconds and most “Credit” operations taking less than 1 second. However, this chart also reveals a line of blue near the bottom. We can plot successful sales and unsuccessful ones side-by-side with a violin plot and see the following.

Yapstone Sales

Source Code:

ggplot(ht_mid[ht_mid$gatewaySelectorCode=="YAPSTONE"&ht_mid$requestType=="SALE",], aes(y=gw, x=requestType))
	+theme_bw() + geom_violin(trim=T, aes(fill=success)) 
	+ ggtitle("Time Taken by Yapstone Sale") + labs(y="seconds", x="Gateway") 
	+theme(axis.text.x=element_text(angle=-45)) 
	+ scale_fill_brewer(type="qual", palette="Dark2") + ylim(0,5)

Yapstone is able to quickly fail certain “Sale” attempts, but others take about the same amount of time as the successful ones. The theory is that Yapstone fails some sales quickly, but needs to reach out to the payment processors for the bulk. It is simple enough to create this visualization for all the gateways:

Yapstone Sale Disposition

Source Code:

ggplot(ht_mid[ht_mid$requestType=="SALE",], aes(y=gw, x=requestType))
	+theme_bw() + geom_violin(trim=T, aes(fill=success)) 
	+ ggtitle("Gateway Sale Operations by Disposition") 
	+ labs(y="seconds", x="Gateway") 
	+theme(axis.text.x=element_text(angle=-45)) 
	+ scale_fill_brewer(type="qual", palette="Dark2") 
	+ ylim(0,5)+facet_grid(gatewaySelectorCode~.)

All the gateways are able to fail-fast in certain instances.

These visualizations and data transformations are very easy to perform using the R environment. The hardest part was getting the data into a format that R can work with.