Production Outage on January 27, 2026

On January 27, 2026, the production environment was disrupted or degraded from approximately 8:30am to 10:30am ET. The disruption resulted in three specific, known issues.

Unstyled web pages: The web application was not displayed correctly, typically looking like a list of unformatted text rather than the web application users expect to experience.
Single Sign-On (SSO) Authorization failures: Organizations that use Okta or Duo for SSO received an error response with a 403 status.
"Request Approval" and certain other buttons did not work.

What happened?
Along with a routine deployment, the infrastructure supporting the ScopeStack application was being updated. Specifically, ScopeStack added the AWS Cloud Front distribution layer in front of the load balancers that send requests to the ScopeStack application servers. In essence, this was adding an additional layer of logic within AWS that instructs AWS where to send a request to the ScopeStack application. The additional routing layer provides two primary benefits:

Improved security: Requests routed in this way benefit from AWS security measures against things like Denial of Service attacks. While the application infrastructure already includes Web Application Firewall (WAF) tools to assist with this, the application can always benefit from the security experts at one of the leading infrastructure companies in the world.
Improved Performance: A specific goal for this routing layer is to permit ScopeStack to provide regionally based service to users outside the US. For example, our user community in Australia would be able to access ScopeStack from servers located in Sydney with a reduction in response times of at least 80%.

These updates were intended to be introduced over the weekend of January 24-25, 2026. They were delayed, however, by inclement weather that threatened power outages within our primary work area and we did not feel safe introducing changes while we were uncertain if we would be able to support them. Having tested the changes in our staging environment over the last week, we believed that the changes could be introduced safely during the week.

What Went Wrong?

Unstyled Web Pages

This was produced by a sequencing issue in the deployment cycle. The web site pages were ready for use before the digital assets like javascript files and stylesheets were available from the content delivery network (CDN). The CDN then cached the "file not found" error, with the result that users got unstyled pages even after the assets were synchronized. This was not identified in the staging environment because the significantly lower volume of use did not lead to any website requests during the deployments.
This was resolved approximately 10 minutes after it happened by forcing the CDN to clear the cached responses. The deployment steps are being resequenced to ensure that this cannot happen in the future.

SSO Authorization Failures:

The Cloud Front distribution layer has some size limitations on the amount of data that can be sent to a server. The restrictions limit both the amount of data that can be sent to a server from a form on a web page (e.g., the list of services in the project editor) as well as the "headers" that accompany the requests (things like cookies and security protocols).
With respect to SSO, this limit can be broken by "verbose systems" -- identity providers that include as much user information ("claims") as possible. Okta, in particular, frequently includes many such "claims" with the result that its requests are immediately rejected as Unauthorized (403) before they are allowed to reach the server. This was not identified in the staging environment because the two identity provider systems to which ScopeStack has access (Google, Microsoft Active Directory) restrict themselves to minimal claims and are rarely rejected.

"Request Approval" and certain other buttons did not work

These requests were rejected by the API servers and there is no logging information in the ScopeStack API server logs showing their rejection which indicates that they were rejected by the new routing logic. We are still trying to trace the route cause of the problem.

[UPDATE 2:55pm]
Further AI-assisted research has pointed out that there are specific endpoints in the API that are triggering new URL validation heuristic rules and those violations led to the Authorization failures on these specific routes. Fortunately, there are very few of those endpoints in the application. Unfortunately, they are commonly used endpoints used to request or cancel approval, mark a project won or lost, and so on. The solution for these paths is straightforward but will ultimately require us to deprecate these paths.

Going Forward
The primary lesson from this incident for the ScopeStack team is very simple: no infrastructure changes are permitted outside the window from 8:00am ET Saturdays through 4:00pm ET Sundays. This deployment window occurs during the weekend hours of all customers and permits the ScopeStack team to validate any infrastructure changes with minimal impact to the user community.

Production Outage on January 27, 2026

What caused service disruption and degradation?