Amazon is holding a crucial retail technology meeting today to address recent service outages, some linked to errors from AI-assisted coding. Dave Treadwell, the senior vice president overseeing the technical foundations of Amazon’s website, informed employees that the “This Week in Stores Tech” (TWiST) meeting will deeply examine “some of the issues that got us here.” The meeting began at 12:30 p.m. ET.
Treadwell openly acknowledged the recent problems. He wrote in a memo to employees, “Folks – as you likely know, the availability of the site and related infrastructure has not been good recently.” He shifted the meeting’s focus due to the rise of “Sev 1s,” which are severe incidents causing outages or performance issues in critical systems.
Amazon experienced four such high-severity incidents within a week, Treadwell revealed. He emphasized the necessity of this deep dive to “regain our strong availability posture.” An Amazon spokesperson confirmed that TWiST is a regular meeting where retail tech leaders review store operations and performance, including website and app availability.
This meeting follows a significant malfunction last week where Amazon’s online store experienced problems for about six hours on Thursday. Users couldn’t check out, access account information, or see product prices. Amazon attributed these issues to a “software code deployment.”
Amazon, along with its major cloud rivals, is heavily investing in infrastructure to handle the growing demand for artificial intelligence services, which require substantial computing power. Last month, Amazon announced plans for $200 billion in capital expenditures this year, surpassing all its tech competitors.
Despite these massive AI investments, Amazon continues to cut jobs. The company laid off approximately 16,000 corporate workers in January, following an earlier round of roughly 14,000 job cuts in October. In total, Amazon eliminated over 27,000 employee roles between 2022 and 2023.
In a separate memo, Treadwell specifically pointed to “genAI-assisted changes” as a factor contributing to recent incidents, dating back to the third quarter of 2025. He noted that “GenAI tools supplementing or accelerating production change instructions” led to “unsafe practices,” among other issues. Treadwell admitted that the “best practices and safeguards” for generative AI use are still developing.
Amazon plans to strengthen various safeguards to prevent future problems. This includes requiring more senior engineers to review “GenAI-assisted” production changes made by less experienced staff. Treadwell stated, “We are implementing temporary safety practices which will introduce controlled friction to changes in the most important parts of the Retail experience.” He added that they will also invest in “more durable solutions including both deterministic and agentic safeguards.”
Amazon Web Services (AWS) has also faced several outages recently. However, Amazon clarified that AWS was not involved in the specific incidents Treadwell referenced. For example, AWS experienced an outage in December that affected a cost management feature for an extended period, reportedly after engineers allowed its Kiro AI coding tool to make changes. At the time, Amazon stated the outage was due to “user error,” not AI.










