Welcome!

Wearables Authors: Pat Romanski, Liz McMillan, Yeshim Deniz, Jnan Dash, Roger Strukhoff

Related Topics: Wearables, @CloudExpo, @DXWorldExpo, @DevOpsSummit

Wearables: Blog Post

Five Ways to Mitigate the Risks of TiP By @Neotys | @DevOpsSummit [#DevOps]

Best Practices

Don’t Freak Out! Five Ways to Mitigate the Risks of TiP

If there is one thing we know for sure, it is that it's extremely difficult to accurately reproduce a production environment for QA purposes. That's why there is such a natural pull in the direction of Testing in Production (TiP), in which testing is done within the live environment where real users are actively engaged in the product.

There are many benefits to using the production system as a means to conduct QA, but it can be stressful for an organization that isn't well-versed in the practice. Oftentimes, the risks of TiP prevent software testers from even trying it out. And in the worst-case scenario, a poorly managed TiP initiative can have dangerous consequences, impacting real users and revenue.

However, TiP is not something to be afraid of. The more you understand what may go wrong and how to take the proper precautions to prevent that, the more successful and efficient your overall app development process will be.

Today's post is about the major risks of Testing in Production, how to mitigate them and ensure you're getting your best end-product to users.

The Benefits of TiP

When you rely solely on a dedicated QA environment to test your app before launching it in production, you open yourself up to the risk that minor differences in how the QA and Production environments are implemented actually have a big impact on app quality and performance. Testing in Production practices can lead to great new insights - potentially avoiding catastrophic system failures.

The major benefit of TiP is the ability to test aspects of production against real users, providing a controlled way of learning how the live environment - not to mention the people operating it -behaves under specific conditions of usage, failure, and stress. Take the example of the Netflix's Chaos Monkey, a subroutine that operates in the production environment introducing random errors like VM crashes and network interruptions. These manufactured bugs force developers to address significant error conditions and code around them. It also keeps an organization familiar with otherwise rare disaster situations that require system recovery and other operational solutions.

Other methods of TiP are used in different situations to allow you to see how small quantities of users react to a change, or how users react to two totally different products. Overall, the ability to test user and system behavior in real time is what is so attractive about Testing in Production.

What's Holding You Back?

It's natural to be cautious, though - we understand. If you are new to TiP, you may be concerned about some of its downsides. So here are five ways to temper various forms of risk that TiP introduces into your development process.

User Impact

When it comes to SaaS, users are your lifeblood. Without them there is no usage, no revenue, and no business. Obviously their experience matters above all else, so any testing we do in production simply cannot break the production environment.

In our last post about TiP techniques, we summarized a few methods that can be used to control the impact of testing on users: canary testing (introducing small amounts of code change and see if it works) and controlled test flights (seeing how users interact with intended changes in UI) are two key examples. Synthetic users also play a huge part in TiP, as they capture metrics that show what real users would experience when executing specific transactions in your product, without requiring real users to go through those user paths.

Another form of mitigating the effects of testing on real users is an old standby - the scheduled maintenance window. It's common to conduct load tests on a production system during low usage periods. However, even in these situations you are still impacting the users that are on the system at that time. Take this example we encountered recently: An educational software company was conducting a 10,000 virtual user load test. They scheduled it for off-hours when only 500 real users were on the system. However - those 500 users were still exposed to the product at its worst. Here's an example where notifying users that the system would be down for a short time could've protected everyone from a poor experience.

Security

Another common concern of testing in production has to do with security. Imagine introducing a vulnerability into a system because the code you deploy wasn't properly vetted. Or running a separate instance of your application for testing purposes on production equipment, only to discover that proper security steps weren't followed because the operations team wasn't completely aware of this dedicated space.

The best way to mitigate this risk is to begin the TiP process with a cross-functional mindset. You need input from your data security and operations team to make sure you are running your tests in a safe way. As part of a mature QA process, TiP can't be relegated to solely the domain of the QA group - instead, it must be implemented with the entire team in mind, so that it benefits the entire team. Over time, security can easily become a normal part of how the entire team approaches and implements the TiP process.

Accountability

One of the reasons that modern operations teams comprising of many people can effectively manage a complex production environment is a strong system for accountability. Changes are controlled and documented, and records are kept to make sure that if any problems arise, the root cause can be identified and fixed to prevent that mistake from happening again. However, this is not necessarily as common or as rigid a practice in QA as it is in Operations.

When it comes down to TiP, you need to merge common QA practices with common Operations practices. This means putting in place systems for accountability: keeping detailed notes, names, dates and case tracking. Work with your Operations teams to find an easy, non-intrusive way of introducing appropriate change control processes into the QA procedure. As an overall rule of thumb, treat the TiP environment as the production environment and you won't let your guard down in terms of accountability.

Ownership

The issue of ownership is a hot topic when Testing in Production because both the QA and the Operations group may claim to own the environment. This can be even further complicated if an issue comes during a TiP run, and someone has to relay the issue back to the development team. Now you have code that needs to be created, deployed in production, for the purposes of testing. It can be an ownership mess.

To address these concerns, build up good practices for communication and coordination across the whole team. Address ownership is typically less of an issue with the TiP process, and more commonly an organizational issue in the end. When the organizational roles and procedures are clear, you can begin to bridge the gap between QA and production teams for a trusting and productive working relationship.

Cross-Contamination

Lastly, testing in production can lead to cross-contamination problems. The nature of shared web services, is that one may impact others, even if the applications are virtually separated, due to the infrastructure components they have in common. Put simply, conducting testing on application 1 could cause unexpected problems on application 2 for which there is no obvious root cause.

This makes it important to monitor changes and be aware of the entire back-end. It is easiest to mitigate this problem by isolating each app during testing and alerting everyone involved when testing is occurring that may impact other applications. This also brings us back to how important it is to work alongside an operational team and improve site maintenance procedures in order to recover from a problem with cross-contamination.

Test Safely

There are many clear benefits to Testing in Production, and if you manage the process properly you can counteract the major downsides of TiP without too much effort. It's clear that security, accountability, security, ownership and cross-contamination can pose serious risks to the process, but using sound organizational, tracking and procedure during the ever-vulnerable test period should do the trick. Happy testing!

More Stories By Tim Hinds

Tim Hinds is the Product Marketing Manager for NeoLoad at Neotys. He has a background in Agile software development, Scrum, Kanban, Continuous Integration, Continuous Delivery, and Continuous Testing practices.

Previously, Tim was Product Marketing Manager at AccuRev, a company acquired by Micro Focus, where he worked with software configuration management, issue tracking, Agile project management, continuous integration, workflow automation, and distributed version control systems.

IoT & Smart Cities Stories
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settl...
Contextual Analytics of various threat data provides a deeper understanding of a given threat and enables identification of unknown threat vectors. In his session at @ThingsExpo, David Dufour, Head of Security Architecture, IoT, Webroot, Inc., discussed how through the use of Big Data analytics and deep data correlation across different threat types, it is possible to gain a better understanding of where, how and to what level of danger a malicious actor poses to an organization, and to determin...
@CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in ...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
LogRocket helps product teams develop better experiences for users by recording videos of user sessions with logs and network data. It identifies UX problems and reveals the root cause of every bug. LogRocket presents impactful errors on a website, and how to reproduce it. With LogRocket, users can replay problems.
Data Theorem is a leading provider of modern application security. Its core mission is to analyze and secure any modern application anytime, anywhere. The Data Theorem Analyzer Engine continuously scans APIs and mobile applications in search of security flaws and data privacy gaps. Data Theorem products help organizations build safer applications that maximize data security and brand protection. The company has detected more than 300 million application eavesdropping incidents and currently secu...
Rafay enables developers to automate the distribution, operations, cross-region scaling and lifecycle management of containerized microservices across public and private clouds, and service provider networks. Rafay's platform is built around foundational elements that together deliver an optimal abstraction layer across disparate infrastructure, making it easy for developers to scale and operate applications across any number of locations or regions. Consumed as a service, Rafay's platform elimi...
Two weeks ago (November 3-5), I attended the Cloud Expo Silicon Valley as a speaker, where I presented on the security and privacy due diligence requirements for cloud solutions. Cloud security is a topical issue for every CIO, CISO, and technology buyer. Decision-makers are always looking for insights on how to mitigate the security risks of implementing and using cloud solutions. Based on the presentation topics covered at the conference, as well as the general discussions heard between sessio...
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Day 2 Keynote at 17th Cloud Expo, Sandy Ca...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists examined how DevOps helps to meet the de...