Introduction:

In today's rapidly evolving technological landscape, problem-solving skills are the cornerstone of success in the field of engineering. From debugging complex code to enhancing system performance and robustness, the ability to efficiently identify, analyze, and resolve issues is a hallmark of a proficient tech engineer. One company that exemplifies this ethos is Rubrik, known for its innovative cyber resilience solutions. In this blog post, we will explore several key mindsets that can help tech engineers excel in problem-solving, drawing inspiration from Rubrik's commitment to excellence.

Be Curious: Unleash the Power of Inquiry

The innate drive to understand the 'why' and 'how' behind a problem often leads engineers to explore unconventional paths and unearth hidden insights. This mindset not only deepens the overall understanding of the problem and its related components but also fosters a culture of innovation, facilitates more effective troubleshooting, and promotes continuous learning.

Furthermore, a curious mindset encourages collaboration and knowledge-sharing, as it prompts engineers to question existing practices and seek diverse perspectives. This collective curiosity can lead to groundbreaking discoveries, driving the team and organization towards continual improvement and excellence.

An example of curiosity in action is when an engineer is asked, "How do I do X?" Instead of offering a direct solution, they dig deeper by asking, "Why do you need to do X?" Understanding the motivation behind the request often reveals that the question addresses a symptom, not the root cause. Rather than applying a quick fix, this approach can lead to a more effective solution—such as redesigning a workflow or automating a task—ultimately solving the true problem and improving long-term outcomes.

Another example is when working with a code snippet, by seeking a deeper understanding of the involved workflows and thoroughly exploring dependencies, engineers can identify and eliminate redundancies or bottlenecks. This approach leads to streamlined processes and significantly enhances overall efficiency.

Document: Capturing Insights for Future Triumphs

Effective documentation is the cornerstone of efficient problem-solving in engineering. By thoroughly documenting your troubleshooting and testing processes, you create a valuable knowledge repository that enables quicker resolution of similar issues in the future. Furthermore, well-maintained documentation fosters collaboration within teams, ensuring that insights are shared and collective problem-solving is enhanced across the organization.

At Rubrik, every code change is accompanied by meticulous validation details, outlining the exact steps taken to ensure the change works as intended. This level of precision not only helps validate the code but also serves as a blueprint for anyone making future changes in the same code path. By having these detailed validation steps, engineers can confidently iterate on existing work, knowing they can rely on proven testing methods.

In addition, Rubrik employs Closed Loop Analysis (CLA) to capture the root cause of any issues in depth. This analysis goes beyond simply identifying what went wrong; it provides actionable steps to prevent the issue from occurring again. CLA is instrumental in improving the overall reliability of our systems, as it closes the feedback loop, ensuring that lessons learned from one incident lead to lasting improvements across the board.

Together, these practices build a culture of precision and accountability, ensuring that every problem not only gets solved but also leaves a trail of knowledge that strengthens our codebase and team for the future.

Reproducing the Problem is 50% Solution: Pinpointing the Culprit

As engineers, our ultimate goal is to find the root cause of a problem and devise a robust solution. In line with Rubrik's commitment to precision, the adage "reproducing the problem is 50% of the solution" rings true. By recreating the issue in a controlled environment, you gain valuable insights into its mechanics, making it easier to devise targeted fixes.

In one of our testing pipelines, we noticed a gradual slowdown despite having sufficient parallelization in place. After reproducing the issue in our development environment, it became clear that the root cause was tied to certain tests failing to clean up the databases used during execution. This oversight caused the databases to accumulate over time, placing an increasing load on the database server, which, in turn, slowed down subsequent test runs.

Once we identified this as a resource leak issue, we implemented a framework-level change to ensure proper database cleanup after every test. Additionally, a detection mechanism was introduced to automatically fail any test that did not clean up resources, preventing the issue from recurring. This approach not only restored pipeline efficiency but also enhanced overall system stability by catching potential leaks early in the testing process. This experience reinforced the value of reproducing problems in a controlled environment, allowing us to devise a targeted and scalable solution.

Aim to Eliminate the Problem: Going Beyond Quick Fixes

Focus on eliminating issues at their core rather than merely patching them. Instead of settling for quick fixes that offer only temporary relief, it is essential to approach problems with the intention of eradicating them completely.

This requires a mindset centered on long-term planning and strategic thinking, rather than getting lost in tactical responses. For instance, when a vulnerability is discovered, the goal should not be just to apply a patch but to conduct a thorough investigation to identify the underlying cause. Understanding why the issue occurred enables teams to implement lasting changes that prevent similar vulnerabilities from arising in the future.

Eliminate Human Intervention: Automation for Excellence

Automation lies at the heart of Rubrik's innovation, and it can significantly elevate problem-solving in engineering. Strive to minimize human intervention by implementing automated processes wherever possible. Whether it's deploying mitigation steps, scaling resources, or running routine maintenance, automation reduces the risk of human error and frees up valuable time for engineers to focus on more strategic tasks.

In the cloud backend architecture, whenever an issue arises within the database server that triggers an alert in our monitoring system, an automated debug information collector springs into action. This tool efficiently gathers critical diagnostic data, such as error logs, system metrics, and configuration settings, providing developers with essential insights needed for effective root cause analysis.

By automating this data collection process, we significantly reduce the time and effort required to diagnose problems, allowing engineers to focus on identifying and addressing the underlying issues rather than spending valuable time gathering information. This proactive approach not only streamlines troubleshooting but also enhances our ability to implement robust, long-term solutions that prevent similar issues from occurring in the future. Ultimately, this practice reinforces our commitment to maintaining a reliable and resilient cloud infrastructure.

Shift Left: Early Detection for Excellence in Development

Incorporating a "shift left" mentality can have a transformative impact on problem-solving. This approach involves catching and addressing potential issues early in the development lifecycle. By identifying and mitigating risks before they escalate, engineers can save time, resources, and headaches down the line.

As an example, at Rubrik, new database queries are reviewed by an AI engine during development, specifically when code changes are submitted for peer review. This advanced system detects inefficient queries and offers automated suggestions for enhancements, ensuring optimal performance from the very beginning.

Explore Different Perspectives: Holistic Problem Understanding

Exploring different perspectives is crucial for comprehending the full scope of a problem. Consider how the issue impacts various stakeholders, systems, and processes. This multifaceted understanding not only facilitates more effective solutions but also fosters a collaborative work environment.

Consider a scenario where a framework developed by one team is utilized by various other teams across the organization. When an issue arises at the framework level, adopting a user-centric perspective can significantly enhance the resolution process. For instance, by understanding the challenges users face while debugging, the development team can identify opportunities to improve monitoring and create better tools to support users more effectively. This empathetic approach not only leads to more robust solutions but also fosters a collaborative environment where the needs of all stakeholders are prioritized.

Leave It Better Than You Found It: Continuous Improvement

Rubrik's commitment to leaving a positive impact is mirrored in the engineering world through the principle of continuous improvement. Treat every problem-solving endeavor as an opportunity to enhance systems, processes, and practices. Strive to not only resolve the immediate issue but to leave the overall environment in a better state than you found it.

Refactoring to common libraries is an excellent example of this mindset. If a developer modifies a private method that is frequently used across different modules, moving that method to a shared library can significantly improve code maintainability. By centralizing the method in a common library, other developers can leverage this functionality without duplicating code, leading to cleaner and more efficient codebases. 

Another example is creating tools for common operations. For operations that are performed regularly, creating tools or scripts to automate these tasks can lead to substantial efficiency gains. This saves time and reduces the risk of human error.

Another important aspect is addressing issues that may cross team boundaries. Even if the problem lies in code owned by another team, taking the initiative to collaborate and contribute to the solution fosters a culture of shared ownership and holistic improvement. This approach ensures that the entire system benefits from enhancements, rather than just isolated components.

Conclusion:

In the dynamic realm of engineering, mastering problem-solving is a skill that sets remarkable engineers apart. By embracing and cultivating mindsets such as curiosity, thorough documentation, proactive elimination of problems, automation, and a commitment to constant improvement, engineers can follow in the footsteps of trailblazing companies like Rubrik. Just as Rubrik revolutionizes cyber resilience, engineers armed with these mindsets can revolutionize the way problems are approached, solved, and ultimately prevented, shaping a more efficient and innovative future for the tech industry.