Luke Marsden, CEO and founder of Dotscience––the pioneer in DevOps for machine learning (ML)––is an industry veteran and an expert in DevOps and container storage. Marsden founded Dotscience in 2017 to signal the rise of a new paradigm where ML engineering should be just as easy, fast and safe as modern software engineering through the use of DevOps techniques and practices. By giving teams the unique ability to collaboratively track runs—a record of the data, code and parameters used when training an AI model—-Dotscience empowers ML and data science teams in industries including fintech, autonomous vehicles, healthcare and consultancies to achieve reproducibility, accountability, collaboration and continuous delivery across the AI model lifecycle.
A thought leader in his field, Luke has built and promoted software products and shaped open source development. Luke graduated from Oxford University with a BA in Computer Science in 2008.
M.R. Rangaswami: What market need drove you to create Dotscience and what are some of the biggest challenges you’re looking to solve?
Luke Marsden: We started Dotscience in order to address the current crisis in data science and AI in which companies are investing large amounts of money but rarely seeing adequate results. Building effective AI initiatives requires an end-to-end approach that is often overlooked by many fledgling teams. Getting useful data to data science and ML teams is often cited as a major challenge, and while this is true, it is only half the picture. Equally as important is that these teams are able to deliver new models to production quickly and efficiently, while remaining in control of quality and respecting regulatory concerns.
The typical response to the data challenge is to create a data engineering team and while this is an important step, leaders must also consider how data engineering teams interact with data science and ML teams in an efficient manner. It is also critical that leadership teams focus not just on hiring talent for the actual creation of ML models, but also the creation and maintenance of a delivery pipeline infrastructure that can quickly deliver new models to production, and monitor them once deployed.
And, because ML models can only respond to the training they’ve been given, the ultimate goal is to equip teams across the whole delivery pipeline, from data engineering, to model deployment and monitoring, with tooling that allows them to detect and respond to changes as quickly as possible.
For those that are familiar with DevOps practices of Continuous Integration and Continuous Delivery, they will find our solution to these issues rather intuitive. This is why we launched Dotscience in July 2019, to solve the pain points experienced when operationalizing AI. We also introduced our vision that ML engineering should be just as easy, fast and safe as modern software engineering when using DevOps techniques––our DevOps for ML manifesto. AI and data science today is like software engineering was in the 1990s. We believe ML and data sciences teams should apply the same DevOps techniques that transformed the way software engineers deliver software by making it possible to collaborate, test and deliver software continuously.
AI has the potential to alter the global economy, but ultimately AI is an immature discipline. The lessons learned from DevOps are sorely needed in ML. The means by which DevOps addressed the end-to-end challenges of software development show great promise in solving the ML model development difficulties of today.
M.R.: You’ve said that AI and data science today is like software engineering was in the 1990s. Can you elaborate?
Luke: In the 1990s, software engineering work was split across development, testing and operations silos. Developers would work on a feature until it was done, often finding out too late that somebody else had been working on another part of the code that clashed with theirs. It was a brutal cycle. Without version control and continuous integration, software engineering was difficult. When DevOps was introduced in the late 2000s it transformed software development. DevOps emphasized the importance of a collaborative and experimental culture and it reshaped the way software engineers are able to work without the hindrance of silos by focussing on the end-to-end optimization of delivering value to users.
Version control and the workflows that it enables now allow software teams to iterate quickly because they can easily reproduce application code and collaborate with each other. We saw the potential benefits that this could bring to ML and created tooling that does for the entire model development lifecycle what DevOps did for coding. This DevOps approach to ML provides a fundamentally better and more collaborative work environment for data engineers, data scientists and AI teams.
The Dotscience platform allows teams to achieve all four of the DevOps for ML manifesto’s pillars: Reproducible, Accountable, Collaborative and Continuous AI. The key to the entire process is continuous. In order to continually deliver value to the business, models must be trained, re-trained and statistically monitored to keep up with constant changes in a business environment. We help organizations achieve a continuous, useable stream of their ML data and models and optimize the end-to-end flow of working models, as opposed to just the speed at which any model can be deployed. In addition, Dotscience is open and interoperable, allowing users to connect to external data sources, bring their own compute to our SaaS, or deploy easily in their own cloud and easily connect to a separate CI system, container registry, and Kubernetes clusters.
M.R.: Are companies doing enough to stabilize and scale their AI initiatives? If not, what more can they do that isn’t already being done?
Luke: Despite significant investment in AI, many companies are still struggling to stabilize and scale their AI initiatives. We recently surveyed ML professionals for the State of Development and Operations of AI Applications. The top three challenges respondents indicated having with their AI workloads were duplicating work, rewriting a model after a team member leaves and difficulty justifying the value of AI initiatives to the wider business. We looked at how businesses are deploying AI today and investigated the need for accountability and collaboration when building, deploying and iterating on AI.
While over two-thirds of businesses reported they are spending between $500,000 and $10 million on their AI efforts, the majority of respondents say they continue to experience a variety of operational challenges. AI deployments today are slow and inefficient. Moreover, the manual tools and processes predominantly in use to operationalize ML and AI do not support the scaling and governance demanded by many AI initiatives. Manual processes can be cumbersome, discourage collaboration and create knowledge silos within teams. When model provenance, a complete record of all the steps taken to create an AI model, is tracked manually, AI and ML teams often turn to spreadsheets without a more effective means of recording their work. This is inflexible, risky, slow and complicated.
To simplify, scale, accelerate and control every stage of the AI model lifecycle, the same DevOps-like principles of collaboration, fast feedback and continuous delivery should be applied to AI today. Just as with normal software engineering teams, it is imperative that leaders consider these challenges up front. We see many teams who proudly present their first trained model to the business only for the business to then assume that the teams can then churn out production-ready trained models on demand. This is rarely the case when leaders haven’t made sure to build a continuous end-to-end delivery infrastructure as well. Displaying promising early results that cannot then be scaled strains the trust between the wider business and this often very costly AI projects. Successful AI initiatives require cooperation across the entire business and because of this building trust from the outset is paramount.
M.R. Rangaswami is the Co-Founder of Sand Hill Group.