Straddling the lines of both academia and business, Dr. Hao Zhong is a passionate technology innovator and entrepreneur who has for years contributed numerous cutting edge technology innovations and driven product development in data storage, and in the computing industry as a whole. Not only a successful startup that spans the globe, ScaleFluxhas left a lasting impression on Rensselaer Polytechnic Institute (RPI), the oldest technology research institute in the nation. Hao earned his Ph.D from RTI and ScaleFlux co-founder and CTO Tong Zhang is currently a professor.
With ScaleFlux, Hao drives the development of key techniques and algorithms for Computational Storage products, exploring their optimal use in mainstream application domains such as databases. Founded in 2014, ScaleFlux is a well-funded startup with a team proven to deploy complex computing and solid-state storage solutions in volume.
M.R. Rangaswami: Can you tell us about what products or services ScaleFlux offers that are unique in the market?
Hao Zhong: ScaleFlux is the pioneer in deploying Computational Storage at scale. Computational Storage is the foundation for modern, data-driven infrastructure that enables responsive performance, affordable scaling, and agile platforms for compute and storage I/O intensive applications.
This is important because storage architecture has remained mostly unchanged over the decades, mainly because the duty/function of data storage hardware has remained unchanged (i.e., store data and serve I/O requests). By fundamentally expanding the duty/function of data storage hardware, computational storage will commence a new chapter of the data storage industry with many exciting new opportunities ahead.
The ever-increasing gap between data volume and CPU processing power is exactly the reason why computational storage has attracted so much attention over recent years. The slow-down of Moore’s Law forces the computing industry to transition from traditional CPU-only homogeneous computing towards domain-specific, heterogeneous computing. This inevitable paradigm transition brings an unprecedented opportunity to re-think and innovate the design of future data storage devices/systems (especially solid-state data storage). Hence, the future of storage lies in the trend of integrating computational capability into storage devices.
M.R.: One area we have been hearing a lot about lately is the importance of Edge Storage. How do you gauge an enterprise’s edge storage needs, and is its process significantly different from traditional data center storage?
Hao: We believe enterprise’s edge storage needs will continue to grow, driven by 5G, IoT, autonomous vehicles, etc. Due to the power and environmental constraints, enterprises’ edge infrastructure will mainly deploy solid-stage data storage devices. In comparison to data in the traditional data center, data at the Edge is much more likely to be ephemeral or raw data. That is, data that is only going to stay where it is temporarily. Edge data either needs to be passed en masse to the central data center or processed locally, with only the analytical summary being sent to the central data center and the original raw data then discarded.
Due to cost, power and space constraints, enterprise edge infrastructure typically has limited CPU processing capability, which however makes it increasingly challenging to meet the growing demands on edge-based data pre-processing/filtering. This makes it almost inevitable to transform the edge infrastructure from traditional CPU-only homogeneous computing to heterogeneous computing. By simultaneously meeting the storage and computing demands at the edge, the emerging computational storage drive could play a critical role in architecting future enterprise edge infrastructure
M.R.: How will AI and ML continue to affect modern storage?
Hao: AI/ML will be one of the most important (if not the most important) drivers for data storage from both the demand and innovation perspectives. In the presence of an ever-increasing amount of data being generated every day, AI/ML provide means to make effective use of the data. As a result, people have better incentive to at least temporarily store data, which directly leads to a growing demand for data storage capacity.
AI/ML training platforms mainly contain three components: training kernel computation, data preprocessing, and data storage. Most prior and current R&D activities focus on improving the efficiency of the first component (i.e., kernel computation), and as a result the efficiency of AI/ML training kernel computation has significantly improved over the years. This however makes the entire AI/ML training system increasingly bottlenecked by the performance/efficiency of other two components (i.e., data preparation and data storage). For example, as recently reported by Facebook at HotChips’21, data preprocessing/storage accounts for over 50% of the total AI/ML training energy consumption. This demands re-thinking the design and implementation of data preprocessing/storage in AI/ML training platforms, for which computational storage could be a very appealing solution.
M.R. Rangaswami is the Co-Founder of Sandhill.com