Data lakes – what they are and why they matter
What are data lakes? Data lakes are, in essence, a storage repository for all your business data, held until it’s needed. The best way to understand data lakes is by comparing them with a more traditional data warehousing approach to storing and managing data.
While data warehouses have their uses, data lakes are the more flexible option for many businesses. Data warehouses have a strict structure requirement for all held data. By contrast, data lakes can handle a wide range of unstructured data types: everything from word processing documents to voice recordings, images, video, pages, social media posts, transaction logs and so on.
|Data Lake||Data Warehouse|
|More flexible structure||Highly Structured|
|Easily changeable/accessible data||More difficult/costly to alter data|
|Keeps original data format||Transformed for specific applications ahead of its usage|
Why is this important for agribusinesses? More than 90% of all business data is in those unstructured formats. Data held in a lake isn’t selected or transformed to suit a particular purpose. Instead, the idea is to store everything unchanged and in its original format and allow applications to curate it based on metadata at the point of consumption.
Why do data lakes matter?
Data lakes came into existence following the lower costs of high storage capability. Previously, as well as being expensive, data took a long time to source and so it made sense to only extract data identified as ‘business critical’ and keep it in a central data warehouse – even if that meant missing out on potential insights which could be lost in the data discarded during the extraction process.
In recent years, the cost of storage has plummeted, capacity has sky- rocketed and data volume has gone off the scale. Using cloud technology, storage can be sourced on-demand, scaled up and down to suit business needs and exploited with minimal management overhead. All of which makes it possible for businesses of all sizes to, not only store everything, but keep data for longer thereby improving the chances of finding actionable business insights which might otherwise get overlooked.
Data lake technology makes it easier to manage and exploit data regardless of its format, quality or location and do so at scale which, given the rapidly escalating volumes businesses have to cope with, has to be a good thing. Take precision farming for example, where terabytes of data need to be collected from smart field and machinery sensors and fed into Big Data analytics, AI and process automation apps to decide how best to maximize yields. To do all that and in real-time using a data warehouse would make precision farming both difficult and a costly non-starter, if you consider the different sources of this data, different machinery manufacturers, farm management systems and regulations. With a data lake, however, it’s suddenly a lot easier and more viable proposition.
In essence, data lakes enable the development of more scalable and cost-effective data-driven applications. But that isn’t the only benefit, here are a few more:
- Data lakes don’t have to be centralized, they can be distributed and located closer to where data is collected. This, in turn, allows applications to be moved to the edge of the network for faster processing and lower latency and that can be hugely important, especially when it comes to automated precision farming.
- Data lakes are designed to be more open and accessible than a traditional data warehouse, widening the scope of analytics teams as well as encouraging development of line of business applications and the delivery of self-service access to valuable business insights and associated data-driven tools.
- With data lakes, applications and analytics can be developed to fit the data, rather than the other way around. Applications and their users can also be enabled to discover and adapt to new data sources and mix, match and blend data sources much more easily, both internally and externally – following a new business acquisition or spin-off for example.
How can Data Lakes benefit your agribusiness? Download the full report at https://secureforms.proagrica.com/datalakes