Data lakes – tips to avoid a data swamp


As discussed in other blogs (and the longer white paper), data lakes can be a significant source of untapped value laying dormant within your business. By utilizing a smart data solution, it becomes possible to extract great value – all without the need to sift through potentially vast quantities of data in your system.

However, don’t jump straight into the data lake. Navigating your data results and deriving maximum value requires a little more forethought. Here are a few tips for agribusinesses who are ready to make full use of their data lake – and avoid creating an inert and valueless data swamp:

Content curation

Content curation – gathering and organizing information relevant to a particular area – is key to avoiding data swamps.

When it comes to data lakes, curation starts with strict controls as to what can go into the lake, who owns it and who manages and then continues with the development of processes for sorting, describing and cataloguing data as it is ingested. This metadata is essential to being able to find relevant information and do so quickly when building applications and searching for insights. Automation is also hugely important here.

Seek the best tools for the job

Software to automate and speed up the process of describing and tagging information as it enters a data lake is becoming widely available. The leading solutions all provide such metadata creation/management tools. However, these can vary in capabilities so look specifically for technology to bring different metadata sources together, as well as accelerate the processes of preparation, integration, and analysis of metadata and support self-service across the entire data landscape. Companies should also invest in the acquisition, training and development of data curation and metadata management skills.

Set the right balance between speed and quality

Speed in locating and processing data is always important but different users can work with varying levels of quality. For example, while a robotic farming process might need data to be both complete and accurate, business analysts may be less demanding. For example, they can often work with incomplete datasets and use machine learning to cluster information and discard outliers to compensate for poor quality. Zoning data by quality and user is worth considering.

Be agile enough to take advantage of new insights

It used to be that you put a request for a new data analysis into IT and, possibly, got results a few weeks or even months later. Now, it’s possible to find new insights at the press of a button – but (and it’s a big but), companies must be agile enough to take advantage of those rapid insights as well as far-sighted enough to build the automated apps and processes needed to underpin that agility.


How can Data Lakes benefit your agribusiness? Download the full report at


Get more insights from Proagrica. It’s free.


Sign up to our newsletter to keep up to date with Proagrica news, insight and product releases


How can we help?


Your business is our business. Our team of experts can help deliver the right solutions to suit you.

Contact us