Studying Unstructured data to generate ML Models

2 min read

This study uncovered hidden insights that informed the development of GCPs unstructured data storage capabilities. It exemplified how well crafted research can transform ambiguity into clarity.

"Unstructured data" is data in the formats that is unstructured, few of the examples being images from phones, pdfs, videos, music, handwritten notes etc.

Why:
The Big Query Machine Learning (BQML) team aimed to understand how insights could be derived from unstructured data to inform product decisions. The need for unstructured data processing within Google cloud storage was required by BigQuery clients, and so the team wanted to first understand the use cases & potential implementation before introducing the capability.

How:
The research approach involved conducting 60-minute user interviews with industry experts from various fields utilizing unstructured forms of data. These experts belonged to the following fields -- medical, vaccine trials, government bodies, social media, music & surveillance . Participants shared insights on competitor tools they used, challenges faced, and best practices for handling unstructured data.

Findings:
The study highlighted key issues such as difficulties with processing unstructured data, selecting appropriate storage solutions, and metadata management. Confusion around interchangeable terms and brand names used across regions and industries presented significant roadblocks.

Conclusion:
The research provided a detailed understanding of unstructured data use cases, helping the BQML team identify metadata management and tagging as crucial areas for further exploration. These insights offer valuable guidance for improving unstructured data processing in BigQuery.

* Diagram generated in Lucid charts to understand the various terms/products utilized in the UX research study by external Data experts. The ETL process was created as a diagram & the various tools plugged in per their function in the process.*