Datadobi launches DQL to scan, interrogate petabyte-scale data lakes

According to the latest research, there will be about 175 zettabytes (ZB) of data worldwide by 2025 compared to 64.2ZB in 2020. Not surprisingly, as a result, 95% of businesses cite the need to manage unstructured data as a problem for their business.

Both of Datadobi’s products, DobiMigrate and DobiProtect, have been designed to scan large file systems containing billions of files to help organizations harness the power of unstructured data. Each of these scans produces huge lists of file paths and their metadata in a proprietary format to allow performant and storage-efficient handling, analysis, and comparison of the files to enhance unstructured data management.

Historically, these scan files were only used for doing data migration or protection for customers…until now.

What is Datadobi Query Language? (DQL)

Over the last several months as the COVID-19 pandemic drove digital transformation and increased the amount of unstructured data within networks, enterprises began asking us for access to the scans to analyze and reorganize unstructured data lakes.

For a customer to dissect the composition of the data, however, it requires some serious data reduction and aggregation in that set of billions of files. This created the need for a tool to query, aggregate, and reduce the amount of information about the data lake so it is consumable by the IT administrator.

Datadobi has officially developed Datadobi Query Language (DQL) to enhance the file system assessment service to optimize and organize data lakes internally. DQL within the file system assessment service offers complete flexibility around how the software can interrogate the customer data set and enables tremendous data reduction to make it manageable for the customer to handle its multi-petabyte data lake.

DQL is a query framework that can look for many aspects in a data lake such as:

Identifying cold data sets — data that is infrequently accessed
Identifying old data sets —data that was created or modified some time ago
Identifying data sets owned by a specific user or group, e.g. by users who no longer work at the company
Identifying shares, exports, or directories trees that are homogeneous (cold, old, owner, file types) and can be handled as one data set e.g. to take specific lifecycle actions upon

How DQL Fits into Datadobi’s Existing Products and Services

As mentioned above, DQL is used to customize Datadobi’s file system assessment service.

For background, Datadobi created the file system assessment offering last year as a service for customers that can be used before they plan a data migration or reorganization.

DQL is now an essential part of the file system assessment service because it enables assessments to be customizable. Using the pre-migration service enhanced with DQL, customers can learn to understand what’s on their storage system, and based on the partitioning of their system in data sets, make a plan of what to migrate where.

On a similar note, DQL is an essential part of Datadobi’s vendor-neutral data mobility engine. DQL sits within the engine technology to scan file systems, move data, analyze the file metadata of large data lakes, and simplify how IT administrators can look at their data and identify logical subsets of data.

The volume of data is only expected to grow over the next few years. IT administrators need a data management solution that can transform data into digestible material to allow curated decisions on storage options for migration and protection to be made.