Jump to ratings and reviews
Rate this book

The Enterprise Data Catalog

Rate this book
Combing the web is simple, but how do you search for data at work? It's difficult and time-consuming, and can sometimes seem impossible. This book introduces a practical the data catalog. Data analysts, data scientists, and data engineers will learn how to create true data discovery in their organizations, making the catalog a key enabler for data-driven innovation and data governance. Author Ole Olesen-Bagneux explains the benefits of implementing a data catalog. You'll learn how to organize data for your catalog, search for what you need, and manage data within the catalog. Written from a data management perspective and from a library and information science perspective, this book helps

First published July 1, 2023

29 people are currently reading
82 people want to read

About the author

Ole Olesen-Bagneux

2 books8 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
22 (40%)
4 stars
20 (36%)
3 stars
10 (18%)
2 stars
1 (1%)
1 star
2 (3%)
Displaying 1 - 10 of 10 reviews
Profile Image for Pernille Helene Kjeldsen.
1 review
March 20, 2023
“It’s the unlocking of data that is becoming the do-or-die of companies in the 2020s”.

This is my favourite sentence from the book ‘The Enterprise Data Catalog’ and it’s one of the strong points made in the well structured and well written book by Ole Olesen-Bagneux.
Reading the book made me realise what I was missing from the Domain Driven Design when doing data governance and it gave great inspiration for how to organise and improve information retrieval in a data catalogue. It considers various types of users and stakeholders of a catalogue, and it brings in the aspect of using your data catalogue to discover data, which I find have been overlooked by some vendors on the market.

The book provides you excellent background knowledge if you are in the beginning of choosing a data catalogue provider, but the book can also be of inspiration for the advance user/provider that wants to improve the use and value of a data catalogue.

As a reader with a background in Information Science it is powerful to see our theories and methods translated into modern context of cataloguing and mapping data, and you should definitely give it a read if you work with metadata, cataloguing, knowledge systems construction, or if you’re just a fan of domain thinking and domain-studies in general.
14 reviews3 followers
June 18, 2023
I was in the middle of a project with DataHub and this book provided interesting guidance that is general enough to apply to different technologies. This book has allowed me to see the strengths and weaknesses of DataHub in a different manner. However, you won't find here any concrete technical advice. Most of the book is good if you are planning to evaluate data catalogs or if you plan to do metadata modeling with a tool that has already been chosen.

If you are a data engineer and you have enjoyed books like "Release It!" by Michael T. Nygard, then you will also like this book. However, my advice is to first have concrete experience with data governance tools. Following a quick start could be a great way of doing so: https://datahubproject.io/docs/quicks...

If you are interested in Governance in general of projects in enterprise cloud architectures, not just data governance, I recommend the following: https://www.goodreads.com/book/show/5...

If you want to read a similar book, though, I recommend: "Data Governance: The Definitive Guide"

https://www.goodreads.com/book/show/5...
Profile Image for Giulio Ciacchini.
393 reviews15 followers
July 14, 2023
A very good journey into Data Catalogs.
The approach is very theoretical and agnostic is terms of tools or technologies.
The concepts are well explained and the whole structure is easy to follow.
Also the core idea is not that hard to grasp even to someone new to the world of data: having a place to look for data assets.
I have much appreciated the fact that the book is short and not redundant: the topic is smaller that other data arguments, so it makes sense to have a short but dense book.

NOTES
A Data Catalog is an organized inventory of the data of a company: it provides an overview at a metadata level only, and thus no actual data values are exposed, everyone can see it.
It is a DB with metadata that has been pushed or pulled from the data sources.

It is organized in domains that contain assets, metadata representations of data in source system.
It allows Data Discovery and Data Governance.

Organize Data
Domain-Driven Design (DDD) is an approach to software development that focuses on creating software systems that closely align with the business domain they are designed to serve. It provides a set of principles and practices for managing complexity and organizing code in a way that reflects the domain's concepts, logic, and relationships.
DDD emphasizes collaboration between domain experts, software developers, and other stakeholders to ensure that the software accurately models the problem domain and captures the essential business knowledge. The goal is to create a shared understanding of the domain and use that understanding to drive the design and implementation of the software.
The domain represents the core concepts, rules, and processes that define the problem space and drive the behavior of the software. It includes the entities, relationships, workflows, and business logic that are relevant to the specific problem or industry being addressed.

Processes describe how a company performs a task.
Capabilities describe what tasks a company perform.
The first step to organising your domains is to choose between creating the domains as processes or capabilities.
A process domain is put together based on how things are done; A capability domain based on what things are done.

Getting Assets into the Data Catalog
- Pull: using standard, built-in connectors (crawlers); API; RDS (read-only data store)
- Push: mainly streaming, just listen and receive data, not influence

Classification of content; confidentiality (secrecy); sensitivity (pii).

Understand Search
There are different users: everyday end users; governance end users; data analytics end users/
They can search in data (actual data, specific answers) or for data (data sources).
Leverage on DQL (Database Query Language), the most popular being SQL.
In a Data Catalog we can use IRQL (Information Retrieval Query Language)

- Why do you search a data catalog? Because it enables data discovery. Data discovery starts with finding the best data sources.
- What do you search in a data catalog? In a data catalog, you are not searching in data, but for data.
- How do you search a data catalog? You use a combination of query language commands, operators, and clickable filters. You can use simple search, browsing, and complex search:
- Simple search is simple for you, but complex behind the scenes. It provides search results based on how you have previously searched. It also corrects your queries and makes suggestions.
- Browse search can be vertical, based on domains; horizontal, based on data lineage and display how data travels across systems; relational, based on graph technology.
- When searching for data, you need to apply the mindset of a librarian, not a data scientist. Searching for data is a discipline that relies on search mechanics, but it also takes experience and understanding your company's data and language.
- Basic simple search is the way of searching that most end users will apply. A well- structured data catalog will deliver precise simple search, especially if it's based on a knowledge graph. But expect a lot of mess deeper down in the search results also.
- Detailed simple search requires you to know the syntax of the IRQL in your data catalog. So it takes a little time to write, or just experience, but you get super- precise hits in return.
- Flexible simple search also depends on understanding IRQL, but it opens up the search to give more results, increasing your recall and decreasing your precision, while at the same time still being a better way to target a well-defined topic than basic simple search.
- Range search is searching in intervals, e.g., a time span. This kind of search will result in high precision and low recall.
- Block search is a structured way to search for a complex topic using IRQL. It works best if your glossaries are exhaustive and used with great specificity.
- Statement search is a way to search for a complex topic; it simply puts a lot of things together in a search. It's not unstructured, but it's haphazard.
- Glossary browsing is searching in which you go exploring to get informed and enlightened about business terminology.
- Domain browsing, lineage browsing, and graph browsing are ways of searching vertically, horizontally, and relationally, respectively, by clicking through the data landscape.

Discover Data
- Data catalogs must activate metadata so that a data catalog is not just a dead repository of data but a machine that pushes relevant data in contexts where it can provide value and increased results.
- Data governance leaders are engaged by the fact that they can apply sensitivity and confidentiality classification directly on data. They are furthermore r ted by the fact that they can join forces in mapping the IT landscape and concentrate on more strategic priorities.
- Data analytics leaders are engaged naturally, but an extra selling point is data lineage, which allows these leaders to understand changes upstream or the causes of broken reporting.
- Domain leaders are engaged by the potential of seeing data from other business units that they are in need of in their daily tasks.
- All leaders are in fact connected. They work with data for different purposes, either on the operational backbone or on the data platform. But the data they work with is the same, and they need to align on how to describe it and manage it in a data catalog.

Access Data
- There will be cases where the implementation of a data catalog is in fact a catalog of catalogs. Even though this is a difficult way to implement a data catalog, it can be both a necessary and relevant approach.
- A centralized approach, which uses one global solution to make data accessible across the company
- A decentralized approach, where each domain is capable of choosing their own solutions to make data accessible
- A combined approach, where some data is accessible via a central solution, while certain domains in the company act more freely and have made data accessible themselves
- Questionnaires are a way to unlock the descriptions of domains and the assets in them, complete with glossary terminology.

The data asset Lifecyle is very similar to the Data Lifecycle: plan; obtain; store and share; maintain; apply; dispose.

- All data in IT systems has a lifecycle. This lifecycle can be short, long, or eternal, depending on the nature of the organization it pertains to. The data catalog enables companies to gain a complete overview of their data earlier in the data lifecycle.
- The data catalog enables you to mirror all the data in the IT landscape of a firm, giving global control of the data lifecycle, which solves issues such as how and with whom data must be shared and when it must be deleted-or if it must not be deleted.
- The data assets inside the data catalog also have their own lifecycle, and to keep the data catalog well curated and searchable, the lifecycle of the asset must be taken into consideration when managing the assets, for example when data sources are sunsetted.
- All lifecycles-inside and outside the data catalog-are connected. Data source lifecycles and data lifecycles influence the data asset lifecycles and terminology lifecycles, whereas the two latter support the first two. You can manage your data source and your data lifecycles via the data catalog.
- Lifecycles enable applied search that takes the dimension of time into account: via lifecycles, searches can be carried out that go back in time as long as the organizational memory allows. This is the key element to be in compliance with privacy regulations and other, industry-specific regulations that require organization to store data for a certain period of time.
- Lifecycles can be treated as a maintenance framework that can be enacted by using a data catalog.
- Data observability proposes to manage the data lifecycle in the Obtain phase, before it is stored in solutions and shared with the rest of the company.
1 review
July 8, 2024
So, what is a data catalog? Is it just another data tool for the data geeks?

Nope. The data catalog is the foundation for data democratization within an organization. It addresses core structural issues in data that affect how we discover and access information. It addresses issues such as data organization, metadata quality, and ecosystem interactions. By addressing these fundamental challenges, we can significantly improve our ability to efficiently find and use relevant data across domains.

This is what 'The Enterprise Data Catalog' is all about. Ole Olesen-Bagneux has written a solid and dense book, that is rich in experience, as Ole knows a lot about Information & Library Science with a combination of Data Management and IT.

Olesen-Bagneux's book goes to the heart of data cataloging, addressing structural data issues related to data discovery. It's not just a manual, but an integration of information and library science with data management, promoting a holistic view of data and information across multiple disciplines. This multidisciplinary approach emphasizes the need to search and find data regardless of its structure, format, or medium.

The book is a versatile resource for anyone in the world of data. It helps in planning a data catalog implementation, offering insights on foundational considerations, the purposes of what data cataloging really is, and what is required to create a functional data catalog. Olesen-Bagneux also provides a deep understanding of search mechanisms, differentiating between searching for data and searching in data, which enhances the data discovery processes.

It provides guidance on how to articulate data catalog value to stakeholders, identify key stakeholders, and integrate data catalogs into system environments. It also explores how data catalogs support data lifecycle management, emphasizing the importance of time and the interconnectedness of data lifecycles.

Olesen-Bagneux presents a clear definition of a data catalog as an organized inventory of an organization's data. He emphasizes its role in providing visibility, organizing data, and enabling search functionality. His academic yet approachable style simplifies complex topics and makes them understandable to anyone.

A highlight of the book is the discussion of search, in which Olesen-Bagneux differentiates between "searching for data" and "searching in data" and guides readers through the entire value stream from initial question to discovery. He argues for adopting a librarian's mindset to effectively find relevant data sources.

Metadata is another key topic, and Olesen-Bagneux highlights its role in making data searchable, ensuring accessibility, and providing security. The book also addresses the contemporary concept of the data mesh, presenting it as essential for scalable and operational data organizations, and discussing various approaches to data catalogs, including knowledge graph-based methods.

The importance of data lifecycle management is often overlooked, but Olesen-Bagneux highlights the complexity of managing data over time and the interaction between different lifecycle stages, and calls for more attention to this critical aspect.

Toward the end of the book, Olesen-Bagneux looks to the future of data management, envisioning data catalogs as potential internal search engines within organizations. This visionary perspective serves as a fitting conclusion to the book.

In the Afterword, he touches on the ethical implications of data catalogs, reminding the reader of the responsibilities that come with increased data accessibility. This ethical discussion is both timely and essential.

'The Enterprise Data Catalog' is more than a guide to data catalogs. It's a comprehensive exploration of data discovery, access, and organization. It's a must-read for data professionals (and any business professional, for that matter) that seamlessly blends theory and practical advice. I highly recommend this book.
1 review
November 10, 2023
This book has opened my eyes to the seemingly infinite world of possibilities in searching for data. Without super deep prior understanding of data search and data catalogues The Enterprise Data Catalog has opened a world of opportunities for me.

Ole writes about high-level topics in an understandable way and even though our business is too small for this kind of technology full-scale this is truly inspirational in terms of what we have to do and how else have to think data management when our business expand during the coming years.

Before reading the book the subject was kind of a black box to me but now I understand the interplay between different layers of structuring and organising complex data. Making it available to all.

The last chapter (I highly recommend reading the entire book) finishes it all off with a perspective on data search that really took me by surprise… it sort of put it all into place.

If you are serious about understanding searching for and in data this is a must-read.

I highly recommend it!
1 review1 follower
April 15, 2023
This book does a great job of explaining the nuts and bolts of data catalogs, but there's also a larger vision running through it. This comes to fruition in chapter 8 with the notion of the "company search engine," a central hub for discovering data, people, organizational structure, etc. Just as search revolutionized the internet — and indeed the landscape of human knowledge — Olesen-Bagneux's notion of searchable organizations could revolutionize work in enterprises, government and the nonprofit sector.

Data practitioners are the target audience for this book, and they will gain a great deal by reading it. But I highly recommend it to business leaders, even those not explicitly responsible for technology. Whether they work in century old Fortune 500 companies or fast growing startups, they will discover transformative possibilities.
1 review1 follower
July 8, 2024
The author has a great insight: that enterprise data catalogs are about search. Yes, there are other uses (governance etc) but the primary design criteria for catalogs should be first and foremost about the search experience. The issue is that most catalog vendors do not have this focus. It would be great if enterprise data catalog vendors read this book and took a hard look at how their products are designed and what purpose they serve!
Profile Image for Xavier.
Author 1 book5 followers
January 23, 2025
I enjoyed reading the book. Seeing someone with a Library and Information Science (LIS) background writing for a data audience is refreshing. This perspective is much needed in the data community.

The book is structured to be easy to follow, and the concepts are explained clearly and concisely.

While the approach is theoretical and avoids getting bogged down in specific tools or technologies (which I like), the author does mention some data catalog platforms I hadn't encountered before, which was a nice bonus.

The book's real strength lies in its thought-provoking guiding principles, which offer takeaways for anyone considering building a data catalog within their organization. I'll definitely consider these insights for future data catalog projects.

The book is a valuable read if you're involved in data governance or management.
Profile Image for Tine Andersen.
1 review
May 12, 2025
Ole Olesen-Bagneux uses his background in Library and Information science to bring a fresh take on data catalogues in The Enterprise Data Catalogue. He draws connections between classic cataloguing methods and today’s metadata practices, showing how data catalogues can be more than just technical tools—they can also play a key role in aligning strategy across an organization.

He makes a strong case for treating metadata as a vital business asset, not just a concern for IT.

Throughout the book, Olesen-Bagneux shows how smart data catalogue design can boost data literacy, build trust, and create real value for the whole business.
11 reviews
April 22, 2024
Good overview of data catalog and use cases that might be implemented with it.
Displaying 1 - 10 of 10 reviews

Can't find what you're looking for?

Get help and learn more about the design.