Good Graph DB Starter Despite Challenges for Non-Programmers - Robinson and Webber’s “Graph Databases” offers helpful context and an overview of this increasingly important topic. It seems to fulfill its purpose to “introduce graphs and graph databases to technology practitioners, including developers, database professionals, and technology decision makers.” However, if not among those with this kind of background, one will need additional assistance for more complete understanding and learning.
There is a forward by Emil Eifrem the founder of Neo4j, the most prominent of these applications, that provides some historical background and a preface offers initial explanation and puts the emergence of graph databases (DBs) into perspective. Seven chapters follow: (1) Introduction, (2) Options for Storing Connected Data, (3) Data Modeling with Graphs, (4) Building a Graph Database Application, (5) Graphs in the Real World, (6) Graph Database Internals, and (7) Predictive Analysis with Graph Theory.
In the way of perspective, passages from the ‘Preface’ and ‘Introduction’ are particularly informative. For instance, the authors make the case for the graph DB relevance as follows, “Graph databases address one of the great macroscopic business trends of today: leveraging complex and dynamic relationships in highly connected data to generate insight and competitive advantage.” They continue to say that “Graph theory was pioneered by Euler in the 18th century, and has been actively researched and improved by mathematicians, sociologists, anthropologists, and other practitioners ever since. However, it is only in the past few years that graph theory and graph thinking have been applied to information management . . . [ due to] commercial success of companies such as Facebook, Google, and Twitter [which incorporate such approaches].” The authors explain that “A graph is . . . a set of nodes and the relationships that connect them.” (Since this submission, I discovered Barabasi's Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life which provides additional history and context that others may find useful in this regard; see my review.)
Within ‘Data Modeling with Graphs,’ the authors indicate that “Before we dig deeper into modeling with graphs, a word on models in general. Modeling is an abstracting activity motivated by a particular need or goal. . . There are no natural representations of the world the way it “really is,” just many purposeful selections, abstractions, and simplifications, some of which are more useful than others . . .” They continue later to say “Graph modeling naturally fits with the way we tend to abstract details. . . using circles and boxes, and then describe the connections between these things by joining them with arrows and lines. Today’s graph databases, more than any other database technologies, are “whiteboard friendly.” The typical whiteboard view of a problem is a graph. What we sketch in our creative and analytical modes maps closely to the data model we implement inside the database.” The authors go on to say “The interesting thing about graph diagrams is that they tend to contain specific instances of nodes and relationships, rather than classes or archetypes. In other words, we tend to describe graphs using specification by example.”
Much of the rest of the book is devoted to the Cypher programing language used with Neo4j and its use in constructing and querying graph DBs. As the authors say, “Cypher enables a user (or an application acting on behalf of a user) to ask the database to find data that matches a specific pattern. Colloquially, we ask the database to “find things like this.” And the way we describe what “things like this” look like is to draw them, using ASCII art.” This visual aspect makes graph DBs especially appealing and revealing. While the ease in using Cypher is emphasized, this aspect is what becomes more challenging for those of us who are not programmers.
Other parts of the book step through how graph DB models are built, scaled up and used in a production or operating environment with large amounts of data and/or real-time feeds. The authors also discuss the real-world application of graph DBs and their use in prediction. Namely, graph DBs are being utilized in such areas as information or talent search, providing product recommendations, fraud detection, and new drug discovery. No doubt there can be other usages not only in business, but also in the sciences, arts, and humanities (See my reviews of books including not only Morrison’s Data-driven Organization Design: Sustaining the Competitive Edge Through Organizational Analytics ---where I got my first hint about this topic and Neo4j--- Serious Play: How the World's Best Companies Simulate to Innovate , but also Mitteldorf’s Cracking the Aging Code: The New Science of Growing Old - And What It Means for Staying Young , Genosko’s Remodelling Communication: From WWII to the WWW (Toronto Studies in Semiotics and Communication) , Archer and Jockers’ The Bestseller Code: Anatomy of the Blockbuster Novel , and Staley’s Computers, Visualization, and History: How New Technology Will Transform Our Understanding of the Past for a few ideas in these veins).
As a non-programmer interested in exploring such wider appliance of graph DBs, my path has been one of downloading Neo4j, using the company website and You Tube tutorials with my own data, to construct and learn some graph DB basics. Likely, I will also take a formal course at some point. While these steps were necessary in my case (which I would recommend to others so interested) one can also utilize this book as a starter along these lines as there does not yet seem to be a more simple and painless way to go at this time.