Jump to ratings and reviews
Rate this book

Using OpenRefine

Rate this book
With this book on OpenRefine, managing and cleaning your large datasets suddenly got a lot easier! With a cookbook approach and free datasheets included, you'll quickly and painlessly improve your data managing capabilities. Overview In Detail Data is supposed to be the new gold, but how can you unlock the value in your data? Managing large datasets used to be a task for specialists, but you don't have to worry about inconsistencies or errors anymore. OpenRefine lets you clean, link, and publish your dataset in a breeze. Using OpenRefine takes you on a practical tour of all the handy features of this well-known data transformation tool. It is a hands-on recipe book that teaches you data techniques by example. Starting from the basics, it gradually transforms you into an OpenRefine expert. This book will teach you all the necessary skills to handle any large dataset and to turn it into high-quality data for the Web. After you learn how to analyze data and spot issues, we'll see how we can solve them to obtain a clean dataset. Messy and inconsistent data is recovered through advanced techniques such as automated clustering. We'll then show extract links from keyword and full-text fields using reconciliation and named-entity extraction. Using OpenRefine is more than a it's a guide stuffed with tips and tricks to get the best out of your data. What you will learn from this book Approach The book is styled on a Cookbook, containing recipes - combined with free datasets - which will turn readers into proficient OpenRefine users in the fastest possible way. Who this book is written for This book is targeted at anyone who works on or handles a large amount of data. No prior knowledge of OpenRefine is required, as we start from the very beginning and gradually reveal more advanced features. You don't even need your own dataset, as we provide example data to try out the book's recipes.

114 pages, Paperback

First published January 1, 2013

7 people are currently reading
22 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
2 (7%)
4 stars
19 (67%)
3 stars
6 (21%)
2 stars
1 (3%)
1 star
0 (0%)
Displaying 1 - 6 of 6 reviews
Profile Image for Miss Susan.
2,732 reviews62 followers
March 20, 2025
ayyyyyyy it's me, ya girl, once again up half the night reading a book on a topic i'm teaching soon because there's no such thing as being too prepared!!!

anyways this is both an excellent intro to openrefine for a beginner and comprehensive enough that i learned a few tricks despite being moderately experienced. if you're also in the intermediate 'yeah, i clean data pretty frequently at work and taught a workshop on this like three times now' stage i'd jump ahead to the last few chapters on reconciliation and grel. some of the reconciliation services he mentions are no longer maintained (freebase we barely knew thee!) but the principles hold

4.5 stars
Profile Image for Margaret Heller.
Author 2 books36 followers
November 19, 2013
Disclosure: the publisher of this book provided me with a free copy in exchange for a review. The opinions expressed in the review are my own.

While OpenRefine is an extremely useful "power tool for messy data", its power can be difficult to master without a great deal of trial and error on the part of the user. Part of this stems from the evolving nature of the tool. It began life as Freebase Gridworks, with the purpose of cleaning up data in order to run it against linked data in Freebase. When the Freebase parent organization was acquired by Google, they rebranded the tool as Google Refine, but as Google's priorities shifted, they stopped working on the tool and it became the open source OpenRefine. This legacy means that the tool has many pieces created by different people for different purposes. While there is quite a lot of good documentation out there on the OpenRefine site and elsewhere, this book puts it together in a easy to follow format. Like a lot of OpenRefine documentation, it is a series of "recipes" that explain how to do one specific task, but is written with the cover to cover reader in mind as well. The Google produced tutorial videos have similar coverage, but the book is more in depth, and has the advantage for readers coming from the cultural institution side of using a museum data set for examples. Another advantage is that the authors of the book have a particular interest in named entity recognition (part of the book covers the tool that one of them produced), which is particularly helpful for more abstract data sets with cultural data.

Using OpenRefine is useful for beginner or intermediate users of OpenRefine. As someone who has used OpenRefine for awhile and written about its use in libraries, this was more helpful than I expected initially, since there were pieces of functionality I'd not yet encountered in experimentation or documentation so far. My one criticism is that much of the book promises a complete explanation in the appendix of regular expressions and the Google Refine Expression Language that powers the software, but I found that the GREL documentation was less useful than I hoped, though I still learned from it. I would have preferred if that section had been earlier in the book. That aside, I would recommend this book to anyone who has been using OpenRefine or thinking about using it, and additionally for library and museum professional development collections.

Profile Image for Emir Muñoz.
1 review
November 26, 2013
[The publisher of this book provide me with a free e-copy for review.]

My familiarization with OpenRefine started with Google Refine. The main idea has evolved and become open for contributions of the community.
This book is a nice cookbook, easy to follow written in a friendly language, to perform both simple and complex operations over semistructured data such as HTML tables, spreadsheets, csv files, among others, exploiting the linkeability of your datasets with the Linked Open Data (LOD) cloud. Even without previous knowledge any user can take this book and from scratch start to use OpenRefine.

Organization:
The book is divided into four chapters plus an appendix.
Chapter 1: Presents the first set of easy-to-follow recipes to get your hands into loading and preparing your data.
Chapter 2: The main contribution is on how to sort and create facets to select (or isolate) data based on regular expressions over the cell values, with the main goal of fix the datasets.
Chapter 3: Tells you how to deal with more advanced operations over your data, and gives a brief introduction to GREL, the language defined to manipulate cell values. GREL is simple and powerful enough to match and replace the cell’s content as needed by different purposes, that’s because they have an appendix with a deeper explanation.
Chapter 4: Once that all the data have been normalized this can be reconciliated with an external knowledge base or linked open dataset such as Freebase (can be whatever knowledge base appropriate for the data at hands) to enrich the semantics of the data.

What I like about this book:
The format of cookbook and the easy reading are the most positive aspects of this book. The authors made a good work explaining all the use cases using a real data example.

What I dislike about this book:
The smaller things that I dislike are part of the format used in the book. It would be better to have enumerated figures and sections to make references effortless. However, the figures are very self explicatives and ad-hoc with the explanations. Also, for easy referencing would be nice to have a consecutive recipes numeration and not a new numeration in each chapter.

Wrapping up, I would strongly recommend this book to any user interested in: Semantic Web/Linked Data, Linked Open Data publication, Data Integration, Named Entity Recognition; at student, lecturer, practitioner level, or just for hobby.
Profile Image for Liath Appleton.
20 reviews
October 31, 2013
Finally! A much needed introduction to OpenRefine. In my line of work I clean and organize data nearly every day. I use OpenRefine for the bulk of this work, and find myself training new students on a regular basis. This book and its tutorials have freed up my time, allowing students to learn the basics on their own. I am then able to focus more advanced training on the specifics of our particular data.
This book assumes no prior knowledge of OpenRefine, but even as an advanced user I learned a few tricks I hadn't previously discovered. OpenRefine itself is an essential tool for anyone who works with large amounts of data, and anyone who needs to learn or teach OpenRefine will find this book to be a valuable addition to their library.
Profile Image for Jose Manuel.
241 reviews4 followers
June 22, 2015
Es suficiente con que la herramienta sea extraordinaria para que el libro, a poco que se esfuercen los autores, permita entrar raudamente a realizar limpieza de datasets.
Pero es que además el libro es suficientemente instructivo si quieres empezar a usar esta herramienta de Data Cleaning.
No es un libro avanzado, pero si no tienes ni idea te vale para empezar.
Displaying 1 - 6 of 6 reviews

Can't find what you're looking for?

Get help and learn more about the design.