(?)
Quotes are added by the Goodreads community and are not verified by Goodreads. (Learn more)

“Use manual sanity checks in data pipelines. When optimizing data processing systems, it’s easy to stay in the “binary mindset” mode, using tight pipelines, efficient binary data formats, and compressed I/O. As the data passes through the system unseen, unchecked (except for perhaps its type), it remains invisible until something outright blows up. Then debugging commences. I advocate sprinkling a few simple log messages throughout the code, showing what the data looks like at various internal points of processing, as good practice — nothing fancy, just an analogy to the Unix head command, picking and visualizing a few data points. Not only does this help during the aforementioned debugging, but seeing the data in a human-readable format leads to “aha!” moments surprisingly often, even when all seems to be going well. Strange tokenization! They promised input would always be encoded in latin1! How did a document in this language get in there? Image files leaked into a pipeline that expects and parses text files! These are often insights that go way beyond those offered by automatic type checking or a fixed unit test, hinting at issues beyond component boundaries. Real-world data is messy. Catch early even things that wouldn’t necessarily lead to exceptions or glaring errors. Err on the side of too much verbosity.”

Micha Gorelick, High Performance Python: Practical Performant Programming for Humans
Read more quotes from Micha Gorelick


Share this quote:
Share on Twitter

Friends Who Liked This Quote

To see what your friends thought of this quote, please sign up!

0 likes
All Members Who Liked This Quote

None yet!


This Quote Is From


Browse By Tag