Jake VanderPlas's Blog

September 13, 2018

The Waiting Time Paradox, or, Why Is My Bus Always Late?

Rapid Ride Bus Image Source: Wikipedia License CC-BY-SA 3.0

If you, like me, frequently commute via public transit, you may be familiar with the following situation:

You arrive at the bus stop, ready to catch your bus: a line that advertises arrivals every 10 minutes. You glance at your watch and note the time... and when the bus finally comes 11 minutes later, you wonder why you always seem to be so unlucky.

Na��vely, you might expect that if buses are coming every 10 minutes and you arrive at a random time, your average wait would be something like 5 minutes.In reality, though, buses do not arrive exactly on schedule, and so you might wait longer.It turns out that under some reasonable assumptions, you can reach a startling conclusion:

When waiting for a bus that comes on average every 10 minutes, your average waiting time will be 10 minutes.

This is what is sometimes known as the waiting time paradox.

I've encountered this idea before, and always wondered whether it is actually true... how well do those "reasonable assumptions" match reality?This post will explore the waiting time paradox from the standpoint of both simulation and probabilistic arguments, and then take a look at some real bus arrival time data from the city of Seattle to (hopefully) settle the paradox once and for all.

 •  0 comments  •  flag
Share on Twitter
Published on September 13, 2018 10:00

December 18, 2017

Simulating Chutes & Ladders in Python

[img: Chutes and Ladders animated simulation]

This weekend I found myself in a particularly drawn-out game of Chutes and Ladders with my four-year-old. If you've not had the pleasure of playing it, Chutes and Ladders (also sometimes known as Snakes and Ladders) is a classic kids board game wherein players roll a six-sided die to advance forward through 100 squares, using "ladders" to jump ahead, and avoiding "chutes" that send you backward. It's basically a glorified random walk with visual aids to help you build a narrative. Thrilling. But she's having fun practicing counting, learning to win and lose gracefully, and developing the requisite skills to be a passionate sports fan, so I play along.

On the approximately twenty third game of the morning, as we found ourselves in a near endless cycle of climbing ladders and sliding down chutes, never quite reaching that final square to end the game, I started wondering how much longer the game could last: what is the expected length of a game? How heavy are the tails of the game length distribution? How succinctly could I answer those questions in Python? And then, at some point, it clicked: Chutes and Ladders is memoryless ��� the effect of a roll depends only on where you are, not where you've been ��� and so it can be modeled as a Markov process! By the time we (finally) hit square 100, I basically had this blog post written, at least in my head.

When I tweeted about this, people pointed me to a number of similar treatments of Chutes & Ladders, so I'm under no illusion that this idea is original.Think of this as a blog post version of a dad joke: my primary goal is not originality, but self-entertainment, and if anyone else finds it entertaining that's just an added bonus.

 •  0 comments  •  flag
Share on Twitter
Published on December 18, 2017 10:00

December 11, 2017

Optimization of Scientific Code with Cython: Ising Model

Python is quick and easy to code, but can be slow when doing intensive numerical operations. Translating code to Cython can be helpful, but in most cases requires a bit of trial and error to achieve the optimal result. Cython's tutorials contain a lot of information, but for iterative workflows like optimization with Cython, it's often useful to see it done "live".

For that reason, I decided to record some screencasts showing this iterative optimization process, using an Ising Model, as an example application.

 •  0 comments  •  flag
Share on Twitter
Published on December 11, 2017 12:00

December 5, 2017

Installing Python Packages from a Jupyter Notebook

In software, it's said that all abstractions are leaky, and this is true for the Jupyter notebook as it is for any other software.I most often see this manifest itself with the following issue:

I installed package X and now I can't import it in the notebook. Help!

This issue is a perrennial source of StackOverflow questions (e.g. this, , , there, another, this one, , and this... etc.).

Fundamentally the problem is usually rooted in the fact that the Jupyter kernels are disconnected from Jupyter's shell; in other words, the installer points to a different Python version than is being used in the notebook.In the simplest contexts this issue does not arise, but when it does, debugging the problem requires knowledge of the intricacies of the operating system, the intricacies of Python package installation, and the intricacies of Jupyter itself.In other words, the Jupyter notebook, like all abstractions, is leaky.

In the wake of several discussions on this topic with colleagues, some online (exhibit A, exhibit B) and some off, I decided to treat this issue in depth here.This post will address a couple things:

First, I'll provide a quick, bare-bones answer to the general question, how can I install a Python package so it works with my jupyter notebook, using pip and/or conda?.

Second, I'll dive into some of the background of exactly what the Jupyter notebook abstraction is doing, how it interacts with the complexities of the operating system, and how you can think about where the "leaks" are, and thus better understand what's happening when things stop working.

Third, I'll talk about some ideas the community might consider to help smooth-over these issues, including some changes that the Jupyter, Pip, and Conda developers might consider to ease the cognitive load on users.

This post will focus on two approaches to installing Python packages: pip and conda.Other package managers exist (including platform-specific tools like yum, apt, homebrew, etc., as well as cross-platform tools like enstaller), but I'm less familiar with them and won't be remarking on them further.

 •  0 comments  •  flag
Share on Twitter
Published on December 05, 2017 09:00

November 9, 2017

Exploring Line Lengths in Python Packages

This week, Twitter upped their single-tweet character limit from 140 to 280, purportedly based on this interesting analysis of tweet lengths published on Twitter's engineering blog.The gist of the analysis is this: English language tweets display a roughly log-normal distribution of character counts, except near the 140-character limit, at which the distribution spikes:

The analysis takes this as evidence that twitter users often "cram" their longer thoughts into the 140 character limit, and suggest that a 280-character limit would more naturally accommodate the distribution of people's desired tweet lengths.

This immediately brought to mind another character limit that many Python programmers face in their day-to-day lives: the 79-character line limit suggested by Python's PEP8 style guide:

Limit all lines to a maximum of 79 characters.

I began to wonder whether popular Python packages (e.g. NumPy, SciPy, Pandas, Scikit-Learn, Matplotlib, AstroPy) display anything similar to what is seen in the distribution of tweet lengths.

Spoiler alert: they do! And the details of the distribution reveal some insights into the programming habits and stylistic conventions of the communities who write them.

 •  0 comments  •  flag
Share on Twitter
Published on November 09, 2017 14:00

May 26, 2017

Exposing Python 3.6's Private Dict Version

/*!** IPython notebook**//* CSS font colors for translated ANSI colors. */.ansibold { font-weight: bold;}/* use dark versions for foreground, to improve visibility */.ansiblack { color: black;}.ansired { color: darkred;}.ansigreen { color: darkgreen;}.ansiyellow { color: #c4a000;}.ansiblue { color: darkblue;}.ansipurple { color: darkviolet;}.ansicyan { color: steelblue;}.ansigray { color: gray;}/* and light for background, for the same reason */.ansibgblack { background-color: black;}.ansibgred { background-color: red;}.ansibggreen { background-color: green;}.ansibgyellow { background-color: yellow;}.ansibgblue { background-color: blue;}.ansibgpurple { background-color: magenta;}.ansibgcyan { background-color: cyan;}.ansibggray { background-color: gray;}div.cell { /* Old browsers */ display: -webkit-box; -webkit-box-orient: vertical; -webkit-box-align: stretch; display: -moz-box; -moz-box-orient: vertical; -moz-box-align: stretch; display: box; box-orient: vertical; box-align: stretch; /* Modern browsers */ display: flex; flex-direction: column; align-items: stretch; border-radius: 2px; box-sizing: border-box; -moz-box-sizing: border-box; -webkit-box-sizing: border-box; border-width: 1px; border-style: solid; border-color: transparent; width: 100%; padding: 5px; /* This acts as a spacer between cells, that is outside the border */ margin: 0px; outline: none; border-left-width: 1px; padding-left: 5px; background: linear-gradient(to right, transparent -40px, transparent 1px, transparent 1px, transparent 100%);}div.cell.jupyter-soft-selected { border-left-color: #90CAF9; border-left-color: #E3F2FD; border-left-width: 1px; padding-left: 5px; border-right-color: #E3F2FD; border-right-width: 1px; background: #E3F2FD;}@media print { div.cell.jupyter-soft-selected { border-color: transparent; }}div.cell.selected { border-color: #ababab; border-left-width: 0px; padding-left: 6px; background: linear-gradient(to right, #42A5F5 -40px, #42A5F5 5px, transparent 5px, transparent 100%);}@media print { div.cell.selected { border-color: transparent; }}div.cell.selected.jupyter-soft-selected { border-left-width: 0; padding-left: 6px; background: linear-gradient(to right, #42A5F5 -40px, #42A5F5 7px, #E3F2FD 7px, #E3F2FD 100%);}.edit_mode div.cell.selected { border-color: #66BB6A; border-left-width: 0px; padding-left: 6px; background: linear-gradient(to right, #66BB6A -40px, #66BB6A 5px, transparent 5px, transparent 100%);}@media print { .edit_mode div.cell.selected { border-color: transparent; }}.prompt { /* This needs to be wide enough for 3 digit prompt numbers: In[100]: */ min-width: 14ex; /* This padding is tuned to match the padding on the CodeMirror editor. */ padding: 0.4em; margin: 0px; font-family: monospace; text-align: right; /* This has to match that of the the CodeMirror class line-height below */ line-height: 1.21429em; /* Don't highlight prompt number selection */ -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none; /* Use default cursor */ cursor: default;}@media (max-width: 540px) { .prompt { text-align: left; }}div.inner_cell { min-width: 0; /* Old browsers */ display: -webkit-box; -webkit-box-orient: vertical; -webkit-box-align: stretch; display: -moz-box; -moz-box-orient: vertical; -moz-box-align: stretch; display: box; box-orient: vertical; box-align: stretch; /* Modern browsers */ display: flex; flex-direction: column; align-items: stretch; /* Old browsers */ -webkit-box-flex: 1; -moz-box-flex: 1; box-flex: 1; /* Modern browsers */ flex: 1;}/* input_area and input_prompt must match in top border and margin for alignment */div.input_area { border: 1px solid #cfcfcf; border-radius: 2px; background: #f7f7f7; line-height: 1.21429em;}/* This is needed so that empty prompt areas can collapse to zero height when there is no content in the output_subarea and the prompt. The main purpose of this is to make sure that empty JavaScript output_subareas have no height. */div.prompt:empty { padding-top: 0; padding-bottom: 0;}div.unrecognized_cell { padding: 5px 5px 5px 0px; /* Old browsers */ display: -webkit-box; -webkit-box-orient: horizontal; -webkit-box-align: stretch; display: -moz-box; -moz-box-orient: horizontal; -moz-box-align: stretch; display: box; box-orient: horizontal; box-align: stretch; /* Modern browsers */ display: flex; flex-direction: row; align-items: stretch;}div.unrecognized_cell .inner_cell { border-radius: 2px; padding: 5px; font-weight: bold; color: red; border: 1px solid #cfcfcf; background: #eaeaea;}div.unrecognized_cell .inner_cell a { color: inherit; text-decoration: none;}div.unrecognized_cell .inner_cell a:hover { color: inherit; text-decoration: none;}@media (max-width: 540px) { div.unrecognized_cell > div.prompt { display: none; }}div.code_cell { /* avoid page breaking on code cells when printing */}@media print { div.code_cell { page-break-inside: avoid; }}/* any special styling for code cells that are currently running goes here */div.input { page-break-inside: avoid; /* Old browsers */ display: -webkit-box; -webkit-box-orient: horizontal; -webkit-box-align: stretch; display: -moz-box; -moz-box-orient: horizontal; -moz-box-align: stretch; display: box; box-orient: horizontal; box-align: stretch; /* Modern browsers */ display: flex; flex-direction: row; align-items: stretch;}@media (max-width: 540px) { div.input { /* Old browsers */ display: -webkit-box; -webkit-box-orient: vertical; -webkit-box-align: stretch; display: -moz-box; -moz-box-orient: vertical; -moz-box-align: stretch; display: box; box-orient: vertical; box-align: stretch; /* Modern browsers */ display: flex; flex-direction: column; align-items: stretch; }}/* input_area and input_prompt must match in top border and margin for alignment */div.input_prompt { color: #303F9F; border-top: 1px solid transparent;}div.input_area > div.highlight { margin: 0.4em; border: none; padding: 0px; background-color: transparent;}div.input_area > div.highlight > pre { margin: 0px; border: none; padding: 0px; background-color: transparent;}/* The following gets added to the <head> if it is detected that the user has a * monospace font with inconsistent normal/bold/italic height. See * notebookmain.js. Such fonts will have keywords vertically offset with * respect to the rest of the text. The user should select a better font. * See: https://github.com/ipython/ipython/is... * * .CodeMirror span { * vertical-align: bottom; * } */.CodeMirror { line-height: 1.21429em; /* Changed from 1em to our global default */ font-size: 14px; height: auto; /* Changed to auto to autogrow */ background: none; /* Changed from white to allow our bg to show through */}.CodeMirror-scroll { /* The CodeMirror docs are a bit fuzzy on if overflow-y should be hidden or visible.*/ /* We have found that if it is visible, vertical scrollbars appear with font size changes.*/ overflow-y: hidden; overflow-x: auto;}.CodeMirror-lines { /* In CM2, this used to be 0.4em, but in CM3 it went to 4px. We need the em value because */ /* we have set a different line-height and want this to scale with that. */ padding: 0.4em;}.CodeMirror-linenumber { padding: 0 8px 0 4px;}.CodeMirror-gutters { border-bottom-left-radius: 2px; border-top-left-radius: 2px;}.CodeMirror pre { /* In CM3 this went to 4px from 0 in CM2. We need the 0 value because of how we size */ /* .CodeMirror-lines */ padding: 0; border: 0; border-radius: 0;}/*Original style from softwaremaniacs.org (c) Ivan Sagalaev <Maniac@SoftwareManiacs.Org>Adapted from GitHub theme*/.highlight-base { color: #000;}.highlight-variable { color: #000;}.highlight-variable-2 { color: #1a1a1a;}.highlight-variable-3 { color: #333333;}.highlight-string { color: #BA2121;}.highlight-comment { color: #408080; font-style: italic;}.highlight-number { color: #080;}.highlight-atom { color: #88F;}.highlight-keyword { color: #008000; font-weight: bold;}.highlight-builtin { color: #008000;}.highlight-error { color: #f00;}.highlight-operator { color: #AA22FF; font-weight: bold;}.highlight-meta { color: #AA22FF;}/* previously not defined, copying from default codemirror */.highlight-def { color: #00f;}.highlight-string-2 { color: #f50;}.highlight-qualifier { color: #555;}.highlight-bracket { color: #997;}.highlight-tag { color: #170;}.highlight-attribute { color: #00c;}.highlight-header { color: blue;}.highlight-quote { color: #090;}.highlight-link { color: #00c;}/* apply the same style to codemirror */.cm-s-ipython span.cm-keyword { color: #008000; font-weight: bold;}.cm-s-ipython span.cm-atom { color: #88F;}.cm-s-ipython span.cm-number { color: #080;}.cm-s-ipython span.cm-def { color: #00f;}.cm-s-ipython span.cm-variable { color: #000;}.cm-s-ipython span.cm-operator { color: #AA22FF; font-weight: bold;}.cm-s-ipython span.cm-variable-2 { color: #1a1a1a;}.cm-s-ipython span.cm-variable-3 { color: #333333;}.cm-s-ipython span.cm-comment { color: #408080; font-style: italic;}.cm-s-ipython span.cm-string { color: #BA2121;}.cm-s-ipython span.cm-string-2 { color: #f50;}.cm-s-ipython span.cm-meta { color: #AA22FF;}.cm-s-ipython span.cm-qualifier { color: #555;}.cm-s-ipython span.cm-builtin { color: #008000;}.cm-s-ipython span.cm-bracket { color: #997;}.cm-s-ipython span.cm-tag { color: #170;}.cm-s-ipython span.cm-attribute { color: #00c;}.cm-s-ipython span.cm-header { color: blue;}.cm-s-ipython span.cm-quote { color: #090;}.cm-s-ipython span.cm-link { color: #00c;}.cm-s-ipython span.cm-error { color: #f00;}.cm-s-ipython span.cm-tab { background: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADAAAAAMCAYAAAAkuj5RAAAAAXNSR0IArs4c6QAAAGFJREFUSMft1LsRQFAQheHPowAKoACx3IgEKtaEHujDjORSgWTH/ZOdnZOcM/sgk/kFFWY0qV8foQwS4MKBCS3qR6ixBJvElOobYAtivseIE120FaowJPN75GMu8j/LfMwNjh4HUpwg4LUAAAAASUVORK5CYII=); background-position: right; background-repeat: no-repeat;}div.output_wrapper { /* this position must be relative to enable descendents to be absolute within it */ position: relative; /* Old browsers */ display: -webkit-box; -webkit-box-orient: vertical; -webkit-box-align: stretch; display: -moz-box; -moz-box-orient: vertical; -moz-box-align: stretch; display: box; box-orient: vertical; box-align: stretch; /* Modern browsers */ display: flex; flex-direction: column; align-items: stretch; z-index: 1;}/* class for the output area when it should be height-limited */div.output_scroll { /* ideally, this would be max-height, but FF barfs all over that */ height: 24em; /* FF needs this *and the wrapper* to specify full width, or it will shrinkwrap */ width: 100%; overflow: auto; border-radius: 2px; -webkit-box-shadow: inset 0 2px 8px rgba(0, 0, 0, 0.8); box-shadow: inset 0 2px 8px rgba(0, 0, 0, 0.8); display: block;}/* output div while it is collapsed */div.output_collapsed { margin: 0px; padding: 0px; /* Old browsers */ display: -webkit-box; -webkit-box-orient: vertical; -webkit-box-align: stretch; display: -moz-box; -moz-box-orient: vertical; -moz-box-align: stretch; display: box; box-orient: vertical; box-align: stretch; /* Modern browsers */ display: flex; flex-direction: column; align-items: stretch;}div.out_prompt_overlay { height: 100%; padding: 0px 0.4em; position: absolute; border-radius: 2px;}div.out_prompt_overlay:hover { /* use inner shadow to get border that is computed the same on WebKit/FF */ -webkit-box-shadow: inset 0 0 1px #000; box-shadow: inset 0 0 1px #000; background: rgba(240, 240, 240, 0.5);}div.output_prompt { color: #D84315;}/* This class is the outer container of all output sections. */div.output_area { padding: 0px; page-break-inside: avoid; /* Old browsers */ display: -webkit-box; -webkit-box-orient: horizontal; -webkit-box-align: stretch; display: -moz-box; -moz-box-orient: horizontal; -moz-box-align: stretch; display: box; box-orient: horizontal; box-align: stretch; /* Modern browsers */ display: flex; flex-direction: row; align-items: stretch;}div.output_area .MathJax_Display { text-align: left !important;}div.output_area div.output_area div.output_area img,div.output_area svg { max-width: 100%; height: auto;}div.output_area img.unconfined,div.output_area svg.unconfined { max-width: none;}/* This is needed to protect the pre formating from global settings such as that of bootstrap */.output { /* Old browsers */ display: -webkit-box; -webkit-box-orient: vertical; -webkit-box-align: stretch; display: -moz-box; -moz-box-orient: vertical; -moz-box-align: stretch; display: box; box-orient: vertical; box-align: stretch; /* Modern browsers */ display: flex; flex-direction: column; align-items: stretch;}@media (max-width: 540px) { div.output_area { /* Old browsers */ display: -webkit-box; -webkit-box-orient: vertical; -webkit-box-align: stretch; display: -moz-box; -moz-box-orient: vertical; -moz-box-align: stretch; display: box; box-orient: vertical; box-align: stretch; /* Modern browsers */ display: flex; flex-direction: column; align-items: stretch; }}div.output_area pre { margin: 0; padding: 0; border: 0; vertical-align: baseline; color: black; background-color: transparent; border-radius: 0;}/* This class is for the output subarea inside the output_area and after the prompt div. */div.output_subarea { overflow-x: auto; padding: 0.4em; /* Old browsers */ -webkit-box-flex: 1; -moz-box-flex: 1; box-flex: 1; /* Modern browsers */ flex: 1; max-width: calc(100% - 14ex);}div.output_scroll div.output_subarea { overflow-x: visible;}/* The rest of the output_* classes are for special styling of the different output types *//* all text output has this class: */div.output_text { text-align: left; color: #000; /* This has to match that of the the CodeMirror class line-height below */ line-height: 1.21429em;}/* stdout/stderr are 'text' as well as 'stream', but execute_result/error are *not* streams */div.output_stderr { background: #fdd; /* very light red background for stderr */}div.output_latex { text-align: left;}/* Empty output_javascript divs should have no height */div.output_javascript:empty { padding: 0;}.js-error { color: darkred;}/* raw_input styles */div.raw_input_container { line-height: 1.21429em; padding-top: 5px;}pre.raw_input_prompt { /* nothing needed here. */}input.raw_input { font-family: monospace; font-size: inherit; color: inherit; width: auto; /* make sure input baseline aligns with prompt */ vertical-align: baseline; /* padding + margin = 0.5em between prompt and cursor */ padding: 0em 0.25em; margin: 0em 0.25em;}input.raw_input:focus { box-shadow: none;}p.p-space { margin-bottom: 10px;}div.output_unrecognized { padding: 5px; font-weight: bold; color: red;}div.output_unrecognized a { color: inherit; text-decoration: none;}div.output_unrecognized a:hover { color: inherit; text-decoration: none;}.rendered_html { color: #000; /* any extras will just be numbers: */}.rendered_html :link { text-decoration: underline;}.rendered_html :visited { text-decoration: underline;}.rendered_html h1:first-child { margin-top: 0.538em;}.rendered_html h2:first-child { margin-top: 0.636em;}.rendered_html h3:first-child { margin-top: 0.777em;}.rendered_html h4:first-child { margin-top: 1em;}.rendered_html h5:first-child { margin-top: 1em;}.rendered_html h6:first-child { margin-top: 1em;}.rendered_html * + ul { margin-top: 1em;}.rendered_html * + ol { margin-top: 1em;}.rendered_html pre,.rendered_html tr,.rendered_html th,.rendered_html td,.rendered_html * + table { margin-top: 1em;}.rendered_html * + p { margin-top: 1em;}.rendered_html * + img { margin-top: 1em;}.rendered_html img,.rendered_html img.unconfined,div.text_cell { /* Old browsers */ display: -webkit-box; -webkit-box-orient: horizontal; -webkit-box-align: stretch; display: -moz-box; -moz-box-orient: horizontal; -moz-box-align: stretch; display: box; box-orient: horizontal; box-align: stretch; /* Modern browsers */ display: flex; flex-direction: row; align-items: stretch;}@media (max-width: 540px) { div.text_cell > div.prompt { display: none; }}div.text_cell_render { /*font-family: "Helvetica Neue", Arial, Helvetica, Geneva, sans-serif;*/ outline: none; resize: none; width: inherit; border-style: none; padding: 0.5em 0.5em 0.5em 0.4em; color: #000; box-sizing: border-box; -moz-box-sizing: border-box; -webkit-box-sizing: border-box;}a.anchor-link:link { text-decoration: none; padding: 0px 20px; visibility: hidden;}h1:hover .anchor-link,h2:hover .anchor-link,h3:hover .anchor-link,h4:hover .anchor-link,h5:hover .anchor-link,h6:hover .anchor-link { visibility: visible;}.text_cell.rendered .input_area { display: none;}.text_cell.rendered .text_cell.unrendered .text_cell_render { display: none;}.cm-header-1,.cm-header-2,.cm-header-3,.cm-header-4,.cm-header-5,.cm-header-6 { font-weight: bold; font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;}.cm-header-1 { font-size: 185.7%;}.cm-header-2 { font-size: 157.1%;}.cm-header-3 { font-size: 128.6%;}.cm-header-4 { font-size: 110%;}.cm-header-5 { font-size: 100%; font-style: italic;}.cm-header-6 { font-size: 100%; font-style: italic;}.highlight .hll { background-color: #ffffcc }.highlight { background: #f8f8f8; }.highlight .c { color: #408080; font-style: italic } /* Comment */.highlight .err { border: 1px solid #FF0000 } /* Error */.highlight .k { color: #008000; font-weight: bold } /* Keyword */.highlight .o { color: #666666 } /* Operator */.highlight .ch { color: #408080; font-style: italic } /* Comment.Hashbang */.highlight .cm { color: #408080; font-style: italic } /* Comment.Multiline */.highlight .cp { color: #BC7A00 } /* Comment.Preproc */.highlight .cpf { color: #408080; font-style: italic } /* Comment.PreprocFile */.highlight .c1 { color: #408080; font-style: italic } /* Comment.Single */.highlight .cs { color: #408080; font-style: italic } /* Comment.Special */.highlight .gd { color: #A00000 } /* Generic.Deleted */.highlight .ge { font-style: italic } /* Generic.Emph */.highlight .gr { color: #FF0000 } /* Generic.Error */.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */.highlight .gi { color: #00A000 } /* Generic.Inserted */.highlight .go { color: #888888 } /* Generic.Output */.highlight .gp { color: #000080; font-weight: bold } /* Generic.Prompt */.highlight .gs { font-weight: bold } /* Generic.Strong */.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */.highlight .gt { color: #0044DD } /* Generic.Traceback */.highlight .kc { color: #008000; font-weight: bold } /* Keyword.Constant */.highlight .kd { color: #008000; font-weight: bold } /* Keyword.Declaration */.highlight .kn { color: #008000; font-weight: bold } /* Keyword.Namespace */.highlight .kp { color: #008000 } /* Keyword.Pseudo */.highlight .kr { color: #008000; font-weight: bold } /* Keyword.Reserved */.highlight .kt { color: #B00040 } /* Keyword.Type */.highlight .m { color: #666666 } /* Literal.Number */.highlight .s { color: #BA2121 } /* Literal.String */.highlight .na { color: #7D9029 } /* Name.Attribute */.highlight .nb { color: #008000 } /* Name.Builtin */.highlight .nc { color: #0000FF; font-weight: bold } /* Name.Class */.highlight .no { color: #880000 } /* Name.Constant */.highlight .nd { color: #AA22FF } /* Name.Decorator */.highlight .ni { color: #999999; font-weight: bold } /* Name.Entity */.highlight .ne { color: #D2413A; font-weight: bold } /* Name.Exception */.highlight .nf { color: #0000FF } /* Name.Function */.highlight .nl { color: #A0A000 } /* Name.Label */.highlight .nn { color: #0000FF; font-weight: bold } /* Name.Namespace */.highlight .nt { color: #008000; font-weight: bold } /* Name.Tag */.highlight .nv { color: #19177C } /* Name.Variable */.highlight .ow { color: #AA22FF; font-weight: bold } /* Operator.Word */.highlight .w { color: #bbbbbb } /* Text.Whitespace */.highlight .mb { color: #666666 } /* Literal.Number.Bin */.highlight .mf { color: #666666 } /* Literal.Number.Float */.highlight .mh { color: #666666 } /* Literal.Number.Hex */.highlight .mi { color: #666666 } /* Literal.Number.Integer */.highlight .mo { color: #666666 } /* Literal.Number.Oct */.highlight .sa { color: #BA2121 } /* Literal.String.Affix */.highlight .sb { color: #BA2121 } /* Literal.String.Backtick */.highlight .sc { color: #BA2121 } /* Literal.String.Char */.highlight .dl { color: #BA2121 } /* Literal.String.Delimiter */.highlight .sd { color: #BA2121; font-style: italic } /* Literal.String.Doc */.highlight .s2 { color: #BA2121 } /* Literal.String.Double */.highlight .se { color: #BB6622; font-weight: bold } /* Literal.String.Escape */.highlight .sh { color: #BA2121 } /* Literal.String.Heredoc */.highlight .si { color: #BB6688; font-weight: bold } /* Literal.String.Interpol */.highlight .sx { color: #008000 } /* Literal.String.Other */.highlight .sr { color: #BB6688 } /* Literal.String.Regex */.highlight .s1 { color: #BA2121 } /* Literal.String.Single */.highlight .ss { color: #19177C } /* Literal.String.Symbol */.highlight .bp { color: #008000 } /* Name.Builtin.Pseudo */.highlight .fm { color: #0000FF } /* Name.Function.Magic */.highlight .vc { color: #19177C } /* Name.Variable.Class */.highlight .vg { color: #19177C } /* Name.Variable.Global */.highlight .vi { color: #19177C } /* Name.Variable.Instance */.highlight .vm { color: #19177C } /* Name.Variable.Magic */.highlight .il { color: #666666 } /* Literal.Number.Integer.Long *//* Temporary definitions which will become obsolete with Notebook release 5.0 */.ansi-black-fg { color: #3E424D; }.ansi-black-bg { background-color: #3E424D; }.ansi-black-intense-fg { color: #282C36; }.ansi-black-intense-bg { background-color: #282C36; }.ansi-red-fg { color: #E75C58; }.ansi-red-bg { background-color: #E75C58; }.ansi-red-intense-fg { color: #B22B31; }.ansi-red-intense-bg { background-color: #B22B31; }.ansi-green-fg { color: #00A250; }.ansi-green-bg { background-color: #00A250; }.ansi-green-intense-fg { color: #007427; }.ansi-green-intense-bg { background-color: #007427; }.ansi-yellow-fg { color: #DDB62B; }.ansi-yellow-bg { background-color: #DDB62B; }.ansi-yellow-intense-fg { color: #B27D12; }.ansi-yellow-intense-bg { background-color: #B27D12; }.ansi-blue-fg { color: #208FFB; }.ansi-blue-bg { background-color: #208FFB; }.ansi-blue-intense-fg { color: #0065CA; }.ansi-blue-intense-bg { background-color: #0065CA; }.ansi-magenta-fg { color: #D160C4; }.ansi-magenta-bg { background-color: #D160C4; }.ansi-magenta-intense-fg { color: #A03196; }.ansi-magenta-intense-bg { background-color: #A03196; }.ansi-cyan-fg { color: #60C6C8; }.ansi-cyan-bg { background-color: #60C6C8; }.ansi-cyan-intense-fg { color: #258F8F; }.ansi-cyan-intense-bg { background-color: #258F8F; }.ansi-white-fg { color: #C5C1B4; }.ansi-white-bg { background-color: #C5C1B4; }.ansi-white-intense-fg { color: #A1A6B2; }.ansi-white-intense-bg { background-color: #A1A6B2; }.ansi-bold { font-weight: bold; }

I just got home from my sixth PyCon, and it was wonderful as usual. If you weren't able to attend���or even if you were���you'll find a wealth of entertaining and informative talks on the PyCon 2017 YouTube channel.

Two of my favorites this year were a complementary pair of talks on Python dictionaries by two PyCon regulars: Raymond Hettinger's Modern Python Dictionaries A confluence of a dozen great ideas and Brandon Rhodes' The Dictionary Even Mightier (a followup of his PyCon 2010 talk, The Mighty Dictionary)

Raymond's is a fascinating dive into the guts of the CPython dict implementation, while Brandon's focuses more on recent improvements in the dict's user-facing API. One piece both mention is the addition in Python 3.6 of a private dictionary version to aid CPython optimization efforts. In Brandon's words:

"PEP509 added a private version number... every dictionary has a version number, and elsewhere in memory a master version counter. And when you go and change a dictionary the master counter is incremented from a million to a million and one, and that value a million and one is written into the version number of that dictionary. So what this means is that you can come back later and know if it's been modified, without reading maybe its hundreds of keys and values: you just look and see if the version has increased since the last time you were there."

He later went on to say,

"[The version number] is internal; I haven't seen an interface for users to get to it..."

which, of course, I saw as an implicit challenge. So let's expose it!

 •  0 comments  •  flag
Share on Twitter
Published on May 26, 2017 09:00

March 30, 2017

A Practical Guide to the Lomb-Scargle Periodogram

This week I published the preprint of a manuscript that started as a blog post, but quickly out-grew this medium: Understanding the Lomb-Scargle Periodogram.

Figure 24 from Understanding the Lomb-Scargle Periodogram. The figure shows the true period vs the periodogram peak for asimulated dataset with an observing cadence typical of ground-based optical astronomy.The simulation reveals common patterns of failure of the Lomb-Scargle method that are notoften discussed explicitly, but are straightforward to explain based on the intuitiondeveloped in the paper; see Section 7.2 for a detailed discussion.[image error]

Over the last couple years I've written a number of Python implementations of the Lomb-Scargle periodogram (I'd recommend AstroPy's LombScargle in most cases today), and also wrote a marginally popular blog post and somewhat pedagogical paper on the subject.This all has led to a steady trickle of emails from students and researchers asking for advice on applying and interpreting the Lomb-Scargle algorithm, particularly for astronomical data.I noticed that these queries tended to repeat many of the same questions and express some similar misconceptions, and this paper is my attempt to address those once and for all ��� in a "mere" 55 pages (which includes 26 figures and 4 full pages of references, so it's not all that bad).

 •  0 comments  •  flag
Share on Twitter
Published on March 30, 2017 06:00

March 22, 2017

Group-by From Scratch

I've found one of the best ways to grow in my scientific coding is to spend time comparing the efficiency of various approaches to implementing particular algorithms that I find useful, in order to build an intuition of the performance of the building blocks of the scientific Python ecosystem.

In this vein, today I want to take a look at an operation that is in many ways fundamental to data-driven exploration: the group-by, otherwise known as the split-apply-combine pattern.An architypical example of a summation group-by is shown in this figure, borrowed from the Aggregation and Grouping section of the Python Data Science Handbook:

[image error]

The basic idea is to split the data into groups based on some value, apply a particular operation to the subset of data within each group (often an aggregation), and then combine the results into an output dataframe.Python users generally turn to the Pandas library for this type of operation, where it is is implemented effiently via a concise object-oriented API:

 •  0 comments  •  flag
Share on Twitter
Published on March 22, 2017 10:00

March 8, 2017

Triple Pendulum CHAOS!

Earlier this week a tweet made the rounds which features a video that nicely demonstrates chaotic dynamical systems in action:


A visualization of chaos: 41 triple pendulums with very slightly different initial conditions pic.twitter.com/CTiABFVWHW

— Fermat's Library (@fermatslibrary) March 5, 2017

Edit: a reader pointed out that the original creator of this animation posted it on reddit in 2016.

Naturally, I immediately wondered whether I could reproduce this simlulation in Python.This post is the result.

 •  0 comments  •  flag
Share on Twitter
Published on March 08, 2017 07:00

March 3, 2017

Reproducible Data Analysis in Jupyter

Jupyter notebooks provide a useful environment for interactive exploration of data. A common question I get, though, is how you can progress from this nonlinear, interactive, trial-and-error style of exploration to a more linear and reproducible analysis based on organized, packaged, and tested code. This series of videos presents a case study in how I personally approach reproducible data analysis within the Jupyter notebook.

Each video is approximately 5-8 minutes; the videos areavailable in a YouTube Playlist.Alternatively, below you can find the videos with some description and links to relevant resources

 •  0 comments  •  flag
Share on Twitter
Published on March 03, 2017 07:00

Jake VanderPlas's Blog

Jake VanderPlas
Jake VanderPlas isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Jake VanderPlas's blog with rss.