<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

 <title>Searching Gradients with Huy Nguyen</title>
 <link href="https://everyhue.meatom.xml" rel="self"/>
 <link href="https://everyhue.me"/>
 <updated>2020-06-08T11:06:27-07:00</updated>
 <id>https://everyhue.me</id>
 <author>
   <name>Huy Nguyen</name>
 </author>

 
 <entry>
   <title>Why Uncertainty Matters in Deep Learning and How to Estimate It</title>
   <link href="https://everyhue.me/posts/why-uncertainty-matters/"/>
   <updated>2020-01-20T00:00:00-08:00</updated>
   <id>https://everyhue.me/posts/why-uncertainty-matters</id>
   

   <content type="html">&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#more-trustworthy-models&quot; id=&quot;markdown-toc-more-trustworthy-models&quot;&gt;More trustworthy models&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#uncertainty-estimates-what-are-they-good-for&quot; id=&quot;markdown-toc-uncertainty-estimates-what-are-they-good-for&quot;&gt;Uncertainty estimates, what are they good for?&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#sources-of-uncertainty&quot; id=&quot;markdown-toc-sources-of-uncertainty&quot;&gt;Sources of uncertainty&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#how-to-estimate-uncertainty-overview&quot; id=&quot;markdown-toc-how-to-estimate-uncertainty-overview&quot;&gt;How to estimate uncertainty (overview)&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#monte-carlo-dropout&quot; id=&quot;markdown-toc-monte-carlo-dropout&quot;&gt;Monte Carlo Dropout&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#deep-ensembles&quot; id=&quot;markdown-toc-deep-ensembles&quot;&gt;Deep Ensembles&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#regression&quot; id=&quot;markdown-toc-regression&quot;&gt;Estimating uncertainty for regression&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#classification&quot; id=&quot;markdown-toc-classification&quot;&gt;Estimating uncertainty for classification&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#sample-code&quot; id=&quot;markdown-toc-sample-code&quot;&gt;Sample code&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#evaluation&quot; id=&quot;markdown-toc-evaluation&quot;&gt;How good are these uncertainty estimates?&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#moving-beyond-traditional-leaderboard-metrics&quot; id=&quot;markdown-toc-moving-beyond-traditional-leaderboard-metrics&quot;&gt;Moving beyond traditional leaderboard metrics&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#references&quot; id=&quot;markdown-toc-references&quot;&gt;References&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;more-trustworthy-models&quot;&gt;More trustworthy models&lt;/h3&gt;

&lt;p&gt;For &lt;a href=&quot;https://arxiv.org/abs/1606.06565&quot;&gt;safety&lt;/a&gt; critical systems and infrastructure, you need to know when to trust a model’s prediction and when you should be more cautious about its output.&lt;/p&gt;

&lt;!--
Suppose you train a model to detect pedestrians, but only had access to *day time* videos for training data. Would you trust your model's predictions if had to run it . But, would your model know that it's operating in a completely new environment? Or would it just go about predicting everything at night as &quot;no pedestrian&quot;?

How would you know when to trust your model? --&gt;

&lt;p&gt;The problem with conventional deep neural networks is that they only provide &lt;a href=&quot;https://en.wikipedia.org/wiki/Point_estimation&quot;&gt;point-estimates&lt;/a&gt; which are single predictive values given some input data. What you don’t get with these models is a measure of how uncertain they are for any given prediction.&lt;/p&gt;

&lt;!-- For classification, these predictions take the form of a single class label (and a softmax score), and in the case of regression, a single output number. --&gt;

&lt;p&gt;While the softmax “probability” of a neural network classifier is commonly interpreted and used as a heuristic for uncertainty, scientists have shown that this measure often &lt;a href=&quot;https://arxiv.org/abs/1412.1897&quot;&gt;overestimates its confidence&lt;/a&gt;, even on examples that are unrecognizable when compared to a model’s training set.&lt;/p&gt;

&lt;p&gt;What we really need is an &lt;em&gt;accurate&lt;/em&gt; measure of uncertainty. Having one lets your model say “I don’t know” when it encounters a distribution of data that it wasn’t trained with or when it evaluates an example that can be interpreted in multiple ways.&lt;/p&gt;

&lt;!-- If you had access to such a measure, you'd be able to take different and more [useful](#what_you_can_do) actions based upon how confident the model was of its prediction. --&gt;

&lt;h3 id=&quot;uncertainty-estimates-what-are-they-good-for&quot;&gt;Uncertainty estimates, what are they good for?&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Accurate&lt;/em&gt; uncertainty estimates give you more flexibility when dealing with your model. You don’t have to always assign the same amount of trust to every model prediction, and likewise, you don’t always have to take the same action based upon those predictions.&lt;/p&gt;

&lt;p&gt;For example, when uncertainty is high you can decide to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Refrain from taking any action on the prediction at all.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.cs.ox.ac.uk/people/angelos.filos/publications/diabetic_retinopathy_diagnosis.pdf&quot;&gt;Refer&lt;/a&gt; the particular piece of data to a human to make a final call.&lt;/li&gt;
  &lt;li&gt;Collect more examples that cause high uncertainty to retrain your model.&lt;/li&gt;
  &lt;li&gt;Implement a tiered prediction strategy by sending the data to a slower, but more accurate model, when uncertainty is high.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Blindly assigning the same amount of trust and action for every model prediction can lead to &lt;a href=&quot;https://www.wired.com/story/when-it-comes-to-gorillas-google-photos-remains-blind/&quot;&gt;embarrassing&lt;/a&gt;, &lt;a href=&quot;https://mpra.ub.uni-muenchen.de/69383/1/MPRA_paper_69383.pdf&quot;&gt;serious&lt;/a&gt;, or even &lt;a href=&quot;https://www.ntsb.gov/news/events/Pages/2019-HWY18MH010-BMG.aspx&quot;&gt;fatal&lt;/a&gt; mistakes in your autonomous systems.&lt;/p&gt;

&lt;h3 id=&quot;sources-of-uncertainty&quot;&gt;Sources of uncertainty&lt;/h3&gt;

&lt;p&gt;We have to understand where uncertainty comes from to know how we can account for it in our models. Let’s look at two major sources of uncertainty in predictive modeling below.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Aleatoric uncertainty&lt;/strong&gt; is uncertainty arising from noisy data. The noise can come from observation noise or it can come from the underlying process itself. High Aleatoric uncertainty indicate that small changes in the input data lead to large variances in the target data.&lt;/p&gt;

&lt;!-- ![image](https://user-images.githubusercontent.com/121183/73386945-95872600-4284-11ea-89f1-3cfc6f377715.png) --&gt;

&lt;figure style=&quot;float:left; width: 45%; margin-right:1em;&quot; class=&quot;fig&quot;&gt;
    &lt;img src=&quot;/media/3017/aleatoric_uncertainty_obs.png&quot; height=&quot;200&quot; /&gt;
    &lt;figcaption&gt;
    Noisy measurements of an underlying process leading to high Aleatoric uncertainty.
    &lt;/figcaption&gt;
&lt;/figure&gt;
&lt;figure style=&quot;float:left; width:45%;&quot; class=&quot;fig&quot;&gt;
    &lt;img src=&quot;/media/3017/aleatoric_uncertainty_prc.png&quot; height=&quot;200&quot; /&gt;
    &lt;figcaption&gt;
    Even with error-less observations, a noisy underlying process can give rise to high Aleatoric uncertainty.
    &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;&lt;strong&gt;Epistemic uncertainty&lt;/strong&gt; is uncertainty arising from a noisy model. Given the same input, high epistemic uncertainty indicate that small changes in model parameters give rise to large changes in model predictions. This type of uncertainty commonly occurs when models are evaluated on data whose distribution is different from the training data.&lt;/p&gt;

&lt;figure class=&quot;fig&quot;&gt;
    &lt;img src=&quot;/media/3017/epistemic_uncertainty.png&quot; width=&quot;100%&quot; /&gt;
    &lt;figcaption&gt;
    Epistemic uncertainty can arise in regions of the data space where there are few observations for training. This is because when there is little training data, many plausible model parameters may suffice for explaining the underlying ground truth phenomenon.
    &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;!-- When deploying models into real-world use cases, it is not uncommon for your model to encounter data that comes from a distribution that is drastically different than the one that it was trained on. Or alternatively, your data distribution may shift over time.

For example, let's say you trained a *dogs* vs. *cats* image classifier using a conventional convolutional neural network. Without uncertainty, you would be unable to determine that a photo of a *car* belonged to neither of these classes. In effect, your model would be forced to make a prediction of either *cat* or *dog* instead of being able to predict &quot;I don't know&quot; when encountering new data. --&gt;

&lt;!-- While the above example covers a benign situation, the shortcomings of a model without uncertainty estimates  extends to more serious situations, such as in medical tasks like [detecting diabetic retinopathy detection][diabetic] using images of retinas. --&gt;

&lt;!-- For further info, see  [section 1.2][yaringal_intro] of Yarin Gal's Thesis on Bayesian Learning. --&gt;

&lt;!-- ![uncertainty](https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41598-017-17876-z/MediaObjects/41598_2017_17876_Fig5_HTML.jpg?as=webp) --&gt;

&lt;h3 id=&quot;how-to-estimate-uncertainty-overview&quot;&gt;How to estimate uncertainty (overview)&lt;/h3&gt;

&lt;p&gt;There is ongoing research into how best to estimate uncertainty and it is not a solved problem yet. &lt;a href=&quot;http://proceedings.mlr.press/v48/gal16.pdf&quot;&gt;Monte Carlo Dropout&lt;/a&gt; and &lt;a href=&quot;https://arxiv.org/abs/1612.01474&quot;&gt;Model Ensembling&lt;/a&gt; are methods that have garnered recent attention because they mesh well with existing neural network architectures and are less computationally constrained than other methods.&lt;/p&gt;

&lt;p&gt;These methods try to account for both aleatoric and epistemic uncertainty.&lt;/p&gt;

&lt;p&gt;Although the authors behind these methods give different motivations for their work, both approach the problem of estimating uncertainty using a similar strategy:&lt;/p&gt;

&lt;p&gt;They first propose modifying your neural network to &lt;em&gt;estimate probability distributions&lt;/em&gt; rather than point-estimates (we talk more on this for the tasks of &lt;a href=&quot;#regression&quot;&gt;regression&lt;/a&gt; and &lt;a href=&quot;#classification&quot;&gt;classification&lt;/a&gt; below). In doing so, we allow our models to capture Aleatoric Uncertainty during training.&lt;/p&gt;

&lt;p&gt;Second, they propose that at prediction time, you sample and combine predictions from multiple realizations of your neural network to get a final prediction and its uncertainty. This procedure helps our networks capture Epistemic uncertainty.&lt;/p&gt;

&lt;h4 id=&quot;monte-carlo-dropout&quot;&gt;Monte Carlo Dropout&lt;/h4&gt;

&lt;p&gt;The deep learning community often uses &lt;a href=&quot;https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf&quot;&gt;Dropout&lt;/a&gt; to prevent models from overfitting. The idea is easy to implement: just randomly zero-out activations in your neural network with rate &lt;script type=&quot;math/tex&quot;&gt;p&lt;/script&gt; at training time and scale your activations by &lt;script type=&quot;math/tex&quot;&gt;p&lt;/script&gt; at test-time.&lt;/p&gt;

&lt;p&gt;It turns out Dropout with some modifications is also useful for estimating uncertainty as described by &lt;a href=&quot;http://proceedings.mlr.press/v48/gal16.pdf&quot;&gt;Gal and Ghahramani&lt;/a&gt;. As long as your neural networks are trained with a few Dropout layers, you can use this method at prediction-time to obtain an estimate of uncertainty for your model.&lt;/p&gt;

&lt;p&gt;The approach works by combining the predictions of several “realizations” of your neural network, which are essentially multiple forward passes of the same data point &lt;script type=&quot;math/tex&quot;&gt;\mathbf{x}&lt;/script&gt; through your network while applying different dropout masks &lt;script type=&quot;math/tex&quot;&gt;\mathbf{w}_t&lt;/script&gt;.&lt;/p&gt;

&lt;p&gt;Unlike traditional Dropout networks, Monte Carlo Dropout (MC Dropout) networks applies dropout both at train-time and at test-time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Algorithm:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Train a neural network &lt;script type=&quot;math/tex&quot;&gt;f_\theta(\mathbf{x})&lt;/script&gt; containing Dropout layers and a probabilistic loss appropriate for either your &lt;a href=&quot;#regression&quot;&gt;regression&lt;/a&gt; or &lt;a href=&quot;#classification&quot;&gt;classification&lt;/a&gt; task (see below) .&lt;/li&gt;
  &lt;li&gt;At test time, perform &lt;script type=&quot;math/tex&quot;&gt;T&lt;/script&gt; stochastic forward passes through &lt;script type=&quot;math/tex&quot;&gt;f_\theta(\mathbf{x})&lt;/script&gt; to obtain predictions for input &lt;script type=&quot;math/tex&quot;&gt;\mathbf{x}&lt;/script&gt;.&lt;/li&gt;
  &lt;li&gt;Depending on whether you are doing &lt;a href=&quot;#regression_pred&quot;&gt;regression&lt;/a&gt; or &lt;a href=&quot;#classification_pred&quot;&gt;classification&lt;/a&gt;, “combine” predictions as described below to obtain an Expectation-based prediction and uncertainty estimate.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 id=&quot;deep-ensembles&quot;&gt;Deep Ensembles&lt;/h4&gt;

&lt;p&gt;Another way to estimate uncertainty is by using model ensembling as described in the paper &lt;a href=&quot;https://arxiv.org/abs/1612.01474&quot;&gt;Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The approach is quite similar to MC Dropout, and in fact, one way to interpret MC Dropout is to view it as a form of model ensembling.&lt;/p&gt;

&lt;p&gt;The major difference with this approach is that rather than using a &lt;em&gt;single&lt;/em&gt; trained network to make predictions with several randomly sampled dropout masks, we use &lt;script type=&quot;math/tex&quot;&gt;M&lt;/script&gt; trained models initialized from random starting points to collect our Monte Carlo samples.&lt;/p&gt;

&lt;h4 id=&quot;regression&quot;&gt;Estimating uncertainty for regression&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Defining the Probabilistic Loss Function:&lt;/strong&gt; When training models for the regression task, we usually minimize the error between some target values &lt;script type=&quot;math/tex&quot;&gt;y&lt;/script&gt; and predicted values &lt;script type=&quot;math/tex&quot;&gt;\hat{y}&lt;/script&gt; using the &lt;a href=&quot;https://en.wikipedia.org/wiki/Mean_squared_error&quot;&gt;Mean Squared Error&lt;/a&gt; loss.&lt;/p&gt;

&lt;p&gt;To obtain uncertainty estimates with MC Dropout or Model Ensembling however, we must take a more probabalistic view. Rather than predicting a single scalar value &lt;script type=&quot;math/tex&quot;&gt;\hat{y}&lt;/script&gt;, we assume our target data is normally distributed and predict a &lt;a href=&quot;https://en.wikipedia.org/wiki/Normal_distribution&quot;&gt;Gaussian distribution&lt;/a&gt; &lt;script type=&quot;math/tex&quot;&gt;\mathcal{N}&lt;/script&gt; parameterized with mean &lt;script type=&quot;math/tex&quot;&gt;\hat{\mu}&lt;/script&gt; and variance &lt;script type=&quot;math/tex&quot;&gt;\hat{\sigma}^2&lt;/script&gt;.&lt;/p&gt;

&lt;p&gt;\begin{equation}
    \hat{\mu}, \hat{\sigma}^2 = f_\theta(\mathbf{x})
\end{equation}&lt;/p&gt;

&lt;p&gt;\begin{equation}
    p_{\theta}(y | \mathbf{x}) = \mathcal{N}(\hat{\mu}, \hat{\sigma}^2)
\end{equation}&lt;/p&gt;

&lt;p&gt;For our loss, instead of minimizing the difference to the predicted and target variable, we minimize the difference of our predictive distribution to the target distribution using the &lt;a href=&quot;https://www.cs.princeton.edu/courses/archive/spring08/cos424/scribe_notes/0214.pdf&quot;&gt;Negative Log Likelihood&lt;/a&gt; loss:&lt;/p&gt;

&lt;p&gt;\begin{equation}
    - \log{p_{\theta}(y | \mathbf{x}}) = \frac{\log{\hat{\sigma}^2}}{2} + \frac{(y - \hat{\mu})^2}{2\hat{\sigma}^2}
\end{equation}&lt;/p&gt;

&lt;p&gt;(As an aside, I find this loss quite facinating. Notice here, we’re never explicitly providing the network an “uncertainty label” or target &lt;script type=&quot;math/tex&quot;&gt;\sigma^2&lt;/script&gt;. The network implicitly learns to capture the variance through the balance of the &lt;script type=&quot;math/tex&quot;&gt;\hat{\sigma}^2&lt;/script&gt; terms in the numerator and denominator.)&lt;/p&gt;

&lt;p id=&quot;regression_pred&quot;&gt;&lt;strong&gt;Making Predictions and Quantifying Uncertainty:&lt;/strong&gt; Once we’ve trained our model, if we’re performing Monte Carlo Dropout, we sample dropout masks &lt;script type=&quot;math/tex&quot;&gt;\mathbf{w_t}&lt;/script&gt; and perform forward passes through &lt;script type=&quot;math/tex&quot;&gt;f_\theta(\mathbf{x; \mathbf{w_t}})&lt;/script&gt; to obtain &lt;script type=&quot;math/tex&quot;&gt;T&lt;/script&gt; samples:&lt;/p&gt;

&lt;p&gt;\begin{equation}
    \hat{\mu}_{t}, \hat{\sigma}_{t}^2 =  f_\theta(\mathbf{x; \mathbf{w_t}})
\end{equation}&lt;/p&gt;

&lt;p&gt;With these Monte Carlo samples &lt;script type=&quot;math/tex&quot;&gt;\hat{\mu}_{t}&lt;/script&gt; , &lt;script type=&quot;math/tex&quot;&gt;\hat{\sigma}_{t}^2&lt;/script&gt; in hand, we can now compute our final regression prediction &lt;script type=&quot;math/tex&quot;&gt;\hat{y}_{\ast}&lt;/script&gt; and its uncertainty &lt;script type=&quot;math/tex&quot;&gt;\hat{\sigma}_{\ast}^{2}&lt;/script&gt; :&lt;/p&gt;

&lt;p&gt;\begin{equation}
    \hat{y}_{\ast} = \frac{1}{T}\sum_{t\in{T}}{\hat{\mu}_{t}}
\end{equation}&lt;/p&gt;

&lt;p&gt;\begin{equation}
    \hat{\sigma}_{\ast}^{2} = \frac{1}{T}\sum_{t\in{T}}{(\hat{\sigma}_{t}^2 + \hat{\mu}_{t}^2)} - \hat{y}_{\ast}^2
\end{equation}&lt;/p&gt;

&lt;p&gt;If we’re using Model Ensembling, rather than performing &lt;script type=&quot;math/tex&quot;&gt;T&lt;/script&gt; forward passes through a single network with randomly sampled &lt;script type=&quot;math/tex&quot;&gt;\mathbf{w}_t&lt;/script&gt; dropout masks, we instead get our predictions from &lt;script type=&quot;math/tex&quot;&gt;M&lt;/script&gt; trained models whose parameters &lt;script type=&quot;math/tex&quot;&gt;\mathbf{\theta}_m&lt;/script&gt; are initialized to random starting points. Everything else remains the same.&lt;/p&gt;

&lt;p&gt;Note: These formulations for regression are described in follow-on papers to the MC Dropout paper from &lt;a href=&quot;https://arxiv.org/abs/1703.04977&quot;&gt;Kendall and Gal&lt;/a&gt;; &lt;a href=&quot;https://arxiv.org/abs/1612.01474&quot;&gt;Lakshminarayanan et. al&lt;/a&gt; also derives the same formulation using &lt;a href=&quot;https://en.wikipedia.org/wiki/Scoring_rule&quot;&gt;Proper Scoring Rules&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;classification&quot;&gt;Estimating uncertainty for classification&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Defining the Probabilistic Loss Function:&lt;/strong&gt; The great news is that for classification, we do not need to modify the loss in order to obtain meaningful uncertainty estimates using Monte Carlo Dropout or Model Ensembling. This is because the predictions of a conventional neural network classifier uses the &lt;a href=&quot;https://en.wikipedia.org/wiki/Softmax_function&quot;&gt;Softmax&lt;/a&gt; function which already parameterizes a discrete probability distribution.&lt;/p&gt;

&lt;p&gt;Likewise, the &lt;a href=&quot;https://en.wikipedia.org/wiki/Cross_entropy&quot;&gt;Cross Entropy loss&lt;/a&gt; used to optimize neural network classifiers is already minimizing the the difference between our target and predictive distributions (it’s basically another name for the negative log likelihood). For these reasons, we can keep our classification network’s loss mechanics exactly the same.&lt;/p&gt;

&lt;p id=&quot;classification_pred&quot;&gt;&lt;strong&gt;Making Predictions and Quantifying Uncertainty:&lt;/strong&gt; Once we’ve trained a standard network for classification, it is simple to obtain an expectation-based prediction and uncertainty estimate for our model.&lt;/p&gt;

&lt;p&gt;If we are performing MC Dropout, to get a final prediction &lt;script type=&quot;math/tex&quot;&gt;\mathbf{\hat{y}}_\ast&lt;/script&gt; , we can average the predicted softmax probabilities  over &lt;script type=&quot;math/tex&quot;&gt;T&lt;/script&gt; stochastic forward passes of the data &lt;script type=&quot;math/tex&quot;&gt;\mathbf{x}&lt;/script&gt; through our network &lt;script type=&quot;math/tex&quot;&gt;\mathbf{f}_{\theta}(\mathbf{x}; \mathbf{w}_t)&lt;/script&gt; by sampling random dropout mask &lt;script type=&quot;math/tex&quot;&gt;\mathbf{w}_t&lt;/script&gt; for each pass:&lt;/p&gt;

&lt;p&gt;\begin{equation}
    \mathbf{\hat{y}}_t = \mathit{Softmax}(\mathbf{f}_{\theta}(\mathbf{x}; \mathbf{w}_t))
\end{equation}&lt;/p&gt;

&lt;p&gt;\begin{equation}
    \mathbf{\hat{y}}_\ast = \frac{1}{T} \sum_{t}{\mathbf{\hat{y}}_t}
\end{equation}&lt;/p&gt;

&lt;p&gt;If we’re using Model Ensembling, rather than performing &lt;script type=&quot;math/tex&quot;&gt;T&lt;/script&gt; forward passes through a single network with randomly sampled &lt;script type=&quot;math/tex&quot;&gt;\mathbf{w}_t&lt;/script&gt; dropout masks, we instead get our predictions from &lt;script type=&quot;math/tex&quot;&gt;M&lt;/script&gt; trained models whose parameters &lt;script type=&quot;math/tex&quot;&gt;\mathbf{\theta}_m&lt;/script&gt; are initialized to random starting points. Everything else stays the same.&lt;/p&gt;

&lt;p&gt;We measure the uncertainty of our probabilistic prediction &lt;script type=&quot;math/tex&quot;&gt;\mathbf{\hat{y}_\ast}&lt;/script&gt; by computing its Entropy over its vector elements &lt;script type=&quot;math/tex&quot;&gt;\hat{y}_{\ast,c}&lt;/script&gt; :&lt;/p&gt;

&lt;p&gt;\begin{equation}
    H(\mathbf{\hat{y}_\ast}) = - \sum_c^C \hat{y}_{*,i} * {\log{\hat{y}_{\ast,c}}}
\end{equation}&lt;/p&gt;

&lt;h4 id=&quot;sample-code&quot;&gt;Sample code&lt;/h4&gt;

&lt;p&gt;We implement &lt;a href=&quot;https://www.tensorflow.org/&quot;&gt;Tensorflow 2.0&lt;/a&gt; code to perform Monte Carlo Dropout and Model Ensembling for both classification and regression in the following repository:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/huyng/incertae&quot;&gt;https://github.com/huyng/incertae&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s take a quick look at what the code does.&lt;/p&gt;

&lt;p&gt;We’ll first define our model below:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tensorflow.keras.layers&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Dense&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Dropout&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tensorflow.keras&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Sequential&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Sequential&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Dense&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;activation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'relu'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Dropout&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;.5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Dense&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;activation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'relu'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Dropout&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;.5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Dense&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;activation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Notice, that rather than outputting a single unit, we’re outputing 2 units at the end of the network for the parameters of our estimated Gaussian distribution &lt;script type=&quot;math/tex&quot;&gt;\hat{\mu}&lt;/script&gt; and &lt;script type=&quot;math/tex&quot;&gt;\hat{\sigma}^2&lt;/script&gt;.&lt;/p&gt;

&lt;p&gt;We’ll now define the loss for the regression task based on the equations above:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;gaussian_nll&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y_true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_pred&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;
    Gaussian negative log likelihood

    Note: to make training more stable, we optimize
    a modified loss by having our model predict log(sigma^2)
    rather than sigma^2.
    &quot;&quot;&quot;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;y_true&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reshape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y_true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_pred&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;si&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_pred&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;loss&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;si&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;squared_difference&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y_true&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;si&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;2.0&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;reduce_mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loss&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;compile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loss&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gaussian_nll&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;optimizer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'sgd'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Finally, we’ll define our prediction function that will provide us with both an uncertainty estimate and a expecation-based prediction from our model.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;numpy&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;predict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;'''
    Args:
        model: The trained keras model
        x: the input tensor with shape [N, M]
        T: the number of monte carlo trials to sample
    Returns:
        y_mean: The expected value of our prediction
        y_std: The standard deviation of our prediction
    '''&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;mu_arr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;si_arr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;y_pred&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;training&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_pred&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;si&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_pred&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;mu_arr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;si_arr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;si&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;mu_arr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mu_arr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;si_arr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;si_arr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;var_arr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;si_arr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;y_mean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mu_arr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;axis&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;y_variance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var_arr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mu_arr&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;axis&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_mean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;**&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;y_std&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;y_variance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_std&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I know it might seem odd that we’re setting &lt;code class=&quot;highlighter-rouge&quot;&gt;train=True&lt;/code&gt; when using the model to predict, but this is how the Keras framework determines that it needs to sample a random dropout mask when making a forward pass.&lt;/p&gt;

&lt;p&gt;Let’s use our function now:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;y_mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y_std&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;predict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;model&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Here, &lt;code class=&quot;highlighter-rouge&quot;&gt;y_mean&lt;/code&gt; is the expected value of our estimated distribution and &lt;code class=&quot;highlighter-rouge&quot;&gt;y_std&lt;/code&gt; is standard deviation and can be used for our uncertainty estimate.&lt;/p&gt;

&lt;!-- ![comparison](https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41598-017-17876-z/MediaObjects/41598_2017_17876_Fig4_HTML.jpg?as=webp) --&gt;

&lt;h3 id=&quot;evaluation&quot;&gt;How good are these uncertainty estimates?&lt;/h3&gt;

&lt;p&gt;We can review the experiments in &lt;a href=&quot;https://arxiv.org/abs/1912.10481v1&quot;&gt;A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks&lt;/a&gt; to get a sense of how well the proposed uncertainty estimates capture the concept of uncertainty.&lt;/p&gt;

&lt;p&gt;In this paper, the authors train a image classifier to predict whether a patient suffers from &lt;a href=&quot;https://en.wikipedia.org/wiki/Diabetic_retinopathy&quot;&gt;Diabetic Retinopathy&lt;/a&gt; based on pictures of the patient’s cornea. They train models using both techniques discussed in this blog post (in addition to a few other techniques used for estimating uncertainty).&lt;/p&gt;

&lt;p&gt;To test whether their uncertainty estimates meaningfully capture uncertainty, they propose a simple evaluation protocol:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Refer a fixed percentage of the test dataset to a expert human oracle by sweeping a threshold over the uncertainty estimates provided by their models.&lt;/li&gt;
  &lt;li&gt;Report their model’s accuracy on the remaining &lt;em&gt;retained&lt;/em&gt; data split (i.e. the samples that were not referred).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The idea behind this protocol is that an &lt;em&gt;accurate&lt;/em&gt; measure of uncertainty would prioritize referring out the examples with high uncertainty, and as a result, the retained data would theoretically only contain examples that the model can predict with higher accuracy.&lt;/p&gt;

&lt;p&gt;Here are the results of running this protocol on the Diabetic Retinopathy dataset:&lt;/p&gt;

&lt;figure class=&quot;fig&quot;&gt;
    &lt;img src=&quot;/media/3017/evaluation.jpg&quot; width=&quot;100%&quot; /&gt;
    &lt;figcaption&gt;
    Plot of model accuracy on test data as the model refers less samples (based on the model's estimate of uncertainty) to a human oracle. For meaningful measures of uncertainty, we see accuracy increase as we decrease amount of retained data. Left: test images comes from the same machine type used for training data. Right: test images come from a different machine type than the one used for training data. [&lt;a href=&quot;https://arxiv.org/abs/1912.10481v1&quot;&gt;Filos et al.&lt;/a&gt;]
    &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;In the plot above, the authors evaluate their ability to estimate uncertainty both when the test data comes from the same distribution as the training data &lt;em&gt;and&lt;/em&gt; when the test data comes from a different distribution.&lt;/p&gt;

&lt;p&gt;As a sanity check we can look at the “Random Referal” baseline. Here, samples are chosen for referal at random regardless of their estimated uncertainty. As expected, randomly choosing samples neither increases nor decreases accuracy as we refer more samples to the human oracle. In contrast, both the MC Dropout and Model Ensembling methods increase accuracy as they refer more examples meaning their measures of uncertainty are finding examples that they are likely to get wrong when making a prediction.&lt;/p&gt;

&lt;p&gt;The “Deterministic” baseline uses a single neural network to make point-estimate predictions and computes its estimate of uncertainty using entropy. In other words, it’s the conventional neural network that everyone is use to working with. It does much worse than MC Dropout and Model Ensembling for both in-distribution and out-of-distribution test sets. Interestingly for out-of-distribution data, using its uncertainty estimates is &lt;em&gt;no better than random referral&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;These experiments make a strong case for implementing either MC Dropout or Model ensembling to obtain more accurate uncertainty estimates. If you have the computational budget, combining both approaches (i.e. “Ensemble MC Dropout”) could yield the best results.&lt;/p&gt;

&lt;h3 id=&quot;moving-beyond-traditional-leaderboard-metrics&quot;&gt;Moving beyond traditional leaderboard metrics&lt;/h3&gt;

&lt;p&gt;I wrote this article because in a world driven by leaderboard AUC and AP metrics, it was worth pointing out that there are other measures, specifically the quality of your model’s uncertainty estimates, that matter for production environments.&lt;/p&gt;

&lt;p&gt;We have to know when to trust our models as much as we have to find models with the highest possible accuracy. Uncertainty quantification gives us this ability and it gives us more flexibility for deciding what to do with our model predictions.&lt;/p&gt;

&lt;p&gt;While the article only scratches the surface of the field, hopefully you now have some basic tools to quantify uncertainty and you understand why uncertainty is so critical for developing trustworthy models.&lt;/p&gt;

&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1612.01474&quot;&gt;Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://proceedings.mlr.press/v48/gal16.pdf&quot;&gt;Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1412.1897&quot;&gt;Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1703.04977&quot;&gt;What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1606.06565&quot;&gt;Concrete Problems in AI Safety&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1912.10481v1&quot;&gt;A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1906.02530&quot;&gt;Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1706.04599&quot;&gt;On Calibration of Modern Neural Networks&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf&quot;&gt;Dropout: A Simple Way to Prevent Neural Networks from Overfitting&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.cs.ox.ac.uk/people/yarin.gal/website/thesis/1_introduction.pdf#page=7&quot;&gt;Yarin Gal’s Thesis - The Importance of Knowing What We Don’t Know&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://www.cs.ox.ac.uk/people/angelos.filos/publications/diabetic_retinopathy_diagnosis.pdf&quot;&gt;Benchmarking Bayesian Deep Learning with Diabetic Retinopathy Diagnosis&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.nature.com/articles/nature21056&quot;&gt;Dermatologist-level classification of skin cancer with deep neural networks&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/pdf/1802.09127.pdf&quot;&gt;Deep Bayesian Bandits Showdown&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.cs.princeton.edu/courses/archive/spring08/cos424/scribe_notes/0214.pdf&quot;&gt;Princeton COS424: Maximum Likelihood Estimation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.ntsb.gov/news/events/Pages/2019-HWY18MH010-BMG.aspx&quot;&gt;Collision Between Vehicle Controlled by Developmental Automated Driving System and Pedestrian&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.ntsb.gov/news/events/Pages/2019-HWY18MH010-BMG.aspx&quot;&gt;Fukushima: The Failure of Predictive Models&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.wired.com/story/when-it-comes-to-gorillas-google-photos-remains-blind/&quot;&gt;When It Comes to Gorillas, Google Photos Remains Blind&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</content>

 </entry>
 
 <entry>
   <title>Are your models calibrated?</title>
   <link href="https://everyhue.me/posts/multiclass-calibration/"/>
   <updated>2019-02-08T00:00:00-08:00</updated>
   <id>https://everyhue.me/posts/multiclass-calibration</id>
   

   <content type="html">&lt;p&gt;A calibration curve (sometimes called a “reliability diagram”) tells you whether your model’s predicted probabilities accurately reflect the real chance of your model being right.&lt;/p&gt;

&lt;p&gt;It’s very common for neural networks to &lt;a href=&quot;https://arxiv.org/abs/1706.04599&quot;&gt;overestimate&lt;/a&gt; the confidence in their predictions, and this type of diagram helps us detect when this phenomenon occurs. Here’s an example:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3018/deterministic_classification.svg&quot; style=&quot;box-sizing: border-box; padding: 1em 2em ; width:100%;&quot; /&gt;&lt;/p&gt;

&lt;p&gt;On the x-axis we have our model’s predicted confidence. On the y-axis we plot the model accuracy given its predicted confidence. We can see from this particular diagram that the model is “overconfident” when it makes a prediction in the range between &lt;code class=&quot;highlighter-rouge&quot;&gt;0.5&lt;/code&gt; to &lt;code class=&quot;highlighter-rouge&quot;&gt;0.7&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;calibration-curves-for-multiclass-classifiers&quot;&gt;Calibration curves for multiclass classifiers&lt;/h3&gt;

&lt;p&gt;Scikit learn provides a &lt;a href=&quot;https://scikit-learn.org/stable/modules/generated/sklearn.calibration.calibration_curve.html&quot;&gt;function&lt;/a&gt; to compute calibration curves for binary classification problems. However, in many cases we want to obtain the calibration curve for a model that makes predictions for more than 2 classes.&lt;/p&gt;

&lt;p&gt;We can look to &lt;a href=&quot;https://arxiv.org/abs/1706.04599&quot;&gt;Guo et al.&lt;/a&gt; to see how &lt;em&gt;they&lt;/em&gt; generate their calibration curve plots.&lt;/p&gt;

&lt;p&gt;They propose “binning” all predicted confidences into &lt;script type=&quot;math/tex&quot;&gt;M&lt;/script&gt; equally wide bins. Where &lt;script type=&quot;math/tex&quot;&gt;B_m&lt;/script&gt; is the bin containing the set of indices of samples that fall into interval &lt;script type=&quot;math/tex&quot;&gt;I_m = (\frac{m-1}{M}, \frac{m}{M}]&lt;/script&gt;.&lt;/p&gt;

&lt;p&gt;For each bin we can compute the bin accuracy (which is the y-axis on our graph) using the following formula:&lt;/p&gt;

&lt;script type=&quot;math/tex; mode=display&quot;&gt;acc(B_m) = \frac{1}{\vert B_m \vert} \sum_{i \in B_m}{\unicode{x1D7D9}(\hat{y}_i = y_i)}&lt;/script&gt;

&lt;p&gt;Here, &lt;script type=&quot;math/tex&quot;&gt;\unicode{x1D7D9}(\hat{y}_i = y_i)&lt;/script&gt; is &lt;script type=&quot;math/tex&quot;&gt;1&lt;/script&gt; if the example label &lt;script type=&quot;math/tex&quot;&gt;y_i&lt;/script&gt; belongs to the same class as the prediction &lt;script type=&quot;math/tex&quot;&gt;\hat{y}_i&lt;/script&gt;, and &lt;script type=&quot;math/tex&quot;&gt;0&lt;/script&gt; otherwise.&lt;/p&gt;

&lt;p&gt;To sum up, we compute the y-axis of our plot by first segmenting all predicted confidence scores into M bins. Each of these prediction scores are associated to a class &lt;script type=&quot;math/tex&quot;&gt;c&lt;/script&gt;. For each bin, we count the number of examples whose labels match the class &lt;script type=&quot;math/tex&quot;&gt;c&lt;/script&gt; associated to our predicted score and divide by the total count of items in the bin.&lt;/p&gt;

&lt;h3 id=&quot;code&quot;&gt;Code&lt;/h3&gt;

&lt;p&gt;Here is the code to compute and plot the calibration curves for your models in matplotlib.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;numpy&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;multiclass_calibration_curve&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;probs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bins&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;'''
    Args:
        probs (ndarray):
            NxM predicted probabilities for N examples and M classes.
        labels (ndarray):
            Vector of size N where each entry is an integer class label.
        bins (int):
            Number of bins to divide the prediction probabilities into.
    Returns:
        midpoints (ndarray):
            Midpoint value of each bin
        accuracies (ndarray):
            Fraction of examples that are positive in bin
        mean_confidences:
            Average predicted confidences in each bin
    '''&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;step_size&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bins&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;n_classes&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;probs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;labels_ohe&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;eye&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n_classes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;astype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)]&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;midpoints&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;mean_confidences&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;accuracies&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bins&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;beg&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;step_size&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;end&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;step_size&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;bin_mask&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;probs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;beg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;probs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;bin_cnt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bin_mask&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;astype&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;bin_confs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;probs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bin_mask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;bin_acc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;labels_ohe&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bin_mask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bin_cnt&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;midpoints&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;beg&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;2.&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;mean_confidences&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bin_confs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;accuracies&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bin_acc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;midpoints&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accuracies&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean_confidences&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;plot_multiclass_calibration_curve&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;probs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bins&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;title&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;'''
    Plot calibration curve
    '''&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;title&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'Reliability Diagram'&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;title&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;title&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;midpoints&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accuracies&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean_confidences&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;multiclass_calibration_curve&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;probs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;labels&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bins&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bins&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bar&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;midpoints&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accuracies&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bins&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;align&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'center'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lw&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'#000000'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'#2233aa'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alpha&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;label&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'Model'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zorder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scatter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;midpoints&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accuracies&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lw&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ec&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'black'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;#ffffff&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zorder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;plot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;linspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;linspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'--'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lw&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alpha&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;.7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'gray'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;label&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'Perfectly calibrated'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;zorder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ylim&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xlabel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;confidence'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ylabel&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'accuracy&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;title&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;title&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xticks&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;midpoints&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rotation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;legend&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'upper left'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tight_layout&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;midpoints&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;accuracies&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean_confidences&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;http://people.cs.pitt.edu/~milos/research/AAAI_Calibration.pdf&quot;&gt;Obtaining Well Calibrated Probabilities Using Bayesian Binning&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arxiv.org/abs/1706.04599&quot;&gt;On Calibration of Modern Neural Networks&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://scikit-learn.org/stable/modules/generated/sklearn.calibration.calibration_curve.html&quot;&gt;Scikit Learn Calibration Curve&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</content>

 </entry>
 
 <entry>
   <title>Similarity search 101 - Part 2 (Fast retrieval with vp-trees)</title>
   <link href="https://everyhue.me/posts/similarity-search-101-with-vantage-point-trees/"/>
   <updated>2014-03-26T00:00:00-07:00</updated>
   <id>https://everyhue.me/posts/similarity-search-101-with-vantage-point-trees</id>
   

   <content type="html">&lt;blockquote&gt;
  &lt;p&gt;This is the 2nd article in a two part series on similarity search. See &lt;a href=&quot;/posts/similarity-search-101-part-1/&quot;&gt;&lt;strong&gt;part 1&lt;/strong&gt;&lt;/a&gt; for an overview of the subject.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this second installment of my series on similarity search we’ll figure out how to improve on the speed and efficiency of querying our database for nearest neighbors using a data structure known as a “vantage point tree”.&lt;/p&gt;

&lt;p&gt;We previously used a brute force approach by computing pairwise distances between our query and all points in our dataset so that we could find items that were close to it.&lt;/p&gt;

&lt;p&gt;Unfortunately, this technique scales in &lt;script type=&quot;math/tex&quot;&gt;O( \# examples \times \# features)&lt;/script&gt; time which is prohibitively expensive on even modestly sized datasets.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/K-d_tree&quot;&gt;Kd-trees&lt;/a&gt;, and more recently &lt;a href=&quot;https://en.wikipedia.org/wiki/Vantage-point_tree&quot;&gt;vantage point trees&lt;/a&gt; (a.k.a vp-trees), have gained popularity within the machine learning community for their efficacy in reducing the computational cost of similarity search over large datasets.&lt;/p&gt;

&lt;p&gt;For this article, we’ll focus on examining how a vp-tree works.&lt;/p&gt;

&lt;h3 id=&quot;what-is-a-vantage-point-tree-and-how-do-we-construct-one&quot;&gt;What is a vantage point tree and how do we construct one?&lt;/h3&gt;

&lt;p&gt;In a nutshell, a vantage point tree structure allows us to store the elements of our dataset in such a way that during query time, we can quickly exclude from examination large portions of our data without having to perform any distance computations on the elments of that excluded portions.&lt;/p&gt;

&lt;p&gt;Let’s take a look at the basic structure of a vp-tree because it will allow us to understand how we can prune data from a search at query time.&lt;/p&gt;

&lt;p&gt;By definition, each node in a vp-tree stores at a minimum 5 pieces of information:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;A list of elements sampled from our dataset&lt;/li&gt;
  &lt;li&gt;A vantage point element chosen randomly from the list of elements above&lt;/li&gt;
  &lt;li&gt;A distance called &lt;em&gt;mu&lt;/em&gt;&lt;/li&gt;
  &lt;li&gt;A “left” child node&lt;/li&gt;
  &lt;li&gt;A “right” child node&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I’ll explain soon how all of these compoenents relate, but in the meantime here’s an illustration of the vp-tree concept:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3016/vptree.png&quot; alt=&quot;vantage point tree&quot; /&gt;&lt;/p&gt;

&lt;p&gt;At the root node of our tree, the list of elements consists of &lt;em&gt;every&lt;/em&gt; single item in our data set. From this list of items, we choose one element and designate it as our &lt;em&gt;vangate point&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;To choose &lt;script type=&quot;math/tex&quot;&gt;mu&lt;/script&gt;, we compute the median distance between our vantage point &lt;script type=&quot;math/tex&quot;&gt;vp&lt;/script&gt; and all other elements &lt;script type=&quot;math/tex&quot;&gt;P&lt;/script&gt; in the current node .&lt;/p&gt;

&lt;script type=&quot;math/tex; mode=display&quot;&gt;mu = median(\ dist(vp, p)\ )\ \ \ \forall p \in P&lt;/script&gt;

&lt;p&gt;We select all points within a distance &lt;script type=&quot;math/tex&quot;&gt;mu&lt;/script&gt; from the vantage point to assign elements to the &lt;em&gt;left child node&lt;/em&gt;.  And similarly, we can assign all points outside of &lt;script type=&quot;math/tex&quot;&gt;mu&lt;/script&gt; to the &lt;em&gt;right child node&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3016/left_right_child.png&quot; alt=&quot;elements for left and right child&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Since &lt;script type=&quot;math/tex&quot;&gt;mu&lt;/script&gt; is the median distance between the vantage point and all other points, this procedure effectively divides into half the elements we assign to the left and right child nodes.&lt;/p&gt;

&lt;p&gt;Finally, to construct the rest of the tree, we recursively follow this same procedure for each child node, until there are no more elements to assign to child nodes.&lt;/p&gt;

&lt;p&gt;Here is some pseudo code to build the tree:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;VPNode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;elements&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;left_child&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;right_child&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;build_vp_tree&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;elements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;VPNode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;select_random&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;elements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;median&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;distance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;elements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;left_elements&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;elements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;distance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;right_elements&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;elements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;where&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;distance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;e&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;left_child&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;build_vp_tree&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;left_elements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;right_child&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;build_vp_tree&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;right_elements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;nearest-neighbor-search-with-the-vantage-point-tree&quot;&gt;Nearest neighbor search with the vantage point tree&lt;/h3&gt;

&lt;p&gt;For a dataset encoded as a vantage point tree and a query point &lt;script type=&quot;math/tex&quot;&gt;q&lt;/script&gt;, how can we find the closest &lt;script type=&quot;math/tex&quot;&gt;k&lt;/script&gt; points in our dataset without running distance computations for every single element?&lt;/p&gt;

&lt;p&gt;One approach we could take is to say that for every &lt;script type=&quot;math/tex&quot;&gt;q&lt;/script&gt; there is some threshold distance &lt;script type=&quot;math/tex&quot;&gt;tau&lt;/script&gt; where &lt;em&gt;all&lt;/em&gt; of its closest &lt;script type=&quot;math/tex&quot;&gt;k&lt;/script&gt; neighbors are contained within this threshold. You can imagine this area as an enclsosed circle as depcited below:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3016/querypoint.png&quot; alt=&quot;query point and tau&quot; /&gt;&lt;/p&gt;

&lt;p&gt;There are three scenarios for how this query-tau area can relate to any node within our vantage point tree.&lt;/p&gt;

&lt;h4 id=&quot;pruning-the-left-child-node&quot;&gt;Pruning the left child node&lt;/h4&gt;

&lt;p&gt;The first scenario is if the area lies &lt;em&gt;completely&lt;/em&gt; outside of our vantage-point-mu radius as depicted below. If this is the case, we can safely assume that if we are to find &lt;script type=&quot;math/tex&quot;&gt;q&lt;/script&gt;’s nearest neighbors we can forego looking in our node’s left child, which contains all elements within the mu radius of this vantage point.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3016/disjoint_vp_q.png&quot; alt=&quot;query-tau and vp-mu areas are disjoint&quot; /&gt;&lt;/p&gt;

&lt;h4 id=&quot;pruning-the-right-child-node&quot;&gt;Pruning the right child node&lt;/h4&gt;

&lt;p&gt;The next scenario is when the query-tau area lies &lt;em&gt;completely&lt;/em&gt; inside the bounds of the vantage point’s mu-radius (see below). In this case, we can ignore all points outside of &lt;script type=&quot;math/tex&quot;&gt;mu&lt;/script&gt; which we had conveniently assigned to the right child node.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3016/union_vp_q.png&quot; alt=&quot;query-tau and vp-mu areas are disjoint&quot; /&gt;&lt;/p&gt;

&lt;h4 id=&quot;worst-case-we-check-both-left-and-right-child-nodes&quot;&gt;Worst case, we check both left and right child nodes&lt;/h4&gt;

&lt;p&gt;What happens when the query-tau area partially intersects with our node’s vantage point’s mu-radius?&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3016/intersect_vp_q.png&quot; alt=&quot;query-tau and vp-mu areas partially intersect&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In this scenario, we can’t say whether the right or left child contains the nearest neighbors, so we have to search both nodes.&lt;/p&gt;

&lt;h4 id=&quot;traversing-the-tree-to-find-nearest-neighbors&quot;&gt;Traversing the tree to find nearest neighbors&lt;/h4&gt;

&lt;p&gt;To summarize, when the query threshold area is &lt;em&gt;completely&lt;/em&gt; outside the bounds of our node’s vantage-point mu boundary, we can exclude or “prune” the left child node from our search space. When the query threshold is &lt;em&gt;completely&lt;/em&gt; inside the bounds of vantage-point mu boundary, we cans safely ignore the right child node. And finally when neither is the case, we must search both left and right child nodes.&lt;/p&gt;

&lt;p&gt;Now that we know how to behave when examining a single node, we can use this knowledge to find &lt;script type=&quot;math/tex&quot;&gt;q&lt;/script&gt;’s nearest neighbors by recursively shrinking &lt;script type=&quot;math/tex&quot;&gt;tau&lt;/script&gt; as we search down the vantage point tree.&lt;/p&gt;

&lt;p&gt;More concretely, we initialize &lt;script type=&quot;math/tex&quot;&gt;tau&lt;/script&gt; to be infinity. And as we traverse from the root node to each child node of the vp-tree, we set &lt;script type=&quot;math/tex&quot;&gt;tau&lt;/script&gt; to be equal to the lesser of the distance from &lt;script type=&quot;math/tex&quot;&gt;q&lt;/script&gt; to &lt;script type=&quot;math/tex&quot;&gt;vp&lt;/script&gt; or any previously seen &lt;script type=&quot;math/tex&quot;&gt;tau&lt;/script&gt;.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;find_nearest_neighbors&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tree&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;
    tree = the VP tree
    k    = # of nearest neighbors you wanted to find
    q    = query point
    &quot;&quot;&quot;&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;tau&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;infinity&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;nodes_to_visit&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tree&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;# fixed size array for nearest neightbors
&lt;/span&gt;    &lt;span class=&quot;c1&quot;&gt;# sorted from closest to farthest neighbor
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;neighbors&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PriorityQueue&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nodes_to_visit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;length&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nodes_to_visit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;popleft&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;distance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tau&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;c1&quot;&gt;# store node.vp as a neighbor if it's closer than any other point
&lt;/span&gt;            &lt;span class=&quot;c1&quot;&gt;# seen so far
&lt;/span&gt;            &lt;span class=&quot;n&quot;&gt;neighbors&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

            &lt;span class=&quot;c1&quot;&gt;# shrink tau
&lt;/span&gt;            &lt;span class=&quot;n&quot;&gt;farthest_nearest_neighbor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;neighbors&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;tau&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;distance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;farthest_nearest_neighbor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;# check for intersection between q-tau and vp-mu regions
&lt;/span&gt;        &lt;span class=&quot;c1&quot;&gt;# and see which branches we absolutely must search
&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tau&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;nodes_to_visit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;left_child&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tau&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;nodes_to_visit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;right_child&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tau&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;nodes_to_visit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;right_child&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mu&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tau&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;nodes_to_visit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;left_child&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;neighbors&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Here is the full source code for my &lt;a href=&quot;https://github.com/huyng/algorithms/tree/master/vptree&quot;&gt;python implementation of a vantage point tree&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;references&quot;&gt;references&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;http://www.cs.iastate.edu/~honavar/nndatastructures.pdf&quot;&gt;Data Structures and Algorithms for Nearest Neighbor Search in Metric Space&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;http://www1.cs.columbia.edu/CAVE/publications/pdfs/Kumar_ECCV08_2.pdf&quot;&gt;What is a Good Nearest Neighbors Algorithm
for Finding Similar Patches in Images?&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

</content>

 </entry>
 
 <entry>
   <title>Similarity search 101 - Part 1 (overview)</title>
   <link href="https://everyhue.me/posts/similarity-search-101-part-1/"/>
   <updated>2014-03-19T00:00:00-07:00</updated>
   <id>https://everyhue.me/posts/similarity-search-101-part-1</id>
   

   <content type="html">&lt;p&gt;In my work applying machine learning technologies, finding “similar” items is  one of the most common challenges that people come to my team with:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;“Can you find me the image that looks like this other image?”&lt;/li&gt;
  &lt;li&gt;“Which webpage is similar to this webpage?”&lt;/li&gt;
  &lt;li&gt;“Which song matches this audio clip I’ve recorded?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To more formally define the &lt;em&gt;similarity task&lt;/em&gt;, let’s assume we have a big database of items. Solving the this task means that when we are given a new item – let’s call it the &lt;em&gt;query&lt;/em&gt; – our algorithm is able to locate within the database the closest or most similar items to the query.&lt;/p&gt;

&lt;p&gt;Whether we’re dealing with pictures, audio clips, faces, documents, or DNA sequences, we can approach solving this problem using a common framework which I briefly summarize below.&lt;/p&gt;

&lt;h3 id=&quot;feature-extraction&quot;&gt;Feature extraction&lt;/h3&gt;

&lt;p&gt;First we need to know how to “represent” each item in our data set as a series of numbers. This process is often called “feature extraction” or “feature representation”.&lt;/p&gt;

&lt;p&gt;There is a large field of research focused on finding good feature representations for all kinds of data. And while it may not yield the most optimal results in terms of discounting noise in our data, we can take the raw signals themselves and convert them verbatim into our “feature vector” as a first step in building our similarity search engine.&lt;/p&gt;

&lt;p&gt;So for example, if our dataset is made up of 16 x 16 gray scale images, we would take the pixel value of each row and store them as an array of 256 numbers.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3015/extract-features.png&quot; alt=&quot;feature extraction&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;defining-a-similarity-metric&quot;&gt;Defining a similarity metric&lt;/h3&gt;

&lt;p&gt;Once we have a feature vector for every item in our dataset, we need a way of comparing one feature vector to another feature vector. If we view each feature vector as a point in some n-dimensional space, one can use the euclidean distance between two points &lt;script type=&quot;math/tex&quot;&gt;p&lt;/script&gt; and &lt;script type=&quot;math/tex&quot;&gt;q&lt;/script&gt; as a similarity metric:&lt;/p&gt;

&lt;script type=&quot;math/tex; mode=display&quot;&gt;distance = \sqrt{\sum_{i=1}^n (q_i-p_i)^2}&lt;/script&gt;

&lt;h3 id=&quot;brute-force-similarity-search&quot;&gt;Brute force similarity search&lt;/h3&gt;

&lt;p&gt;With the similarity metric and a feature extraction routine in place, we pretty much have a complete working similarity search system, albeit an efficient one. If given a query &lt;script type=&quot;math/tex&quot;&gt;q&lt;/script&gt;, we can find the closest item to &lt;script type=&quot;math/tex&quot;&gt;q&lt;/script&gt; in dataset &lt;script type=&quot;math/tex&quot;&gt;D&lt;/script&gt; using the following routine:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;find_nearest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;nearest_neighbor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;min_distance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;infinity&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;D&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;distance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;compute_distance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;q&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;min_distance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;distance&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;min_distance&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;distance&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;nearest_neighbor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;nearest_neighbor&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The approach above scales in &lt;script type=&quot;math/tex&quot;&gt;O(m \times n)&lt;/script&gt; time where &lt;script type=&quot;math/tex&quot;&gt;m&lt;/script&gt; is the number of items in our dataset &lt;script type=&quot;math/tex&quot;&gt;D&lt;/script&gt; and &lt;script type=&quot;math/tex&quot;&gt;n&lt;/script&gt; is the number of dimensions in our feature vector. So even for a modest database size of 1000 items and a feature vector of 256 dimensions, we could be computing up to 256 thousand operations with every single query!&lt;/p&gt;

&lt;h3 id=&quot;to-be-continued-&quot;&gt;To be continued …&lt;/h3&gt;

&lt;p&gt;In the next part of this series, we’ll take a look at using &lt;a href=&quot;https://en.wikipedia.org/wiki/Vantage-point_tree&quot;&gt;vantage point trees&lt;/a&gt; to see how we can more efficiently store our data set so that it’s less computationally intensive to search over.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/posts/similarity-search-101-with-vantage-point-trees/&quot;&gt;&lt;strong&gt;continue to part 2 →&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</content>

 </entry>
 
 <entry>
   <title>Faster Numpy Dot Product for Multi-dimensional Arrays</title>
   <link href="https://everyhue.me/posts/faster-numpy-dot-product/"/>
   <updated>2013-12-14T00:00:00-08:00</updated>
   <id>https://everyhue.me/posts/faster-numpy-dot-product</id>
   

   <content type="html">&lt;p&gt;When installed and linked correctly against a blas implementation like ATLAS or
OpenBlas, numpy’s &lt;code class=&quot;highlighter-rouge&quot;&gt;dot&lt;/code&gt; product can run incredibly fast. What I hadn’t known until recently is that for some special cases, you can perform dot products an order of magnitude faster just by calling the underlying blas routines directly.&lt;/p&gt;

&lt;p&gt;In order to do so, I’ve written a module called &lt;a href=&quot;https://gist.github.com/huyng/7969327#file-fastdot-py&quot;&gt;fastdot&lt;/a&gt;, whose source code is located in a github &lt;a href=&quot;https://gist.github.com/huyng/7969327#file-fastdot-py&quot;&gt;gist&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The code, in a nutshell, is a wrapper around blas’s generalized matrix-matrix multiplication routines (A.K.A “gemm”). Its main job is to ensure that the arrays that get passed into it are in FORTRAN contiguous order before handing-off the bulk of the work to the underlying blas routines.&lt;/p&gt;

&lt;h3 id=&quot;performance-comparison-between-dot-product-implementations&quot;&gt;Performance comparison between dot product implementations&lt;/h3&gt;

&lt;p&gt;I ran a few benchmarks and plotted below the speed differences between &lt;code class=&quot;highlighter-rouge&quot;&gt;fastdot.dot&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;np.dot&lt;/code&gt; operating on matrices of varying dimensions and sizes.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3014/dot_time.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;A shape            B shape     array dims    function       time (s)
-----------------  ----------  ------------  -----------  ----------
(8, 55, 55, 1024)  (1024, 92)  4d * 2d       np.dot           2.0168
(440, 55, 1024)    (1024, 92)  3d * 2d       np.dot           1.9883
(24200, 1024)      (1024, 92)  2d * 2d       np.dot           0.0944

(8, 55, 55, 1024)  (1024, 92)  4d * 2d       fastdot.dot      0.1187
(440, 55, 1024)    (1024, 92)  3d * 2d       fastdot.dot      0.0940
(24200, 1024)      (1024, 92)  2d * 2d       fastdot.dot      0.1342&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;As you can see &lt;code class=&quot;highlighter-rouge&quot;&gt;fastdot.dot&lt;/code&gt; runs almost &lt;strong&gt;20X&lt;/strong&gt; faster than &lt;code class=&quot;highlighter-rouge&quot;&gt;np.dot&lt;/code&gt; when one of your matrices has 3 dimensions or more. But before you start using it everywhere that you need a dot product, do take notice that it runs slightly slower than numpy’s implementation when operating on matrices with 2 dimensions or less.&lt;/p&gt;

&lt;p&gt;As a rule of thumb, use fastdot when you know ahead of time that the number of dimensions in your matrices will be larger than 3. It could increase your program’s performace by 20x. Otherwise stick with numpy’s implementation.&lt;/p&gt;

</content>

 </entry>
 
 <entry>
   <title>The JSON Streaming Record (JSRec) data format</title>
   <link href="https://everyhue.me/posts/json-streaming-record-data-format/"/>
   <updated>2013-10-30T00:00:00-07:00</updated>
   <id>https://everyhue.me/posts/json-streaming-record-data-format</id>
   
   <category term="programming" />
   
   <category term="python" />
   

   <content type="html">&lt;p&gt;I’m inventing a new data format (as blasphemous as it sounds).
It’s flexible, human readable, easy to produce, and best of
 all, nearly impossible to screw up parsing.&lt;/p&gt;

&lt;p&gt;I’m doing it to replace CSV files because you shouldn’t have to
worry about quoting, escaping, or deciding whether your “comma” sepearated
values turn out to really mean semicolon or even worse, tab delimited.&lt;/p&gt;

&lt;p&gt;The new data format is called &lt;strong&gt;json streaming record&lt;/strong&gt; or &lt;strong&gt;JSRec&lt;/strong&gt; for short.
While I say it’s “new”, I’m sure many of you have either produced or
consumed this type of data already at some point in your career.&lt;/p&gt;

&lt;p&gt;Here’s how it’s defined:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Files of this format have &lt;strong&gt;.jsrec&lt;/strong&gt; as their file extension&lt;/li&gt;
  &lt;li&gt;Each line in the file is a json hash map&lt;/li&gt;
  &lt;li&gt;Empty lines and lines beginning with ‘&lt;strong&gt;#&lt;/strong&gt;’ are considred comments and ignored during parsing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s an example file foobar.jsrec&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;{&quot;foo&quot;:1, &quot;bar&quot;: &quot;marry&quot;}
{&quot;foo&quot;:11, &quot;bar&quot;: &quot;had a&quot;}
{&quot;foo&quot;:21, &quot;bar&quot;: &quot;little lamb&quot;}

# some comments
{&quot;foo&quot;:33, &quot;bar&quot;: &quot;more data&quot;}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Here’s the code to parse and encode this data format:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;json&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;load_jsrec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;loads a .jsrec file&quot;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;fh&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;line&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;continue&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;startswith&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;#&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;continue&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;yield&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;loads&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;line&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;fh&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;close&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;dump_jsrec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;writes a .jsrec file&quot;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filename&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;w&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fh&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rec&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;records&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;fh&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;write&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;json&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dumps&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rec&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;a-new-use-case-building-data-processing-pipelines-with-jsrec&quot;&gt;A new use case: building data processing pipelines with JSRec&lt;/h3&gt;

&lt;p&gt;Because each line in a JSRec file contains all the information necessary
to parse a record, you can use it to pipe output from one program to another:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;cat foobar.jsrec | progA | progB
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As long as the programs you’re using understands JSRec, you can start chaining
them together. This is HUGE because it makes building data processing pipelines
on the commandline a modular and simple task.&lt;/p&gt;

&lt;h3 id=&quot;when-to-use-it&quot;&gt;When to use it&lt;/h3&gt;

&lt;p&gt;Json Streaming Record is an ideal replacement for CSV files. Use it when you want a
data format that can store “streams” of records that are human readable yet easily
parsed by a machine.&lt;/p&gt;

&lt;p&gt;With each line of the format being a completely self-contained JSON object, JSRec
allows you to produce and consume data in an incremental fashion. I encourage you to start
using it as a data format to pass around on the commandline for when you’re
building those data processing pipelines.&lt;/p&gt;

</content>

 </entry>
 
 <entry>
   <title>A Guide to Analyzing Python Performance</title>
   <link href="https://everyhue.me/posts/python-performance-analysis/"/>
   <updated>2013-09-03T00:00:00-07:00</updated>
   <id>https://everyhue.me/posts/python-performance-analysis</id>
   
   <category term="programming" />
   
   <category term="python" />
   

   <content type="html">&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#introduction&quot; id=&quot;markdown-toc-introduction&quot;&gt;Introduction&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#coarse-grain-timing-with-time&quot; id=&quot;markdown-toc-coarse-grain-timing-with-time&quot;&gt;Coarse grain timing with time&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#fine-grain-timing-with-a-timing-context-manager&quot; id=&quot;markdown-toc-fine-grain-timing-with-a-timing-context-manager&quot;&gt;Fine grain timing with a timing context manager&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#line-by-line-timing-and-execution-frequency-with-a-profiler&quot; id=&quot;markdown-toc-line-by-line-timing-and-execution-frequency-with-a-profiler&quot;&gt;Line-by-line timing and execution frequency with a profiler&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#how-much-memory-does-it-use&quot; id=&quot;markdown-toc-how-much-memory-does-it-use&quot;&gt;How much memory does it use?&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#ipython-shortcuts-for-line_profiler-and-memory_profiler&quot; id=&quot;markdown-toc-ipython-shortcuts-for-line_profiler-and-memory_profiler&quot;&gt;IPython shortcuts for line_profiler and memory_profiler&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#wheres-the-memory-leak&quot; id=&quot;markdown-toc-wheres-the-memory-leak&quot;&gt;Where’s the memory leak?&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#which-objects-are-the-most-common&quot; id=&quot;markdown-toc-which-objects-are-the-most-common&quot;&gt;Which objects are the most common?&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#which-objects-have-been-added-or-deleted&quot; id=&quot;markdown-toc-which-objects-have-been-added-or-deleted&quot;&gt;Which objects have been added or deleted?&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#what-is-referencing-this-leaky-object&quot; id=&quot;markdown-toc-what-is-referencing-this-leaky-object&quot;&gt;What is referencing this leaky object?&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#effort-vs-precision&quot; id=&quot;markdown-toc-effort-vs-precision&quot;&gt;Effort vs precision&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#refrences&quot; id=&quot;markdown-toc-refrences&quot;&gt;Refrences&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;introduction&quot;&gt;Introduction&lt;/h3&gt;

&lt;p&gt;While it’s not always the case that every Python program you write will require a rigorous performance analysis, it is reassuring to know that there are a wide variety of tools in Python’s ecosystem that one can turn to when the time arises.&lt;/p&gt;

&lt;p&gt;Analyzing a program’s performance boils down to answering 4 basic questions:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;How fast is it running?&lt;/li&gt;
  &lt;li&gt;Where are the speed bottlenecks?&lt;/li&gt;
  &lt;li&gt;How much memory is it using?&lt;/li&gt;
  &lt;li&gt;Where is memory leaking?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Below, we’ll dive into the details of answering these questions using some awesome tools.&lt;/p&gt;

&lt;h3 id=&quot;coarse-grain-timing-with-time&quot;&gt;Coarse grain timing with time&lt;/h3&gt;

&lt;p&gt;Let’s begin by using a quick and dirty method of timing our code: the good old unix utility &lt;code class=&quot;highlighter-rouge&quot;&gt;time&lt;/code&gt;.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;time &lt;/span&gt;python yourprogram.py

real    0m1.028s
user    0m0.001s
sys     0m0.003s&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The meaning between the three output measurements are detailed in this &lt;a href=&quot;http://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1&quot;&gt;stackoverflow article&lt;/a&gt;, but in short&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;real - refers to the actual elasped time&lt;/li&gt;
  &lt;li&gt;user - refers to the amount of cpu time spent outside of kernel&lt;/li&gt;
  &lt;li&gt;sys - refers to the amount of cpu time spent inside kernel specific functions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can get a sense of how many cpu cycles your program used up regardless of other programs running on the system by adding together the &lt;em&gt;sys&lt;/em&gt; and &lt;em&gt;user&lt;/em&gt; times.&lt;/p&gt;

&lt;p&gt;If the sum of &lt;em&gt;sys&lt;/em&gt; and &lt;em&gt;user&lt;/em&gt; times is much less than &lt;em&gt;real&lt;/em&gt; time, then you can guess that most your program’s performance issues are most likely related to IO waits.&lt;/p&gt;

&lt;h3 id=&quot;fine-grain-timing-with-a-timing-context-manager&quot;&gt;Fine grain timing with a timing context manager&lt;/h3&gt;

&lt;p&gt;Our next technique involves direct instrumentation of the code to get access to finer grain timing information. Here’s a small snippet I’ve found invaluable for making ad-hoc timing measurements:&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;timer.py&lt;/code&gt;&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;time&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Timer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;object&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;verbose&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;False&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;verbose&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;verbose&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__enter__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__exit__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;secs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;
        &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;msecs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;secs&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# millisecs
&lt;/span&gt;        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;verbose&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'elapsed time: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;f ms'&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;msecs&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;In order to use it, wrap blocks of code that you want to time with Python’s &lt;code class=&quot;highlighter-rouge&quot;&gt;with&lt;/code&gt; keyword and this &lt;code class=&quot;highlighter-rouge&quot;&gt;Timer&lt;/code&gt; context manager. It will take care of starting the timer when your code block begins execution and stopping the timer when your code block ends.&lt;/p&gt;

&lt;p&gt;Here’s an example use of the snippet:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;timer&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Timer&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;redis&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Redis&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rdb&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Redis&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Timer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;rdb&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lpush&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;foo&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;bar&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;=&amp;gt; elasped lpush: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s s&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;secs&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Timer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;rdb&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lpop&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;foo&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;=&amp;gt; elasped lpop: &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;s s&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;%&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;secs&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;I’ll often log the outputs of these timers to a file in order to see how my program’s performance evolves over time.&lt;/p&gt;

&lt;h3 id=&quot;line-by-line-timing-and-execution-frequency-with-a-profiler&quot;&gt;Line-by-line timing and execution frequency with a profiler&lt;/h3&gt;

&lt;p&gt;Robert Kern has a nice project called &lt;a href=&quot;http://packages.python.org/line_profiler/&quot;&gt;line_profiler&lt;/a&gt; which I often use to see how fast and how often each line of code is running in my scripts.&lt;/p&gt;

&lt;p&gt;To use it, you’ll need to install the python package via pip:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;pip &lt;span class=&quot;nb&quot;&gt;install &lt;/span&gt;line_profiler&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Once installed you’ll have access to a new module called “line_profiler” as well as an executable script “kernprof.py”.&lt;/p&gt;

&lt;p&gt;To use this tool, first modify your source code by decorating the function you want to measure with the &lt;code class=&quot;highlighter-rouge&quot;&gt;@profile&lt;/code&gt; decorator. Don’t worry, you don’t have to import anyting in order to use this decorator. The &lt;code class=&quot;highlighter-rouge&quot;&gt;kernprof.py&lt;/code&gt; script  automatically injects it into your script’s runtime during execution.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;primes.py&lt;/code&gt;&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;profile&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;primes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;elif&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;mroot&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;**&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.5&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;half&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mroot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;m&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;half&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;primes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Once you’ve gotten your code setup with the &lt;code class=&quot;highlighter-rouge&quot;&gt;@profile&lt;/code&gt; decorator, use &lt;code class=&quot;highlighter-rouge&quot;&gt;kernprof.py&lt;/code&gt; to run your script.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;kernprof.py &lt;span class=&quot;nt&quot;&gt;-l&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-v&lt;/span&gt; fib.py&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;The &lt;code class=&quot;highlighter-rouge&quot;&gt;-l&lt;/code&gt; option tells kernprof to inject the &lt;code class=&quot;highlighter-rouge&quot;&gt;@profile&lt;/code&gt; decorator into your script’s builtins, and &lt;code class=&quot;highlighter-rouge&quot;&gt;-v&lt;/code&gt; tells kernprof to display timing information once you’re script finishes. Here’s one the output should look like for the above script:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Wrote profile results to primes.py.lprof
Timer unit: 1e-06 s

File: primes.py
Function: primes at line 2
Total time: 0.00019 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     2                                           @profile
     3                                           def primes(n):
     4         1            2      2.0      1.1      if n==2:
     5                                                   return [2]
     6         1            1      1.0      0.5      elif n&amp;lt;2:
     7                                                   return []
     8         1            4      4.0      2.1      s=range(3,n+1,2)
     9         1           10     10.0      5.3      mroot = n ** 0.5
    10         1            2      2.0      1.1      half=(n+1)/2-1
    11         1            1      1.0      0.5      i=0
    12         1            1      1.0      0.5      m=3
    13         5            7      1.4      3.7      while m &amp;lt;= mroot:
    14         4            4      1.0      2.1          if s[i]:
    15         3            4      1.3      2.1              j=(m*m-3)/2
    16         3            4      1.3      2.1              s[j]=0
    17        31           31      1.0     16.3              while j&amp;lt;half:
    18        28           28      1.0     14.7                  s[j]=0
    19        28           29      1.0     15.3                  j+=m
    20         4            4      1.0      2.1          i=i+1
    21         4            4      1.0      2.1          m=2*i+3
    22        50           54      1.1     28.4      return [2]+[x for x in s if x]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Look for lines with a high amount of hits or a high time interval. These are the areas where optimizations can yield the greatest improvements.&lt;/p&gt;

&lt;h3 id=&quot;how-much-memory-does-it-use&quot;&gt;How much memory does it use?&lt;/h3&gt;

&lt;p&gt;Now that we have a good grasp on timing our code, let’s move on to figuring out how much memory our programs are using. Fortunately for us, Fabian Pedregosa has implemented a nice &lt;a href=&quot;https://github.com/fabianp/memory_profiler&quot;&gt;memory profiler&lt;/a&gt; modeled after Robert Kern’s line_profiler.&lt;/p&gt;

&lt;p&gt;First install it via pip:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;pip &lt;span class=&quot;nb&quot;&gt;install&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-U&lt;/span&gt; memory_profiler
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;pip &lt;span class=&quot;nb&quot;&gt;install &lt;/span&gt;psutil&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;(Installing the &lt;code class=&quot;highlighter-rouge&quot;&gt;psutil&lt;/code&gt; package here is recommended because it greatly improves the performance of the memory_profiler).&lt;/p&gt;

&lt;p&gt;Like line_profiler, memory_profiler requires that you decorate your function of interest with an &lt;code class=&quot;highlighter-rouge&quot;&gt;@profile&lt;/code&gt; decorator like so:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;profile&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;primes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;To see how much memory your function uses run the following:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;python &lt;span class=&quot;nt&quot;&gt;-m&lt;/span&gt; memory_profiler primes.py&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;You should see output that looks like this once your program exits:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Filename: primes.py

Line #    Mem usage  Increment   Line Contents
==============================================
     2                           @profile
     3    7.9219 MB  0.0000 MB   def primes(n):
     4    7.9219 MB  0.0000 MB       if n==2:
     5                                   return [2]
     6    7.9219 MB  0.0000 MB       elif n&amp;lt;2:
     7                                   return []
     8    7.9219 MB  0.0000 MB       s=range(3,n+1,2)
     9    7.9258 MB  0.0039 MB       mroot = n ** 0.5
    10    7.9258 MB  0.0000 MB       half=(n+1)/2-1
    11    7.9258 MB  0.0000 MB       i=0
    12    7.9258 MB  0.0000 MB       m=3
    13    7.9297 MB  0.0039 MB       while m &amp;lt;= mroot:
    14    7.9297 MB  0.0000 MB           if s[i]:
    15    7.9297 MB  0.0000 MB               j=(m*m-3)/2
    16    7.9258 MB -0.0039 MB               s[j]=0
    17    7.9297 MB  0.0039 MB               while j&amp;lt;half:
    18    7.9297 MB  0.0000 MB                   s[j]=0
    19    7.9297 MB  0.0000 MB                   j+=m
    20    7.9297 MB  0.0000 MB           i=i+1
    21    7.9297 MB  0.0000 MB           m=2*i+3
    22    7.9297 MB  0.0000 MB       return [2]+[x for x in s if x]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;ipython-shortcuts-for-line_profiler-and-memory_profiler&quot;&gt;IPython shortcuts for line_profiler and memory_profiler&lt;/h3&gt;

&lt;p&gt;A little known feature of &lt;code class=&quot;highlighter-rouge&quot;&gt;line_profiler&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;memory_profiler&lt;/code&gt; is that both programs have shortcut commands accessible from within IPython. All you have to do is type the following within an IPython session:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;%load_ext memory_profiler
%load_ext line_profiler
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Upon doing so you’ll have access to the magic commands &lt;code class=&quot;highlighter-rouge&quot;&gt;%lprun&lt;/code&gt; and &lt;code class=&quot;highlighter-rouge&quot;&gt;%mprun&lt;/code&gt; which behave similarly to their command-line counterparts. The major difference here is that you won’t need to decorate your to-be-profiled functions with the &lt;code class=&quot;highlighter-rouge&quot;&gt;@profile&lt;/code&gt; decorator. Just go ahead and run the profiling directly within your IPython session like so:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;In [1]: from primes import primes
In [2]: %mprun -f primes primes(1000)
In [3]: %lprun -f primes primes(1000)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This can save you a lot of time and effort since none of your source code needs to be modified in order to use these profiling commands.&lt;/p&gt;

&lt;h3 id=&quot;wheres-the-memory-leak&quot;&gt;Where’s the memory leak?&lt;/h3&gt;

&lt;p&gt;The cPython interpreter uses reference counting as it’s main method of keeping track of memory. This means that every object contains a counter, which is incremented when a reference to the object is stored somewhere, and decremented when a reference to it is deleted. When the counter reaches zero, the cPython interpreter knows that the object is no longer in use so it deletes the object and deallocates the occupied memory.&lt;/p&gt;

&lt;p&gt;A memory leak can often occur in your program if references to objects are held even though the object is no longer in use.&lt;/p&gt;

&lt;p&gt;The quickest way to find these “memory leaks” is to use an awesome tool called &lt;a href=&quot;http://mg.pov.lt/objgraph/&quot;&gt;objgraph&lt;/a&gt; written by Marius Gedminas. This tool allows you to see the number of objects in memory and also locate all the different places in your code that hold references to these objects.&lt;/p&gt;

&lt;p&gt;To get started, first install &lt;code class=&quot;highlighter-rouge&quot;&gt;objgraph&lt;/code&gt;:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;pip &lt;span class=&quot;nb&quot;&gt;install &lt;/span&gt;objgraph&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;Once you have this tool installed, insert into your code a statement to invoke the debugger:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pdb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pdb&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_trace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h5 id=&quot;which-objects-are-the-most-common&quot;&gt;Which objects are the most common?&lt;/h5&gt;

&lt;p&gt;At run time, you can inspect the top 20 most prevalent objects in your program by running:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(pdb) import objgraph
(pdb) objgraph.show_most_common_types()

MyBigFatObject             20000
tuple                      16938
function                   4310
dict                       2790
wrapper_descriptor         1181
builtin_function_or_method 934
weakref                    764
list                       634
method_descriptor          507
getset_descriptor          451
type                       439
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h5 id=&quot;which-objects-have-been-added-or-deleted&quot;&gt;Which objects have been added or deleted?&lt;/h5&gt;

&lt;p&gt;We can also see which objects have been added or deleted between two points in time:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(pdb) import objgraph
(pdb) objgraph.show_growth()
.
.
.
(pdb) objgraph.show_growth()   # this only shows objects that has been added or deleted since last show_growth() call

traceback                4        +2
KeyboardInterrupt        1        +1
frame                   24        +1
list                   667        +1
tuple                16969        +1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h5 id=&quot;what-is-referencing-this-leaky-object&quot;&gt;What is referencing this leaky object?&lt;/h5&gt;

&lt;p&gt;Continuing down this route, we can also see where references to any given object is being held. Let’s take as an example the simple program below:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;a&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}]&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pdb&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pdb&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_trace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;To see what is holding a reference to the variable &lt;code class=&quot;highlighter-rouge&quot;&gt;x&lt;/code&gt;, run the &lt;code class=&quot;highlighter-rouge&quot;&gt;objgraph.show_backref()&lt;/code&gt; function:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;(pdb) import objgraph
(pdb) objgraph.show_backref([x], filename=&quot;/tmp/backrefs.png&quot;)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The output of that command should be a PNG image stored at &lt;code class=&quot;highlighter-rouge&quot;&gt;/tmp/backrefs.png&lt;/code&gt; and it should look something like this:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3011/backrefs.png&quot; alt=&quot;back refrences&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The box at the bottom with red lettering is our object of interest. We can see that it’s referenced by the symbol &lt;code class=&quot;highlighter-rouge&quot;&gt;x&lt;/code&gt; once and by the list &lt;code class=&quot;highlighter-rouge&quot;&gt;y&lt;/code&gt; three times. If &lt;code class=&quot;highlighter-rouge&quot;&gt;x&lt;/code&gt; is the object causing a memory leak, we can use this method to see why it’s not automatically being deallocated by tracking down all of its references.&lt;/p&gt;

&lt;p&gt;So to review, &lt;a href=&quot;http://mg.pov.lt/objgraph/&quot;&gt;objgraph&lt;/a&gt; allows us to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;show the top N objects occupying our python program’s memory&lt;/li&gt;
  &lt;li&gt;show what objects have been deleted or added over a period of time&lt;/li&gt;
  &lt;li&gt;show all references to a given object in our script&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;effort-vs-precision&quot;&gt;Effort vs precision&lt;/h3&gt;

&lt;p&gt;In this post, I’ve shown you how to use several tools to analyze a python program’s performance. Armed with these tools and techniques you should have all the information required to track down most memory leaks as well as identify speed bottlenecks in a Python program.&lt;/p&gt;

&lt;p&gt;As with many other topics, running a performance analysis means balancing the tradeoffs between effort and precision. When in doubt, implement the simplest solution that will suit your current needs.&lt;/p&gt;

&lt;h3 id=&quot;refrences&quot;&gt;Refrences&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;http://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1&quot;&gt;stack overflow - time explained&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://packages.python.org/line_profiler/&quot;&gt;line_profiler&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/fabianp/memory_profiler&quot;&gt;memory_profiler&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;http://mg.pov.lt/objgraph/&quot;&gt;objgraph&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</content>

 </entry>
 
 <entry>
   <title>My Tmux Setup</title>
   <link href="https://everyhue.me/posts/my-tmux-setup/"/>
   <updated>2013-08-21T00:00:00-07:00</updated>
   <id>https://everyhue.me/posts/my-tmux-setup</id>
   

   <content type="html">&lt;p&gt;If you do a lot of context switching between projects, recreating your terminal
environment &amp;amp; window layout can easily eat up hours of your day. Here’s a quick tip
to help you create and manage persistent terminal workspaces so that with a
few keystrokes, you can jump straight back into whatever you were working on
as quickly as possible.&lt;/p&gt;

&lt;h3 id=&quot;how-it-works&quot;&gt;How it works&lt;/h3&gt;

&lt;p&gt;The whole idea behind this setup is to make it so you will have 1) The ability to name your
terminal workspaces with easily memorizable names and 2) the  ability to keep persistent terminal workspaces running
even if you’ve closed your terminal window.&lt;/p&gt;

&lt;p&gt;All of this is achieved with the following shell script which we’ll configure &lt;a href=&quot;http://www.iterm2.com/&quot;&gt;iTerm2&lt;/a&gt; to execute
anytime you open a new window.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&lt;span class=&quot;c&quot;&gt;#!/bin/sh&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;PATH&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$PATH&lt;/span&gt;:/usr/local/bin

&lt;span class=&quot;c&quot;&gt;# abort if we're already inside a TMUX session&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$TMUX&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;exit &lt;/span&gt;0

&lt;span class=&quot;c&quot;&gt;# startup a &quot;default&quot; session if none currently exists&lt;/span&gt;
tmux has-session &lt;span class=&quot;nt&quot;&gt;-t&lt;/span&gt; _default &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; tmux new-session &lt;span class=&quot;nt&quot;&gt;-s&lt;/span&gt; _default &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# present menu for user to choose which workspace to open&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;PS3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;Please choose your session: &quot;&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=(&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;tmux list-sessions &lt;span class=&quot;nt&quot;&gt;-F&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;#S&quot;&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;NEW SESSION&quot;&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;BASH&quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Available sessions&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;------------------&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot; &quot;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;select &lt;/span&gt;opt &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;options&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[@]&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;do
    case&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$opt&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt;
        &lt;span class=&quot;s2&quot;&gt;&quot;NEW SESSION&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;nb&quot;&gt;read&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-p&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;Enter new session name: &quot;&lt;/span&gt; SESSION_NAME
            tmux new &lt;span class=&quot;nt&quot;&gt;-s&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$SESSION_NAME&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;
            &lt;span class=&quot;nb&quot;&gt;break&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;;;&lt;/span&gt;
        &lt;span class=&quot;s2&quot;&gt;&quot;BASH&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            bash &lt;span class=&quot;nt&quot;&gt;--login&lt;/span&gt;
            &lt;span class=&quot;nb&quot;&gt;break&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
            tmux attach-session &lt;span class=&quot;nt&quot;&gt;-t&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$opt&lt;/span&gt;
            &lt;span class=&quot;nb&quot;&gt;break&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;esac&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;done&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;setting-it-up&quot;&gt;Setting it up&lt;/h3&gt;

&lt;p&gt;First off, place the above script in a location that’s accessible to &lt;a href=&quot;http://www.iterm2.com/&quot;&gt;iTerm2&lt;/a&gt; (I usually place it in ~/.dotfiles/tmux.start.sh).&lt;/p&gt;

&lt;p&gt;Then open up iTerm2’s terminal preferences and have it execute this script anytime you open a new window:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3012/iterm2config.jpg&quot; alt=&quot;iterm2config&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;usage&quot;&gt;Usage&lt;/h3&gt;

&lt;p&gt;Once you’ve done all the above, everytime you open a new window, you’ll be prompted to choose which previous
workspace you want to join.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/media/3012/newsession.jpg&quot; alt=&quot;newsession&quot; /&gt;&lt;/p&gt;

&lt;p&gt;You’ll also have the opportunity to create new work spaces by choosing &lt;em&gt;NEW SESSION&lt;/em&gt;. Or if you don’t want to open a
full blown tmux session, you can choose to just open up a BASH prompt. All of these sessions are persistent. So if you
decide to close your terminal window, they will remain active in the background and ready for you to rejoin at a later
point in time.&lt;/p&gt;

</content>

 </entry>
 
 <entry>
   <title>GGPlot2 theme for Matplotlib</title>
   <link href="https://everyhue.me/posts/sane-color-scheme-for-matplotlib/"/>
   <updated>2011-02-08T00:00:00-08:00</updated>
   <id>https://everyhue.me/posts/sane-color-scheme-for-matplotlib</id>
   
   <category term="python" />
   
   <category term="programming" />
   

   <content type="html">&lt;p&gt;
    John Hunter, creator of &lt;a href=&quot;http://matplotlib.sourceforge.net/&quot;&gt;MatPlotlib&lt;/a&gt;, originally designed it&amp;rsquo;s color scheme to be familiar to Matlab users. As it turns out, the color scheme works well for publication material but doesn't work so great for viewing visualizations on the web.
&lt;/p&gt;
&lt;p&gt;
    I find the default styling for graphs produced using &lt;a href=&quot;http://had.co.nz/ggplot2/&quot;&gt;ggplot2&lt;/a&gt; aesthetically pleasing for this purpose, so I spent some time over the weekend to refine the default colors and settings for my matplotlib installation. The result of this work is embodied in this &lt;a href=&quot;https://gist.github.com/816622&quot;&gt;.matplotlibrc color theme&lt;/a&gt; file. If you want graphs that look like the ones below &lt;em&gt;by default&lt;/em&gt;, download it and place the file under &lt;code&gt;~/.matplotlib/matplotlibrc&lt;/code&gt;.
&lt;/p&gt;
&lt;p&gt;&lt;img alt=&quot;scatter plot&quot; src=&quot;http://imgur.com/oPvCE.png&quot; /&gt;
&lt;img alt=&quot;bar charts&quot; src=&quot;http://imgur.com/063Hs.png&quot; /&gt;
&lt;img alt=&quot;line plot&quot; src=&quot;http://imgur.com/kjVuA.png&quot; /&gt;
&lt;img alt=&quot;time series&quot; src=&quot;http://imgur.com/Idqs4.png&quot; /&gt;
&lt;img alt=&quot;histogram&quot; src=&quot;http://imgur.com/Z0GWx.png&quot; /&gt;&lt;/p&gt;
</content>

 </entry>
 
 <entry>
   <title>Don't Hash Your Secrets</title>
   <link href="https://everyhue.me/posts/dont-hash-your-secrets-heres-why-in-python/"/>
   <updated>2010-02-01T00:00:00-08:00</updated>
   <id>https://everyhue.me/posts/dont-hash-your-secrets-heres-why-in-python</id>
   
   <category term="programming" />
   
   <category term="python" />
   

   <content type="html">&lt;!-- End Meta --&gt;
&lt;p&gt;Ben Adida suggests that you &lt;a href=&quot;http://benlog.com/articles/2008/06/19/dont-hash-secrets/&quot;&gt;don't hash your secrets&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;That means that if you know SHA1(secret || message), then you can compute SHA1(secret || message  || ANYTHING), which is a valid signature for message || ANYTHING. So to break this system, you just need to see one signature.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Not being a cryptography expert, I was blown away by his article. At the core of his post is the idea that given a hash digest of a &lt;strong&gt;message&lt;/strong&gt;, one could compute the hash of &lt;strong&gt;message + appended_message&lt;/strong&gt; without even knowing the original message. &lt;/p&gt;
&lt;p&gt;I had to see this for myself. Was it &lt;em&gt;that&lt;/em&gt; easy to extend an MD5 or SHA1 hash?
Below, you'll find working &lt;a href=&quot;/media/3006/spoof_md5.py.txt&quot;&gt;python code&lt;/a&gt; and an explanation for spoofing signatures signed with the MD5 algroithm. &lt;/p&gt;
&lt;!--more--&gt;
&lt;h3&gt;Implementation&lt;/h3&gt;
&lt;p&gt;To generate a hash from a message, algorithms like MD5 and SHA1 iterate through the message block by block. For each block, the algorithm runs a &lt;a href=&quot;http://en.wikipedia.org/wiki/Cryptographic_hash_function#Merkle-Damg.C3.A5rd_construction&quot;&gt;transformation function&lt;/a&gt; where the input is a &lt;strong&gt;seed state&lt;/strong&gt; and a &lt;strong&gt;message block&lt;/strong&gt; . The output of this transformation is then fed back as the &lt;strong&gt;seed state&lt;/strong&gt; for the transformation of the next message block (see the above diagram).&lt;/p&gt;

&lt;center&gt;&lt;img alt=&quot;md5.png&quot; src=&quot;/media/3006/md5.png&quot; width=&quot;100%&quot;&gt;&lt;/center&gt;
&lt;p&gt;After the hashing function has digested the entire message, it then appends some padding and runs the transformation function one more time. The &lt;strong&gt;final state&lt;/strong&gt; of this transformation becomes the digest. &lt;/p&gt;
&lt;div class=&quot;codehilite&quot;&gt;&lt;pre&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;hashlib&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;md5&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;signature&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;md5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;secret&amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;digest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;repr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;signature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;O&amp;#39;Q&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\xa8\xb8\x9d\x81&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\xd7\x13&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\xe0\xfb&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;_2&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\xde&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the code above, &lt;strong&gt;the signature represents the state output of the final transformation function&lt;/strong&gt;. &lt;/p&gt;
&lt;p&gt;AHA! We now have a strategy to extend the hash. If we can seed the transformation function with the state(AKA signature) of the original message, we can essentially extend the hash without even knowing the original message. &lt;/p&gt;
&lt;p&gt;There is one problem however. I mentioned before that the MD5 algorithm adds a piece of padding to the original message before it gives us the hash. That means whenever we see a signature it's really the hash of the &lt;strong&gt;message + padding&lt;/strong&gt;. Fortunately, the padding is only dependent upon the length of the original message. With that in mind, we can easily generate both the new signature and padding. Here's some pseudocode&lt;/p&gt;
&lt;div class=&quot;codehilite&quot;&gt;&lt;pre&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;decode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;signature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;padding&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;calculate_padding&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;original_message_len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;new_signature&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transform&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;appended message&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# This should be True&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;new_signature&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;md5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;original_message&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;padding&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;appended message&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;digest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now here is the real code.&lt;br /&gt;
&lt;/p&gt;
&lt;div class=&quot;codehilite&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;spoof_digest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;originalDigest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;originalLen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;spoofMessage&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# first decode digest back into state tuples&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Decode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;originalDigest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# generate a seed md5 object&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;spoof&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;md5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# seed the count variable for calculation of index, padLen, and bits&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;spoof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;originalLen&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;3&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# calculate some variables to generate the original padding&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;index&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;spoof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x3f&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;padLen&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;56&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;index&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;index&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;56&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;120&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;index&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;bits&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Encode&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;spoof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xffffffffL&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;spoof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# construct the original padding&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;padding&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PADDING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;padLen&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# augment the count with the new padding and trailing bits&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;spoof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;padding&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;3&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;spoof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;3&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;spoof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# run an update&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;spoof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;spoofMessage&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# We now have a digest of the original secret + message + some_padding&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;spoof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;digest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;padding&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The code has a dependency on a pure-python implementation of the md5 algorithm that I've packaged it together with &lt;a href=&quot;/media/3006/spoof_md5.py.txt&quot;&gt;the source code&lt;/a&gt;. If you want to try it out, download the file and run this test function (also included in the file):&lt;/p&gt;
&lt;div class=&quot;codehilite&quot;&gt;&lt;pre&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;test_spoofing&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;originalMsg&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;secret&amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;my message&amp;quot;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;appendedMsg&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;my message extension&amp;quot;&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# This is the signature that a legitimate user sends&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# over the wire in clear text. &lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;originalSignature&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;md5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;originalMsg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;digest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# This is how an attacker would spoof the signature where,&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# the message ==  originalMsg + padbits + appendedMsg .&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# Notice that this method implies that the attacker&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# knows the original length of the &amp;quot;secret&amp;quot; ... &lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# Most apis such as Flickr assign secrets that are of&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# uniform length for all of their api users.&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;spoofSignature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;padbits&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;spoof_digest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;originalSignature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;originalMsg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;appendedMsg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# This is how a legitimate user would construct the&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# a signature when message == originalMsg + padbits + appendedMsg&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;testSignature&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;md5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;originalMsg&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;padbits&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;appendedMsg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;digest&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

    &lt;span class=&quot;c&quot;&gt;# make sure the spoof signature and the test signature match.&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# if, this passes, we&amp;#39;ve successfully constructed a spoofed message&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# of the form: secret + orginal_message + padding + appended_message&lt;/span&gt;
    &lt;span class=&quot;c&quot;&gt;# without actually knowing the secret.&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testSignature&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;spoofSignature&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt; Information in this blog is meant for educational purposes only! &lt;/strong&gt;&lt;/p&gt;
</content>

 </entry>
 
 <entry>
   <title>Bashmarks: Directory Bookmarks in the Shell</title>
   <link href="https://everyhue.me/posts/bashmarks-directory-bookmarks/"/>
   <updated>2009-09-10T00:00:00-07:00</updated>
   <id>https://everyhue.me/posts/bashmarks-directory-bookmarks</id>
   
   <category term="programming" />
   
   <category term="bash" />
   

   <content type="html">&lt;!-- End Meta --&gt;
&lt;p&gt;&lt;strong&gt;EDIT 2010-07-01&lt;/strong&gt; : This post is left up for context / historical posterity. I've packaged up a shell script to allow you to save and jump to commonly used directories. It's called &lt;a href=&quot;https://github.com/huyng/bashmarks&quot;&gt;bashmarks&lt;/a&gt; and it has tab completion functionality built-in. Visit the link below to learn more.

&lt;h3&gt;&lt;a href=&quot;https://github.com/huyng/bashmarks&quot;&gt;Bashmarks&lt;/a&gt;&lt;/h3&gt;
&lt;br&gt;
&lt;p&gt;Do not use the stuff below&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Before I wrote this script, It felt like I spent half of my time in terminal cd-ing around to various directories. If you're like me, placing this snippet into your .bashrc file will save you tons of time each and every single day:&lt;/p&gt;
&lt;div class=&quot;codehilite&quot;&gt;&lt;pre&gt;# Bash Directory Bookmarks
alias m1='alias g1=&amp;quot;cd `pwd`&amp;quot;'
alias m2='alias g2=&amp;quot;cd `pwd`&amp;quot;'
alias m3='alias g3=&amp;quot;cd `pwd`&amp;quot;'
alias m4='alias g4=&amp;quot;cd `pwd`&amp;quot;'
alias m5='alias g5=&amp;quot;cd `pwd`&amp;quot;'
alias m6='alias g6=&amp;quot;cd `pwd`&amp;quot;'
alias m7='alias g7=&amp;quot;cd `pwd`&amp;quot;'
alias m8='alias g8=&amp;quot;cd `pwd`&amp;quot;'
alias m9='alias g9=&amp;quot;cd `pwd`&amp;quot;'
alias mdump='alias|grep -e &amp;quot;alias g[0-9]&amp;quot;|grep -v &amp;quot;alias m&amp;quot; &amp;gt; ~/.bookmarks'
alias lma='alias | grep -e &amp;quot;alias g[0-9]&amp;quot;|grep -v &amp;quot;alias m&amp;quot;|sed &amp;quot;s/alias //&amp;quot;'
touch ~/.bookmarks
source ~/.bookmarks&lt;/pre&gt;&lt;/div&gt;
&lt;!--more--&gt;
&lt;h4&gt;Directory Bookmark Usage&lt;/h4&gt;
&lt;p&gt;With this in place, your bash shell will have the ability to set and retrieve directory bookmarks. Let's say you're in a folder that you visit a hundreds of times per day. Run one of the &quot;m&quot; (a.k.a &lt;em&gt;mark&lt;/em&gt;) commands inside the directory to create a bookmark. Here's an example:&lt;/p&gt;
&lt;div class=&quot;codehilite&quot;&gt;&lt;pre&gt;# This will create a bookmark for the /var/www directory
user@host[/var/www/] : m1&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now whenever you want to &lt;em&gt;cd&lt;/em&gt; into that directory, you can run the corresponding &quot;g&quot; (a.k.a &lt;em&gt;goto mark&lt;/em&gt;) command. &lt;/p&gt;
&lt;div class=&quot;codehilite&quot;&gt;&lt;pre&gt;# This will cd into /var/www
user@host[/etc/apache2] : g1&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In other words the &lt;em&gt;m1&lt;/em&gt; command will set the &lt;em&gt;g1&lt;/em&gt; bookmark, the &lt;em&gt;m2&lt;/em&gt; command will set the &lt;em&gt;g2&lt;/em&gt; bookmark, and so on ... If you don't want to keep track of these bookmarks in your head, you'll be glad to hear that the &quot;lma&quot; (a.k.a &quot;list marks &quot;) command can show you all of your current bookmarks like so:&lt;/p&gt;
&lt;div class=&quot;codehilite&quot;&gt;&lt;pre&gt;user@host[/usr/local/] : lma
g1='cd /var/www/'
g2='cd /etc/'&lt;/pre&gt;&lt;/div&gt;
&lt;h4&gt;Persisting the Bookmarks&lt;/h4&gt;
&lt;p&gt;If you want to preserve your bookmarks for the next time you log in, execute the &lt;em&gt;mdump&lt;/em&gt; command which will store the bookmarks into a file called &lt;em&gt;.bookmarks&lt;/em&gt; under your HOME directory. Keep in mind that if you do not run this command your bookmarks will be forgotten once you log out of the shell.&lt;/p&gt;
</content>

 </entry>
 
 <entry>
   <title>Toward an Ergonomic Vim Setup</title>
   <link href="https://everyhue.me/posts/toward-an-ergonomic-vim-setup/"/>
   <updated>2009-07-21T00:00:00-07:00</updated>
   <id>https://everyhue.me/posts/toward-an-ergonomic-vim-setup</id>
   
   <category term="python" />
   

   <content type="html">&lt;span class=&quot;markdownOutput&quot;&gt;
&lt;p&gt;Here&amp;#8217;s a brief tip: &lt;strong&gt;Rebind your ctrl key to capslock&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Save yourself from the pain. The contorted positions we developers strain our hands into will eventually break them - Emacs &amp;amp; Textmate users, you know what I&amp;#8217;m talking about. Do yourself a favor and rebind your control key to capslock. You can do this under MacOSX by going to &lt;em&gt;System Preferences&lt;/em&gt; &gt; &lt;em&gt;Keyboard &amp;amp; Mouse&lt;/em&gt; &gt; &lt;em&gt;Keyboard&lt;/em&gt; &gt; &lt;em&gt;Modifier Keys&lt;/em&gt;. Change the settings to look like the following and you&amp;#8217;re set.&lt;/p&gt;
&lt;p&gt;&lt;img id=&quot;keyboardandmouse&quot; src=&quot;/media/3008/keyboardandmouse.png&quot; alt=&quot;keyboardandmouse&quot; title=&quot;&quot; /&gt;&lt;/p&gt;
&lt;/span&gt;
</content>

 </entry>
 

</feed>
