Regions in infinite random grids

Here is a puzzle I thought of:

What is the average area of a contiguous region in a infinite, random binary grid?

Unfortunately, I can’t solve it. The 1d case has been solved by others, and is a good warmup. The trick is to write an infinite sum, and when you do, you should get 2. In 2d it’s much harder to find such a sum, so I tried it numerically.

The idea is to sample many random n-by-n grids for increasing values of n, and count the average size of a region. For example, here is a binary grid of side-length n = 5:


This example has 6 regions, so the average area in this sample is 25/6. Perhaps it’s clearer if I color them:

For several values of n, I’m going to generate many such grids and plot the results. The idea is that the average should converge to the correct answer in the limit n→∞.

I’ve also plotted statistical errors. From this plot, it appears that the answer is about 7, or perhaps it diverges logarithmically? Let me know if you have any ideas.

Betting on opinion polls

You may have heard of PredictIt, the political futures market. It allows users to make and take bets on a wide variety of political outcomes. Several markets ask you to predict the presidential job approval ratings at a future date. The site typically relies on a poll aggregator such as FiveThirtyEight or (more commonly) RealClearPolitics. You can bet anywhere from 1 to 99 cents on the outcome falling inside or outside a certain range, and you win a dollar if you’re right.

Actually, you don’t have to bet on the most likely outcome to make money in the long term. You only have to find people offering bets with the wrong odds. For example, say I’m offered the following game: Pay 1 cent to play, then shuffle a deck of cards and pick one. If it’s the ace of spades, I win a dollar. I can play as many times as I want. Even though I expect to lose on a given round, in the long run I’ll profit because it only takes me about 52 cents to win a dollar. In other words, the game is underpriced relative to its odds.

Thus, the way to play the game is not to predict the most likely outcome, but rather to calculate the probabilities of all outcomes. Start with historical data scraped from RealClearPolitics, for about a year ending in February 13, 2018.

We can look at the distribution of daily changes, which looks normally distributed. Here I’ve plotted a normal fit over the histogram of steps.

Actually, a random walk with normally distributed steps (also called a “Gaussian random walk”) has some nice properties. If the steps are independently sampled from N(\mu,\sigma^2), then the total change T steps later is sampled from N(\mu T,\sigma^2\ T). That is, it’s the variance which grows linearly with time, not the standard deviation. The mean also grows linearly, as you might expect for a drift process. (Here we assume Markovian behavior; that is, that the system has no memory other than its current state.) Armed with this knowledge, we can plot a distribution on outcomes a fixed time later, say 100 days.

If you’ve made it this far, you’re probably wondering whether each step is really sampled independently from the rest. To test this assumption, we should calculate the autocorrelation function,
\mathrm{Exp}\Big[\Delta(t)\Delta(t+\tau)\Big]
as a function of τ, where &\Delta; is a daily change in approval rating.

It drops an order of magnitude between τ = 0 and τ = 1, and stays there. What this tells us is that the underlying system has little memory other than its current value. In fact, this defines the Markov assumption.

Finally, the question is how to place your bet. On PredictIt, the outcomes are binned, so we should integrate the normal distribution over each bin width to get probabilities. From there we can choose the event with the most favorable odds, and use the Kelly strategy to decide how much to bet.

In a later post, I should use historical data to examine the performance of this approach.

A joke which requires lack of knowledge to understand

Usually “getting” a joke requires some subject knowledge. For example, you have to know who Michael Jackson is in order to understand Michael Jackson jokes. I came up with a joke which requires partial knowledge as well as partial ignorance. The joke is:

What emoji is missing on Chinese phones?

Answer: 少

The character is “shǎo,” meaning “to lack.” To me it also looks like a smiley face. But to someone who knows the language, it doesn’t look like a smiley face—it looks like shǎo.

I think you can see the predicament. In order to get the joke, you have to know the character, but not well enough to sight-read it. This reminds me of stories about catching Soviet spies with the Stroop effect. Supposedly, suspects were presented with Russian words for colors, but typeset in different colors than what the words said. Then they were asked to go through the list and say what color each word was printed in. If you can’t read Russian, it’s very easy. But if you can read the word, your brain gets confused. In this way, you can test somebody for a lack of knowledge.