I’ve invented a way to break bad news gradually, instead of all at once. Suppose there is a big question—“Are you breaking up with me?” or “Do I have hepatitis, Doc?” The traditional algorithm is decidedly O(1); the girlfriend or the doctor simply says “yes” or “no,” and the news is broken. It would be nice if we could delay the news, so that the answer became gradually more clear as time passed. Here’s a procedure to do just that.
At each timestep, the doctor (say) flips a coin and hides the outcome from the patient. If it is heads, he simply says “heads.” If it is tails and the patient has hepatitis, he says “heads.” If it is tails and the patient does not have hepatitis, he says “tails.”
Let’s analyze this from the patient’s point of view, supposing that both answers start out equally likely in his mind. That is, Suppose there have been N timesteps. If the doctor ever says “tails,” then the patient knows he’s in the clear. So the interesting question is how the patient’s degree of belief changes when the doctor has said “heads” every time for N timesteps.
Using Bayes’s theorem and some algebra, you can show that In order to get N “heads” responses given no hepatitis, the coin would have to land heads-up N times. And we know that has probability After a line of algebra, we get
This approaches 100% as N tends toward infinity, which is what we expected. On the other hand, if the patient doesn’t have hepatitis then we expect a “tails” to come up after only 2 timesteps.
WikiLeaks recently published documents detailing CIA attacks on common smartphone operating systems, including those on Samsung, Google, and Apple devices. WhatsApp and other “secure messaging apps,” it turns out, are not secure at all. The reason is that if the phone OS is itself compromised, nothing on the phone can be secure. It doesn’t matter how good your app’s encryption is—if the CIA can read everything on my phone, then that includes unencrypted plaintext and private encryption keys.
The situation appears hopeless. For one thing, our devices are not secure. Furthermore, who’s to say the cell phone companies won’t voluntarily give the government your data, even if the operating systems themselves are fixed? If you want secure communication, it seems the only way to do it is to build your own cell phone, and then build your own network of cell towers.
Sending secure messages over untrusted channels is basically a solved problem. Encrypt the message with RSA and sign it with DSA, for example. (These, and similar algorithms, will probably be broken by quantum computers. But as far as we know, that hasn’t happened yet.)
What makes the situation more interesting is that the broadcasting device itself can no longer be trusted. So, where does that leave us? Assuming I can’t patch the phone, I need a new device.
The new device would:
randomly generate keys
accept plaintext input from a keypad
encrypt messages with a private key
decrypt messages with a public key
Since we still want to use the phone to send and receive public keys and encrypted messages (ciphertext), we would need to transmit these back and forth between the phone and the device. I drew a schematic:
There are a few points you’d have to consider to make this thing work. After all, if they can hack my smartphone, why can’t they hack this thing?
First, it would have to be completely open-source from the circuitboard up. That way, anybody could check for security vulnerabilities on their own.
Second, it would have to be simple. This way, a single user could understand the entire system and convince himself that there were no holes. Security flaws emerge when a product becomes too complicated for an individual engineer to understand in its entirety. You can’t hack a toaster! But you can hack an operating system, because it’s got a lot of “moving parts.”
Third, you would have to ensure that the encrypting device could not be infected via the connection to the smartphone. I would be wary of complex protocols like USB. In fact, one could go so far as to send and receive information over the headphone jack. (Microphone for output; left or right channel for input; ground for ground.) That way, you know that only data bits are being sent, not commands or metadata or any other crap. Unless I am mistaken, the only potential vulnerability would be buffer overflow attacks—and you can avoid those if you’re careful.
…so I decided to enumerate a subset of them. The particular form of pun I’m interested in is:
What’s the difference between a _____ and a _____? One is a xA B; the other is a A xB.
For example, what’s the difference between a skinny Spaniard and a skinny Russian? One is a slight Iberian; the other is a light Siberian.
To generate these, I took a word list of about 40,000 English words. It’s straightforward to find pairs of words matching the (xA, B, A, xB) template, as long as you choose the right data structures and pay attention to efficiency. But then we need to narrow those pairs down so that
xA and A are adjectives; xB and B are nouns, or
xA and A are nouns; xB and B are adjectives.
For this second task, I used WordNet, a lexical database from Princeton University. The Natural Language Toolkit (NLTK) provides helpful Python bindings to the necessary WordNet functions. Here is my Python script:
from nltk.corpus import wordnet as wn
f = open('wlist_match10.txt')
allpuns = open('all_puns.txt', 'w')
wordlist = 
wordset = set()
for line in f:
word = line[0:len(line)-1]
N = len(wordlist)
# 22,282 words
# 113,135 permutation pairs with xA, B compatibility
# means 0.02% of permutations have xA,B compatibility
# fewer have part-of-speech compatibility as well
return len(wn.synsets(w, pos=wn.ADJ)) > 0
return len(wn.synsets(w, pos=wn.NOUN)) > 0
def candidate(xA, B):
x = xA
A = xA[1:]
xB = x + B
if (isNoun(B) and isNoun(xB) and isAdj(A) and isAdj(xA)):
pun = xA + ' ' + B + ', ' + A + ' ' + xB
if (isAdj(B) and isAdj(xB) and isNoun(A) and isNoun(xA)):
pun = xB + ' ' + A + ', ' + B + ' ' + xA
for j in range(N):
xA = wordlist[j]
x = xA
A = xA[1:]
if (A not in wordset):
for k in range(N):
B = wordlist[k]
if (x + B in wordset):
# 6,696 results!
I didn’t bother to generate the full version of each pun using synonyms of the four words. It shouldn’t be too hard to do so, however. The script outputs a text file containing all 6,696 possible puns of the desired form.