Category Archives: Programming

Notes on quantitative trading

I’ve been meaning to learn the math behind stock trading for a while, but I’ve found it’s hard to find quality information. Most of the stuff online is (1) non-technical, (2) trying to sell you something, or (3) both. So I decided to collect my own notes on modern portfolio theory (MPT). Here’s the pdf: Notes on Itô calculus and quantitative trading.

The information comes from various lecture slides and articles. I didn’t put specific references in there, since it’s standard, textbook stuff. Just search for any piece you’d like more information about.

Here is a rough outline:

  • How to select stocks, given their risk and return statistics
  • How to model risk and return in the first place

The first part takes the “Minimum Variance” approach due to Markowitz. To model stock prices, I give an overview of Itô calculus (one form of stochastic calculus) and geometric Brownian motion (GBM). This is the model used by the Black-Scholes formula for pricing derivatives.

I suspect that a simple index fund might beat a portfolio selected with this recipe. In the future, I’d like to test on historical data and find out if there really is an advantage to picking your own stocks.

Secure communication through CIA-infected devices

The problem

WikiLeaks recently published documents detailing CIA attacks on common smartphone operating systems, including those on Samsung, Google, and Apple devices. WhatsApp and other “secure messaging apps,” it turns out, are not secure at all. The reason is that if the phone OS is itself compromised, nothing on the phone can be secure. It doesn’t matter how good your app’s encryption is—if the CIA can read everything on my phone, then that includes unencrypted plaintext and private encryption keys.

The situation appears hopeless. For one thing, our devices are not secure. Furthermore, who’s to say the cell phone companies won’t voluntarily give the government your data, even if the operating systems themselves are fixed? If you want secure communication, it seems the only way to do it is to build your own cell phone, and then build your own network of cell towers.

My solution

Sending secure messages over untrusted channels is basically a solved problem. Encrypt the message with RSA and sign it with DSA, for example. (These, and similar algorithms, will probably be broken by quantum computers. But as far as we know, that hasn’t happened yet.)

What makes the situation more interesting is that the broadcasting device itself can no longer be trusted. So, where does that leave us? Assuming I can’t patch the phone, I need a new device.

The new device would:

  • randomly generate keys
  • accept plaintext input from a keypad
  • encrypt messages with a private key
  • decrypt messages with a public key

Since we still want to use the phone to send and receive public keys and encrypted messages (ciphertext), we would need to transmit these back and forth between the phone and the device. I drew a schematic:

CIA
Fig. 1 By hiding the plaintext and private key from CIA malware on Samsung and Apple devices, we can broadcast messages securely from these infected devices.

There are a few points you’d have to consider to make this thing work. After all, if they can hack my smartphone, why can’t they hack this thing?

First, it would have to be completely open-source from the circuitboard up. That way, anybody could check for security vulnerabilities on their own.

Second, it would have to be simple. This way, a single user could understand the entire system and convince himself that there were no holes. Security flaws emerge when a product becomes too complicated for an individual engineer to understand in its entirety. You can’t hack a toaster! But you can hack an operating system, because it’s got a lot of “moving parts.”

Third, you would have to ensure that the encrypting device could not be infected via the connection to the smartphone. I would be wary of complex protocols like USB. In fact, one could go so far as to send and receive information over the headphone jack. (Microphone for output; left or right channel for input; ground for ground.) That way, you know that only data bits are being sent, not commands or metadata or any other crap. Unless I am mistaken, the only potential vulnerability would be buffer overflow attacks—and you can avoid those if you’re careful.

The set of English puns is finite and enumerable

…so I decided to enumerate a subset of them. The particular form of pun I’m interested in is:

What’s the difference between a _____ and a _____? One is a xA B; the other is a A xB.

For example, what’s the difference between a skinny Spaniard and a skinny Russian? One is a slight Iberian; the other is a light Siberian.

To generate these, I took a word list of about 40,000 English words. It’s straightforward to find pairs of words matching the (xA, B, A, xB) template, as long as you choose the right data structures and pay attention to efficiency. But then we need to narrow those pairs down so that

  • xA and A are adjectives; xB and B are nouns, or
  • xA and A are nouns; xB and B are adjectives.

For this second task, I used WordNet, a lexical database from Princeton University. The Natural Language Toolkit (NLTK) provides helpful Python bindings to the necessary WordNet functions. Here is my Python script:

from nltk.corpus import wordnet as wn

f = open('wlist_match10.txt')
allpuns = open('all_puns.txt', 'w')

wordlist = []
wordset = set()
for line in f:
  word = line[0:len(line)-1]
  wordlist.append(word)
  wordset.add(word)
N = len(wordlist)
# 22,282 words
# 113,135 permutation pairs with xA, B compatibility
# means 0.02% of permutations have xA,B compatibility
# fewer have part-of-speech compatibility as well

def isAdj(w):
  return len(wn.synsets(w, pos=wn.ADJ)) > 0
def isNoun(w):
  return len(wn.synsets(w, pos=wn.NOUN)) > 0
def candidate(xA, B):
  x = xA[0]
  A = xA[1:]
  xB = x + B
  if (isNoun(B) and isNoun(xB) and isAdj(A) and isAdj(xA)):
    pun = xA + ' ' + B + ', ' + A + ' ' + xB
    allpuns.write(pun+'\n')
  if (isAdj(B) and isAdj(xB) and isNoun(A) and isNoun(xA)):
    pun = xB + ' ' + A + ', ' + B + ' ' + xA
    allpuns.write(pun+'\n')    

for j in range(N):
  xA = wordlist[j]
  x = xA[0]
  A = xA[1:]
  if (A not in wordset):
    continue
  for k in range(N):
    B = wordlist[k]
    if (x + B in wordset):
      candidate(xA, B)

f.close()
allpuns.close()

# 6,696 results!

I didn’t bother to generate the full version of each pun using synonyms of the four words. It shouldn’t be too hard to do so, however. The script outputs a text file containing all 6,696 possible puns of the desired form.

Here are some highlights:

  • residential preparations, presidential reparations
  • dinky rain, inky drain
  • revolutionary ages, evolutionary rages