Ahoy!

An Android app for local message broadcasts.

Berlin Hack and Tell #23

Michael Specht
http://ahoy.nhcham.org

Political statements via SSID

http://opensignal.com/reports/politics-of-wifi-names.php

Ahoy - what's it about?

  • you enter a message into your Android phone:
    Look behind you, a three-headed monkey!
  • the app sets up a WiFi hotspot, using the SSID to broadcast the message:
    aR6Sg0VTH8VaDwcl9uHzNDFhvAfROR9
  • everybody using the app within a range of ~30 meters receives your message

Possible scenario: Demonstrations

http://www.zeit.de/politik/ausland/2013-06/tuerkei-proteste-berichte

Possible scenario: Flood volunteers

http://www.taz.de/!117861/

Possible scenario: Sports events

http://de.wikipedia.org/wiki/La_Ola

Possible scenario: Live events

http://www.in-your-face.de/festival/wacken-open-air-2012

Ahoy SSID format

  • SSID: 31 printable ASCII characters ABCDEFGHIJKLMNOPQRSTUVWXYZ
    abcdefghijklmnopqrstuvwxyz
    0123456789 -_\|/,."'`(){}[]<>+#@%^!~*=$&?:;
  • better use alphanumerical characters only
  • SSID format: ^a[0-9a-zA-Z]{30}$
  • Capacity:
    • log262 = 5.95 bits
    • 30 ⋅ log262 = 178 bits

Ahoy SSID format

How many different messages can we encode using 178 bits?

383,123,885,216,472,214,589,586,756,787,577,295,904,684,780,545,900,544

  • 178 bits ≈ 22 bytes → using UTF-8 or UTF-16:
  • ~21 characters (Latin scripts)
  • ~11 characters (Cyrillic, Arabic, Hebrew, Devanagari, Hangul, Hiragana, Katakana, Han, ...)

We're wasting a lot of bits there...

  • each of these messages has the same cost (104 bits):
    • Hello, world!
    • Hello, rmwqp/
    • HellLwPrMwQp#
    • Qq0dd9HLmsuhB
  • the fraction of meaningful messages is minimal
  • skew the message space in a way that makes:
    • meaningful messages cheaper
    • random noise more expensive

Huffman trees to the rescue!

Huffman coding

  • use character probabilities to:
    • assign short bit strings to frequent characters
    • assign longer bit strings to non-frequent characters
  • we need a lot of text samples
  • the Leipzig Corpora Collection provides real-world sample text in 80+ languages (10,000 or 100,000 sentences each)

Character probabilities

English

Character probabilities

English, first letter in word

Character probabilities

English, following Q

Character probabilities

English, following QU

Context-sensitive Huffman coding

  • natural language is highly context-sensitive
  • create a lot of Huffman trees for various contexts:
    • default tree
    • word offset trees
    • monogram prefix trees
    • bigram prefix trees
  • → 200 – 500 of the most used trees are chosen and stored as a language pack
  • always use the most appropriate Huffman tree depending on the context during encoding/dedocing

Language packs

LanguageTreesSizeUTFAhoy
EnglishEnglish20030.3 kB2243 – 58229%
GermanDeutsch20031.9 kB2143 – 58240%
Hebrewעברית25028.2 kB1242 – 51387%
Russianрусский30041.4 kB1241 – 55400%
Persianفارسی40048.6 kB1242 – 55404%
Arabicالعربية30044.7 kB1243 – 55408%
Hindiहिन्दी40049.5 kB1141 – 56440%
Korean한국어50151.4 kB1023 – 30265%

Ahoy Demo

http://ahoy.nhcham.org/try.html

Typos: bad

Sometimes, typos make a message more expensive:

...even if the corrected message is longer!

Proper writing: good

Camel casing produces more expensive messages than proper writing:

Katzenfutter billig. Nur 1 € bei Kaisers. → 171 bits KatzenfutterBillig.Nur1€BeiKaisers. → 194 bits

Thank you for listening!

http://ahoy.nhcham.org