sharkticon: betting on transformers before the ai hype

Posted on Mar 10, 2020

june 12, 2017. a paper landed that would quietly reshape AI — “Attention Is All You Need.” it introduced the Transformer architecture, soon to displace RNNs in language modeling, then in vision, then in just about everything. i didn’t know it yet, but that paper would define my first big technical bet.

at the time, i was in my first year of computer science at PoC, the student innovation center. there was already a project floating around the lab called SmartShark, an LSTM-based system that could spot simple DDoS or man-in-the-middle attacks. it worked on a narrow slice of attacks, the way student projects often do — promising, but a long way from production.

so i asked the obvious question. what if we replaced the LSTM with a Transformer? not to learn what an attack looks like (there are too many shapes for that) but to learn what a healthy network looks like, and treat anything that drifted from it as suspicious. anomaly detection, powered by attention.

we wrote a Transformer from scratch in Python with TensorFlow. network packets were vectorized with Packet2Vec, a variant of Word2Vec, so we could do arithmetic on traffic the way you do on words. the model predicted the network at t+1, and an anomaly layer compared the prediction to reality. drift past a threshold, and an alert went up.

doing this as a first-year student was the hardest thing i’d ever attempted. reference implementations barely existed. half my nights went to rereading the paper, the other half to debugging tensor shapes. but it worked — not perfectly, not production-ready, but it worked.

i ended up presenting Sharkticon at AI meetups, conferences, and small trade fairs. the project didn’t change the world. it changed me. i understood Transformers from the inside out before they became the thing everyone built on, and i learned that the right time to bet on a new architecture is exactly when nobody else is.

Slides

Github