6 Lessons from Scaling Data Science from 0 to 4 Million COVID Tests


On March 16th I showed up at a lab that was actively being used to test athletes for Performance Enhancing Drugs – a startup of a few people, Curative, had moved down the week prior to turn the facility into a COVID testing lab. As it so often happens, I saw they were looking for volunteers via a tweet from an old coworker of mine who was roommates with the girlfriend of the CEO.

The first few days we swapped doping equipment for PCR machines and bench space. God that shit is heavy. The first week our progress broke first of many graphs.

At the conclusion of the first few weeks, management and I came to an agreement that I was not fit for lab work (something I had suggested rather effusively up to that point) and that if I wasn’t going to sit still and process samples I’d need to deliver something else of value or get lost. “I’m a data scientist!” I’d remind them occasionally and so they gave me a day to deliver something with the pile of data we had collected thus far. By the end of the day I was Curative’s first data scientist, and by the end of my time I was leading a team of 5 with nearly 4 million tests under our belt.

Here’s a non-exhaustive list of things I learned in no particular order of importance. (I should have taken more notes during my time.) Some of this is hyper specific to the context of scaling-at-all-costs during a pandemic; hopefully some is broadly applicable.

The first week our progress broke first of many graphs. The first week our progress broke first of many graphs.

Performative Rituals of Data Integrity

My biggest misstep in the early days was not doing enough internal marketing for what I was doing. My work product in previous jobs was always directed towards “software” or “data” people — those with PhDs in the hard sciences, I not-so-quickly learned, do not think of the world in similar ways.

We were tracking the position of tens of thousands of samples at any given time in a lab. A given position of a sample was dependent on lab workers remembering to scan in samples as they arrived at their station, and in the early days the location error was probably in the 3% range. These samples were always processed, but their real time location would be off.

Issues arose when every now and again someone would go and count samples at a station. The number on the screen would be different than what the lab person had counted. Sometimes it would be a bug, as we were rapidly innovating on both the software and process and it was inevitable. Oftentimes, though, it was because people are bad at counting hundreds of identical items. My explanations that these error rates were in acceptable ranges and we were getting rapidly better, was received more as an excuse rather than an explanation of a process of iteration. People became contemptuous of the numbers, writing them off as mostly useless.

As the numbers of samples in the lab exploded past what any person could physically see and mentally keep track of, this became a problem. For several weeks I was left pulling my hair out as I heard reports of lab personnel writ large discounting our stats and recommendations.

The solution? Put on a show. So, along with CIO, I went and scanned every sample in the lab. It took all day. I was delirious. But you know what we found?

A) even trying our best, we sucked and missed scanning ~10% of samples (because, I can’t reiterate this enough: humans are bad)

B) After going back with a list of what we missed we found everything where we expected to

We made a big song and dance of the whole thing and the show reset expectations. Along with continuing improvement on alert systems for correcting human mistakes we had buy-in from then on.

Testing labs are factories, not institutions of discovery

In order to do useful data science it’s probably important to know what problems you are trying to solve. When I first joined the lab, friends and family asked a lot of questions about what cure I was helping research and I scoffed trying to explain that we were more of a processing facility. I wish I took my own scoffing more to heart.

The questions a manufacturer and factory care about are obvious (Where are things? Are we following a FIFO order? How long does each step take? What’s the variance in employee speed? What’s our estimated max throughput? etc). The questions of a COVID-19 testing lab mid-pandemic are less so. Had I taken the former to heart earlier, I would have found more rich veins of inquiry sooner and the unknown unknowns would have resolved much more quickly to known unknowns. It’s only a matter of perspective into the same problem, but it’s the perspective that makes all the difference.

Factories are probably best served by business analysts

I’m not sure I understand the difference between a business analyst and a data scientist in the general sense but I now know what it entails for a factory.

85% of questions for a factory can be answered with 3 tools: mean, median, and standard deviation. Their primary value is to interface with non-technical folks, formulate their problems as statistical ones, and provide some insight. E.g.: How many samples from X customer did we get? What’s the average age from site Y? What percentage of people in County Z are insured? Most of these tasks are 1-3 hours from start to completion.

15% of questions require a regression, a decision tree, a manifold, some math, or insert chapter title of “The Elements of Statistical Learning” here. These tasks took about 6 hours per iteration where the total number of iterations was a judgement call by the data scientist to where they felt like they had hit diminishing returns. E.g. Given the number of appointments scheduled and the day of the week, how many samples should we expect at the LA lab?

I would probably hire business analysts more so than data scientists in the beginning if I were to do it again. The low hanging fruit is low - no need to hire someone proficient in Stan in the beginning. On the other hand the nice thing is 6 months down the road when that low hanging fruit is picked a data scientist can then climb the proverbial fruit tree. Perhaps this is a counter argument against hiring those with a lower ceiling in the beginning. I rate my epistemic confidence in Business Analyst vs Data Scientist hot takes at 30%.

We made a lot of maps (we made a lot of maps)

Hiring quickly over quality (and the futility of pedigree)

This advice might be terribly broad and extremely particular to the specific context of Curative and COVID-19 testing: hire those who can start working as fast as possible and cull as you go.

Maybe it’s the word “cull” that feels terrible to write - but I stand by it. Given the rate of growth, a mediocre ass in a seat today is worth a lot more than a perfect ass in a month. I did a technical interview with the candidate and a 30 day work trial. I generally believe interviews are voodoo and really you are kidding yourself if you can discern “fine” from “fantastic” in a rapid amount of time. I know I relied too much on pedigree - FAANG and Ivy league schools were taken to mean much more than they should have and I probably will discount them to near 0 signal in the future.

Similar to the advice of never going to the hospital in July when all the freshly minted MDs start, a newly minted manager carries similar warnings. I can’t help but feel guilty about the hiring and firings I butchered but I’m not sure there’s a way to learn otherwise. I found the book An Elegant Puzzle: Systems of Engineering Management to be indispensable.

North Star Metrics

Every night I (and eventually my team) would make slides for the executive team to give context for how the various aspects of the lab were operating. The number 1 metric we cared about was how long it took for someone to perform a test and get a result back ( when I left nearly all patients were given a result in 48 hours - impressive given the shipping time from Texas and Florida ).

I think one of the things I did the worst was just lob this stat into the nightly sync, pat myself on the back, and declare it a job well done. This, unsurprisingly, was not particularly effective. In retrospect the whole job was an optimization between speed and quality and figuring out who to push on to fix things in the desired direction. When I transitioned to leading the team I was surprised to find most of my time became politics and not programming. I initially thought of this as bureaucratic cruft, but in reality politics is how you take your results from the spreadsheet into the lab. Without someone to do this deftly, the team's value is severely limited.

Breaking the Engine

One of the most useful things I learned was my ceiling for work was probably a magnitude higher than I had thought. When mission, purpose, and opportunity to learn align, output was shockingly high. But entropy comes for us all. After about 5 months things started to break down for me - I had thrown my health out the window and the burnout was flaring hard.

I helped scale a startup to a real company and built a great team, but learning went from stepwise to flat and new challenges became less obvious. When I realized this, my sense of purpose declined and I was swamped by exhaustion. Curative was always meant to be temporary for me; at the first all hands, when we all fit into a single conference room, we were told to think in weeks not months. Now 1,000 employees later and 6 months to the day of seeing that tweet I had my final day.

Thanks to Curative for the opportunity and the friends I’ll never forget - a weird pandemic experience in a weird time.

Thanks to Aubree for her careful read.