Broad In Focus: Tom Green, Software Engineering Manager

For the past seven years, software engineering manager Tom Green has guided the development and maintenance of software tools that support the Genetic Perturbation Platform at the Broad Institute, where he can be found working with a team of software engineers or consulting with scientists...

For the past seven years, software engineering manager Tom Green has guided the development and maintenance of software tools that support the Genetic Perturbation Platform at the Broad Institute, where he can be found working with a team of software engineers or consulting with scientists conducting experimental screens. Two decades ago, however, Green was living without electricity or running water in the jungles of Nicaragua, a houseguest of locals in the remote village of Karawala on the Caribbean coast, doing a very different kind of research.

To earn his Ph.D. in linguistics at MIT, Green served many 4-month-long stints in Nicaragua where he studied Ulwa, an “endangered language” spoken by only a few hundred people. To create an Ulwa dictionary and grammar reference, Green embedded himself within the community, analyzing the language’s structure and usage. Software engineering had been a hobby of Green’s since junior high, so he was able to apply his programming skills to generate reports and perform analyses using his custom-built Ulwa database.

Green enjoyed linguistics, but kept circling back to his long-standing interest in programming. After completing his doctorate and working as a software engineer in industry for several years, Green joined the Harvard Initiative for Innovative Computing, a now-defunct effort to get scientists to work together with programmers to improve the quality of scientific software. Although he had never studied biology, he now worked closely with life scientists and wanted to learn more. Every morning on his way to work, Green listened to OpenCourseWare lectures from MIT’s Introduction to Biology class, co-taught by Broad director Eric Lander. “It was amazing,” said Green. “By the end of that, I was thinking, if I had taken that course earlier, I probably would have gone into genetics instead.”

He only learned about the Broad Institute after passing by the lobby on Main Street on his way to a meeting at MIT, and intrigued, he applied for a job. Genetic Perturbation Platform (GPP) director David Root called him later for a different position. “I came to the Broad for my interview, and I didn’t want to leave,” said Green.

He joined GPP (then known as the RNAi Platform) in 2008 to oversee a team of four other software engineers who support the software needs of this busy platform. To enable studies of genetic function at the Broad and elsewhere, platform scientists design, build, and distribute libraries of reagents that “perturb” gene expression, turning activity up, down, or off, in addition to maintaining an online library database of these reagents. At that time, the platform was primarily focused on RNA interference using short hairpin RNAs to knock genes down, although it now also offers gene over-expression (ORF, or open reading frame) reagents to dial up gene activity and CRISPR-Cas9 reagents to turn it off.

All of these efforts require a significant investment in software infrastructure to keep this high-throughput platform running. Green and his team of engineers build and maintain software tools to track millions reagent samples in the database; to functionally annotate each construct — describing each perturbagen and which gene it targets in the genome; to help scientists design new constructs to perturb genes; to track the current releases of the reference human and mouse genome sequence from NCBI; and to support the analysis of data generated during screening studies.

The fast pace of advancing technology in the field is one of the biggest challenges faced by Green and his team. “Our bread and butter for a long time had been shRNAs,” said Green. “Then ORFs came along, and we spent a lot of time upgrading a system that used to think the only thing in the world was a shRNA hairpin.” When the platform ventured into ORF technology for genetic overexpression, the software still had to track the samples in the same way, but much of the process needed to be modified. “That really caused a big upheaval in the way our data was structured and handled,” he said.

The platform’s most recent new offering is CRISPR constructs for genome editing. These reagents are incredibly valuable research tools — they allow scientists to fully “knock out” a gene, effectively turning it off. Researchers are anxious to use more CRISPR tools in the lab, so GPP has been in a high-throughput production phase to build its CRISPR library. Green and the other software engineers have been just as busy building software tools to design, search for, and find annotation of all the CRISPR constructs. As always, the programmers have to remain flexible, since GPP often shifts from a mode of high-throughput library production to one in which they are running many screening projects.

A major challenge faced by Green and his fellow engineers is maintaining a system that is always up and running. Scientists at the Broad and beyond interact with the GPP library of reagents and database through a web-based system that is frequently being updated with new constructs and new annotations. Unlike other software packages or web portals that release new versions periodically, the GPP library database can’t wait for the next release, but must be constantly updated. “It’s like e-commerce. Everything is all online, all the time, all live,” said Green.

Research and development is another big focus of the platform, so keeping up with the pace of new developments is also challenging for the software team. “We’re never able to rest on our laurels, because then the scientists will have changed their methods or used some different enzyme,” said Green. “There’s an education aspect to our role. We’re not bench scientists or even biologists, so we always need to be educated to keep up with the constant pace of R&D in the platform”

Green and his team are looking forward to completing the CRISPR database and moving out of production mode soon. At the Broad, he enjoys seeing the platform’s tools support cutting-edge science in such a vital place. “I’m really excited for this production phase to be done, so we can spruce up our existing tools,” Green said. “There’s lots of cool stuff to do, and I think we’re finally getting the chance to do it.”