But consider the scope of the challenge. There are some 20,000 protein-coding genes in the human genome, each of which can come in four or five versions, which adds up to about 100,000 possible protein sequences. Once a protein is created, it can be chemically modified by one or more of some 200 different tags. Multiply that by the fact that these tags can attach to a protein molecule at multiple positions, and you get a virtually limitless number of possible tagging combinations and, consequently, of differently modified proteins.
Standard methods cannot recognize these modified proteins. Each protein has a unique sequence of amino acid building blocks that allows it to be identified by comparing this sequence to protein blueprints in reference libraries, but modifications get in the way of this recognition because they alter the properties of the amino acids. As analyzing large numbers of such alterations simultaneously has been exceedingly difficult, a broad study of modified proteins – one that would ask open-ended questions – was too challenging. Thus most studies have, until now, tended to narrowly focus on just a few specific modifications.
“Looking at the effects of proteins without the changes that happen in them after they have been created limits our understanding of the true complexity of biological processes,” explains Merbl, of Weizmann’s Systems Immunology Department. “My lab set itself the goal of creating a novel computational tool that can track down dozens of protein modification types in an unbiased manner.”