Introduction to Proteomics: From Mass Spectrometry to Data Analysis
What is Proteomics? Proteomics is the large-scale study of proteins — their structures, functions, interactions, and modifications. While the genome provides the more
What is Proteomics?
Proteomics is the large-scale study of proteins — their structures, functions, interactions, and modifications. While the genome provides the blueprint, proteins are the molecular machines that execute cellular functions. The proteome is far more complex than the genome: alternative splicing, post-translational modifications (PTMs), and protein-protein interactions create a dynamic molecular landscape that varies across cell types, developmental stages, and disease states.
Understanding the proteome is critical because proteins are the primary drug targets, biomarkers, and functional effectors in biological systems. Mass spectrometry (MS)-based proteomics has emerged as the dominant technology for comprehensive protein analysis, enabling the identification and quantification of thousands of proteins from complex biological samples.
Mass Spectrometry Fundamentals
How Mass Spectrometry Works
A mass spectrometer measures the mass-to-charge ratio (m/z) of ionized molecules. In proteomics, the typical workflow involves three key steps:
-
Ionization: Peptides are converted to gas-phase ions, typically using electrospray ionization (ESI) or matrix-assisted laser desorption/ionization (MALDI).
-
Mass Analysis: Ions are separated based on their m/z ratio using instruments such as quadrupoles, time-of-flight (TOF) analyzers, Orbitrap mass analyzers, or ion traps.
-
Detection: The abundance and m/z of each ion species are recorded, generating a mass spectrum.
Tandem Mass Spectrometry (MS/MS)
In tandem MS, precursor ions are selected and fragmented to generate product ions. The fragmentation pattern provides sequence information that enables peptide identification. The most common fragmentation methods include:
-
CID (Collision-Induced Dissociation): Peptides collide with inert gas molecules, primarily producing b and y ions.
-
HCD (Higher-energy Collisional Dissociation): Similar to CID but performed in a dedicated collision cell, providing cleaner spectra with TMT reporter ions.
-
ETD (Electron Transfer Dissociation): Preserves labile PTMs like phosphorylation, producing c and z ions.
Modern Mass Spectrometry Platforms
Current state-of-the-art instruments include:
-
Thermo Orbitrap Astral: Combines an Orbitrap analyzer with a novel Astral analyzer, achieving unprecedented speed (200+ Hz MS/MS) and depth (>10,000 proteins from a single cell line).
-
Bruker timsTOF Ultra: Uses trapped ion mobility spectrometry (TIMS) for an additional dimension of separation, with PASEF (Parallel Accumulation Serial Fragmentation) for high sensitivity.
-
SCIEX ZenoTOF 7600: Features Zeno trapping for near-100% duty cycle, dramatically improving sensitivity for data-independent acquisition.
Sample Preparation
Bottom-Up Proteomics
The most widely used approach in proteomics is bottom-up (shotgun) proteomics, where proteins are digested into peptides before MS analysis:
-
Cell Lysis: Cells are lysed using detergents (SDS, SDC), urea, or physical disruption to release proteins.
-
Reduction and Alkylation: Disulfide bonds are reduced (typically with DTT or TCEP) and alkylated (with iodoacetamide or chloroacetamide) to prevent re-formation.
-
Digestion: Proteins are digested with trypsin (cleaving after K and R residues) or a combination of Lys-C and trypsin.
-
Cleanup: Peptides are desalted using C18 solid-phase extraction (SPE) cartridges or StageTips.
-
Fractionation (optional): High-pH reversed-phase fractionation or strong cation exchange (SCX) increases proteome depth.
Sample Preparation Methods
Several standardized workflows are available:
-
SP3 (Single-Pot, Solid-Phase-enhanced Sample Preparation): Uses magnetic beads for protein capture and cleanup. Fast, flexible, and compatible with low input amounts.
-
FASP (Filter-Aided Sample Preparation): Uses molecular weight cutoff filters for buffer exchange and digestion. Excellent for removing detergents like SDS.
-
In-solution digestion: The simplest approach — proteins are digested directly in urea buffer after dilution.
-
S-Trap: Combines the benefits of FASP with simpler handling and faster processing times.
Quantification Strategies
Label-Free Quantification (LFQ)
LFQ compares peptide intensities or spectral counts across runs without chemical labels. MaxLFQ (implemented in MaxQuant) uses a delayed normalization approach that is robust and widely adopted. The main advantage is unlimited sample comparison; the disadvantage is that run-to-run variability can reduce quantitative accuracy.
Isobaric Labeling (TMT/iTRAQ)
Tandem mass tags (TMT) allow multiplexing of up to 18 samples in a single MS run. Peptides from different conditions are labeled with chemically identical tags that fragment to produce reporter ions with distinct masses. TMTpro 18-plex enables high-throughput quantitative proteomics with reduced missing values. However, ratio compression due to co-isolation interference remains a challenge addressed by MS3-level quantification or FAIMS gas-phase fractionation.
SILAC (Stable Isotope Labeling by Amino Acids in Cell Culture)
SILAC incorporates heavy isotope-labeled amino acids (typically 13C/15N lysine and arginine) into proteins during cell culture. Labeled and unlabeled samples are mixed early in the workflow, minimizing technical variability. SILAC is considered the gold standard for quantitative accuracy but is limited to cell culture systems.
Data Analysis Pipeline
Database Search
Raw MS/MS spectra are searched against protein sequence databases to identify peptides. Popular search engines include:
-
MaxQuant/Andromeda: The most widely used platform for label-free and SILAC proteomics.
-
Proteome Discoverer/SEQUEST/Mascot: Thermo's commercial platform with multiple search engine options.
-
MSFragger/FragPipe: Ultrafast search engine enabling open modification searches and large-scale analyses.
-
DIA-NN: State-of-the-art tool for data-independent acquisition (DIA) data analysis.
Statistical Analysis
After identification and quantification, statistical analysis identifies differentially abundant proteins. Common tools and approaches include:
-
Perseus: MaxQuant's companion software for statistical analysis and visualization. Provides imputation, normalization, t-tests, ANOVA, and clustering.
-
limma: Originally developed for microarrays, limma's moderated t-statistics are widely used for proteomics through the limma-voom or limma-trend frameworks.
-
MSstats: A dedicated R package for statistical analysis of quantitative proteomics data, supporting multiple experimental designs and quantification methods.
-
DEqMS: Extends limma with peptide count-based variance estimation, improving statistical power for proteomics data.
Functional Analysis
Differentially abundant proteins are interpreted using pathway and gene ontology enrichment analysis. Tools include:
-
STRING: Protein-protein interaction network analysis and functional enrichment.
-
Enrichr/g:Profiler: Gene ontology, pathway, and regulatory enrichment analysis.
-
GSEA: Gene set enrichment analysis that uses ranked protein lists rather than arbitrary cutoffs.
-
Cytoscape: Network visualization with plugins like ClueGO for functional analysis.
Emerging Trends in Proteomics
Single-Cell Proteomics
Recent advances in MS sensitivity enable proteome profiling of individual cells. Methods like SCoPE2/plexDIA, nanoPOTS, and cellenONE-based sample preparation achieve coverage of 1,000-5,000 proteins per single cell. This opens new possibilities for understanding cellular heterogeneity in tissues and tumors.
Clinical Proteomics
High-throughput clinical proteomics platforms like Proximity Extension Assay (Olink), SomaScan, and DIA-MS are enabling large-scale biomarker discovery studies with thousands of patient samples. The UK Biobank Pharma Proteomics Project profiled ~3,000 proteins in 54,000 participants, providing an unprecedented resource for disease biomarker research.
Conclusion
Proteomics has evolved from a specialized technique to a mainstream analytical platform capable of deep, quantitative, and high-throughput protein analysis. The combination of advanced mass spectrometry instrumentation, optimized sample preparation workflows, and sophisticated computational tools makes proteomics an essential component of modern biological research. Whether you're studying disease mechanisms, discovering biomarkers, or characterizing drug targets, proteomics provides molecular insights that no other technology can match.
관련 읽을거리
- 💊 비타민D 부족이 만성피로의 원인? 혈액검사로 확인하세요 — Genobalance
- 🧠 뇌의 가소성: 우리 뇌는 왜 평생 변화하는가 — K-Brain Map
- 💻 AI 기술 동향: 핫 스타트업부터 윤리적 논쟁까지 — BRIC