Portfolio
Featured Projects
1. HPCToolkit: Advanced GPU Performance Analysis
High-Performance Computing, GPU Optimization, Instruction Sampling
Extended the cutting-edge performance analysis tool for high-performance computing systems:
- Implemented multi-tile and multi-card program counter sampling on Intel GPUs
- Extended support for vendor-neutral instruction sampling across NVIDIA, AMD, and Intel GPUs
- Developed a versatile dual-mode data collection framework for optimized performance measurement
2. Accelerating LLM Inference with Ragged Tensor Optimization
CUDA, CUTLASS, CUBLAS, C++ Programming
Optimized an grouped GEMM library for LLM inference:
- Developed a CUDA library for ragged tensor operations, minimizing padding and improving performance
- Implemented and benchmarked various execution scenarios for comprehensive performance analysis
- Created an efficient CUDA kernel for expert computation in MoE models
3. LightNeuron: GEMM-Optimized Framework for ML Inference
GEMM Optimization, SIMD Programming, x86-64 Architecture
Developed an educational neural network library optimized for x86-64 architecture:
- Implemented core CNN components in C, including convolutional layers and matrix multiplication
- Utilized Intel’s VTune Profiler for performance optimization, implementing techniques such as Blocking and SIMD instructions
- Extended support for PyTorch and TensorFlow pre-trained models via HDF5
4. Mastodon Insights: Large-Scale NLP Analysis with Apache Spark
Distributed Computing, Apache Spark, Big Data, Natural Language Processing
Designed and implemented a distributed NLP processing pipeline for analyzing Mastodon content:
- Developed a PySpark application for dynamic TF-IDF matrix computation and updates
- Created a RESTful API integrating complex NLP functionalities, including nearest neighbor computations
- Optimized pipeline performance through pre-computation and caching, coupled with Apache Spark’s distributed processing
5. LeakCheck: Memory Leak Detection Library for C Applications
Graph Search, Memory Leak Detection, Software Reliability, C Programming
Designed a comprehensive memory leak detection library for C applications:
- Implemented a Depth-First Search algorithm to analyze a Directed Cyclic Graph of allocated objects
- Developed a lightweight interception mechanism for memory management functions
- Created an intuitive reporting system for detailed leak information, facilitating efficient debugging
6. Awesome GEMM: Comprehensive Matrix Multiplication Resource
Matrix Multiplication, High-Performance Computing, Software Optimization
Curated an extensive collection of GEMM frameworks and libraries:
- Focused on computational efficiency in matrix operations
- Regularly updated to include the latest advancements in GEMM optimization
- Serves as a valuable resource for researchers and developers in high-performance computing
7. Hands-on SIMD Programming with C++
SIMD Programming, Parallel Processing, C++, Performance Engineering
Created a practical guide to SIMD programming in C++:
- Focused on performance engineering and parallel processing techniques
- Provided hands-on examples and exercises for SIMD optimization
- Aimed at helping developers leverage SIMD instructions for improved application performance
8. Stanford CS149: Parallel Computing Coursework
Parallel Computing, OpenMP, CUDA Programming
Completed coursework in parallel computing, showcasing:
- Implementations of various parallel algorithms using OpenMP and CUDA
- Optimization of parallel programs for multi-core CPUs and GPUs
9. LZ77 Compression Algorithm Implementation
Data Compression, LZ77, C Programming, Algorithm Design
Implemented the LZ77 compression algorithm in C:
- Developed a sliding window and look-ahead buffer for efficient compression
- Optimized for both compression ratio and speed
- Demonstrated understanding of fundamental compression techniques
10. Eva: A Modern Functional Programming Language
Language Design, Interpreter Development, Functional Programming, Object-Oriented Programming
Designed and implemented a modern programming language combining functional and object-oriented paradigms:
- Developed a complete interpreter in JavaScript, focusing on lexical analysis, parsing (using syntax-cli), and execution
- Incorporated functional programming features including closures and higher-order functions
- Implemented syntax sugar and OOP support, enabling versatile coding approaches