Navigation

Awesome GEMM: A Curated List of GEMM Frameworks, Libraries and Software

Keywords: Matrix Multiplication, High-Performance Computing, Numerical Analysis, Software Optimization, Computational Efficiency.

Awesome GEMM is a curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software. It serves as a comprehensive resource for developers and researchers interested in high-performance computing, numerical analysis, and optimization of matrix operations.

awesome-gemm

code

LightNeuron: An Educational Inference Framework

Keywords: Neural Networks, GEMM Optimization, x86-64 Architecture, C Programming, CPU Efficiency.

LightNeuron is a highly efficient, educational neural network library designed for x86-64 architectures in C. It aims to provide insights into neural network mechanics, profiling, and optimization, with a special focus on the efficiency of General Matrix Multiply (GEMM) operations.

Targeted primarily at students, researchers, and developers, LightNeuron offers a CNN inference framework capable of processing HDF5 model files. This facilitates the integration with models trained on frameworks like PyTorch and TensorFlow. Key features include:

framework

Figure: LightNeuron CNN Inference Framework.

LightNeuron places a strong emphasis on optimizing General Matrix Multiply (GEMM) operations. This optimization leads to significant performance improvements, as measured in GFLOPS (Giga Floating Point Operations Per Second), particularly noticeable across a range of matrix dimensions. Key strategies employed in this optimization include:

The result of these enhancements is a notable increase in CPU computational efficiency, boosting the performance of matrix multiplication operations considerably.

gflops_performance

Figure: Performance of matrix multiplication operations in LightNeuron, measured in GFLOPS.

code

Stanford CS149: Parallel Computing

Keywords: Parallel Computing, Performance Analysis, SIMD, OpenMP, CUDA Programming.

Stanford CS149 aims to impart a fundamental grasp of the principles and trade-offs in modern parallel computing system design. It also focuses on teaching parallel programming skills necessary to effectively leverage these systems. An integral part of the course is understanding machine performance characteristics, hence both parallel hardware and software design are covered.

This repository contains my solutions to the programming assignments for CS149, which include:

code

Insight Mastodon: NLP Analysis with Spark

Keywords: Distributed Computing, Natural Language Processing, Apache Spark, Network Analysis, Big Data.

Insight Mastodon implements a robust data pipeline for analyzing Mastodon toots, utilizing Apache Spark (with PySpark for Python integration) and Apache Hadoop. Designed for local and cloud (AWS Lambda) environments, it leverages Docker for seamless operation.

nlp-spark

Figure: Architecture of the Mastodon toot analysis pipeline.

code

Hands-on SIMD Programming with C++: A Practical Guide

Keywords: Performance Engineering, SIMD Programming, C++ Programming, Parallel Processing, Instruction Sets.

Hands-on SIMD Programming with C++ provides a practical, step-by-step guide to SIMD (Single Instruction, Multiple Data) programming in C++, tailored for beginners. Through a progressive, example-driven approach, it delves into the fundamental techniques of SIMD programming, emphasizing minimal code to cover a broad range of methods. The guide ensures a smooth learning curve for SIMD programming noobs.

code

LZ77 Compression: A C Implementation

Keywords: Data Compression, LZ77, Sliding Window, C Programming.

A C implementation of the LZ77 compression algorithm, featuring a sliding window and a look-ahead buffer.

lz77

Figure: LZ77 is a lossless data compression algorithm that encodes data by referencing earlier occurrences of the same data within a sliding window.

code

LeakCheck: A Memory Leak Detector (MLD) for C

Keywords: Memory Leak Detection, C Programming, Directed Cyclic Graphs, Software Reliability, Automated Analysis.

LeakCheck is a dedicated memory leak detection library designed for C applications. It provides an automated approach to identify memory leaks that can often go undetected, thereby improving the performance and reliability of applications.

Key Features:

leakcheck

Figure: Visualization of memory allocation state using Directed Cyclic Graphs (DCG) in LeakCheck, depicting (a) an ideal scenario without any leaked objects, and (b) a scenario with detected memory leaks.

code

Eva: A Functional Programming Language

Keywords: Functional Programming, Language Design, Interpreter Workflow, Syntax Analysis, Lexical Scoping.

Eva is a modern functional programming language designed to offer an intuitive and powerful approach to software development. It seamlessly integrates core concepts of functional programming with the versatility of object-oriented paradigms.

eva

Figure: Interpreter Workflow Diagram - Depicting the lexing, parsing, and interpretation phases, this figure outlines the transformation from source code to executable output in the Eva language interpreter.

Features

code

Algo Playground: A Collection of DS/A Implementations in Python/Java

Keywords: Data Structures, Algorithms.

Algo Playground is a collection of algorithms and data structures implemented in Python and Java, featuring trees, graphs, stacks, linked lists as well as algorithms such as DFS and BFS. It serves as a playground for refining my skills in algorithmic design and data structure optimization.

algo-playground

code

Back to Top