A case study of designing and fine-tuning llama3 model to support function calling
By Michael Hu20 min readThis post discuss how we designed and fine-tuned llama3 model to support function calling, a immense potential for adapting LLM to real-world business cases. We also provide an open-source project Llama3-FunctionCalling that anyone can access.
A simple and clean implementation of Retrieval Augmented Generation (RAG) with open-source embedding models and LLaMA
By Michael Hu15 min readIn this post, we delve into the workings of the Retrieval Augmented Generation (RAG) system and introduce RAG-LLaMA. This open-source project showcases a straightforward implementation of Retrieval Augmented Generation (RAG), incorporating open-source embedding models and the LLaMA chat model. We build a very simple chatbot that answers questions about Tesla cars, demonstrating RAG's potential for various private knowledge base tasks.
A clean PyTorch implementation of Direct Preference Optimization (DPO) to fine-tune LLaMA models
By Michael Hu2 min readIntroducing DPO-LLaMA. This open-source project features a clean PyTorch implementation of Direct Preference Optimization (DPO) to fine-tune LLaMA model to follow human preference.
4-bit Quantized Low-Rank Adaptation (LoRA) LLM fine-tuning decoupled from Hugging Face
By Michael Hu10 min readIn this post, we discuss quantized LoRA and introduce QLoRA-LLM, my new open-source project. This project showcases a straightforward custom implementation of Quantized LoRA (QLoRA) for fine-tuning a language model (LLM), employing fundamental tools like PyTorch and Bitsandbytes, independent of Hugging Face. Such a tailored implementation of QLoRA proves highly beneficial for fine-tuning a personalized LLM model.
A PyTorch implementation of OpenAI's InstructGPT paper to train and fine-tune LLaMA model to align with human preferences, with support for k-bit quantization and Low-Rank Adaptation (LoRA) fine-tuning
By Michael Hu10 min readIn this post, we discuss how reinforcement learning from human feedback (RLHF) works, and we introduce InstructLLaMA. This open-source project showcases a PyTorch implementation of OpenAI's InstructGPT paper, completely independent of third-party tools like Hugging Face. Here, we substitute the GPT model with Meta's LLaMA pre-trained model. InstructLLaMA facilitates various stages including dataset preparation, pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF) to train and fine-tune the LLaMA2 model, enabling it to adhere to human instructions. This is akin to InstructGPT or ChatGPT but on a smaller scale.
A PyTorch implementation of pre-training and fine-tuning scripts to train and fine-tune GPT-2 models
By Michael Hu2 min readIntroducing MiniGPT. This open-source project features PyTorch implementation of OpenAI's GPT-2 model. It supports dataset preparation, pre-training, fine-tuning, and distributed training with PyTorch FSDP.
A PyTorch implementation of DeepMind's MuZero agent to do planning with a learned model
By Michael Hu2 min readIntroducing MuZero. This open-source project features PyTorch implementation of DeepMind's MuZero agent. MuZero agent supports turn-based, two-player, zero-sum games, as well as single-player games like Atari games.
A PyTorch implementation of DeepMind's AlphaZero agent to play two-player, zero-sum strategy board games like Go and Gomoku.
By Michael Hu2 min readIntroducing AlphaZero. An open-source project features PyTorch implementation of DeepMind's famous AlphaZero agent to play Go and Free-style Gomoku board games.
A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar
By Michael Hu2 min readIntroducing Deep-RL-Zoo. This open-source project features a collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar. Including SOTA algorithms like DQN, Rainbow, PPO, RND, R2D2, and Agent57.