Introduction
Learn how to preview changes locally
FuncMaster: A Step Towards On-Device Function Calling
This model is designed to bring the power of function calling language models directly to your devices. FuncMaster represents our first attempt into creating highly accessible, efficient, and capable AI models that can operate independently of cloud computing resources.
Overview
FuncMaster-v0.1-Mistral-7B is the inaugural release of our function calling language model (LLM). It is engineered to run locally on computers and iPhones, marking a significant step towards achieving efficient on-device AI. This model is capable of interpreting commands and executing function calls directly from your device, offering a new level of interactivity and utility in AI applications. We fine-tuned this model on top of teknium/OpenHermes-2.5-Mistral-7B due to it’s performance.
Key Features
- Local Execution: Runs directly on your device, eliminating the need for constant cloud connectivity.
- Function Calling Capability: Can interpret and execute function calls, such as checking stock prices or weather conditions.
- Compatibility: Designed to operate on various devices, including PCs and iPhones.
- Speed: Achieves ~9 tokens/second on an iPhone 15 Pro, with a notable performance of ~24 tokens/second on a Mac M1 (Q_4_K_M).
Current Limitations
FuncMaster-v0.1-Mistral-7B is not yet production-ready. Its accuracy and functionality are areas of active development. Current limitations include:
- Accuracy: While promising, the model’s accuracy in executing function calls and delivering reliable information is still under refinement.
- Performance Variability: Speed and efficiency can vary depending on the device and the complexity of the function call.
Performance Benchmarks
- iPhone 15 Pro: Approximately 9 tokens/second.
- Mac M1 (16GB RAM):
- Time to first token: 0.23 seconds.
- Speed: ~24 tokens/second.
- Note: Performance metrics are based on a single function in the system prompt and are subject to change with the complexity of tasks.
Downloads
FuncMaster is available for download and experimentation. Below are links to the model and the dataset used for its training:
Chat Model - Original: FuncMaster-v0.1-Mistral-7B - GGUF: FuncMaster-v0.1-Mistral-7B-GGUF - Lora: FuncMaster-v0.1-Mistral-7B-Lora - Fine-Tuning Notebook: FuncMaster-v0.1-Mistral-7B Google Colab - Dataset: lilacai/glaive-function-calling-v2-sharegpt
Instruct Model - Original: FuncMaster-v0.1-Mistral-7B-Instruct - GGUF: FuncMaster-v0.1-Mistral-7B-Instruct-GGUF - Lora: FuncMaster-v0.1-Mistral-7B-Instruct-Lora - Fine-Tuning Notebook: FuncMaster-v0.1-Mistral-7B-Instruct Google Colab - Dataset: allyson-ai/instruct-function-calling
Repo: https://github.com/Allyson-AI/FuncMaster
How To Run The Model
RNExample (Run LLMs Locally on iPhone)
The RNExample
folder contains a React Native app designed to demonstrate the capabilities of FuncMaster. To get it up and running, follow these steps:
Setup
- Clone the Repo: Ensure you have the FuncMaster repository cloned to your local machine.
- Navigate to RNExample: Change directory into the
RNExample
- Install Dependencies: Run the following commands to install the necessary npm packages and Cocoapods dependencies.
- Open Xcode: To run the app on an iOS device, you’ll need to open the project in Xcode and sign the application with your developer account. This step is crucial for deploying the app to your device.
Running the App
- Download the Model: Ensure you download the desired model version from Hugging Face, specifically the
GGUF
variant. - Configure the App: Depending on the model you’re using (Instruct vs. Chat), adjust the
instruct
variable inApp.jsx
.- For Instruct: Set
instruct
totrue
. - For Chat: Ensure
instruct
is set tofalse
.
- For Instruct: Set
- Launch the App: Use Xcode to build and run the app on your device. To test the model, tap the file button within the app interface.
Model and Accuracy
- The app currently supports two query types:
Q_2
andQ_4_K_M
. For better accuracy, it’s recommended to useQ_4_K_M
, although it’s still under improvement.
Python Scripts and LM Studio
The repository also includes infer.py
, a script for running inference through the LM Studio server.
LM Studio Server
- LM Studio Presets: Load the presets from the
LM Studio
folder corresponding to your model version. The chat model preset is recommended for best performance. - Start the server: Start the server on LM Studio.
- Running the Script: Execute
infer.py
to send a request to the LM Studio server. This script is set up for testing purposes, such as retrieving the stock price of AMZN using theyahoo_fin
package for the function callget_stock_price
.
Example Response:
LM Studio Presets
The presets provided in the LM Studio
folder are designed to simplify the setup process for different model versions. Ensure you select the appropriate preset for your testing scenario.
Future Directions
Our vision for FuncMaster extends far beyond its current capabilities. We are committed to enhancing its accuracy, expanding its functionality, and ensuring it becomes a tool that can seamlessly integrate into everyday tasks. Our roadmap includes:
- Dataset Expansion: Incorporating more diverse, complex datasets focusing on multi-step and multi-turn conversations, along with planning and reasoning datasets.
- Accuracy Improvement: Continuous refinement of the model to improve its reliability and accuracy in function calling.
- Performance Optimization: Ensuring the model remains efficient and effective across a wider range of devices.
Current Limitations
While FuncMaster-v0.1-Mistral-7B marks a significant milestone in our journey towards creating efficient, on-device function-calling language models, it’s important to recognize the areas where the model currently falls short. Our commitment to transparency means we want users and contributors to be fully aware of these limitations as we work together towards improvements.
-
Accuracy and Reliability: The model’s ability to accurately interpret and execute function calls is still evolving. Users may encounter inaccuracies or unexpected responses, particularly with complex queries or less common functions. The model’s current state reflects our initial steps in understanding and developing more sophisticated AI interactions. When using the instruct version the message doesn’t relate to the function call which will be fixed in future versions.
-
Device Performance Variability: While designed to run on a variety of devices, including PCs and iPhones, performance can significantly vary. Factors such as device hardware, available memory, and the complexity of the requested function call can affect execution speed and model responsiveness. For example, while testing on an iPhone yields around 9 tokens per second, this performance is contingent upon the specific task and device conditions.
-
Context and Prompt Size Sensitivity: The model’s performance, particularly in terms of speed and accuracy, is sensitive to the size of the system prompt and the context provided for function calls. Larger contexts or more complex prompts can lead to slower response times and can sometimes challenge the model’s ability to provide accurate and relevant results.
-
Energy Consumption: Running advanced AI models on devices, especially those with limited computing resources like smartphones, can lead to significant energy consumption. Users may notice reduced battery life during extended use, which is an important consideration for mobile and portable applications.
-
Developmental Stage: It’s crucial to understand that FuncMaster-v0.1-Mistral-7B is in an early developmental stage. As such, the model is not yet suitable for production environments or critical applications where reliability is paramount. We are actively working on enhancing the model’s robustness and reliability for future releases.
Looking Ahead
Acknowledging these limitations is not only part of our process of continuous improvement but also an invitation to the community to join us in this development journey. We are exploring various avenues to address these challenges, including optimizing model architecture, expanding training datasets with a wider array of function calls, and refining our training methodologies to enhance model performance across all devices.
We value your feedback and contributions as we work towards creating a more accurate, efficient, and universally accessible model. Together, we can overcome these limitations and achieve our goal of bringing powerful, on-device AI to everyone, everywhere.