llama-jni

Android JNI for port of Facebook's LLaMA model in C/C++

Scenario:

Input modules
Voice control modules

Background

llama.cpp uses pure C/C++ language to provide the port of LLaMA, and implements the operation of LLaMA in MacBook and Android devices through 4-bit quantization.

In order to better support the localization operation of large language models (LLM) on mobile devices, llama-jni aims to further encapsulate llama.cpp and provide several common functions before the C/C++ code is compiled for subsequent direct calls by the engineering to help mobile applications on Android devices directly use the LLM stored locally.

The locally run llama-jni can empower mobile devices with powerful AI capabilities without network connection, which maximizes privacy and security.

The goals of llama-jni include:

Refactoring of the code for main.cpp in llama.cpp to achieve text output equivalent to the system command line in Android projects.
Introduction of logs for C/C++ code to better observe program operation in logs during debugging of Android projects.
Rewriting of several CMakeLists.txt files to achieve smooth compilation process in Android Studio.
Typical project structure and usage examples for Native C++ projects in Android Studio, where running MainActivity.java can observe the inference effects of the LLM in logs.
Equal support for multiple models, input parameters, and prompt mode options in llama.cpp.

Install

The tool configuration information of llama-jni is as follows, which requires support for NDK and CMake. Please make sure they have been installed locally.

plugins {
    id 'com.android.application'
}

android {
    namespace 'com.sx.llama.jni'
    compileSdk 33

    defaultConfig {
        applicationId "com.sx.llama.jni"
        minSdk 24
        targetSdk 33
        versionCode 1
        versionName "1.0"

        testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
        }
    }
    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
    externalNativeBuild {
        cmake {
            path file('src/main/cpp/CMakeLists.txt')
            version '3.22.1'
        }
    }
    buildFeatures {
        viewBinding true
    }
}

dependencies {
    implementation 'androidx.appcompat:appcompat:1.6.1'
    implementation 'com.google.android.material:material:1.8.0'
    implementation 'androidx.constraintlayout:constraintlayout:2.1.4'
    testImplementation 'junit:junit:4.13.2'
    androidTestImplementation 'androidx.test.ext:junit:1.1.5'
    androidTestImplementation 'androidx.test.espresso:espresso-core:3.5.1'
}

Usage

Preparations

llama-jni does not include the language model, please prepare the LLM by yourself, and they need to be supported by the specified version of llama.cpp.

On the mobile application dedicated folder of Android external storage device, you need to store the necessary LLM (e.g. GPT4All) and Prompt text files (e.g. chat-with-bob.txt). Assuming their paths are:

/storage/emulated/0/Android/data/com.sx.llama.jni/ggml-vic7b-q5_0.bin
/storage/emulated/0/Android/data/com.sx.llama.jni/chat-with-bob.txt

Then the following two pieces of code in MainActivity.java need to correspond to their file names:

private final String modelName = "ggml-vic7b-q5_0.bin";
private final String txtName = "chat-with-bob.txt";

After the above work is completed, call the llamaInteractive function in MainActivity.java, and modify the second parameter to the dialog input required by the user to complete the preparation work before the LLM inference.

llamaInteractive(tv, "Please tell me the largest city in China.");

Run

Select AVD in Android Studio and then click the Run icon to execute llama-jni based on the local model file.

Arguments

llama-jni provides two refactoring methods for the main.cpp, which are single complete return and continuous stream printing in the Android project.

For these two modes, MainActivity.java displays typical encapsulation and call methods:

Single complete return

// call
llamaIOPrompt(tv, "Please tell me the largest city in China.");

// encapsulation
private void llamaIOPrompt(TextView tv, String userPrompt) {
    modelPtr = createIOLLModel(String.format("%s/%s", getExternalFilesDir(null).getParent(), modelName), 256);
    String output = runIOLLModel(modelPtr, userPrompt);
    tv.setText(output);
    releaseIOLLModel(modelPtr);
}

The equivalent llama.cpp command for this mode is

./main -m "/storage/emulated/0/Android/data/com.sx.llama.jni/ggml-vic7b-q5_0.bin" -p "Please tell me the largest city in China." -n 256

Continuous stream printing

// call
llamaInteractive(tv, "Please tell me the largest city in China.");

// encapsulation
private void llamaInteractive(TextView tv, String userPrompt) {
    modelPtr = createLLModel(String.format("%s/%s", getExternalFilesDir(null).getParent(), modelName), 256);
    initLLModel(modelPtr, String.format("%s/%s", getExternalFilesDir(null).getParent(), txtName), userPrompt);
    while (whileLLModel(modelPtr)) {
        int[] tokenList = embdLLModel(modelPtr);
        if (printLLModel(modelPtr)) {
            for (int t : tokenList) {
                System.out.println(new String(textLLModel(modelPtr, t), StandardCharsets.UTF_8));
            }
        }
        if (breakLLModel(modelPtr)) {
            System.out.println("break");
            break;
        }
    }
    tv.setText(stringFromJNI());
    releaseLLModel(modelPtr);
}

The equivalent llama.cpp command for this mode (some parameters are not exposed to MainActivity.java) is

./main -m "/storage/emulated/0/Android/data/com.sx.llama.jni/ggml-vic7b-q5_0.bin" -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f "/storage/emulated/0/Android/data/com.sx.llama.jni/chat-with-bob.txt"

Examples

Single Complete Return

After running successfully, llama-jni can display the complete inference result of the LLM on the simulator interface based on the following code segment in MainActivity.java.

tv.setText(output);

Continuous Stream Printing

After running successfully, llama-jni can continuously print every token inference result of the large language model in the log column of Android Studio based on the following code segment in MainActivity.java.

for (int t : tokenList) {
    System.out.println(new String(textLLModel(modelPtr, t), StandardCharsets.UTF_8));
}

streaming.mp4

Related Efforts

LLaMA — Inference code for LLaMA models.
llama.cpp — Port of Facebook's LLaMA model in C/C++.

Maintainers

@shixiangcap

Contributing

Feel free to dive in! Open an issue or submit PRs.

Contributors

This project exists thanks to all the people who contribute.

Name		Name	Last commit message	Last commit date
Latest commit History 623 Commits
.idea		.idea
app		app
gradle/wrapper		gradle/wrapper
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README-zh.md		README-zh.md
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama-jni

Table of Contents

Background

Install

Usage

Preparations

Run

Arguments

Examples

Single Complete Return

Continuous Stream Printing

Related Efforts

Maintainers

Contributing

Contributors

License

About

Languages

License

shixiangcap/llama-jni

Folders and files

Latest commit

History

Repository files navigation

llama-jni

Table of Contents

Background

Install

Usage

Preparations

Run

Arguments

Examples

Single Complete Return

Continuous Stream Printing

Related Efforts

Maintainers

Contributing

Contributors

License

About

Resources

License

Stars

Watchers

Forks

Languages