Trainer Node Guide
How to train the model defined by the Task Creator/Admin
This guide provides step-by-step instructions to SoraEngine's trainer node and automate the training process. By the end, you will have successfully trained a hugging-face model in privacy-preserving fashion, contributed to a global model, and is ready for inferencing.
1. Connect Wallet with SoraChain Dashboard
Connect your Metamask wallet and log in to the dashboard.
you will see API key for the authenticating user. You will use those API keys as arguments to the Client automation module.
2. Set up client repository
Before proceeding, make sure you have cloned the client repository described in the "AI Layer Repo" section.
a. Preprocess Data
There are some scripts pre-defined for preprocessing datasets for training different models.
In the dev environment, we are using micro version of nano mistral which is a text-completion model. Use preprocess_nanoArticles.py to process first 1000 lines of dataset in the required format.
The dataset that is being processed is from HuggingFace.
# Assume, you are in cloned repo directory
Python ./src/preprocess_nanoArticles.py --output_dir ./data/Output
3. Start Trainer node Automation
We have developed a streamlined automation script that handles all necessary configurations and initiates the trainer node, enabling it to connect with the aggregator and request tasks. The script requires the following parameters as arguments:
- Client ID 
- Model name or path 
- Data path (location of the dataset) 
- Workspace directory (where configuration files will be retrieved) 
- Training mode (defined by the task creator) – SoraEngine supports standard SFT training, as well as efficient LoRA PEFT training and quantization 
- TrainingServer ( defines the aggregator node endpoint to connect to. ) 
- SoraAccess keys (for authentication) 
- SoraBucketName (specifies the directory from which the client’s configuration files will be fetched) 
This automation simplifies the process of setting up and running trainer nodes within the SoraEngine ecosystem. 🚀
python AutomateClient.py --client_id Client1 --model_name_or_path crumb/nano-mistral \ 
 --data_path ${PWD}/data/Output --workspace_dir ${PWD}/workspace/SoraWorkspace/example_project/prod_00/Client1 --training_server localhost \
 --train_mode PEFT --SORA_ACCESS_KEY_ID ***** --SORA_SECRET_ACCESS_KEY ******* --SORA_BUCKET_NAME sorachaintestnodeIn the dev/test environment, We can enable automation without defining access keys. By default, we are storing our configuration files in workspace/SoraWorkspace directory.

Last updated
Was this helpful?
