User Guide

📄️ Nodes

LLMOS node can either be a virtual or physical machines, depending on your cluster. The first node in the cluster is designated as the cluster-init node by default. Additional nodes can be configured as either server nodes or worker nodes.

📄️ Machine Learning Clusters

A Machine Learning (ML) Cluster provides a distributed computing environment for running machine learning workloads. Built on top of Ray, a unified framework for scaling AI and Python applications, it provides a distributed runtime, parallel processing, and a suite of AI libraries to accelerate your machine learning tasks.

📄️ Notebooks

Notebooks offer a way to run lightweight web-based development environments of JupyterLab, RStudio, and VS Code within your LLMOS cluster, where you can run interactive code, data analysis, and machine learning tasks.

📄️ Model Service

The LLMOS platform makes it easy to serve machine learning models using the ModelService resource. This tool provides a simple way to set up and manage model serving, powered by the vLLM engine. You can configure details like model name, Hugging Face settings, resource needs, and more to deploy models efficiently and at scale.

📄️ Nodes

📄️ Machine Learning Clusters

📄️ Notebooks

📄️ Model Service

🗃️ Model Management

🗃️ GPU Management

🗃️ Monitoring & Alerting

🗃️ Storage

🗃️ Advanced