Setup: LangChain¶
In this RAG tutorial, we'll be working with LangChain, which is a powerful framework for building applications with language models. LangChain provides utilities for working with various language model providers, integrating embeddings, and creating chains for more complex applications. Below are the necessary imports for this notebook:
import os
import random
import glob
import numpy as np
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.memory import ConversationSummaryMemory
from langchain.prompts import PromptTemplate
from langchain_community.llms import Ollama
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader
from chromadb.utils import embedding_functions
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
import warnings
warnings.filterwarnings('ignore')
Part 1: Retrieval¶
- In this section, we'll focus on the retrieval aspect of RAG. We'll start by understanding vectorization, followed by storing and retrieving vectors efficiently.
Vectorizing¶
- Vectorization is the process of converting text into vectors in an embedding space. These vectors capture the semantic meaning of the text, enabling us to perform various operations like similarity calculations. We'll use HuggingFaceEmbeddings for this task. You can see the documentation for this langchain object here.
vectorizer = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
This vectorizer converts text into vectors in embedding space. Lets try seeing how we can use this.
vectorizer.embed_query("dog")[0:10]
[-0.05314699560403824, 0.014194400049746037, 0.007145748008042574, 0.06860868632793427, -0.07848034799098969, 0.01016747672110796, 0.10228313505649567, -0.01206485740840435, 0.09521342068910599, -0.030350159853696823]
def similarity_two_queries(word1, word2):
# HINT:
# Use vectorizer.embed_query(<text>) to embed text.
# Use np.dot to find the cosine similarity/dot product of 2 vectors
# TODO
return None
# SOLUTION
def similarity_two_queries(word1, word2):
# TODO
word1_vec = vectorizer.embed_query(word1)
word2_vec = vectorizer.embed_query(word2)
return np.dot(word1_vec,word2_vec)
- Observe the similarity scores of both 'cat' and 'dog' to the word 'kitten'
print("Similarity of 'kitten' and 'cat': ",similarity_two_queries("kitten","cat"))
print("Similarity of 'kitten' and 'dog': ",similarity_two_queries("kitten","dog"))
Similarity of 'kitten' and 'cat': 0.7882107650783932 Similarity of 'kitten' and 'dog': 0.5205050628917984
- By using the previously defined function, we can take pairs of texts and quantify how similar they are.
Task 2¶
Which of the following words in the list words are most related to the word 'color'? The function similarity_list takes a list of words, and outputs the word and similarity score from highest to lowest.
def similarity_list(word,words):
similarity_list = [(i,similarity_two_queries("color",i)) for i in words]
sorted_similarity_list = sorted(similarity_list,key=lambda x:x[1],reverse=True)
return sorted_similarity_list
words = ["rainbow","car","black","red","cat","tree"]
# TODO: Which words are most similar to color?
# SOLUTION
similarity_list("color",words)
[('black', 0.7855720240216937),
('red', 0.7491879885567825),
('rainbow', 0.5601087575383563),
('car', 0.40406271589984705),
('cat', 0.35839049605789064),
('tree', 0.35735516274371915)]
Task 3¶
Each query below has an appropriate text that allows you to answer the question. The function match_queries_with_texts matches a query with its most related text. Come up with 3 more questions and 3 suitable answers and add them to the list below.
def match_queries_with_texts(queries, texts):
# Calculate similarities between each query and text
similarities = np.zeros((len(queries), len(texts)))
for i, query in enumerate(queries):
for j, text in enumerate(texts):
similarities[i, j] = similarity_two_queries(query, text)
# Match each query to the text with the highest similarity
matches = {}
for i, query in enumerate(queries):
best_match_idx = np.argmax(similarities[i])
matches[query] = texts[best_match_idx]
return matches
# TODO: Fill in the list to make suitable question-text pairs.
queries = ["What are the 7 colors of the rainbow?",
"What does Elsie do for work?",
"Which country has the largest population?",
"-- INSERT QUERY 1 HERE--",
"-- INSERT QUERY 2 HERE--",
"-- INSERT QUERY 3 HERE--"]
texts = ["China has 1.4 billion people.",
"Elsie works the register at Arby's.",
"The colors of the rainbow are ROYGBIV.",
"-- INSERT TEXT 1 HERE--",
"-- INSERT TEXT 2 HERE--",
"-- INSERT TEXT 3 HERE--"]
#SOLUTION
queries = ["What are the 7 colors of the rainbow?",
"What does Elsie do for work?",
"Which country has the largest population?",
"What time is it?",
"What is the largest continent?",
"Who is the greatest Football player?"]
texts = ["China has 1.4 billion people.",
"Elsie works the register at Arby's.",
"The colors of the rainbow are ROYGBIV.",
"The time is 3:14.",
"The largest continent is Asia.",
"Christiano Ronaldo"]
- Now we shuffle the queries and texts. Let's see if we can match them!
import random
random.shuffle(queries)
random.shuffle(texts)
match_queries_with_texts(queries, texts)
{'Who is the greatest Football player?': 'Christiano Ronaldo',
'What are the 7 colors of the rainbow?': 'The colors of the rainbow are ROYGBIV.',
'What does Elsie do for work?': "Elsie works the register at Arby's.",
'Which country has the largest population?': 'China has 1.4 billion people.',
'What is the largest continent?': 'The largest continent is Asia.',
'What time is it?': 'The time is 3:14.'}
Vector Store¶
- Now lets look at how we can store these for efficient retrieval of the vectors. There are many options for storage but in this exercise, we use ChromaDB
which is an open-source vector DB.
Through langchain, we can set the database to be a langchain *retriever* object, which essentially allows us to perform queries similarly to what we have done before.
- Taking the
textsandqueriesthat you defined before, we can load it into ChromaDB and similarly perform the same operations.
ids = list(range(len(texts)))
random_id = random.randint(100000, 999999)
db = Chroma.from_texts(texts, vectorizer, metadatas=[{"id": id} for id in ids],collection_name=f"temp_{random_id}")
retriever = db.as_retriever(search_kwargs={"k": 1})
texts
['The time is 3:14.', "Elsie works the register at Arby's.", 'Christiano Ronaldo', 'China has 1.4 billion people.', 'The colors of the rainbow are ROYGBIV.', 'The largest continent is Asia.']
retriever.invoke("Which country has the largest population?")
[Document(metadata={'id': 3}, page_content='China has 1.4 billion people.')]
workplaces.txtcontains names and workplaces of several people. Now let’s apply the same retrieval process to a file we read in.
with open("workplaces.txt", 'r') as file:
lines = file.readlines()
lines = [line.strip() for line in lines]
print(lines[0:4])
["Aaron works at McDonald's", 'Beth works at Starbucks', 'Charlie works at Walmart', 'Daisy works at Amazon']
workplace_retriever is a function that takes in the workplace.txt file and returns a database as retriever that you can use to find out the workplaces of people in the file. You can specify the top-k results in the argument of the function.
def workplace_retriever(k=3):
with open("workplaces.txt", 'r') as file:
lines = file.readlines()
lines = [line.strip() for line in lines]
db = Chroma.from_texts(
lines,
vectorizer,
metadatas=[{"id": id} for id in range(len(lines))],
collection_name=f"temp_{id(lines)}"
)
retriever = db.as_retriever(search_kwargs={"k": k})
return retriever
Task 4¶
Using workplace_retriever, find out who works at Starbucks and McDonald's.
# TODO: Find out who works at Starbucks and McDonalds. Use the retriever(k=3).invoke(<query>) method to do this
# Remember to experiment with the value of k to make sure you find all people that work in one place.
# SOLUTION
workplace_retriever(3).invoke("Who works at starbucks")
[Document(metadata={'id': 27}, page_content='Brian works at Starbucks'),
Document(metadata={'id': 1}, page_content='Beth works at Starbucks'),
Document(metadata={'id': 0}, page_content="Aaron works at McDonald's")]
# SOLUTION
workplace_retriever(3).invoke("Who works at McDonald's")
[Document(metadata={'id': 0}, page_content="Aaron works at McDonald's"),
Document(metadata={'id': 26}, page_content="Alice works at McDonald's"),
Document(metadata={'id': 22}, page_content='Wendy works at Reddit')]
Chunking¶
The workplaces.txt data we just looked at was conveniently split into lines, with each line representing a distinct and meaningful chunk of information. This straightforward structure makes it easier to process and analyze the text data.
However, it is usually not so straightforward:
- When dealing with text data, especially from large or complex documents, it's essential to handle the formatting and structure efficiently.
- If we get a not-so-simply formatted file, we can break it down into manageable chunks using LangChain's
TextLoaderandRecursiveCharacterTextSplitter. - This allows us to preprocess and chunk the data effectively for further use in our RAG pipeline.
Lets take a look at some of the Expanse documentation here. We have downloaded the contents of this webpage into two text files named expanse_doc_1.txt and expanse_doc_2.txt.
with open("expanse_doc_1.txt", 'r') as file:
lines = file.readlines()
lines = [line.strip() for line in lines]
print(lines[20:35])
['Job Charging', 'Compiling', 'Running Jobs', 'GPU Nodes', 'Data Movement', 'Storage', 'Composable Systems', 'Software Packages', 'Publications', 'Expanse User Guide', 'Technical Summary', '', '', 'Expanse is a dedicated Advanced Cyberinfrastructure Coordination Ecosystem: Services and Support (ACCESS) cluster designed by Dell and SDSC delivering 5.16 peak petaflops, and will offer Composable Systems and Cloud Bursting.', '']
- We see that the data and text is not split into meaningful chunks of information by default, so we need to try out best to format it in such a way it can be useful. This is why we use chunks, which capture local and neighboring texts, grouping them together.
- When using the RecursiveCharacterTextSplitter, the chunk size determines the maximum size of each text chunk. This is particularly useful when dealing with large documents that need to be split into smaller, manageable pieces for better retrieval and analysis.
def expanse_retriever(chunk_size):
loader = TextLoader('expanse_doc_1.txt')
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=10, separators=[" ", ",", "\n"])
texts = text_splitter.split_documents(documents)
db = Chroma(embedding_function=vectorizer, collection_name=f"expanse_temp_{id(texts)}")
db.add_documents(texts)
retriever = db.as_retriever(search_kwargs={"k": 3})
return retriever
Task 5¶
A function that chunks expanse_doc_1.txt has been provided above, experiment with different chunk sizes and pick a size that captures enough information to answer the question: *"How do you run jobs on expanse?"* Try sizes 10, 100 and 1000 and observe what info is being given.
# TODO: Think about how many characters would be needed to contain useful information for such a complex task
# SOLUTION
expanse_retriever(1000).invoke("How do you run jobs on expanse?")
[Document(metadata={'source': 'expanse_doc_1.txt'}, page_content='up to 30M core-hours.\nJob Scheduling Policies\nThe maximum allowable job size on Expanse is 4,096 cores – a limit that helps shorten wait times since there are fewer nodes in idle state waiting for large number of nodes to become free.\nExpanse supports long-running jobs - run times can be extended to one week. Users requests will be evaluated based on number of jobs and job size. \nExpanse supports shared-node jobs (more than one job on a single node). Many applications are serial or can only scale to a few cores. Allowing shared nodes improves job throughput, provides higher overall system utilization, and allows more users to run on Expanse.\nTechnical Details\nSystem Component\tConfiguration\nCompute Nodes\nCPU Type\tAMD EPYC 7742\nNodes\t728\nSockets\t2\nCores/socket\t64\nClock speed\t2.25 GHz\nFlop speed\t4608 GFlop/s\nMemory capacity\t\n* 256 GB DDR4 DRAM\n\nLocal Storage\t\n1TB Intel P4510 NVMe PCIe SSD\n\nMax CPU Memory bandwidth\t409.5 GB/s\nGPU Nodes\nGPU Type\tNVIDIA V100 SMX2\nNodes\t52\nGPUs/node\t4\nCPU'),
Document(metadata={'source': 'expanse_doc_1.txt'}, page_content='Compute Units (SSCUs), comprising 728 standard nodes, 54 GPU nodes and 4 large-memory nodes. Every Expanse node has access to a 12 PB Lustre parallel file system (provided by Aeon Computing) and a 7 PB Ceph Object Store system. Expanse uses the Bright Computing HPC Cluster management system and the SLURM workload manager for job scheduling.\n\nExpanse Portal Login\n\nExpanse supports the ACCESS core software stack, which includes remote login, remote computation, data movement, science workflow support, and science gateway support toolkits.\n\nExpanse is an NSF-funded system operated by the San Diego Supercomputer Center at UC San Diego, and is available through the ACCESS program.\n\nResource Allocation Policies\nThe maximum allocation for a Principle Investigator on Expanse is 15M core-hours and 75K GPU hours. Limiting the allocation size means that Expanse can support more projects, since the average size of each is smaller.\nScience Gateways requesting in the Maximize tier can request up to'),
Document(metadata={'source': 'expanse_doc_1.txt'}, page_content="script provides additional details regarding project availability and usage. The script is located at:\n\n/cm/shared/apps/sdsc/current/bin/expanse-client\n\nThe script uses the 'sdsc' module, which is loaded by default. \n\n[user@login01 ~]$ module load sdsc\n \nTo review your available projects on Expanse resource use the 'user' parameter and '-r' to desginate a resource. If no resouce is designated expanse data will be shown by default.\n\nuser@login01 ~]$ expanse-client user -r expanse\n\nResource expanse\n\n╭───┬─────────────┬─────────┬────────────┬──────┬───────────┬─────────────────╮\n│ │ NAME │ PROJECT │ TG PROJECT │ USED │ AVAILABLE │ USED BY PROJECT │\n├───┼─────────────┼─────────┼────────────┼──────┼───────────┼─────────────────┤\n│ 1 │ user │ ddp386 │ │ 0 │ 110000 │ 8318 │\n╰───┴─────────────┴─────────┴────────────┴──────┴───────────┴─────────────────╯\n\nTo see full list of available resources, use the 'resource' command:\n\n[user@login02 ~]$")]
Multiple Document Chunking¶
When we have more than one document we want to use in our database, we can simply iteratively chunk them. Metadata for the text source is added by default, but we can add our own metadata as well in the form of IDs.
Task 6¶
expanse_all_retriever is a function that chunks both expanse_doc_1.txt and expanse_doc_2.txt has been provided below, using a chunk size of 1000 characters, find which document information for *"Compiling Codes"* is most likely to be located. Hint: Look at the metadata
def expanse_all_retriever(chunk_size):
random_id = random.randint(100000, 999999) # random 6-digit ID for uniqueness
db = Chroma(
embedding_function=vectorizer,
collection_name=f"expanse_all_temp_{random_id}"
)
pattern = 'expanse_doc_*.txt'
file_list = glob.glob(pattern)
for file_name in file_list:
loader = TextLoader(file_name)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=10,
separators=[" ", ",", "\n"]
)
texts = text_splitter.split_documents(documents)
for i, text in enumerate(texts):
text.metadata["chunk_number"] = i
db.add_documents(texts)
retriever = db.as_retriever(search_kwargs={"k": 3})
return retriever
# TODO: Find the relevant source for the query "Compiling Codes"
chunks = expanse_all_retriever(1000).invoke("Compiling Codes")
for chunk in chunks:
print(chunk.metadata)
# The answer is expanse_doc_2.txt
{'source': 'expanse_doc_2.txt', 'chunk_number': 0}
{'chunk_number': 1, 'source': 'expanse_doc_2.txt'}
{'source': 'expanse_doc_2.txt', 'chunk_number': 5}
Part 2: Basic RAG¶
Ollama is an open-source LLM platform that allows us to use a plethora of different LLMs.
We need to first launch the ollama instance. In a terminal window in the jupyter lab instance, run:
export OLLAMA_HOST="0.0.0.0:$(shuf -i 3000-65000 -n 1)"; echo ${OLLAMA_HOST##*:} > ~/.ollama_port
ollama serve
import os
from ollama import Client
# Read the port from the file
with open(os.path.expanduser('~/.ollama_port')) as f:
port = f.read().strip()
# Connect to 127.0.0.1:<port>
host = f"http://127.0.0.1:{port}"
client = Client(host=host)
# Get LLM
client.pull("gemma3:4b")
ProgressResponse(status='success', completed=None, total=None, digest=None)
llm = Ollama(
model="gemma3:4b",
base_url=f"http://127.0.0.1:{port}", # CRITICAL: Use your custom port
temperature=0
)
llm.invoke("How are you?")
'I’m doing well, thanks for asking! As a large language model, I don’t really *feel* in the way humans do, but my systems are running smoothly and I’m ready to chat and help you with whatever you need. 😊 \n\nHow about you? How’s your day going so far?'
Task 7¶
Write a function that uses the workplace_retriever function to parse your question, retrieves relevant responses from workplace_retriever, and then sends this context to the LLM for it to answer your question in natural language. Fill in workplace_question which accomplishes this task.
#SOLUTION
def workplace_question(question):
retriever = workplace_retriever()
context = retriever.invoke(question)
llm = Ollama(model="gemma3:4b",base_url=f"http://127.0.0.1:{port}",temperature="0.2")
prompt = f"Based on the following context: {context}, answer the question: "
response = llm.invoke(prompt + question)
return response
print(workplace_question("Who are the people that work at Starbucks?"))
Based on the provided documents, the people who work at Starbucks are: * Brian * Beth
Part 3: LangChain RAG¶
The above is a very simple example of a RAG. Now, using langchain, we can put everything together in a cleaner and all inclusive way in one go. Let's combine everything we've learned into the function generate_rag.
- The below implementation has a custom class that allows us to view what chunks are being used based on our queries.
def generate_rag(verbose=False, chunk_info=False):
import glob
random_id = random.randint(100000, 999999)
vectorizer = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma(embedding_function=vectorizer, collection_name=f"expanse_all_temp_{random_id}")
pattern = 'expanse_doc_*.txt'
file_list = glob.glob(pattern)
for file_name in file_list:
loader = TextLoader(file_name)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=10, separators=[" ", ",", "\n"])
texts = text_splitter.split_documents(documents)
for id,text in enumerate(texts):
text.metadata["chunk_number"] = id
db.add_documents(texts)
template = """<s>[INST] Given the context - {context} </s>[INST] [INST] Answer the following question - {question}[/INST]"""
pt = PromptTemplate(
template=template, input_variables=["context", "question"]
)
# Let's retrieve the top 3 chunks for our results
retriever = db.as_retriever(search_kwargs={"k": 3})
class CustomRetrievalQA(RetrievalQA):
def invoke(self, *args, **kwargs):
result = super().invoke(*args, **kwargs)
if chunk_info:
# Print out the chunks that were retrieved
print("Chunks being looked at:")
chunks = retriever.invoke(*args, **kwargs)
for chunk in chunks:
print(f"Source: {chunk.metadata['source']}, Chunk number: {chunk.metadata['chunk_number']}")
print(f"Text snippet: {chunk.page_content[:200]}...\n") # Print the first 200 characters
return result
rag = CustomRetrievalQA.from_chain_type(
llm=Ollama(model="gemma3:4b",base_url=f"http://127.0.0.1:{port}", temperature="0"),
retriever=retriever,
memory=ConversationSummaryMemory(llm=Ollama(model="gemma3:4b", base_url=f"http://127.0.0.1:{port}")),
chain_type_kwargs={"prompt": pt, "verbose": verbose},
)
return rag
Task 8¶
Compare how gemma performs without context, and with context, so without RAG and with RAG.
print(llm.invoke("How can a user check their resource allocations and the resources they can use on the Expanse supercomputer"))
#Try "How can a user check their resource allocations and the resources they can use on the Expanse supercomputer"
Checking your resource allocations and available resources on the Expanse supercomputer (specifically, the Fugaku supercomputer at RIKEN) involves several steps and utilizing different tools. Here's a breakdown of the process, combining information from the Fugaku documentation and common practices:
**1. Understanding the Resource Landscape:**
* **Jobs and Queues:** Fugaku organizes resources through jobs and queues. A *job* is a specific computation you submit. A *queue* is a collection of jobs waiting to be executed.
* **Resource Types:** Fugaku offers a wide range of resources, including:
* **CPU:** Different core counts and speeds.
* **GPU:** NVIDIA GPUs for accelerated computing.
* **Memory:** Various memory sizes (e.g., 64GB, 128GB, 256GB).
* **Network:** High-speed interconnects for efficient communication.
* **Storage:** Access to shared and private storage.
* **Resource Units (RU):** Fugaku uses a concept called Resource Units (RU) to represent the amount of resources a job can consume. This is a key element in scheduling.
**2. Tools for Checking Your Allocations & Available Resources:**
* **`sge` (Scikit Grid Engine):** This is the primary command-line tool for managing jobs and resources on Fugaku.
* **`sge show`:** This is your *most important* command. It displays information about your current job, including:
* **Job ID:** A unique identifier for your job.
* **Job State:** The current status of your job (e.g., `PD`, `RUN`, `COMPLETED`).
* **Resource Usage:** The amount of CPU cores, memory, and GPU resources currently being used by your job.
* **RU Consumption:** The number of Resource Units (RU) your job is consuming.
* **`sge qstat`:** Lists all jobs in the current queue and their status. Useful for seeing what others are running.
* **`sge show qstat`:** Combines the output of `sge qstat` and `sge show qstat` for a more comprehensive view.
* **Fugaku WebUI (Recommended):** The Fugaku WebUI provides a graphical interface for managing your jobs and resources. This is generally the easiest way to get started.
* **Login:** You'll need your Fugaku account credentials to log in.
* **Dashboard:** The dashboard provides an overview of your job status, resource usage, and available resources.
* **Job Management:** You can submit new jobs, modify existing ones, and monitor their progress.
* **Resource Explorer:** This section allows you to browse available resources (CPU, GPU, memory) and their characteristics.
* **`sge show -j <job_id>`:** If you know the ID of a job you've submitted, you can use this command to get detailed information about that specific job.
**3. Steps to Check Your Resources:**
1. **Log in to the Fugaku system:** Typically via SSH.
2. **Open a terminal.**
3. **Run `sge show`:** This will display your current job's resource usage.
4. **(Recommended) Access the Fugaku WebUI:** Navigate to the WebUI URL (provided by the Fugaku support team) and log in. Use the dashboard to see your job status and available resources.
**4. Understanding RU Consumption & Scheduling:**
* **RU Allocation:** When you submit a job, you specify the number of RU you need. The scheduler then attempts to find a node with enough available RU to run your job.
* **RU Units:** 1 RU typically represents a certain amount of CPU cores, memory, and GPU resources. The exact mapping depends on the node configuration.
* **Resource Limits:** You can set resource limits for your jobs to prevent them from consuming excessive resources.
**5. Resources for Further Information:**
* **Fugaku Documentation:** [https://www.fukusu.riken.jp/en/docs/](https://www.fukusu.riken.jp/en/docs/) – This is the official source for all things Fugaku.
* **Fugaku Support Team:** Contact the Fugaku support team for assistance. Their contact information is available on the Fugaku website.
* **Fugaku Tutorials:** The Fugaku website offers tutorials and guides for new users.
**Important Notes:**
* **Account Setup:** You'll need a Fugaku account to submit jobs and access resources. The process for obtaining an account is typically handled by your institution or research group.
* **Queue Selection:** You'll need to submit your jobs to a specific queue. The choice of queue depends on the type of computation you're running and the priority you need.
* **Resource Limits:** Be mindful of resource limits to avoid being penalized or having your jobs rejected.
To help me give you even more tailored advice, could you tell me:
* What is your role in using the Expanse supercomputer (e.g., researcher, student)?
* What kind of computations are you planning to run? (e.g., simulations, data analysis, machine learning)
expanse_rag = generate_rag()
result = expanse_rag.invoke("How can a user check their resource allocations and the resources they can use on the Expanse supercomputer")
print(result["result"])
Here’s how a user can check their resource allocations and available resources on the Expanse supercomputer, based on the provided text:
1. **Using the `expanse-client` command:**
* The user can use the `expanse-client` command with the `user` parameter and `-r` to designate a resource.
* For example: `expanse-client user -r expanse` will display the user's available projects and their resource usage.
2. **Using the `resource` command:**
* The user can use the `resource` command to see a full list of available resources. This command will display a table with columns for NAME, PROJECT, TG PROJECT, USED, and AVAILABLE.
- We can see what is exactly being passed into the LLM highlighted in green when we set
verboseto True.
expanse_rag = generate_rag(verbose=True)
result = expanse_rag.invoke("How can a user check their resource allocations and the resources they can use on the Expanse supercomputer")
print(result["result"])
> Entering new StuffDocumentsChain chain... > Entering new LLMChain chain... Prompt after formatting: <s>[INST] Given the context - Compute Units (SSCUs), comprising 728 standard nodes, 54 GPU nodes and 4 large-memory nodes. Every Expanse node has access to a 12 PB Lustre parallel file system (provided by Aeon Computing) and a 7 PB Ceph Object Store system. Expanse uses the Bright Computing HPC Cluster management system and the SLURM workload manager for job scheduling. Expanse Portal Login Expanse supports the ACCESS core software stack, which includes remote login, remote computation, data movement, science workflow support, and science gateway support toolkits. Expanse is an NSF-funded system operated by the San Diego Supercomputer Center at UC San Diego, and is available through the ACCESS program. Resource Allocation Policies The maximum allocation for a Principle Investigator on Expanse is 15M core-hours and 75K GPU hours. Limiting the allocation size means that Expanse can support more projects, since the average size of each is smaller. Science Gateways requesting in the Maximize tier can request up to up to 30M core-hours. Job Scheduling Policies The maximum allowable job size on Expanse is 4,096 cores – a limit that helps shorten wait times since there are fewer nodes in idle state waiting for large number of nodes to become free. Expanse supports long-running jobs - run times can be extended to one week. Users requests will be evaluated based on number of jobs and job size. Expanse supports shared-node jobs (more than one job on a single node). Many applications are serial or can only scale to a few cores. Allowing shared nodes improves job throughput, provides higher overall system utilization, and allows more users to run on Expanse. Technical Details System Component Configuration Compute Nodes CPU Type AMD EPYC 7742 Nodes 728 Sockets 2 Cores/socket 64 Clock speed 2.25 GHz Flop speed 4608 GFlop/s Memory capacity * 256 GB DDR4 DRAM Local Storage 1TB Intel P4510 NVMe PCIe SSD Max CPU Memory bandwidth 409.5 GB/s GPU Nodes GPU Type NVIDIA V100 SMX2 Nodes 52 GPUs/node 4 CPU script provides additional details regarding project availability and usage. The script is located at: /cm/shared/apps/sdsc/current/bin/expanse-client The script uses the 'sdsc' module, which is loaded by default. [user@login01 ~]$ module load sdsc To review your available projects on Expanse resource use the 'user' parameter and '-r' to desginate a resource. If no resouce is designated expanse data will be shown by default. user@login01 ~]$ expanse-client user -r expanse Resource expanse ╭───┬─────────────┬─────────┬────────────┬──────┬───────────┬─────────────────╮ │ │ NAME │ PROJECT │ TG PROJECT │ USED │ AVAILABLE │ USED BY PROJECT │ ├───┼─────────────┼─────────┼────────────┼──────┼───────────┼─────────────────┤ │ 1 │ user │ ddp386 │ │ 0 │ 110000 │ 8318 │ ╰───┴─────────────┴─────────┴────────────┴──────┴───────────┴─────────────────╯ To see full list of available resources, use the 'resource' command: [user@login02 ~]$ </s>[INST] [INST] Answer the following question - How can a user check their resource allocations and the resources they can use on the Expanse supercomputer[/INST] > Finished chain. > Finished chain. Here’s how a user can check their resource allocations and available resources on the Expanse supercomputer, based on the provided text: 1. **Using the `expanse-client` command:** * The user can use the `expanse-client` command with the `user` parameter and `-r` to designate a resource. * For example: `expanse-client user -r expanse` will display the user's current allocations and available resources. 2. **Using the `resource` command:** * The user can use the `resource` command to see a full list of available resources. This command will display a table with columns for NAME, PROJECT, TG PROJECT, USED, and AVAILABLE.
- For more concise information, the function defined allows us to see individual chunk details as well as their source.
expanse_rag = generate_rag(chunk_info=True)
result = expanse_rag.invoke("How can a user check their resource allocations and the resources they can use on the Expanse supercomputer")
Chunks being looked at: Source: expanse_doc_1.txt, Chunk number: 1 Text snippet: Compute Units (SSCUs), comprising 728 standard nodes, 54 GPU nodes and 4 large-memory nodes. Every Expanse node has access to a 12 PB Lustre parallel file system (provided by Aeon Computing) and a 7 P... Source: expanse_doc_1.txt, Chunk number: 2 Text snippet: up to 30M core-hours. Job Scheduling Policies The maximum allowable job size on Expanse is 4,096 cores – a limit that helps shorten wait times since there are fewer nodes in idle state waiting for lar... Source: expanse_doc_1.txt, Chunk number: 12 Text snippet: script provides additional details regarding project availability and usage. The script is located at: /cm/shared/apps/sdsc/current/bin/expanse-client The script uses the 'sdsc' module, which is lo...
print(result["result"])
Here’s how a user can check their resource allocations and available resources on the Expanse supercomputer, based on the provided text:
1. **Using the `expanse-client` command:**
* The user can use the `expanse-client` command with the `user` parameter and `-r` to designate a resource.
* For example: `expanse-client user -r expanse` will display the user's current allocations and available resources.
2. **Using the `resource` command:**
* The user can use the `resource` command to see a full list of available resources. This command will display a table with columns for NAME, PROJECT, TG PROJECT, USED, and AVAILABLE.
Great work! We've officially made a chatbot that can help us out with all things Expanse, at least according to the 2 .txt files we have access to!