From e9d546434aad5040a740598726c3885883d673f9 Mon Sep 17 00:00:00 2001
From: rainw <1368350749@qq.com>
Date: Fri, 7 Nov 2025 10:17:52 +0800
Subject: [PATCH] fix problem desc & add time limit
---
2025/Global_Memory_Planning_for_LLM.en.md | 49 +-
2025/Global_Memory_Planning_for_LLM.md | 32 +-
2025/checker/Makefile | 10 +
2025/checker/README.md | 8 +
2025/checker/checker.cc | 153 +
2025/checker/example/infile.txt | 3 +
2025/checker/example/outfile.txt | 6 +
2025/checker/testlib.h | 6299 +++++++++++++++++++++
8 files changed, 6533 insertions(+), 27 deletions(-)
create mode 100644 2025/checker/Makefile
create mode 100644 2025/checker/README.md
create mode 100644 2025/checker/checker.cc
create mode 100644 2025/checker/example/infile.txt
create mode 100644 2025/checker/example/outfile.txt
create mode 100644 2025/checker/testlib.h
diff --git a/2025/Global_Memory_Planning_for_LLM.en.md b/2025/Global_Memory_Planning_for_LLM.en.md
index 9582a11..1b464c7 100644
--- a/2025/Global_Memory_Planning_for_LLM.en.md
+++ b/2025/Global_Memory_Planning_for_LLM.en.md
@@ -1,4 +1,4 @@
-#
2025 competition:Global Memory Planning for Large Model Training and Inference
+# 2025 competition:Global Memory Planning for Large Model Training and Inference
## **Ⅰ. Background**
With the development of AI, the parameter count and supported sequence length of large models have increased significantly. Against this backdrop, the memory requirements for training and inference of large models have also grown increasingly high, leading to frequent occurrences of out-of-memory issues. To address the challenge of insufficient memory, the industry has adopted memory offloading as one of the mainstream solutions. For instance, during large model inference, if the KVCache cannot be fully stored in the memory of NPUs/GPUs, a portion of the KVCache can be offloaded to the host memory and retrieved back to the NPU/GPU when needed, thereby alleviating memory pressure. However, memory offloading solutions usually introduce considerable overhead (depending on the size of the data to be offloaded and bandwidth). Generally, it is necessary to overlap the memory offloading process with the operator computation process, reducing overhead through mutual masking.
@@ -6,11 +6,11 @@ With the development of AI, the parameter count and supported sequence length of
The competition focuses on the entire training/inference process of large models. It abstracts the memory access requirements throughout the full training-inference workflow and derives the optimal timing for memory offloading/loading by solving global memory access problems. This approach not only resolves the out-of-memory issue but also minimizes the impact on end-to-end latency as much as possible. The competition requires participants to meet the basic functional requirements of the problem; those with high completion rates or excellent algorithm performance will win. Additionally, participants are expected to maintain good coding style, and excellent code quality may be considered for extra points.
## **Ⅱ. Problem Description**
-For the training/inference process of large models, the abstraction of global memory access can be expressed using a sequence of 4-tuples as follows:[addr_0, size_0, start_0, time_0], [addr_1, size_1, start_1, time_1], … [addr_n, size_n, start_n, time_n]
+For the training/inference process of large models, the abstraction of global memory access can be expressed using a sequence of 4-tuples as follows:[$addr_0$, $size_0$, $start_0$, $time_0$], [$addr_1$, $size_1$, $start_1$, $time_1$], … [$addr_n$, $size_n$, $start_n$, $time_n$]
Addr and size define a memory segment, representing the starting address and size of this segment.Start and time define the time period during which this memory segment is accessed, representing the start time of access and the duration of access.
The following points require attention:
1. The same memory segment (or its subset) may be accessed repeatedly, and there may be overlaps between different memory segments.
-2. The memory access times satisfy the condition start_0 ≤ start_1 ≤ … ≤ start_n, which also means that access to different memory segments may occur simultaneously.
+2. The memory access times satisfy the condition $start_0 \le start_1 \le … \le start_n$, which also means that access to different memory segments may occur simultaneously.
3. Each 4-tuple in the memory access sequence is regarded as a computation process. This process can be parallelized with the memory offloading/loading process, and neither affects the execution time of the other.
4. The memory offloading and loading processes cannot be parallelized with each other.
5. In the initial state (before the execution of the first memory sequence), all memory is idle, and no data has been loaded into the memory.
@@ -22,29 +22,29 @@ For a given memory access sequence, appropriate memory offloading/loading operat
## **Ⅳ. Use Cases**
**Input:**
-The first line inputs three integers L, M, and N. Among them: L represents the total virtual address size of the process; M represents the total capacity of HBM (High-Bandwidth Memory); N represents the length of the memory access sequence.The constraints are 1 ≤ M ≤ L ≤ 100000 and 1 ≤ N ≤ 10000.The next N lines each contain four integers: addrᵢ, sizeᵢ, startᵢ, timeᵢ (with constraints 1 ≤ addrᵢ, sizeᵢ ≤ L and 1 ≤ startᵢ, timeᵢ ≤ 10⁹). addrᵢ denotes the starting virtual address of the memory segment;
-sizeᵢ denotes the size of the virtual address segment;
-startᵢ denotes the earliest accessible time of the memory segment; timeᵢ denotes the duration for which the memory segment is accessed. The input guarantees that startᵢ is monotonically increasing.
+The first line inputs three integers $L$, $M$, and $N$. Among them: $L$ represents the total virtual address size of the process; $M$ represents the total capacity of HBM (High-Bandwidth Memory); $N$ represents the length of the memory access sequence.The constraints are $1 \le M \le L \le 100000$ and $1 \le N \le 10000$.The next $N$ lines each contain four integers: $addr_i$, $size_i$, $start_i$, $time_i$ (with constraints $0 \le addr_i \lt L, 1\le size_i \le L $ and $0 \le start_i, time_i \le 10^9$). $addr_i$ denotes the starting virtual address of the memory segment;
+$size_i$ denotes the size of the virtual address segment;
+$start_i$ denotes the earliest accessible time of the memory segment; $time_i$ denotes the duration for which the memory segment is accessed. The input guarantees that $start_i$ is **monotonically non-decreasing**.
**Output:**
-Output a maximum of 5*N lines. Each line represents an execution operation, which is divided into four types:
-1. Reload Ti Ai SiIndicates a prefetch operation. It means loading the memory with starting virtual address Ai and size Si into HBM at time Ti, which takes a total of 40 * Si time. The constraints are 1 ≤ Ai, Si and Ai + Si ≤ L.
-2. Visit Ti AiIndicates a memory access operation. It means starting the Ai-th memory access request at time Ti, which takes a total of time[Ai] (the access duration of the Ai-th memory segment) time. The constraints are 0 ≤ Ai < N and Ai (the index of memory access requests) must be monotonically increasing.
-3. Offload Ti Ai SiIndicates an offload operation. It means releasing the memory with starting virtual address Ai and size Si from HBM at time Ti, which takes a total of 40 * Si time. The constraints are 1 ≤ Ai, Si and Ai + Si ≤ L.
-4. Fin T Indicates the end of the task. It means the time when all computing tasks are completely finished is T. This operation must be output only once in the last line.
+Output a maximum of $5\times N$ lines. Each line represents an execution operation, which is divided into four types:
+1. *Reload* $T_i$ $A_i$ $S_i$. Indicates a prefetch operation. It means loading the memory with starting virtual address $A_i$ and size $S_i$ into HBM at time $T_i$, which takes a total of $40 \times S_i$ time (The portions that already reside in HBM are not reloaded, and their loading incurs no time overhead). The constraints are $0 \le A_i \lt L, 1 \le S_i \le L$ and $A_i + S_i \le L$.
+2. *Visit* $T_i$ $A_i$. Indicates a memory access operation. It means starting the $A_i$-th memory access request at time $T_i$, which takes a total of time[$A_i$] (the access duration of the $A_i$-th memory segment) time. The constraints are $0 \le A_i \lt N$ and $A_i$ (the index of memory access requests) must be monotonically increasing.
+3. *Offload* $T_i$ $A_i$ $S_i$. Indicates an offload operation. It means releasing the memory with starting virtual address $A_i$ and size $S_i$ from HBM at time $T_i$, which takes a total of $40 \times S_i$ time (The portions that have already been released are not released again, and their release incurs no time overhead). The constraints are $0 \le A_i \lt L, 1 \le S_i \le L$ and $A_i + S_i \le L$.
+4. *Fin* $T$. Indicates the end of the task. It means the time when all computing tasks are completely finished is $T$. This operation must be output only once in the last line.
Among the operations:
- Operations 1 and 3 are read-write type operations;
- Operation 2 is a computation type operation.
-Read-write type operations and computation type operations can be completed in parallel, but operations of the same type can only be completed in parallel (i.e., read-write operations cannot be parallelized with other read-write operations, and computation operations cannot be parallelized with other computation operations).
+Read-write type operations and computation type operations can be completed in parallel, but operations of the same type can only be completed in serial (i.e., read-write operations cannot be parallelized with other read-write operations, and computation operations cannot be parallelized with other computation operations).
-**Input Case 1:**
+**Input Case 1:**
200 100 2
0 100 0 30
100 100 50 10
-**Output Case 1:**
+**Output Case 1:**
Reload 0 0 100
Visit 4000 0
Offload 4030 0 100
@@ -52,13 +52,23 @@ Reload 8030 100 100
Visit 12030 1
Fin 12040
-**Input Case 2:**
+
+**Case 1 Explanation:**
+1. At the initial time $T = 0$, no virtual address space resides in the GPU/NPU memory. The memory capacity is $100$.
+2. Since memory access #0 requires the virtual address range $[0, 100)$, a *Reload* operation is performed to load this range into memory. The *Reload* operation loads $100$ units of memory and costs $4000$ units of time. At this point, $T = 4000$, and the data in $[0, 100)$ is now resident in memory.
+3. The earliest start time of memory access #0 is $0 \le T$, so it can begin immediately. Output a *Visit* operation, which takes $30$ units of time. Now $T = 4030$.
+4. Since the memory is full, before performing access #1, an *Offload* operation must be executed. The virtual address range $[0, 100)$ is offloaded, and $T = 8030$ afterward.
+5. The virtual address range $[100, 200)$ is then loaded into memory, which takes $4000$ units of time. Now $T = 12030$.
+6. The earliest start time of memory access #1 is $50 \le T$, so it can begin immediately. Output a *Visit* operation, which takes $10$ units of time. Now $T = 12040$.
+7. All memory access operations have been completed. Output *FIN*. The total elapsed time is $T = 12040$.
+
+**Input Case 2:**
300 200 3
0 100 0 50
100 100 4000 30
150 100 4001 20
-**Output Case 2:**
+**Output Case 2:**
Reload 0 0 100
Visit 4000 0
Reload 4000 100 100
@@ -68,7 +78,12 @@ Reload 10000 200 50
Visit 12000 2
Fin 12020
+**Case 2 Explanation:**
+Note that memory access operations (*Reload/Offload*) and computation operations can proceed in parallel. Therefore, at time $T = 4000$, the *Visit* and *Reload* operations can start simultaneously.
+Since the virtual address ranges accessed by memory access #1 and #2 partially overlap, the *Offload* operation does not need to evict the entire memory region accessed by access #1.
+
+
## **Ⅴ. Evaluation**
-- Basic Requirements (Level 0): The algorithm's output must meet Objective 1 and Objective
+- Basic Requirements (Level 0): The algorithm's output must meet Objective 1 and Objective. The algorithm must complete execution within **10 seconds** and use no more than **1 GB of memory**.
- Advanced Requirements (Level 1): On the premise of meeting the basic requirements, the total time to complete the computation of the entire input sequence (i.e., the end time of the last memory access operation) should be as short as possible.
- Additional Bonus Items: Providing detailed algorithm description documents, etc.
\ No newline at end of file
diff --git a/2025/Global_Memory_Planning_for_LLM.md b/2025/Global_Memory_Planning_for_LLM.md
index ccc579f..ec3acbe 100644
--- a/2025/Global_Memory_Planning_for_LLM.md
+++ b/2025/Global_Memory_Planning_for_LLM.md
@@ -7,12 +7,12 @@
## **二、问题描述**
对于大模型训练/推理的过程,对于全局内存访问的抽象可以用以下四元组的序列来表达:
-[addr_0, size_0, start_0, time_0], [addr_1, size_1, start_1, time_1], … [addr_n, size_n, start_n, time_n]
+[$addr_0$, $size_0$, $start_0$, $time_0$], [$addr_1$, $size_1$, $start_1$, $time_1$], … [$addr_n$, $size_n$, $start_n$, $time_n$]
addr和size定义了一段内存,表示这段内存的起始地址和大小。
start和time定义了这段内存被访问的时间段,表示开始访问的起始时间和持续时间。
有以下几点需要注意:
1. 同一段内存(的子集)可能被反复访问,并且不同的内存段之间可能存在交集。
-2. 内存访问时间满足start_0 <= start_1 <= … <= start_n,这也意味着不同段内存的访问可能是同时发生的。
+2. 内存访问时间满足$start_0 \le start_1 \le ... \le start_n$,这也意味着不同段内存的访问可能是同时发生的。
3. 内存访问序列的每个四元组都被认为是一个计算过程,这个过程可以和内存卸载/加载过程并行,并且不影响彼此的执行时间。
4. 内存卸载和加载过程,不能并行。
5. 在初始状态时(第一条内存序列执行前),内存都是空闲的,所有的数据都未加载到内存。
@@ -25,17 +25,17 @@ start和time定义了这段内存被访问的时间段,表示开始访问的
## **四、用例描述**
**输入格式:**
-第一行输入三个整数L,M,N,其中L表示进程总虚拟地址大小,M表示HBM总容量,N表示内存访问序列长度(1<= M <= L <=100000,1 <= N <= 10000)。接下来N行,每行4个整数addri, sizei, starti, timei (1 <= addri, sizei, <= L, 1 <= starti, timei <= 109)。其中addri表示该段内存的起始虚拟地址,sizei表示该段虚拟地址大小,starti表示该段内存的最早可访问时间,timei表示该段内存被访问的持续时间。输入保证starti是单调递增的。
+第一行输入三个整数$L$,$M$,$N$,其中$L$表示进程总虚拟地址大小,$M$表示HBM总容量,$N$表示内存访问序列长度($1 \le M \le L \le 100000$; $1 \le N \le 10000$)。接下来$N$行,每行4个整数$addr_i$, $size_i$, $start_i$, $time_i$ ($0 \le addr_i \lt L, 1\le size_i \le L $; $addr_i + size_i \le L$; $0 \le start_i, time_i \le 10^9$)描述一次内存访问。其中$addr_i$表示该段内存的起始虚拟地址,$size_i$表示该段虚拟地址大小,$start_i$表示该段内存的最早可访问时间,$time_i$表示该段内存被访问的持续时间。输入保证$start_i$是**单调不减**的。
**输出格式:**
-输出最多5*N行,每一行表示一个执行操作,分为四种类型:
-1. Reload Ti Ai Si,表示预取操作,意为在Ti时刻把起始虚拟地址为Ai,大小为Si的内存载入HBM,总共花费时间40* Si。需要保证1<=Ai, Si,且Ai + Si <= L。
-2. Visit Ti Ai,表示访存操作,意为在Ti时刻开始进行第Ai个内存访问诉求,总共花费时间为timeAi。需要保证0 <= Ai < N,且访存操作的Ai单调递增。
-3. Offload Ti Ai Si,表示卸载操作,意为在Ti时刻把起始虚拟地址为Ai,大小为Si的内存从HBM中释放,总共花费时间40* Si。需要保证1<=Ai, Si,且Ai + Si <= L。
-4. Fin T,表示任务结束,意为所有计算任务全部完成的时间为T。需要保证该操作只在最后一行输出一次。
+输出最多$5 \times N$行,每一行表示一个执行操作,分为四种类型:
+1. *Reload* $T_i$ $A_i$ $S_i$,表示预取操作,意为在$T_i$时刻把起始虚拟地址为$A_i$,大小为$S_i$的内存载入HBM,总共花费时间$40 \times S_i(其中已存在HBM的部分不会重复载入,不计时间开销)$。需要保证$0 \le A_i \lt L, 1 \le S_i \le L$,且$A_i + S_i \le L$。
+2. *Visit* $T_i$ $A_i$,表示访存操作,意为在$T_i$时刻开始进行第$A_i$个内存访问诉求,总共花费时间为$time_{A_i}$。需要保证$0 \le A_i \lt N$,且访存操作的$A_i$单调递增。
+3. *Offload* $T_i$ $A_i$ $S_i$,表示卸载操作,意为在$T_i$时刻把起始虚拟地址为$A_i$,大小为$S_i$的内存从HBM中释放,总共花费时间$40 \times S_i$(其中已释放的部分不会重复释放,不计时间开销)。需要保证$0 \le A_i \lt L, 1 \le S_i \le L$,且$A_i + S_i \le L$。
+4. *Fin* $T$,表示任务结束,意为所有计算任务全部完成的时间为$T$。需要保证该操作只在最后一行输出一次。
-其中操作1和操作3为读写类型操作,操作2为计算类型操作。读写类型操作和计算类型操作可以并行完成,但是同种类型的操作只能并行完成。
+其中操作1和操作3为读写类型操作,操作2为计算类型操作。读写类型操作和计算类型操作**可以并行**完成,但是同种类型的操作**只能串行**完成。
**输入示例一:**
200 100 2
@@ -50,6 +50,15 @@ Reload 8030 100 100
Visit 12030 1
Fin 12040
+**示例一说明:**
+1. 初始时间$T=0$,所有虚拟地址空间均不在GPU/NPU内存中,内存容量为$100$。
+2. 由于序号$0$的内存访问需要用到$[0,100)$范围的虚拟地址,所以输出*Reload*操作把它们载入内存。*Reload*操作加载$100$单位内存,花费$4000$单位时间,此时$T=4000$,虚拟地址$[0,100)$的数据已经在内存中。
+3. 序号$0$的内存访问最早开始时间为$0 \le T$,可以直接开始序号$0$的内存访问,输出*Visit*操作,花费$30$个单位的时间,此时$T=4030$。
+4. 由于此时内存已满,执行序号1的访问前需要先做*offload*操作。虚拟地址空间$[0,100)$完成*offload*,此时$T=8030$。
+5. 把虚拟地址$[100,200)$的数据载入内存,花费$4000$个单位时间,此时$T=12030$。
+6. 序号$1$的内存访问最早开始时间为$50 \le T$,可以直接开始序号$1$的内存访问,输出*Visit*操作,花费$10$个单位的时间,此时$T=12040$。
+7. 所有内存访问操作都已完成,输出*FIN*,花费总时间$T=12040$。
+
**输入示例二:**
300 200 3
0 100 0 50
@@ -66,7 +75,10 @@ Reload 10000 200 50
Visit 12000 2
Fin 12020
+**示例二说明:**
+注意,读写操作和计算操作是可以并行的,因此在时刻$T=4000$时,可以同时开始*Visit*和*Reload*操作。由于内存访问序号1、2的虚拟地址存在重合部分,做*offload*操作时不需要把序号1所访问的内存全部*offload*。
+
## **五、评选标准**
-- 基础要求(Level 0):算法的输出满足目标一和目标二。
+- 基础要求(Level 0):算法的输出满足目标一和目标二,并且符合输出格式要求;算法执行时间不超过10秒,使用内存不超过1GB。
- 进阶要求(Level 1):在满足基础要求的前提下,完成整个输入序列计算的总时间(即最后一次访存操作结束的时间)越短越好。
- 其他加分项:能提供详细的算法说明文档等。
\ No newline at end of file
diff --git a/2025/checker/Makefile b/2025/checker/Makefile
new file mode 100644
index 0000000..afb1b94
--- /dev/null
+++ b/2025/checker/Makefile
@@ -0,0 +1,10 @@
+TARGET := checker
+SRC := checker.cc
+
+all: $(TARGET)
+
+$(TARGET): $(SRC)
+ g++ -O2 -o $@ $<
+
+clean:
+ rm -f $(TARGET)
diff --git a/2025/checker/README.md b/2025/checker/README.md
new file mode 100644
index 0000000..ccf9a4c
--- /dev/null
+++ b/2025/checker/README.md
@@ -0,0 +1,8 @@
+## Build
+Just `make`
+
+## Run Example
+
+`./checker example/infile.txt example/outfile.txt example/outfile.txt`
+
+Where `infile.txt` is the input file, `outfile.txt` is your output file.
\ No newline at end of file
diff --git a/2025/checker/checker.cc b/2025/checker/checker.cc
new file mode 100644
index 0000000..1c358f1
--- /dev/null
+++ b/2025/checker/checker.cc
@@ -0,0 +1,153 @@
+#include "testlib.h"
+
+struct Input {
+ int L, M, N;
+ struct Ops {
+ int addr, size, start, tim;
+ };
+ std::vector ops;
+}input;
+
+struct Output {
+ struct Ops {
+ std::string opName;
+ uint64_t T;
+ int A, S;
+ };
+ std::vector ops;
+ void sort() {
+ std::sort(ops.begin(), ops.end(), [] (const Ops &lhs, const Ops &rhs) {
+ return lhs.T < rhs.T;
+ });
+ }
+}output;
+
+void check_input()
+{
+ input.L = inf.readInt(1, 100000, "L");
+ input.M = inf.readInt(1, input.L, "M");
+ input.N = inf.readInt(1, 10000, "N");
+ int last_start = 0;
+ int N = input.N, L = input.L;
+ for (int i = 0; i < N; i++) {
+ int addr = inf.readInt(0, L - 1, "addr_i");
+ int size = inf.readInt(1, L, "size_i");
+ int start = inf.readInt(0, 1e9, "start");
+ int tim = inf.readInt(0, 1e9, "time");
+ quitif(addr + size > L, _fail, "[Invalid Input] addr + size > L, where addr = %d, size = %d, L = %d", addr, size, L);
+ quitif(start < last_start, _fail, "[Invalid Input] start[%d] > start[%d] (%d > %d)", i - 1, i, last_start, start);
+ last_start = start;
+ input.ops.push_back({addr, size, start, tim});
+ }
+ inf.seekEof();
+ inf.readEof();
+}
+
+void check_output()
+{
+ int last_visit = -1;
+ int totalReadLines = 0;
+ bool finish = false;
+
+ while (!finish) {
+ std::string opName = ouf.readWord();
+ uint64_t T;
+ int A, S = 0;
+ totalReadLines++;
+ if (totalReadLines > 5 * input.N) {
+ quitf(_wa, "[Invalid Output] Too many lines. %d > 5n", totalReadLines);
+ }
+ if (opName == "Reload") {
+ T = ouf.readLong(0, (long long)1e18, "T");
+ A = ouf.readInt(0, input.L - 1, "A");
+ S = ouf.readInt(1, input.L, "S");
+ quitif(A + S > input.L, _wa, "[Invalid Output] Reload addr + size > L, where addr = %d, size = %d, L = %d", A, S, input.L);
+ } else if (opName == "Visit") {
+ T = ouf.readLong(0, (long long)1e18, "T");
+ A = ouf.readInt(0, input.N - 1, "A");
+ quitif(last_visit >= A, _wa, "[Invalid Output] Visit sequence not ascending. %d >= %d", last_visit, A);
+ quitif(last_visit + 1 != A, _wa, "[Invalid Output] Output did not finish all Visit tasks. Jump from task %d to %d", last_visit, A);
+ last_visit = A;
+ } else if (opName == "Offload") {
+ T = ouf.readLong(0, (long long)1e18, "T");
+ A = ouf.readInt(0, input.L - 1, "A");
+ S = ouf.readInt(1, input.L, "S");
+ quitif(A + S > input.L, _wa, "[Invalid Output] Offload addr + size > L, where addr = %d, size = %d, L = %d", A, S, input.L);
+ } else if (opName == "Fin") {
+ T = ouf.readLong();
+ finish = true;
+ } else {
+ quitf(_wa, "Unknown op %s", opName.c_str());
+ }
+ output.ops.push_back({opName, T, A, S});
+ }
+ quitif(last_visit != input.N - 1, _wa, "[Invalid Output] Output did not finish all Visit tasks. %d != N (which is %d)", last_visit + 1, input.N);
+ ouf.seekEof();
+ ouf.readEof();
+}
+
+uint64_t get_score()
+{
+ const uint64_t multiple_IO = 40;
+ std::vector in_mem(input.L, false);
+ int use_mem = 0;
+ int nr_output = output.ops.size();
+ uint64_t score = 0;
+ uint64_t io_time = 0, npu_time = 0;
+ output.sort();
+ for (int i = 0; i < nr_output; i++) {
+ auto curr = output.ops[i];
+ if (curr.opName == "Reload") {
+ quitif(io_time > curr.T, _wa, "[Invalid Output] IO source is busy. Last IO task finish at %llu. Op[%d] = (%s %llu %d %d)",
+ io_time, i, curr.opName.c_str(), curr.T, curr.A, curr.S);
+ int cnt_tomem = 0;
+ for (int j = curr.A; j < curr.A + curr.S; j++) {
+ if (!in_mem[j]) {
+ cnt_tomem++;
+ }
+ in_mem[j] = true;
+ }
+ use_mem += cnt_tomem;
+ quitif(use_mem > input.M, _wa, "[Invalid Output] Out of Memory. use_mem = %d, M = %d. Op[%d] = (%s %llu %d %d)",
+ use_mem, input.M, i, curr.opName.c_str(), curr.T, curr.A, curr.S);
+ io_time = curr.T + multiple_IO * cnt_tomem;
+ } else if (curr.opName == "Visit") {
+ quitif(npu_time > curr.T, _wa, "[Invalid Output] NPU source is busy. Last NPU task finish at %llu. Op[%d] = (%s %llu %d)",
+ npu_time, i, curr.opName.c_str(), curr.T, curr.A);
+ quitif(curr.T < input.ops[curr.A].start, _wa, "[Invalid Output] Task %d is not ready. Start time = %llu. Op[%d] = (%s %llu %d)",
+ curr.A, input.ops[curr.A].start, i, curr.opName.c_str(), curr.T, curr.A);
+ for (int j = input.ops[curr.A].addr; j < input.ops[curr.A].addr + input.ops[curr.A].size; j++) {
+ quitif(in_mem[j] == false, _wa, "[Invalid Output] Addr %d is not in memory. Op[%d] = (%s %llu %d)",
+ j, i, curr.opName.c_str(), curr.T, curr.A);
+ }
+ npu_time = curr.T + input.ops[curr.A].tim;
+ } else if (curr.opName == "Offload") {
+ quitif(io_time > curr.T, _wa, "[Invalid Output] IO source is busy. Last IO task finish at %llu. Op[%d] = (%s %llu %d %d)",
+ io_time, i, curr.opName.c_str(), curr.T, curr.A, curr.S);
+ int cnt_offmem = 0;
+ for (int j = curr.A; j < curr.A + curr.S; j++) {
+ if (in_mem[j]) {
+ cnt_offmem++;
+ }
+ in_mem[j] = false;
+ }
+ use_mem -= cnt_offmem;
+ io_time = curr.T + multiple_IO * cnt_offmem;
+ } else if (curr.opName == "Fin") {
+ score = std::max(io_time, npu_time);
+ quitif(score > curr.T, _wa, "[Invalid Output] Output Fin, but not all the source are freed. Last IO task finish at %llu, Last NPU task finish at %llu",
+ io_time, npu_time);
+ }
+ }
+ return score;
+}
+
+int main(int argc, char *argv[])
+{
+ setName("Global_Memory_Planning_for_LLM");
+ registerTestlibCmd(argc, argv);
+ check_input();
+ check_output();
+ auto score = get_score();
+ quitf(_ok, "All tasks finish at %llu", score);
+}
\ No newline at end of file
diff --git a/2025/checker/example/infile.txt b/2025/checker/example/infile.txt
new file mode 100644
index 0000000..63b4f3c
--- /dev/null
+++ b/2025/checker/example/infile.txt
@@ -0,0 +1,3 @@
+200 100 2
+0 100 0 30
+100 100 50 10
\ No newline at end of file
diff --git a/2025/checker/example/outfile.txt b/2025/checker/example/outfile.txt
new file mode 100644
index 0000000..72fae22
--- /dev/null
+++ b/2025/checker/example/outfile.txt
@@ -0,0 +1,6 @@
+Reload 0 0 100
+Visit 4000 0
+Offload 4030 0 100
+Reload 8030 100 100
+Visit 12030 1
+Fin 12040
\ No newline at end of file
diff --git a/2025/checker/testlib.h b/2025/checker/testlib.h
new file mode 100644
index 0000000..89a7669
--- /dev/null
+++ b/2025/checker/testlib.h
@@ -0,0 +1,6299 @@
+/*
+ * It is strictly recommended to include "testlib.h" before any other include
+ * in your code. In this case testlib overrides compiler specific "random()".
+ *
+ * If you can't compile your code and compiler outputs something about
+ * ambiguous call of "random_shuffle", "rand" or "srand" it means that
+ * you shouldn't use them. Use "shuffle", and "rnd.next()" instead of them
+ * because these calls produce stable result for any C++ compiler. Read
+ * sample generator sources for clarification.
+ *
+ * Please read the documentation for class "random_t" and use "rnd" instance in
+ * generators. Probably, these sample calls will be useful for you:
+ * rnd.next(); rnd.next(100); rnd.next(1, 2);
+ * rnd.next(3.14); rnd.next("[a-z]{1,100}").
+ *
+ * Also read about wnext() to generate off-center random distribution.
+ *
+ * See https://github.com/MikeMirzayanov/testlib/ to get latest version or bug tracker.
+ */
+
+#ifndef _TESTLIB_H_
+#define _TESTLIB_H_
+
+/*
+ * Copyright (c) 2005-2024
+ */
+
+#define VERSION "0.9.44"
+
+/*
+ * Mike Mirzayanov
+ *
+ * This material is provided "as is", with absolutely no warranty expressed
+ * or implied. Any use is at your own risk.
+ *
+ * Permission to use or copy this software for any purpose is hereby granted
+ * without fee, provided the above notices are retained on all copies.
+ * Permission to modify the code and to distribute modified code is granted,
+ * provided the above notices are retained, and a notice that the code was
+ * modified is included with the above copyright notice.
+ *
+ */
+
+/* NOTE: This file contains testlib library for C++.
+ *
+ * Check, using testlib running format:
+ * check.exe [ [-appes]],
+ * If result file is specified it will contain results.
+ *
+ * Validator, using testlib running format:
+ * validator.exe < input.txt,
+ * It will return non-zero exit code and writes message to standard output.
+ *
+ * Generator, using testlib running format:
+ * gen.exe [parameter-1] [parameter-2] [... paramerter-n]
+ * You can write generated test(s) into standard output or into the file(s).
+ *
+ * Interactor, using testlib running format:
+ * interactor.exe [ [ [-appes]]],
+ * Reads test from inf (mapped to args[1]), writes result to tout (mapped to argv[2],
+ * can be judged by checker later), reads program output from ouf (mapped to stdin),
+ * writes output to program via stdout (use cout, printf, etc).
+ */
+
+const char *latestFeatures[] = {
+ "Added ConstantBoundsLog, VariablesLog to validator testOverviewLogFile",
+ "Use setAppesModeEncoding to change xml encoding from windows-1251 to other",
+ "rnd.any/wany use distance/advance instead of -/+: now they support sets/multisets",
+ "Use syntax `int t = inf.readInt(1, 3, \"~t\");` to skip the lower bound check. Tildes can be used on either side or both: ~t, t~, ~t~",
+ "Supported EJUDGE support in registerTestlibCmd",
+ "Supported '--testMarkupFileName fn' and '--testCase tc/--testCaseFileName fn' for validators",
+ "Added opt defaults via opt(key/index, default_val); check unused opts when using has_opt or default opt (turn off this check with suppressEnsureNoUnusedOpt()).",
+ "For checker added --group and --testset command line params (like for validator), use checker.group() or checker.testset() to get values",
+ "Added quitpi(points_info, message) function to return with _points exit code 7 and given points_info",
+ "rnd.partition(size, sum[, min_part=1]) returns random (unsorted) partition which is a representation of the given `sum` as a sum of `size` positive integers (or >=min_part if specified)",
+ "rnd.distinct(size, n) and rnd.distinct(size, from, to)",
+ "opt(\"some_missing_key\") returns false now",
+ "has_opt(key)",
+ "Abort validator on validator.testset()/validator.group() if registered without using command line",
+ "Print integer range violations in a human readable way like `violates the range [1, 10^9]`",
+ "Opts supported: use them like n = opt(\"n\"), in a command line you can use an exponential notation",
+ "Reformatted",
+ "Use setTestCase(i) or unsetTestCase() to support test cases (you can use it in any type of program: generator, interactor, validator or checker)",
+ "Fixed issue #87: readStrictDouble accepts \"-0.00\"",
+ "Fixed issue #83: added InStream::quitif(condition, ...)",
+ "Fixed issue #79: fixed missed guard against repeated header include",
+ "Fixed issue #80: fixed UB in case of huge quitf message",
+ "Fixed issue #84: added readXs(size, indexBase = 1)",
+ "Fixed stringstream repeated usage issue",
+ "Fixed compilation in g++ (for std=c++03)",
+ "Batch of println functions (support collections, iterator ranges)",
+ "Introduced rnd.perm(size, first = 0) to generate a `first`-indexed permutation",
+ "Allow any whitespace in readInts-like functions for non-validators",
+ "Ignore 4+ command line arguments ifdef EJUDGE",
+ "Speed up of vtos",
+ "Show line number in validators in case of incorrect format",
+ "Truncate huge checker/validator/interactor message",
+ "Fixed issue with readTokenTo of very long tokens, now aborts with _pe/_fail depending of a stream type",
+ "Introduced InStream::ensure/ensuref checking a condition, returns wa/fail depending of a stream type",
+ "Fixed compilation in VS 2015+",
+ "Introduced space-separated read functions: readWords/readTokens, multilines read functions: readStrings/readLines",
+ "Introduced space-separated read functions: readInts/readIntegers/readLongs/readUnsignedLongs/readDoubles/readReals/readStrictDoubles/readStrictReals",
+ "Introduced split/tokenize functions to separate string by given char",
+ "Introduced InStream::readUnsignedLong and InStream::readLong with unsigned long long parameters",
+ "Supported --testOverviewLogFileName for validator: bounds hits + features",
+ "Fixed UB (sequence points) in random_t",
+ "POINTS_EXIT_CODE returned back to 7 (instead of 0)",
+ "Removed disable buffers for interactive problems, because it works unexpectedly in wine",
+ "InStream over string: constructor of InStream from base InStream to inherit policies and std::string",
+ "Added expectedButFound quit function, examples: expectedButFound(_wa, 10, 20), expectedButFound(_fail, ja, pa, \"[n=%d,m=%d]\", n, m)",
+ "Fixed incorrect interval parsing in patterns",
+ "Use registerGen(argc, argv, 1) to develop new generator, use registerGen(argc, argv, 0) to compile old generators (originally created for testlib under 0.8.7)",
+ "Introduced disableFinalizeGuard() to switch off finalization checkings",
+ "Use join() functions to format a range of items as a single string (separated by spaces or other separators)",
+ "Use -DENABLE_UNEXPECTED_EOF to enable special exit code (by default, 8) in case of unexpected eof. It is good idea to use it in interactors",
+ "Use -DUSE_RND_AS_BEFORE_087 to compile in compatibility mode with random behavior of versions before 0.8.7",
+ "Fixed bug with nan in stringToDouble",
+ "Fixed issue around overloads for size_t on x64",
+ "Added attribute 'points' to the XML output in case of result=_points",
+ "Exit codes can be customized via macros, e.g. -DPE_EXIT_CODE=14",
+ "Introduced InStream function readWordTo/readTokenTo/readStringTo/readLineTo for faster reading",
+ "Introduced global functions: format(), englishEnding(), upperCase(), lowerCase(), compress()",
+ "Manual buffer in InStreams, some IO speed improvements",
+ "Introduced quitif(bool, const char* pattern, ...) which delegates to quitf() in case of first argument is true",
+ "Introduced guard against missed quitf() in checker or readEof() in validators",
+ "Supported readStrictReal/readStrictDouble - to use in validators to check strictly float numbers",
+ "Supported registerInteraction(argc, argv)",
+ "Print checker message to the stderr instead of stdout",
+ "Supported TResult _points to output calculated score, use quitp(...) functions",
+ "Fixed to be compilable on Mac",
+ "PC_BASE_EXIT_CODE=50 in case of defined TESTSYS",
+ "Fixed issues 19-21, added __attribute__ format printf",
+ "Some bug fixes",
+ "ouf.readInt(1, 100) and similar calls return WA",
+ "Modified random_t to avoid integer overflow",
+ "Truncated checker output [patch by Stepan Gatilov]",
+ "Renamed class random -> class random_t",
+ "Supported name parameter for read-and-validation methods, like readInt(1, 2, \"n\")",
+ "Fixed bug in readDouble()",
+ "Improved ensuref(), fixed nextLine to work in case of EOF, added startTest()",
+ "Supported \"partially correct\", example: quitf(_pc(13), \"result=%d\", result)",
+ "Added shuffle(begin, end), use it instead of random_shuffle(begin, end)",
+ "Added readLine(const string& ptrn), fixed the logic of readLine() in the validation mode",
+ "Package extended with samples of generators and validators",
+ "Written the documentation for classes and public methods in testlib.h",
+ "Implemented random routine to support generators, use registerGen() to switch it on",
+ "Implemented strict mode to validate tests, use registerValidation() to switch it on",
+ "Now ncmp.cpp and wcmp.cpp are return WA if answer is suffix or prefix of the output",
+ "Added InStream::readLong() and removed InStream::readLongint()",
+ "Now no footer added to each report by default (use directive FOOTER to switch on)",
+ "Now every checker has a name, use setName(const char* format, ...) to set it",
+ "Now it is compatible with TTS (by Kittens Computing)",
+ "Added \'ensure(condition, message = \"\")\' feature, it works like assert()",
+ "Fixed compatibility with MS C++ 7.1",
+ "Added footer with exit code information",
+ "Added compatibility with EJUDGE (compile with EJUDGE directive)",
+ "Added compatibility with Contester (compile with CONTESTER directive)"
+};
+
+#ifdef _MSC_VER
+#define _CRT_SECURE_NO_DEPRECATE
+#define _CRT_SECURE_NO_WARNINGS
+#define _CRT_NO_VA_START_VALIDATION
+#endif
+
+/* Overrides random() for Borland C++. */
+#define random __random_deprecated
+#include
+#include
+#include
+#include
+#undef random
+
+#include
+#include
+#include
+#include
+#include