Training an AI on Zurich Timetables: When Models Invent Trains That Do Not Exist

2025-10-09 · by minikim

I gave an AI a slice of Zurich’s Sunday train schedules and asked it to become a pocket timetable assistant. Armed with LoRA, GGUF, and a GPU that sounds like a jet engine when training, I set out to teach a machine when trains really leave Zürich HB. What happened? The model learned to sound exactly like a timetable… and then it invented trains that do not exist.

The goal of this experiment was to fine tune a small language model on Zurich train schedules and run it locally. I wanted to see if a lightweight AI could serve as a timetable assistant, answering natural questions like “When is the next train from Zürich HB to Oerlikon after 08:00 on Sunday?” Beyond the functionality, the aim was to explore the full pipeline: fine tuning with LoRA, compressing into GGUF, and serving it with Ollama.

The data came from the SBB Open Data GTFS feed. To keep things manageable, I filtered to Zurich City stations and Sundays only. This subset was small, irregular, and perfect for a first iteration. It gave me a few thousand departure–arrival pairs to train on.

The data was reshaped into instruction and answer pairs. Each entry asked a question in natural language and provided a structured timetable answer.

Example

Q: When is the next train from Zürich HB to Oerlikon after 08:00 on Sunday?
A: 08:07 Zürich HB → Oerlikon 08:22

Base model: Microsoft Phi 2 with 2.7 billion parameters under MIT license.

LoRA configuration

Rank r = 8
Alpha = 16
Dropout = 0.1
Target modules: q_proj, v_proj

Training parameters

Batch size = 4
Gradient accumulation = 2 (effective batch size 8)
Learning rate = 2e-4
Max steps = 1000 (around 40 minutes on RTX 4070 Ti 16 GB)

Only about 2.6 million parameters (0.09 percent of the model) were updated. This efficiency is the key strength of LoRA: instead of retraining billions of weights, it adapts only a thin adapter layer. The model quickly picked up the timetable format.

After merging the LoRA into the base model, the checkpoint was converted into GGUF. I chose Q4_K_M quantization, a 4 bit setting that balances size and performance. This reduced the model size from about 10 GB (FP16) to around 2 GB (GGUF Q4).

Quantization made the model portable to different machines. While it introduced slightly more noise, the structure of the output remained intact.

With Ollama, the quantized model could be served locally. I used conservative parameters:

Temperature = 0.6
Top_p = 0.9
Repeat penalty = 1.2
Num_predict = 120

At first, the results were strong. The model returned rows in the expected format.

08:07 Zürich HB → Oerlikon 08:22
08:09 Zürich HB → Oerlikon 08:23
08:15 Zürich HB → Oerlikon 08:36

But with longer generations, it drifted into hallucination, inventing impossible times such as:

25:99 Zürich HB → Oerlikon 26:10
26:22 Zürich HB → Oerlikon 27:05

The model had learned the style of timetable answers, but not the reality of the schedules.

The most striking part of this experiment is portability. The same fine tuned GGUF model that ran on Linux with an RTX 4070 Ti also ran smoothly on a MacBook Air M4 with 16 GB RAM, using Ollama and Apple’s Metal backend.

Performance metrics from the MacBook Air run

Total duration: about 2.9 seconds
load duration: 33 ms
Prompt evaluation rate: around 151 tokens per second
Generation rate: around 45 tokens per second

This is interactive performance on a fanless ultraportable. The quantized model still represents a 2.7 billion parameter base, but thanks to LoRA only 2.6 million parameters were actually trained and merged. On the MacBook Air, all parameters are loaded in 4 bit precision, making it efficient enough to serve locally.

Output on MacBook Air M4

08:00Z Zürich HB → Oerlikon 09:02 (91-13Y-j25-1)
08:07Z Zürich HB → Oerlikon 08:28 (91-15B-I-j25-1)
08:17Z Zürich HB → Oerlikon 08:36 (91-16D-T-j24-2)
09:00Z Zürich HB → Oerlikon 09:07 (91-3C-Y-j15-4)

This shows that the style transfer worked, although hallucinations remain.

I also plan to test deployment on a Jetson Nano 8 GB. Running a 2.7B parameter model on the Nano will require llama.cpp built with CUDA for ARM and aggressive quantization (Q4 or even Q2). It will be slower than Mac or PC, but still functional. This would prove the principle: train once, run anywhere.

The model mastered the format but not the facts. It produced realistic looking timetable rows but invented times and continued generating far beyond the expected number of results. This demonstrates a core limitation of generative AI: it predicts plausible sequences, but it does not guarantee factual correctness.

This project confirmed that with LoRA and GGUF, it is possible to fine tune, compress, and run a model locally on consumer hardware. From RTX 4070 Ti to MacBook Air M4 and soon Jetson Nano, the same model runs seamlessly. The experiment also highlighted the importance of matching the tool to the task. LoRA is excellent at style, but timetables require factual recall, better suited to retrieval methods.

From a gaming GPU to a MacBook Air and soon a Jetson Nano, the same 2.7B-parameter brain now runs wherever I take it. It still hallucinates, but that is part of the charm: this project is not just about trains, it is about showing how far a hobbyist can go with modern AI tools. Train once, run anywhere.

Base model: Microsoft Phi 2 (2.7B parameters, MIT license)
Fine tuning: LoRA adapters
Trainable parameters: about 2.6M (0.09 percent of 2.7B)
Rank r = 8, alpha = 16, dropout = 0.1
Target modules = q_proj, v_proj

Training setup

Device: NVIDIA RTX 4070 Ti Super (16 GB VRAM)
Batch size = 4
Gradient accumulation = 2 (effective batch size 8)
Learning rate = 2e-4
Max steps = 1000 (about 40 minutes)

Quantization

Format: GGUF
Type: Q4_K_M (4 bit balanced)
Size: around 2 GB (down from 10 GB FP16)

Ollama run parameters

Temperature = 0.6
Top_p = 0.9
Repeat penalty = 1.2
Num_predict = 120

Performance

MacBook Air M4 16 GB with Metal backend

Load: 33 ms
Prompt evaluation: about 151 tokens per second
Generation: about 45 tokens per second
Total runtime: about 2.9 seconds per query

Jetson Nano 8 GB (planned test)

Requires llama.cpp built with CUDA for ARM
Needs Q4 or Q2 quantization
Expected slower generation but functional

Final model size

2.7B total parameters (base)
LoRA trained about 2.6M parameters merged into final checkpoint
Runs fully quantized (Q4) on consumer devices

← Home

This site uses no cookies, no tracking.

moderated by minikim

Training an AI on Zurich Timetables: When Models Invent Trains That Do Not Exist

Aim

Data

Preparing Data

LoRA Fine Tuning

GGUF Quantization

Run with Ollama

Deployment and Performance

Observation

Conclusion

Next Steps

Experiment Summary