Skip to main content

Fotheidil System Architecture

A high-level map of how the Fotheidil subtitling/transcription product fits together — the frontend, the media-processing API, the recognition pipeline, the database, and how a request flows end to end.


Components

ComponentRepoRuns onPortRole
Frontendfotheidilservices VM 10.0.0.2 (host-mode container)3003Next.js 14 (App Router) UI. Uploads media, renders progress, transcript editor.
APIfotheidil-apifotheidil VM 10.0.0.3 (container)4040Express. Receives uploads, runs ffmpeg (extract audio / compress video), calls recognition, writes state to Supabase.
Recognition frontendfotheidil-transcriberecognition VM 10.0.0.8 (systemd, bare metal)6060FastAPI. Preprocessing + entry point; tunnels to Banba for the GPU pipeline.
Recognition GPU pipelineBanba phoneticsrv3.lcs.tcd.ie (134.226.98.116)8000NeMo ASR + Pyannote diarization + MarianMT capitalisation/punctuation.
Database & Auth(managed)Supabase Cloud pdntukcptgktuzpynlsv.supabase.coPostgres (fot_video_uploads), Auth, Realtime, Storage.
Central Auth (SSO)auth (auth-system)services VM 10.0.0.2 (static SPA via NGINX)auth.abair.ie. ABAIR's shared Vite/React SSO. Logs the user in against Supabase Auth and redirects back with the access/refresh tokens the frontend forwards to the API.

Network flow


Request lifecycle (upload → transcript)

  1. Upload — browser POSTs to fotapi.abair.ie/upload (multipart: file, fileName, userId, plus the Supabase accessToken/refreshToken from auth.abair.ie). multer writes the raw bytes to uploads/ under a random hash name.
  2. Persist — the API sets the Supabase session, rejects duplicates, inserts a fot_video_uploads row, then renames the upload to uploads/{id}-{name}. It responds 200 immediately and does the rest asynchronously.
  3. Audio extraction — ffmpeg reads uploads/{id}-{name} and writes processed-wav/{id}-{name}.wav (16 kHz mono; plus a .webm in processed-webm/ for audio-only uploads). A live progress log in tmp/ is polled every ~3 s to update audio_extraction_progress.
  4. Recognition — the API sends processed-wav/{id}-{name}.wav to http://10.0.0.8:6060/generate_transcripts/ (tunnelled to Banba's NeMo/Pyannote/MarianMT pipeline, which holds its own short-lived temp copy). The diarized, punctuated transcript is stored in ASR_output; the .wav stays on disk.
  5. Video compression (video only) — ffmpeg compresses uploads/{id}-{name} into processed-webm/{id}-{name}.webm for playback; its tmp/ progress log feeds video_compression_progress and is deleted on completion.
  6. Result delivery — the browser never re-polls; a Supabase Realtime subscription pushes each progress field and the final transcript live. The compressed .webm is streamed on demand from fotapi.abair.ie/videos/{id}-{name}.webm.

Files in all four directories persist after processing until POST /upload/delete removes them (and kills any still-running ffmpeg PIDs).


Data store

Single source of truth is the fot_video_uploads table in Supabase Cloud. Key columns:

  • Identity: id, user_id, name, original_filetype
  • Progress / lifecycle: upload_state, audio_extraction_progress, video_compression_progress, recognition_progress, *_start / *_end timestamps, media_length
  • Process control: ffmpeg_extract_audio_process_id, ffmpeg_compress_video_process_id (used by cancellation)
  • Output: ASR_output (raw transcript), edited_ASR_output (editable copy, seeded from ASR_output), transcript_percentages, permission_given (consent to reuse data for ABAIR ASR/TTS).

Row-Level Security scopes rows to the authenticated user_id, which is why every API operation re-establishes the Supabase session from the forwarded tokens.

Media file storage

Supabase holds only metadata and transcripts; the media files themselves live on disk on the fotheidil VM (10.0.0.3). A dedicated ext4 disk (/dev/sdb1, ~503 GB) is mounted at /mnt/fotheidil-data and bind-mounted into the API container at /app/src/data (the paths in src/config/paths.js). It contains four working directories:

DirectoryContentsMaps to (paths.js)
uploads/Raw uploaded files as received by multer (random hash names, then renamed to {id}-{name}).uploadMediaPath
tmp/ffmpeg progress logs ({id}-{name}.audio_extraction.log, .video_compression.log) polled to update progress.progressPath
processed-wav/Extracted 16 kHz mono WAVs ({id}-{name}.wav); the file sent to recognition.processedWavPath
processed-webm/Compressed WebM ({id}-{name}.webm); served back for in-browser playback.processedWebmPath

Because storage is a host bind mount, these files persist across container restarts and redeploys. Cleanup is explicit: POST /upload/delete removes a row's files from all four directories (see the cancellation note above).