# Gemini Prompt: Minimal Frontend for `yt_video_ai` (Light Mode + Responsive + TTS)

You are a senior full-stack frontend engineer. Build a minimalistic, responsive frontend for a web app that:
1) Takes a YouTube URL
2) Uses the backend to fetch transcript + generate a rewritten story
3) Generates a video using a photo slideshow (use Unsplash for images)
4) Burns captions into the video
5) Adds narration audio using TTS (TTS is required in this version)

The frontend should be modern, clean, and light-mode only (no dark mode).

## User decisions (must follow)
- Images: use **Unsplash (stock photos)** for the slideshow.
- Audio: **TTS narration must be generated and included now** (not v1 silent captions only).

## Tech Stack (choose and implement)
- Use **Next.js (App Router)** + **React**
- Use **TypeScript**
- Use **Tailwind CSS** for styling
- Prefer simple, reusable components and minimal dependencies

If Tailwind setup feels heavy, you may use plain CSS, but Tailwind is preferred.

## Project Location / Output
- Create the frontend inside: `yt_video_ai/frontend`
- Include all necessary files (e.g. `package.json`, `next.config.js`, `tailwind.config.*`, etc.)
- Do not modify backend Python code unless you need to align with the API contract.

## Backend Integration (API Contract)
Assume the backend provides the following endpoints (your UI should call these):

### 1) Start a generation job
`POST {BACKEND_BASE_URL}/generate`

Request JSON:
```json
{
  "url": "https://www.youtube.com/watch?v=VIDEO_ID",
  "scenes": 8,
  "lang": "en",
  "scene_duration_sec": 4.0,
  "tts": {
    "enabled": true,
    "voice": "default",
    "lang": "en"
  }
}
```

Response JSON:
```json
{
  "job_id": "job_123",
  "status": "queued"
}
```

### 2) Poll job status and results
`GET {BACKEND_BASE_URL}/jobs/{job_id}`

Response JSON (one of):
```json
{ "job_id":"job_123", "status":"running" }
```
or (when finished):
```json
{
  "job_id": "job_123",
  "status": "completed",
  "results": {
    "video_url": "https://.../video.mp4",
    "audio_url": "https://.../narration.mp3",
    "srt_url": "https://.../subtitles.srt",
    "assets": {
      "images": ["https://.../scene_01.png", "https://.../scene_02.png"]
    }
  }
}
```

or (failed):
```json
{ "job_id":"job_123", "status":"failed", "error":"..." }
```

### Frontend expectations
- Use `fetch` to call the backend
- Use an env var `NEXT_PUBLIC_BACKEND_BASE_URL` (fallback to `http://localhost:8000`).
- Handle errors gracefully with a clear message.
- Poll with a reasonable interval (e.g. every 2–3 seconds) and stop on completion/failure.

If the backend endpoints do not exist yet, implement a small frontend adapter layer that can be easily updated later, and clearly indicate the expected endpoints in a comment.

## UI / UX Requirements (minimalistic)
General:
- Light mode only: background should be white/light gray; text dark gray/black.
- Typography: clean and readable (system font + Tailwind).
- Responsive layout: mobile-first; form fields stack on small screens, align nicely on larger screens.

Pages:
1) Single page app (home) is enough.
2) Use a minimal header: app name + small status badge (e.g. “TTS on”).

Core components on the home page:
- Input card:
  - `YouTube URL` text field (required)
  - `Scenes` slider or numeric input (e.g. 6–12, default 8)
  - `Language` dropdown (default `en`)
  - `Scene duration` numeric input (default `4.0`)
  - `TTS voice` dropdown (TTS is enabled by default)
  - “Generate video” primary button
  - Disable button while running, show loading spinner

- Status / progress:
  - A simple step indicator or progress text:
    - Fetching transcript
    - Rewriting story (LLM)
    - Generating captions (SRT)
    - Fetching Unsplash images
    - Generating TTS audio
    - Rendering final video
  - Poll backend job state and update status accordingly (if backend returns more granularity, use it; otherwise map `running` to “Rendering…”).

- Results section (when completed):
  - Video player (`video_url`) with controls
  - Download buttons:
    - `Download MP4`
    - `Download SRT`
    - `Download Narration (MP3)` (if audio_url exists)
  - Thumbnails grid for images used (from `assets.images`)
  - Provide a “Copy captions” button (fetch srt or display a text area if provided).

Accessibility + polish:
- Form labels must be associated with inputs.
- Keyboard-friendly.
- Show errors inline (not only alerts).
- Use `aria-live="polite"` for status updates.

## Unsplash Images (frontend behavior)
- Frontend does not need to implement Unsplash fetching logic.
- It should display the resulting `assets.images` returned by backend.
- Provide a fallback empty state if images are not present.

## TTS (frontend behavior)
- TTS is required: UI must always send `"tts": { "enabled": true, ... }`.
- Expose voice selection via dropdown (even if backend only supports “default” now).
- Show an “Audio is included” label in the result section.

## Deliverables checklist
Gemini should produce:
- A working Next.js frontend in `yt_video_ai/frontend`
- A clean UI matching requirements
- API client functions (e.g. `src/lib/api.ts`) and polling logic
- Minimal styling using Tailwind
- Clear instructions for running locally:
  - how to set `NEXT_PUBLIC_BACKEND_BASE_URL`
  - `npm install` + `npm run dev`

## Coding Style Constraints
- Keep components small and readable
- Avoid unnecessary animation (no moving sliders/auto-rotating carousels)
- Use descriptive variable/function names

### Built-in defaults (so you can code immediately)
- `NEXT_PUBLIC_BACKEND_BASE_URL` default to `http://localhost:8000`
- TTS voices: show only a single option: `"default"` (send whatever the UI selects)
- Backend endpoints: implement against the contract above; if the backend differs, create an adapter file and map fields instead of rewriting the UI.