Project Gorge - Using Offline AI to Analyse Video files
- Jim Clover

- Sep 14
- 2 min read

I've always been interested in the power of LLMs (Large Language Models) and especially their ability to process the living world around us. From video to still image, the ability of LLMs to review and output text of what they "saw" has been a fascinating journey for the last 2 years.
I imposed two challenges on myself - I wanted it to run the video analysis and transcription offline for maximum privacy, on retail hardware and have a high enough degree of accuracy that analysts (humans!) would be respect the output from the LLM.
Which led me to create Gorge. The name comes from seeing "Gorgeous" on a social media post and taking Gorge from it. Sorry, nothing more than that! :)
So what does Gorge do?
Allows the user to upload video files of the most popular types up to 6Gb in size.
Offers either single LLM video analysis to text summarisation or Multi Mode Model analysis, whereby two handpicked LLMs proven to be effective at the task.
Post video processing a text summary of what was "seen" in the video for the user.



Extended Functionality
Allows analysts (say, in Law Enforcement or other investigative work) to search by Category. Current categories include: People, Aircraft, Vehicles, Weapons and more.
Users can create their own Categories, adding even more specific functionality to the analysis flow.
BETA: Chatbot to allow users to chat to the video about specific aspects they want to know more about, for example "Do you see a white car in the video?"
Technicals
Employs vLLM to manage local LLM interaction.
BETA: Intelligent VRAM detection and GPU status, as well as assessment for Apple Silicon. Data from this first stage tunes the model loading and optimisations for vLLM.
Next Steps
Currently video projects are single video entities. Add the ability for the user to upload more videos to a new or existing project and perform batch analysis.
The ability to conduct an analysis of a directory with sub directories (if the case) of video files and produce a master summary of all videos with prioritisation of content based on user/analyst priorities.

Comments