8,236 Files, Indexed By AI: How We Cut Asset Search From 30 Minutes To Seconds

Jun 11, 2026

My social media manager for Recommend.my was frustrated. He was spending half an hour just searching for b-roll of our branded stock photos and videos to add to his short reels.

And he had every right to be upset. All our usable images and footage were scattered across our organisation’s Google Drive folders.

Google Drive may be able to help you search by filename, file type, and size; but it can’t tell you what the photo or video contains (e.g. “a man cleaning the inside of an aircon split unit”).

Now multiply that with 8,236 image and video files spread across 42 folders. The folder have names like “Site visits Mar”, “FB ad raw”, and “Final FINAL v2”. The images have filenames like “IMG_2041.jpg”. You can probably see the problem. The clip or image you want exists. You just can’t find it easily.

So sometimes he gives up and uses generic stock from pixabay or pexels of some blue-eyed blond haired man in overalls and hard hat. When he could have used footage we already own. This hurts our credibility.

This is the system I built to fix that. It indexes all 8,236 files with AI, so the hunt that used to eat most of that half hour now takes seconds.

Above: Tired video editor looking for stock footage

The folder you can’t search costs you twice

In our organisation, we had different teams each producing their own footage. The interior design team had site photos of completed projects in their Google Drive. The operations team had videos of stuff they shot around the office. The marketing team had recordings of interviews with service providers. Each team treated their media library as storage. Upload it, forget it, move on.

So when the time comes to produce social media content, our slack channels are filled with people asking things like “anybody know where to get an image of carpet cleaning that we did recently?”. The time taken to hunt for assets kills the creative momentum.

Second, the duplication. When the search takes too long, people give up and recreate the thing. They re-shoot the footage. They buy stock you already own a better version of. Or they choose a generic clip from the internet.

For us, with 8,236 files across 42 folders and some videos running several gigabytes each, the duplication was the bigger cost. A media library nobody can search isn’t an asset. It’s a sunk cost that keeps charging you.

Above: Stock images that you definitely don’t want to use in your marketing

Folders organise by who uploaded, not by what’s inside

Folders never solve this, and the reason is structural. A folder tree reflects whoever set it up. By date, by campaign, by the staff member who dumped the files in. None of that tells you what is actually inside the video.

It’s a bit like a library that shelves books by which donor gave them, rather than by subject. Technically organised. Completely useless if what you want is “a book about plumbing”.

You could fix it by tagging every file by hand. Nobody does. Tagging 8,236 clips, scene by scene, is the kind of task that gets assigned, started, and quietly abandoned by week two. Too much work for a person, and never actually anyone’s job.

So the assets pile up. Safe, and lost.

The unlock: let the AI watch the footage

What makes this solvable now is that modern AI models can watch a video and tell you what’s in it. Not just “this is a video”, but “a technician in uniform servicing a wall-mounted aircon unit, indoors, daytime, two people in shot”.

And it can do this scene by scene. A five-minute clip stops being one blob. It becomes a list of moments, each described and tagged. That four-second shot someone gave up looking for is now its own searchable entry, with the timestamp attached.

This is the same shape as a workflow I built to write case studies from job site photos for a facilities maintenance company. Point AI at a pile of visual assets that would take a person hours to read through, and let it do the mechanical part. The pattern keeps showing up because it keeps working.

What I actually built

Stripped of the technical detail, the system has four parts.

One home for everything. Every image and video lives in one place, not scattered across personal Drives and laptops. I feed a list of Google folder URLs for the system to crawl and process each file and send it to the media library.
An AI that watches each file. As files come in, the AI looks at every one and writes a description, picks the service category, notes whether there are people, whether staff are in uniform, indoor or outdoor, and a set of tags. For videos, it does this for each scene.
A searchable index. All those descriptions and tags go into a database built for search, so a query comes back in milliseconds instead of a folder crawl. Each media file now has metadata that includes a description, tagged with service categories, whether the setting was indoor or outdoor, and whether the person was wearing our uniform.
A way to actually search it. A web app with filters, plus a Slack command. Someone types a few words in Slack and gets back matching clips with thumbnails and links, without leaving the chat.

The payoff: finding the right clip drops from a half-hour hunt to a few seconds. And because results come back at the scene level, you land on the exact moment inside a long clip, not just the clip.

Above: VIdeos and images are now catelogued, searchable and downloable for easy import to video editors

Above: A media file that has been processed by AI. The metadata now includes a description, tagged with service categories, whether the setting was indoor or outdoor, and whether the person was wearing our uniform

What I deliberately left out

I built the system and deployed it in a week. So it’s a v0.1 with limitations such as:

Some classification is wrong. In some cases, the AI confused a man doing a home inspection with a pest control person doing spraying (they were both holding a long stick). But users can correct these tags manually in the system, and once a human edits it, the AI leaves that one alone and never overwrites it. The machine handles the volume, the human keeps the final say.

Very large videos were not processed. The model has an upload size limit, and a handful of our longest, heaviest clips sit above it. Those have to be cut down or pre-processed before the AI can watch them. That part isn’t built yet. I don’t think I will do it, since most of that footage are recordings of long form interviews and zoom meetings; not useful for our marketing.

It’s internal only, on purpose. I deliberately kept customer and service-provider photos out of it. Using people’s images is a consent question, not a search question, and mixing the two would be a mistake.

It only pays off past a certain size. If your whole media library is 200 files in five folders, you don’t need any of this. You need ten minutes of renaming. This earns its keep when manual tagging has genuinely become impossible, which is also the point where the problem starts costing real money.

That last point is the same lesson from a lead-gen system I built that rejects most of my cold-email prospects before I ever see them: build the machine for the part that’s genuinely too big to do by hand, and not a step before.

Immediate benefits

The version I built handles a library of 8,236 files and growing. And now my social media editor can find what he needs in seconds. Our reels quality has improved, and so has the output.

And all it was was an AI that describes your media, an index you can search, and a search box.

So how does your team find its media today? Drive folders? Across many laptops? Stashed away in a portable HDD? Let me know if you need me to build this system for you.

Alex Tan

Digital Marketing Technologist and Industry Practitioner with over 20 years of experience building high-growth tech products and marketing content teams. Co-founded Recommend.my and Sejasa.com. Built SGPayNowQR.com and Skinshare.sg.

digital asset management social media video editing