Tips for Batching Requests in GitHub Copilot's Agent Mode
Recently I had to analyze a bunch of python libraries we're using to determine which ones will need to be upgraded before we can bump our app's Django version. Our repo is huge and we use a lot of packages, but since all of them are open source and easily accessible via GitHub, I figured this would be a good task to outsource to AI. I still consider myself a novice when it comes to effectively prompting and directing LLM agents, so I learned a lot over the course of the last week and ultimately got results that I'm pretty satisfied with. Some of these may be common knowledge, but here are a few things I picked up along the way that may help you too:
When trying to generate human-readable documents (i.e. not structured data), you will get more consistent results when pointing the LLM to a reference file than trying to explain exactly what sections you want to be in a document. This is especially useful if you are trying to generate a large number of documents that all need to look the same.
Understand how billing for premium requests work. Only your prompts eat into your budget - all of the work that the agent does between your prompt and when it decides it's finished counts as a single request. Note that clicking "Continue" when VSCode asks you if you want the agent to keep working does not count as an additional request. So asking Opus 4.5 to analyze 30 documents at once only counts as one request, no matter how long it take or how many tool calls it makes along the way.
It pays to refine your prompt with a lower-cost model before feeding it to a more expensive one, especially if you are able to more effectively batch multiple asks into one prompt. This keeps you in control without having to burn premium requests by stopping and adjusting your prompt when the expensive model strays from what you actually wanted.
When trying to batch larger jobs together into a single request, you should make sure your prompt explains how to divide the task into multiple independent units and suggest that the agent uses subagents to process the work. Each task will still get executed serially, so this doesn't speed up your request at all (and in fact may make them a bit slower depending on what information each subagent looks up itself), but subagents each operate with their own context window. This keeps the global context from getting too polluted with chain-of-thought responses, lookups, tool calls, and so on, which ultimately boosts the focus of the global agent's output (or the subagents', if each one is instructed to give its own output). Subagents don't eat into your premium request budget either.
Consider writing some scripts that can batch several operations together to save some time. In my case, I used
tox.inifiles as a heuristic for determining which versions of python and django a library supported. Left on its own, the agent was able to figure out how to find this information, but it kept making a request to the GitHub API to fetch the repo's tags, then individually requested each tox.ini and parsed it for django-related information. So I wrote (cough, generated) a script which did this in one step and included a reference to it in my prompt. This way the agent could just make one request to fetch all of the things it needed, which saved a good amount of time when I asked Opus to validate the information produced by a lesser model. An added bonus of this approach is that different models won't try to reinvent the wheel to get the information you want - calling a simple script leaves less room for error than letting the agent chain together the output of several individual web requests.