When developing against the Microsoft Graph you may find yourself experiencing HTTP 429 Error Codes now that resource throttling is being implemented in different areas of the Graph.
I came up against a strange and somewhat misleading one this week which is worth being aware of if you are using the Graph to access SharePoint lists and libraries using the /sites/ area of the Graph.
I had a service running which started reporting HTTP 429 error codes. I read through all the latest published documentation to try a figure out how the throttling has been implemented and what the limitations are to see what part of the code could be triggering the throttling. As you’ll find the documentation is very non-committal and mostly serves to justify why there are no specific limits and rather algorithms that dynamically determine the throttling based on a large number of dynamic criteria. All of this sounds really fancy and advanced but is not very helpful when trying to identify what could be causing the throttling issue, or what limit your code is hitting.
Here’s the Microsoft documentation links which are well worth the read:
Microsoft Graph throttling guidance
Updated guidance around SharePoint web service identification and throttling
Avoid getting throttled or blocked in SharePoint Online
Most of the above advice is summarised in this section I took from one of those official documents on handling throttling with the Graph API (Feb 2018)
Best practices to handle throttling
The following are best practices for handling throttling:
- Reduce the number of operations per request.
- Reduce the frequency of calls.
- Avoid immediate retries, because all requests accrue against your usage limits.
When you implement error handling, use the HTTP error code 429 to detect throttling. The failed response includes the Retry-After field in the response header. Backing off requests using the Retry-After delay is the fastest way to recover from throttling because Microsoft Graph continues to log resource usage while a client is being throttled.
-
Wait the number of seconds specified in the Retry-After field.
-
Retry the request.
-
If the request fails again with a 429 error code, you are still being throttled. Continue to use the recommended Retry-After delay and retry the request until it succeeds
This advise all makes sense so that if your code is making a lot of calls (think migrating SharePoint items or doing bulk updates) that the Graph may tell you to slow down. When I was investigating my scenario however, it just didn’t make sense that the code was generating enough traffic to worry the Graph (Office 365 service). The telemetry was telling me the code had made around 2,500 Graph calls spread over a period of 24 hours and this was also spread across more than 100 users from a number of different Office 365 tenants.
Diving deeper into the telemetry a pattern quickly emerged, the 429 errors were being returned in response to a Graph call to get a list item based on a column value. Something along these lines:
https://graph.microsoft.com/v1.0/sites/{site-id}/lists/{list-id}/items?filter=Fields/Title eq 'testitem'
This call didn’t fail all the time, if fact it only seemed to get the 429 error in less than 10% of the cases.
Having spend many hours over the past few years ‘working with’ SharePoint thresholds and query limitations on large lists and libraries, my mind started to move towards thinking that maybe the 429 error was a bit misleading and was actually failing due to the Graph API hitting SharePoint threshold limitations.
Off to prove my theory, I’ve got a library with just under 5000 items (which is the SharePoint list threshold)
Using the Graph API Explorer I can make a call that queries this SharePoint library for a specific item matching on the Title column value being equal to “upload.log” (a file which I know exists in the SharePoint library).
As expected the item is found and a Success code 200 is returned along with the JSON payload in the response body shown above. Time to prove the theory, what if I now add 2 more files to the same document library and repeat the process?
After uploading 2 more files, the library settings now indicate that we have exceeded the list view threshold.
Now executing the same query in the Graph API explorer gives us the 429 error code. Inspecting the response body we can see the additional error code of “activityLimitReached” and message of “The application or user has been throttled”
Why was this error misleading? Neither the error code or message specifically indicate the issue being related to SharePoint thresholds. The documentation and best practice articles (linked to at the start of this article) regarding this 429 response are written on the premise that the volume and frequency of calls is responsible for the error and hence the guidance to handle the error should be to incrementally back-off and keep trying until you get success. This guidance is totally misguided in the case of hitting the underlying SharePoint threshold limitation as the call will always fail and has nothing to do with the volume or frequency of calls you are making. It will fail if it’s the only call you make all day and no matter how many times you retry, it will always fail.
Leave a Reply