I’m writing a simple web scraper (console app) for fun, and because it’s a cool way to learn in and outs of a language.
I have it more or less working, but my problem is that after my initial creation of list of URLs to visit I await Task.WhenAll()
around 60k requests (the “scrape” method is async). Everything runs smoothly for the first 100-1000 requests, but after that I just get Timeout errors. I’m not blocked by the websites, and they are accessible via browser. It’s just my program that seems to die at some point. I’ve searched a bit for some reasons and solutions, and I’ve stumbled upon the concept of socket exhaustion, which seems like my issue.
Now, I don’t spawn HttpClients left and right, in theory I do things correctly, i.e. I use single instance of HttpClient and reuse it as much as possible. Still, the problem persists.
I wanted to use HttpClientFactory, but it seems like it’s basically made for ASP.NET and everything that comes with that (dependency injection and stuff) and I think it’s shooting a fly with a cannon.
My question is, how can I ensure that my connections are used correctly, so I don’t unnecessarily block sockets waiting to be released? Or, how can I use HttpClientFactory in console app, without having to sacrifice my firstborn?
Any tips super appreciated!
Stack trace if that helps: stacktrace4reddit – Pastebin.com
Could it be that your httpclient is being disposed of before you expect it to be (i.e. reaching the end of a using statement)?
Do you have a timeout and max connections per server configured?
If the connection count reaches the maximum, the request sits in a queue until a connection becomes available (another request completes).
But, the timeout starts from the moment you call SendAsync(). So if it sits in queue long enough, you’ll get timeout exceptions.
If not this, share your code 🙂
It would help if you would include the code you have issues with. Otherwise you’ll only get guesses for answers.
C# devs
null reference exceptions