How to deal with a stuck or infinitely queued build

BuildIcon_Large

Team Foundation Build can be a difficult beast, but not usually because of itself. Although people do experience problems when adding solutions that have been around for a while with no build run, it really does do a fantastic job.

On occasion I have seen what I call a “Stuck Build” which is a build that never completes, this tends to only happen on builds setup on large codebases that have never been built before. It also usually also occurs at the point in the build after everything has been built and Team Foundation Build is trying to upload the data it has collected to Team Foundation Server.


Developers should always make sure that any builds they queue complete in a timely fashion.

I queued a build at 3:51. It is still there at 4:21. It seems stuck.

clip_image001

-George Gong, SSW

If you see a Queued build that never completes then there is probably a stuck build somewhere. George;s build has not even run yet, it is still waiting in the queue.

I created a new build for SSW.Website and now it stays in the queue for over 30 minutes.

clip_image002[4]

But the TFS build controller is ok.

clip_image004[4]

clip_image006[4]

I have just installed the VS2010 RTM and Team Explorer. Did I miss something?

-Brite Cheng, SSW

Again, this build is queued, it never started. It is queued as there is another build running. Evan though we now have more than one build server queues will always exist. All it takes is for one more build than we have capacity for to be sent to the controller at the same time. Someone will always have to wait. But they should not have to wait for long. We should probably have something line http://buildmonitor.codeplex.com/ setup so we can quickly look and see what builds are running J but in the mean time you can run the “Build Notifications” application to see what builds are running.

clip_image002
Figure: Where to find the build monitor on the Start menu

Sometimes a build can get Stuck. They usually get stuck when they are uploading assets into TFS. This could be a communication problem or purely a size problem, or a little of both. One thing to look at is how many data collectors you have running during the build and maybe reduce them to get a consistent pass.

You can log onto the server and kill the running build, but make sure it is stuck first… No build should take longer than 30 minutes (Unless it always does).

I would expect any developer to follow the process that comes next:

How to Debug a stuck build:

In this case there is a stuck build of SSW.Framework running:

image

Figure: Build list

I can tell that it is stuck by looking at the running time:

clip_image006
Figure: A very stuck build

After cancelling the build, it did not free up the Build Service (stuck) so I need to restart the build service.

Launch the Admin Console on the build server and restart the build service

clip_image007

Figure: Restart the build service

Then we need to queue another build.

clip_image009

Figure: build is now successful

Done

 

Technorati Tags: ,
  • Pedro

    Hi, nice post, I have seen the stuck build occurrence several times and I am trying to find out the reason why it get stuck, is there a way to diagnostic the reason why it get stuck? I configured the timeout on the build template for 6 hours and the build continues to run for as long as I let it. I have a large code base. Thanks.

    • Are you using legacy XAML builds or the new build system? Do you know where in the process it gets stuck? You should see from the log where it gets to.

      • Pedro

        Hi, thanks for taking your time to answer, i am using the tfs build controller and agent 2012 with a tfs server 2013. I think i managed to figure what happen, but don’t know why it remain stuck, I managed to find an exception from the tfs server on the log of the build agent where it has an exception in the communication between the tfs server and its database when publish the results from the unit tests, but in these case the build should fail.

        Exception Message: TF246017: Team Foundation Server could not connect to the database. Verify that the server that is hosting the database is operational, and that network problems are not blocking communication with the …