Active Directory Groups not Syncing with Team Foundation Server 2010

Topic(s)
Audience

Everyone

For a little while now I had been investigating an odd occurrence in Team Foundation Server. Users added to Active Directory groups have not been filtering back into the Team Foundation Server groups cache. The meant that we had to add users directly to Team Foundation Server in order to give them permission. While this was not ideal, it did not really inconvenience us that much, but we are now trying to streamline our security and need it fixed.

 

Updated 27th July 2010 – SOLUTION – Craig Harry spoke to a couple of the product team guys for both TFS and Active Directory and they came up with a temporary solution.


Although we do not have a high turnover of core staff, we take on a lot of developers for Work Experience and we now have three guys in the root Project Collection Administrators when we already have an Active Directory group the are in added at this level.

clip_image002

Figure: These are all work experience, and are in the AD group, but don’t show in the AD group SSW2000Developers on the TFS server

Snapshots from the SSW2000Developers groups (as records on the TFS server)

clip_image004

clip_image006

clip_image008

Figure: TFS does not show these guys even though they are in AD

Now, don’t get me started as to why all developers get Project Collection Admin! It better than in TFS 2008 when they all have Server admin, but we have not yet tackled the security model update that we need. We have over 170 Team Projects and I have not yet had the time to go through and add them all in. Yes I know I could build a tool for that, but again, I have not yet had the time.

The first thing to look at is the Event Log, but as you can see there are rather a lot of entries, over 13000 of them so applying a filter to just the TFS events is a good idea.

image
Figure: Check the event log for common errors

You can see the hourly “TFS Services” errors, and in fact they reoccur every 24 hours. If you check the 3071 error you will see that the core error is TF53010 that is caused by a timeout in the “Team Foundation Server Identity Synchronization job”.

The description for Event ID 3071 from source TFS Services cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

TF53010: The following error has occurred in a Team Foundation component or extension:
Date (UTC): 7/07/2010 1:38:49 PM
Machine: BASALISK
Application Domain: TfsJobAgent.exe
Assembly: Microsoft.TeamFoundation.Framework.Server, Version=10.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a; v2.0.50727
Service Host: fba54aae-87d6-47bf-a192-0e58693b9ade (TEAM FOUNDATION)
Process Details:
Process Name: TFSJobAgent
Process Id: 7976
Thread Id: 9136
Account name: NT AUTHORITYNETWORK SERVICE

Detailed Message: The Team Foundation Server Identity Synchronization job has timed out. Please restart the job service.

the message resource is present but the message is not found in the string/message table
-Event Log entry from TFS Server

The next thing you want to look for is the job definition. Is it there and is it configured correctly. To do this you need to run some SQL on your TFS server. Please remember that you loose support if you make changes to the data without the aid of MSFT Support. Note that I am not doing this alone, Mr Craig Harry MSFT has my back on this one.

USING tfs_Configuration 
SELECT TOP 1000 *  
FROM [Tfs_Configuration].[dbo].[tbl_JobDefinition] 
WHERE JobId='544DD581-F72A-45A9-8DE0-8CD3A5F29DFE'

You will likely get the following single entry returned by SQL Server.

JobId: 544DD581-F72A-45A9-8DE0-8CD3A5F29DFE
JobName: Team Foundation Server Periodic Identity Synchronization
ExtensionName: Microsoft.TeamFoundation.JobService.Extensions.Core.IdentitySyncJobExtension
Data: NULL
EnabledState: 0
Flags: 0
LastExecutionTime: NULL

Figure: Job Definition returned from SQL for the AD Sync job

Looks OK to me, and as I understand it is normal for the LastExecution to be NULL.

The next thing to check is the history for the Job runs.

USING tfs_Configuration 
SELECT TOP 1000 *
FROM [Tfs_Configuration].[dbo].[tbl_JobHistory] 
WHERE JobId='544DD581-F72A-45A9-8DE0-8CD3A5F29DFE'

You will hopefully not get the same result as me:

HistoryId: 519100
JobSource: FBA54AAE-87D6-47BF-A192-0E58693B9ADE
JobId: 544DD581-F72A-45A9-8DE0-8CD3A5F29DFE
QueueTime: 2010-04-21 01:00:00.000
StartTime: 2010-04-21 01:00:00.597
EndTime: 2010-07-02 16:34:13.653
AgentId: 5D5939AB-4F76-4BDC-A518-FC09EC352152
Result: 7
ResultMessage: NULL
QueuedReasons: 1
QueueFlags: 0

Figure: Data from the Job History table shows something interesting

Because of the timeout in the error message above the TFS Job Agent has marked the job as “Inactive” with a 7 above.

So, how do we figure out what can we do to try and identify where the problem is. Well its time to enable the built in tracing functionality and look at what is going on and where the error is occurring.

The job agent is located in “C:Program FilesMicrosoft Team Foundation Server 2010Application TierTFSJobAgent” and you need to edit the “TfsJobAgent.exe.config” file to enable the trace. This file is stored in a location that needs admin access to the folders, so start notepad in Administrator mode and find the trace section and uncomment the Listners.

<trace autoflush="false" indentsize="4">
  <!--To enable tracing to file, simply uncomment listeners section and set trace switch(es) below.
      Directory specified for TextWriterTraceListener output must exist, and job agent service account must have write permissions. -->
  <!--<listeners>
    <add name="myListener" 
      type="System.Diagnostics.TextWriterTraceListener" 
      initializeData="C:tempjobagent.log" />
    <remove name="Default" />
  </listeners>-->
</trace>

You can see on lines 4 and 9 there is the standard XML/HTML comments and you just need to remove them and set the “initializeData” attribute to a location that you can write to; this is usually “c:temp”.

Once you have that there is one more thing to do. Usually traces are broken up into categories and we want to enable verbose tracing on at least the “General”, “API” and “Authentication” switches.

<switches>
  <!--  Trace Switches
        Each of the trace switches should be set to a value between 0 and 4, inclusive.
          0: No trace output
          1-4: Increasing levels of trace output; see Systems.Diagnostics.TraceLevel-->
  <add name="API" value="0" />
  <add name="Authentication" value="0" />
  <add name="Authorization" value="0" />
  <add name="Database" value="0" />
  <add name="General" value="0" />
  <add name="traceLevel" value="0" />
</switches>

Set the “value” attribute to a 4 for “Verbose” for those switches and save the file.

After examining the logs Mr Harry sent me a little bit of code to run on the server… Rather innocuous looking in its terseness, but it helped isolate the issue.

using System;
using System.Collections.Generic;
using System.DirectoryServices;
using System.Linq;
using System.Text;

namespace ReadLocal
{
    class Program
    {
        static void Main(string[] args)
        {
            using (DirectoryEntry de = new DirectoryEntry())
            {
                de.Path = string.Concat("WinNT://./Administrators");
                Console.WriteLine(de.SchemaClassName);
                de.Path = string.Concat("WinNT://BUILTIN/Administrators");
                Console.WriteLine(de.SchemaClassName);
            }
        }
    }
}

This code have been running on the TFS server for nearly an hour with no sign of returning. The code is trying to resolve accounts against the local machine so as not to have to call out to Active Directory and make things a little faster. I am amazed that Craig was able to diagnose this over IM (Yes the code came through IM as well) and all credit should be sent in his direction.

This is an edge case code problem on our domain and the TFS product team is going to have to speak to the Active Directory guys and see if it is a domain setting change or a hotfix that is needed.

Craig spoke to the relevant guys and it seams that although they will be fixing the code and releasing a hotfix, there is a work around for the problem.

If you remove the “BUILTINAdministrators” group from the “Team Foundation Administrators” group the code takes a different path and does not encounter the hang.

image
Figure: Open “Team Foundation Server Administration Console | Application Tier | Group Membership | [TEAM FOUNDATION]Team Foundation Administrators” and remove the “BUILTINAdministrators” group.

I then restarted the “Team Foundation Server Job Agent” service and after a few minutes the problem above had resolved itself correctly.

Technorati Tags: ,

Create a conversation around this article

Share on Facebook
Share on Twitter
Share on Linkdin

Read more

Martin Hinshelwood
For a long time now I have been searching for that perfect domain that epitomised the vision, the why, of what I am trying to achieve with my customers and the industry at large. Now I have found it in http://nkdagility.com
Martin Hinshelwood
At the MVP Summit I was appalled by the number of people who asked questions about new features for supporting hierarchical tasks! I shared a disgusted look with Peter Provost and we had a quick (and I mean really quick) conversation that resulted in this post. it really comes down …
Martin Hinshelwood
In my journey of delivering an immersive Product Development Mentor Program over the last eight weeks, a compelling narrative unfolded that beautifully illustrates the essence and true strength of Scrum. This story, rooted in the practical application of Scrum through Minecraft, unveils the depth of adaptability and resilience that Scrum …
Martin Hinshelwood
The Boards in Azure DevOps are a powerful tool that your teams can leverage to enable transparent visualization of the current state of value delivery.  However, the inclusion of Blocked columns can stealthily erode the very foundations of efficiency these boards are meant to uphold. By obfuscating the state of …