One part of Active Directory (AD) that can cause all sorts of problems is replication. Replication is crucial when dealing with one or more domains or domain controllers (DCs), no matter whether they’re in the same site or different sites. Problems with replication can lead to authentication problems and problems with accessing resources on the network.
AD object updates are replicated between DCs to ensure all partitions are synchronized. In large companies, having multiple domains and multiple sites is common. Replication must occur within the local site as well as the additional sites to keep domain and forest data the same between all DCs.
I’ll show you how to identify AD replication problems. I’ll also show you how to troubleshoot and resolve four of the most common AD replication errors:
- Error -2146893022 (The target principle name is incorrect)
- Error 1908 (Could not find the domain controller)
- Error 8606 (Insufficient attributes were given to create an object)
- Error 8453 (Replication access was denied)
For this discussion, I’ll use the Contoso forest shown in Figure 1. Table 1 contains the roles, IP addresses, and DNS client settings for the machines in that forest.
Identifying AD Replication Problems
To identify the AD replication problems, you can run the AD Replication Status Tool from your administration workstation in the forest’s root domain. For this example, you’d open this tool from the Win8Client machine, then click the Refresh Replication Status button to ensure you’re communicating properly with all the DCs. On the Discovery Missing Domain Controllers tab of the tool’s Configuration/Scope Settings page, you can see two DCs are missing, as Figure 2 shows.
On the Replication Status Collection Details tab, you can see the replication status of the DCs that aren’t missing, as shown in Figure 3.
By going to the Replication Status Viewer page, you can see any replication errors that are occurring. As you can see in Figure 4, there are quite a few replication errors occurring in the Contoso forest. Note that out of the five DCs, two of them can’t see the other DCs, which means replication isn’t going to occur on the DCs that can’t be seen. Therefore, users connecting to the child DCs aren’t going to have the most up-to-date information, which can lead to problems.
Because there are replication errors, it’s helpful to use RepAdmin.exe to get a forest-wide replication health report. To create the file, you can run the following command from Cmd.exe:
Because there are problems with two of the DCs, you’ll see two occurrences of LDAP error 81 (Server Down) Win32 Err 58 on the screen when the command executes. We’ll deal with those errors later on. For now, open up the ShowRepl.csv in Excel and follow these steps:
- From the Home menu, click Format as table and choose one of the styles.
- While holding down the Ctrl key, click both column A (Showrepl_COLUMNS) and column G (Transport Type). Right-click somewhere in those columns and select Hide.
- Reduce the width of the remaining columns (if needed) so that column K (Last Failure Status) is visible.
- For column I (Last Failure Time), click the down arrow and deselect 0.
- Look at the date in column J (Last Success Time). This is the last time that replication was successful.
- Look at the errors in column K (Last Failure Status). These errors will be same as what you saw in the AD Replication Status Tool.
- Go to a PowerShell prompt and run the command:
- In the grid window that appears, select Add Criteria, select Last Failure Status, and press Add.
- Select the blue underlined word contains in the filter and select does not equal.
- As shown in Figure 5, type a 0 in the box so that it filters out everything with a 0 (success) and shows only the errors.
Now that you know how to check the replication status and discover any errors, let’s look at how to troubleshoot and resolve the four most common errors.
Troubleshooting and Resolving AD Replication Error -2146893022
Let’s start with resolving error -2146893022, where DC2 is failing to replicate to DC1. From DC1, run the following Repadmin command to check the replication status of DC2:
Figure 6 shows the results, which indicate that replication is failing because DC2’s target principle name is incorrect. However, error descriptions like this can be misleading, so you need to dig deeper.
First, you should determine whether there’s basic LDAP connectivity between the machines. To check this, run the following command from DC2:
As Figure 6 shows, you’re getting an LDAP error. Next, try to initiate AD replication from DC2 to DC1:
Once again, you see the same principle name error, as shown in Figure 6.
If you open the Event Viewer on DC2, you’ll see Event 4, as shown in Figure 7.
The highlighted text in the event indicates the reason for the error. What this means is that DC1’s computer account password is different than the password stored in AD for DC1 on the Key Distribution Center (KDC), which in this case, is running on DC2. So, the next task is to determine whether DC1’s computer account password matches what is stored on DC2. From a command prompt on DC1, run the following two commands:
Afterward, open the dc1objmeta1.txt and dc1objmeta2.txt files that were created and look at the version differences for dBCSPwd, UnicodePWD, NtPwdHistory, PwdLastSet, and lmPwdHistory. In this case, the dc1objmeta1.txt file lists the version as 19, whereas the version in the dc1objmeta2.txt file is 11. So, comparing these two files reveals that DC2 has old password information for DC1. The Kerberos operation failed because DC1 was unable to decrypt the service ticket presented by DC2.
The KDC running on DC2 can’t be used for Kerberos with DC1 because DC2 has the old password information. To resolve this problem, you must force DC2 to use the KDC on DC1 so the replication will complete. To do so, you first need to stop the KDC service on DC2:
Then, you need to initiate replication of the Root partition:
Next, you should run the two Repadmin /showobjmeta commands again to verify the versions are the same. If all is well, you can restart the KDC service:
Troubleshooting and Resolving AD Replication Error 1908
Now that the -2146893022 error is fixed, let’s move on AD replication error 1908, where DC1, DC2, and TRDC1 failed to replicate from ChildDC1. To troubleshoot this problem, you can use Nltest.exe to create a Netlogon.log file to determine the cause of error 1908. First, enable verbose logging on DC1 by running the command:
Now that logging is enabled, you need to initiate replication on the DCs so that any errors are logged. It’s helpful to run three commands to reproduce the errors. First, run the following command on DC1:
As you can see in Figure 8, the results indicate that replication is failing because the domain’s DC couldn’t be found.
Second, from DC1, try to locate the KDC in the child.root.contoso.com domain using the command:
The results in Figure 8 indicate that there’s no such domain.
Third, because you can’t find the KDC, try to reach any DC in the child domain using the command:
Once again, the results indicate that there’s no such domain, as shown in Figure 8.
Now that you reproduced the errors, you need to review the Netlogon.log file that has been created in the C:\Windows\debug folder. Open the file in Notepad and look for the entry that begins with “DSGetDcName function called”. Note that there will be multiple entries with this call. You need to find the entry that has the same parameters you specified in the Nltest command (Dom:child and Flags:KDC). The entry you’re looking for will look like:
You should review the initial entry as well as subsequent entries in that thread. Table 2 shows a sample 3372 thread.
In Table 2, you can see the DNS lookup failure for the KDC SRV record in the child domain. Error 1355 indicates that the specified domain either doesn’t exist or couldn’t be contacted.
Because you’re trying to contact Child.root.contoso.com, the next step is to try pinging it from DC1. You’ll likely get an error stating that it can’t find the host.
The information from the Netlogon.log file and the ping test points to a possible problem in DNS delegation. Because you suspect this is the problem, you can test the DNS delegation by running the following command on DC1:
Figure 9 shows a sample Dnstest.txt file. As you can see, there’s a DNS problem. The IP address 192.168.10.1 is supposed to be the address for DC1.
To resolve the DNS problem, follow these steps:
- On DC1, open up the DNS Management console.
- Expand Forward Lookup Zones, expand root.contoso.com, and select child.
- Right-click the (same as parent folder) Name Server record and choose Properties.
- Select lamedc1.child.contoso.com and click the Remove button.
- Select Add so that you can add the valid child domain DNS server to the delegation settings.
- In the Server fully qualified domain name (FQDN) box, type the correct server of childdc1.child.root.contoso.com.
- In the IP Addresses of this NS record box, input the proper IP address of 192.168.10.11.
- Click the OK button twice.
- Select Yes in the dialog box that opens asking if you want to delete the glue record lamedc1.child.contoso.com [192.168.10.1]. (A glue record is a DNS A record for the name server authoritative for the delegated zone.)
- Use Nltest.exe to verify you can locate the KDC in the child domain. Use the /force option so that the Netlogon cache is not used:
- Test AD replication from ChildDC1 to DC1 and DC2. This can be done two different ways. The first approach is to run the command:
The other approach is use the Microsoft Management Console (MMC) Active Directory Sites and Services snap-in, in which case you right-click the DC and choose Replicate Now, as shown in Figure 10. You need to do this for DC1, DC2, and TRDC1.
When doing this, you’ll receive the dialog box shown in Figure 11. Ignore it and click OK. (I’ll discuss this error shortly.)
After completing these steps, go back to the AD Replication Status Tool and refresh the forest-wide replication status. Error 1908 should no longer be present. The error you’ll see is error 8606 (Insufficient attributes were given to create an object), as noted Figure 11. This is the next problem to resolve.
Troubleshooting and Resolving AD Replication Error 8606
A lingering object is an object that’s present on one DC but has been deleted (and garbage collected) on one or more other DCs. AD replication error 8606 and Directory Service event 1988 are good indicators of lingering objects. It’s important to note that AD replication might complete successfully and not log an error from a DC containing lingering objects because replication is based on changes. If there are no changes to any of these objects, there’s no reason to replicate them. For this reason, when cleaning up lingering objects, you should assume that all DCs have it, not just the DCs logging errors.
To troubleshoot this problem, you first need to confirm the error by running the following Repadmin command on DC1:
You should see an error message like that in Figure 12.
You’ll also see event 1988 logged in DC1’s Event Viewer, as shown in Figure 13. Note that event 1988 only reports the first lingering object that was encountered. There usually are many more of these objects present.
You need to copy down three items from the event 1988 information: the lingering object’s globally unique identifier (GUID), the source DC, and the partition’s distinguished name (DN). With this information, you can determine which DCs have this object.
First, use the object’s GUID (in this case, 5ca6ebca-d34c-4f60-b79c-e8bd5af127d8) in the following Repadmin command, which sends its results to the Objects.txt file:
If you open the Objects.txt file, you’ll see that any DC that returns replication metadata for this object contains one or more lingering objects. DCs that don’t have a copy of this object report the status 8439 (The distinguished name specified for this replication operation is invalid).
Next, you need to obtain DC1’s Directory System Agent (DSA) object GUID and identify all lingering objects in the Root partition on DC2. (The DSA provides access to the physical store of directory information that’s located on a hard disk. In AD, the DSA is part of the Local Security Authority process.) To do this, run the command:
In Showrepl.txt, DC1’s DSA object GUID will appear at the top of the file and look like this:
DSA object GUID: 70ff33ce-2f41-4bf4-b7ca-7fa71d4ca13e
With this information, you can use the following command to verify the existence of lingering objects on DC2 by comparing its copy of the Root partition with DC1’s Root partition.
You can then review the Directory Service event log on DC2 to see if there are any lingering objects. If there are, each one will be reported in its own event 1946 entry. The total count of lingering objects for the partition that was checked will be reported in an event 1942 entry.
You can remove lingering objects a couple of ways. The preferred method is to use ReplDiag.exe. Alternatively, you can use RepAdmin.exe.
Using ReplDiag.exe. From your administration workstation in the forest root domain (in this case, Win8Client), you should run the following two commands:
The first command removes the objects. The second command verifies that the replication completed successfully (i.e., error 8606 is no longer logged). You can rerun the Repadmin /showobjmeta commands discussed previously to ensure the object was removed from all the DCs. If you have a read-only domain controller (RODC) and it contained this lingering object, you’ll notice it’s still there. The reason is that the current version of ReplDiag.exe doesn’t remove objects from RODCs. To cleanup on the RODC (in this example, ChildDC2), you can run the command:
You should then review the Directory Service event log on ChildDC2 and look for event 1939. As Figure 14 shows, it notifies you that the lingering objects have been removed.
Using RepAdmin.exe. Another way to remove lingering objects is use only RepAdmin.exe. You first need to remove the lingering objects from the reference DCs using the code shown in Listing 1.
Afterward, you must remove the lingering objects from all the remaining DCs. (Lingering objects might be referenced, or shown, on multiple DCs, so you need to make sure you remove them as well.) Listing 2 shows the commands to use for this purpose.
As you can see, using ReplDiag.exe is much easier to use than RepAdmin.exe because you have far fewer commands to run. The more commands that need to run, the more chances there are for typos, missing commands, or command-line errors.
Troubleshooting and Resolving AD Replication Error 8453
The previous AD replication errors dealt with a DC not being able to find other DCs. AD replication error 8453 occurs when a DC can see other DCs, but it can’t replicate with them.
For example, suppose that the ChildDC2 (an RODC) in the child domain isn’t advertising itself as a Global Catalog (GC) server. To get the status of ChildDC2, you can run the following command on ChildDC2:
This command sends its results to Repl.txt. If you open this text file, you’ll see the following at the top:
Boulder\ChildDC2 DSA Options: IS_GC DISABLE_OUTBOUND_REPL IS_RODC WARNING: Not advertising as a global catalog
If you look closely at the Inbound Neighbors section, you’ll see that the DC=treeroot,DC=fabrikam,DC=com partition is missing because it isn’t being replicated. If you look the bottom of the file, you’ll see the error:
Source: Boulder\TRDC1 ******* 1 CONSECTUTIVE FAILURES since 2014-01-12 11:24:30 Last error: 8453 (0x2105): Replication access was denied Naming Context: DC=treeroot,DC=fabrikam,DC=com
This error indicates that ChildDC2 is unable to add a replication link for the Treeroot partition. As Figure 15 shows, this error is also recorded in the Directory Services event log on ChildDC2 as event 1926.
At this point, you need to check for any security-related problems. To do this, you can use DCDiag.exe:
Figure 16 shows an excerpt from the DCDiag.exe output.
As you can see, you’re receiving error 8453 because the Enterprise Read-Only Domain Controllers security group doesn’t have the Replicating Directory Changes permission.
To resolve this problem, you need to add the missing access control entry (ACE) to the Treeroot partition. To do so, follow these steps:
- On TRDC1, open ADSI Edit.
- Right-click DC=treeroot,DC=fabrikam,DC=com and choose Properties.
- Select the Security tab.
- Review the permissions on this partition. Notice that there are no entries for the Enterprise Read-Only Domain Controllers security group.
- Click Add.
- In the Enter the object names to select box, type ROOT\Enterprise Read-Only Domain Controllers.
- Click the Check Names button, then choose OK if the object picker resolves the name.
- In the Permissions for Enterprise Read-Only Domain Controllers dialog box, clear the Allow check boxes for the following permissions:
- Select the Allow check box for the Replicating Directory Changes permission, as shown in Figure 17. Click OK.
- Manually initiate the Knowledge Consistency Checker (KCC) to immediately recalculate the inbound replication technology on ChildDC2 by running the command:
This command forces the KCC on each targeted DC to immediately recalculate the inbound replication topology, thereby adding the Treeroot partition again.
Healthy Replication Is Crucial
Replication throughout an AD forest is crucial. Without healthy replication, changes made aren’t seen by all DCs, which can lead to all sorts of problems, including authentication issues. Replication problems might not show up immediately. So, if you aren’t monitoring replication or at least periodically checking it, a problem just might pop up at the most inopportune time. I’ve shown you how to check the replication status and discover any errors as well as how to resolve four common AD replication problems.