Tuesday, September 16, 2014

Windows 7 Desktop Can't Join the Domain - Path Not Found? Blame AVG (and everything else!)

Spoiler Alert:  uninstalling AVG from the desktop fixes the problem.

A desktop is complaining that it's Trust Relationship has failed.  Normal stuff, probably went through a system restore and ended with an old SID, no biggie.  Remove it from the domain, reboot, readd to the domain, boom done, right?  Not so fast... after changing the domain name and hitting OK I'm presented with the normal domain login to which I input my domain administrator credentials.  The computer complains with an error message:
The following error occurred attempting to join the domain "somedomain.local":
The network path was not found. 
This points to a DNS issue on our SBS 2008.  Rebooting the server was my first step and it yielded no positive results.

The SBS 2008 in question seems slow and balky.  It's an HP ML110 with 8GB of RAM serving as an SBS for a group of 10 or so people using email and file storage in the server as well as its normal duties authenticating users and doling out Group Policy.

Noted error 13568 with source NtFrs in the event log which basically says that the File Replication Service is in Journal Wrap Error.  It reads kind of like:

The File Replication Service has detected that the replica set "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" is in JRNL_WRAP_ERROR.   Replica set name is    : "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)"  Replica root path is   : "c:\windows\sysvol\domain"  Replica root volume is : "\\.\C:"  

A Replica set hits JRNL_WRAP_ERROR when the record that it is trying to read from the NTFS USN journal is not found.  This can occur because of one of the following reasons.   

[1] Volume "\\.\C:" has been formatted.  

[2] The NTFS USN journal on volume "\\.\C:" has been deleted.  

[3] The NTFS USN journal on volume "\\.\C:" has been truncated. Chkdsk can truncate the journal if it finds corrupt entries at the end of the journal.  

[4] File Replication Service was not running on this computer for a long time.  [5] File Replication Service could not keep up with the rate of Disk IO activity on "\\.\C:".  

Setting the "Enable Journal Wrap Automatic Restore" registry parameter to 1 will cause the following recovery steps to be taken to automatically recover from this error state.  

[1] At the first poll, which will occur in 5 minutes, this computer will be deleted from the replica set. If you do not want to wait 5 minutes, then run "net stop ntfrs" followed by "net start ntfrs" to restart the File Replication Service.  

[2] At the poll following the deletion this computer will be re-added to the replica set. The re-addition will trigger a full tree sync for the replica set.  

WARNING: During the recovery process data in the replica tree may be unavailable. You should reset the registry parameter described above to 0 to prevent automatic recovery from making the data unexpectedly unavailable if this error condition occurs again.  

To change this registry parameter, run regedit.  Click on Start, Run and type regedit.  Expand HKEY_LOCAL_MACHINE. Click down the key path:    "System\CurrentControlSet\Services\NtFrs\Parameters" Double click on the value name    "Enable Journal Wrap Automatic Restore" and update the value. 

 If the value name is not present you may add it with the New->DWORD Value function under the Edit Menu item. Type the value name exactly as shown above.

Also noted Event ID 25:

The shadow copies of volume \\?\Volume{83195036-2013-11e0-9593-3c4a92d51777} were deleted because the shadow copy storage could not grow in time.  Consider reducing the IO load on the system or choose a shadow copy storage volume that is not being shadow copied.

It sounds like the hard disk could too busy to serve up essential functions - looking at the Resource Monitor I could see that SQL was going crazy reading itself from the hard drive.  I decide to run the SBS 2008 BPA and see if it can tell me more.  I also update the HP System Management Agents and the HP Array Configuration Utility so that I could rule out hard disk problems (which indeed were not an issue).

My BPA report showed some issues which were solved with some simple netsh commands that were detailed in the BPA.  But an outsized Sharepoint and SBSMonitoring were also an issue as was the server being in Journal Wrap condition.

The outsized databases don't seem like they'd keep desktops from joining the domain, but the journal wrap might be a different story.  I followed the link to http://support.microsoft.com/kb/292438 and said to myself, "Oh Crap, they've linked to an outdated article, this is for Win2k!  Nice job Microsoft..."  Worthless - except, it's not.  Things haven't changed much in the last 14 years of Active Directory.

Sure enough, upon reading http://blog.ronnypot.nl/?p=738 I check and find the SYSVOL share was not available.  I changed the registry value (which was what the error message directed, also) and waited a few minutes.  The SYSVOL share came available again.  BUT... still cannot connect the workstation to the domain.

I decided to pursue the other issues indicated by the BPA and fix the SBSMonitoring and Sharepoint Services databases.

First SBSMonitoring - Google yielded http://kwsupport.com/2013/05/sbsmonitoring-database-is-nearing-maximum-size/  which suggests using http://blogs.technet.com/b/sbs/archive/2011/08/22/how-to-recreate-the-sbsmonitoring-database.aspx to replace the database with a new blank one.  What are the drawbacks?  Loss of historical data - no biggie.  Downloading and running the script was a breeze, I just needed to set-executionpolicy unrestricted to get it to execute.  That article then recommended I complete the steps at http://blogs.technet.com/b/sbs/archive/2009/07/14/sbs-2008-console-may-take-too-long-to-display-alerts-and-security-statuses-display-not-available-or-crash.aspx which will shorten the amount of time which logs are kept and reduce the amount of information which is logged.

Now to deal with the overweight Sharepoint Services Database - http://support.microsoft.com/kb/2000544 seems like a good place to start and it features a convenient "Fix It For Me."  This removed the issue from the BPA, but the desktop still won't join the domain.

Others have been feeling this pain, I see posts with similar issues all over the Internet.  This one:  http://richardburley.com/windows-7-unable-to-join-domain-fix/ seems like it might finally be the one which most closely matches my situation.  On the afflicted PC I cannot browse to \\servername.  I checked this from another computer and found that \\servername worked fine - an exact fit!  This fellow fixed his issue by removing everything from the network configuration that wasn't TCP/IP v4 or v6.  I'm working remotely so this seems like a real bummer of a solution, but examining the network protocols I noted the AVG Network Filter Driver.  Perhaps this is it?  I removed AVG and rebooted the PC.

Uninstalling AVG fixed the issue - a fifteen minute fix found through four hours of work.  The server is certainly having issues, but they weren't causing THIS issue!