Patrick's SharePoint Blog

SharePoint's Booming world

Archive for August, 2010

Gathering SharePoint Sites Data using SQL

Posted by Patrick Boom on August 5, 2010

Ever got into a large farm environment where you kind of lost track of who is using the sites, which sites there are and how you could contact your user base? Sure, tools are available to give you some kind of BI solution that could help you in this respect.

But I  just wanted a simple view of all the site collections that are within a particular web application (possibly split over multiple content databases) and for each site collection, the logins and email addresses of the people who are within the ‘Owner’ group or similar named groups. As an extra bonus, I wanted to extract the current site Quota, the quota they already used en when the site was last changed.

Like said, some of this information is already available in some form, but I found it hard to get a complete simple overview of all site collections, especially if the numbers start exceeding the thousands. Final requirements,  it should be fast enough to be delivered upon request and the output should be easy to manipulate using analysis tools.

So, couple of ways to get this information, but in fact, only two. Either obtain the information through the object model (with either C# or PowerShell) or use the more dirty (and unsupported) approach of querying the database directly. Although I am kind of in favor of doing it the correct way (i.e. through the object model), this has one very big disadvantage. It is SLOW and has quite some impact on the resources of a server. If your site collections start approaching the thousands or ten-thousands , then running the script could take a considerable amount of time. And we do not want that, do we?

So, for this one time, I decided to go the direct route. Because the query is just going to read the database, there is no danger of corrupting the database. But keep in mind that the database may change during service packs and that a read action may block a write action, because of the lock on the table. But because the query will run for a very short time and outside office hours, I find it acceptable.

Below is the query I used to get the data. Now, I am no SQL expert, so the query could probably be fine tuned or even simplified. Any suggestions in that area are well appreciated. But this one works ;-) It will give you the following information for each site collection in your content database:

– Id (GUID) of the site
— FullUrl (of the root web)
— Site Title
— Current Disk Consumption (bytes)
— Current Quota (bytes)
— Creation date
— Latest change
— Nr Sub webs
— Comma separated string with owner logins
— Comma separated string with owner email addresses
— Comma separated string with all user logins that have access to the site
— Comma separated string with all users email addresses that have access to the site

You can further expand or decrease the list if you want to obviously. Below is the script that I created to get the data:

Select
  Sites.Id
, Webs.FullUrl
, Webs.Title
, Sites.DiskUsed
, Sites.DiskQuota
, Sites.TimeCreated
, Sites.LastContentChange
, SiteWebs.NrWebs
, USERDETAILS.OwnerLogins
, USERDETAILS.OwnerEmail
, USERDETAILS.ALLLogins
, USERDETAILS.AllEmail
From Sites
 inner join Webs on Sites.RootWebId = Webs.Id
 inner join (Select SiteId, Count(*) as NrWebs from Webs group by SiteId) SiteWebs On Sites.Id = SiteWebs.SiteId
 inner Join
 (SELECT Sites.Id
   , (SELECT STUFF(
    (SELECT ‘;’ + CAST(UserInfo.tp_Login as VARCHAR(100)) 
     FROM UserInfo
   inner join Groups on Sites.Id = Groups.SiteId
   inner join GroupMembership on GroupMembership.MemberId = UserInfo.tp_Id
    and GroupMembership.GroupId = Groups.Id
    and GroupMembership.SiteId = Sites.Id
     WHERE tp_SiteId = Sites.Id
    and Groups.Title LIKE ‘%owner%’
     FOR XML PATH(”)),1,1,”)
   ) OWNERLOGINS
   , (SELECT STUFF(
    (SELECT ‘;’ + CAST(UserInfo.tp_Email as VARCHAR(100)) 
     FROM UserInfo
    inner join Groups on Sites.Id = Groups.SiteId
    inner join GroupMembership on GroupMembership.MemberId = UserInfo.tp_Id
    and GroupMembership.GroupId = Groups.Id
    and GroupMembership.SiteId = Sites.Id
     WHERE tp_SiteId = Sites.Id
    and Groups.Title LIKE ‘%owner%’
     FOR XML PATH(”)),1,1,”)
   ) OWNEREMAIL
   , (SELECT STUFF(
    (SELECT ‘;’ + CAST(UserInfo.tp_Login as VARCHAR(100)) 
     FROM UserInfo
   inner join Groups on Sites.Id = Groups.SiteId
   inner join GroupMembership on GroupMembership.MemberId = UserInfo.tp_Id
    and GroupMembership.GroupId = Groups.Id
    and GroupMembership.SiteId = Sites.Id
     WHERE tp_SiteId = Sites.Id
     FOR XML PATH(”)),1,1,”)
   ) ALLLOGINS
   , (SELECT STUFF(
    (SELECT ‘;’ + CAST(UserInfo.tp_Email as VARCHAR(100)) 
     FROM UserInfo
    inner join Groups on Sites.Id = Groups.SiteId
    inner join GroupMembership on GroupMembership.MemberId = UserInfo.tp_Id
    and GroupMembership.GroupId = Groups.Id
    and GroupMembership.SiteId = Sites.Id
     WHERE tp_SiteId = Sites.Id
     FOR XML PATH(”)),1,1,”)
   ) ALLEMAIL
 FROM Sites) USERDETAILS ON USERDETAILS.Id = Sites.Id
order by TimeCreated

The only thing is that this script can be run on a single database at the same time. What if you have multiple databases? Well, quite easy, because our web application in the object model knows which content databases are assigned to him, like below:

SPWebApplication app = SPWebApplication.Lookup(uri);

Console.WriteLine(“Gathering content databases…”);
SPContentDatabaseCollection dbCollection = app.ContentDatabases;

Console.WriteLine(“Gathering statistics…”);
foreach (SPContentDatabase db in dbCollection)
{
}

The SPContentDatabase object also holds a property called DatabaseConnectionString, which we can use to connect to the database. Sweet right? Final thing you we should solve are the multiple result sets. Well, we can use the DataSet.Merge() method for that.  So, get each result set in a local DataSet and merge it with the master.

Then you can export the DataSet in any way you like. This is really fast, only a couple of seconds to get the data over thousands of site collections.

Finally, some caveats in this approach:

1. It is not officially supported, although there is not much that can go wrong.
2. Retest the app after a Service Pack or hotfix, as the database schema might have changed.
3. The security context in which the app runs needs to have access to the databases. Basically, the application pool account of the web application has these permissions.

Let me know if you have other ways of getting these results in a fast way, preferably without 3rd party code. Also let me know if you can achieve the same result with a less complex SQL statement ;-)

Posted in SharePoint 2007 | Tagged: , , | 2 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 225 other followers

%d bloggers like this: