Project Description
This Web part is used to display a dynamic list of the most popular documents and pages in a SharePoint site collection. It reads data from a table in a database that is expected to be updated daily using the Microsoft LogParser tool going against the IIS logs.

You may contact the author at jim [dot] lehmer [at] gmail [dot] com.

Overview

This Web part is used to display a dynamic list of the most popular documents and pages in a SharePoint site collection. It reads data from a table in a database that is expected to be updated daily using the Microsoft LogParser tool (http://www.microsoft.com/technet/scriptcenter/tools/logparser/default.mspx) going against the IIS logs. It is also expected that all the columns in IIS logging are turned on. We use a scheduled task to scrape the latest IIS logs once a day right after midnight, and then a scheduled SQL Server job to trim the resulting table to the last 90 days of data so it doesn't get too big.

A stored proc is called by this Web part and brings back a set of results showing the most popular "real" links (not icons and static stuff like that) for the past "n" days (currently n=15, but it is customizable via a Web part parameter). Then each of those URLs are searched individually in SharePoint. Why do the search? Two reasons - the first is we don't want to display the URL to the user but instead the friendly (we hope) SharePoint title. Secondly, we do a search to let SharePoint do security trimming so we don't show the user links they don't have access to. That is why the stored proc should be set to bring back more results than the Web part parameter that controls how many links to display to the user, in case some of the stored proc results get security trimmed by the SharePoint search.

There is the SharePoint admin screen that shows a site collections most popular links, but (a) it's an admin screen so it’s not accessible to end users, and (b) it shows the URLs, not a user-friendly SharePoint title. Whereas this Web part is meant to live on the front page of an intranet site, for example, and allow people to get to the most popular docs on the site in a single click, with "most popular" changing daily and remaining accurate over time.

Note: We search on each link individually. An earlier version of this Web part was working where the SharePoint query just had all the URLs OR'ed together and did one search, but that then changed the sort order in the results and we want to keep it in descending order by most hits. By searching one at a time we preserve the order in terms of link popularity, and for 10-20 links the whole thing is quick enough anyway.
Note: This has to be installed into the GAC since it does database access and I was too lazy to do CAS. A nice side benefit of this approach is if a document is moved or deleted, the search fails and it doesn't show up in the list. No dead links!

Instructions

You can download the binary .wsp and the GetTopHits stored proc SQL from the Releases tab, but it is probably better to download everything from the Source Code tab if you aren't already parsing your IIS logs into a database using LogParser just to get some additional scripts. In the Scripts directory in the source code is a Powershell script that gets run as a scheduled task every night right after midnight (our IIS logs are created new on a daily basis at midnight - if yours are created on a different schedule change the scheduled task accordingly). The script runs LogParser on each log file it finds and then deletes the file (since IIS has an open file handle on the current log file it doesn't delete that file). If you want different behavior, such as copying/moving the files that have been parsed into another directory, change the script accordingly. You will have to change it to point to your SharePoint IIS log files location and what database and table name you want the data loaded into to.

In the SQL directory I have included three scripts. One is for the GetTopHits stored proc. Currently it has all the "business logic" of deciding which links to return, how many and for how far back. I plan on parameterizing all of that and pushing it all into the Web part to be configured via parameters. The other is a script to create the table (you can also just run LogParser once with createtable:on specified to get the same results). I did not include the SQL for creating indexes on this table, but it will behoove you to create indexes on the date and csUriStem columns. Finally there is a script to create a SQL Agent job that trims the data in the table to only hold the last 90 days. Change all scripts according to your own database and table names and other needs.

If you build the project it will create the PopularPages Top Page Hits WebPart.wsp solution file, or you can download it from Releases. You install the solution using the STSADM command:

stsadm -o addsolution -filename "PopularPages Top Page Hits WebPart.wsp"

You can then deploy the solution and install and activate the feature using either STSADM or the SharePoint UI. You should then be able to add the Web part to a Web part page. It will be in the Server Gallery. Once it is added you need to edit three properties that will show up under Options when you edit the Web part. The first is labeled Log Database Connection String and is just that - the connection string to the location of the database into which your IIS logs have been parsed, e.g.:

server=yourserver;initial catalog=iislogs;integrated security=sspi

The second property is labeled Log Stored Procedure. The stored proc that ships with this code is called GetTopHits so you can put that in, or if you modify the stored proc or have different ones to bring back different results (for example, you may modify it to only bring back a given department's URLs if you dropped this Web part on a department's site), then change the name accordingly here. The final property is Number of Links and that is the number of links the Web part should display to the user. The default is 15.

Enhancements

This section documents enhancements that still need to be done.
  • Add a Boolean property to the Web part to tell it to be used to determine individual user’s top documents as well. When the parameter was true the code would simply need to determine and pass in the user id as a stored proc parameter and GetTopHits would filter on the csUsername column in the table. This would probably require A/D authentication on the SharePoint site to populate the csUsername column so it would only work for intranets and the like.
  • Right now the GetTopHits stored proc contains a lot of the filtering logic for bringing back candidates for this Web part to then search on in SharePoint. A lot of that should probably be moved as browsable properties into this Web part and then GetTopHits should be parameterized to use those values. OTOH, since one of the properties this Web part does have is the ability to point to different stored procs, it is pretty easy to change filtering for different sites by pointing to different stored procs, too. See the stored proc for more details.
  • Figure out how to have search results bring back docs that are attached to items in a list. For example, right now this will find docs in doc libraries, but not docs that are attached to a list item (like an announcement item with a Word document attached).
  • There is probably a better way to connect to the database in a "SharePointy" way.

Last edited May 19, 2009 at 11:52 AM by dullroar, version 8