Pattern for creating scalable SharePoint 2010 BCS connectors

I am currently working on a project that requires me to pull in a large amount of data from an external database into a SharePoint farm and index the content for use in our search service application.

The dataset is currently around a million items and getting larger every day, there was obviously a need for the search crawler to be scalable and not take too much time indexing all this content.

I set about creating a standard .Net Connector Assembly that implemented, Finder (ReadList) and SpecificFinder (ReadItem) methods. The architecture for the BCS and Search framework looks like this (from MSDN)

I won’t go into detail on how to create .Net Connector Assemblies and BDC models because there are other articles out there that show you how, in this blog post I am going to detail how I made our connector scalable and performant by caching the content in memory and allowing the crawler to index the content from memory instead of making numerous (1 million+) calls to the database.

I encapsulated this caching mechanism into a library so that I can re-use the logic throughout my application on many BCS connectors.

I have also published a library containing the code so that this pattern can be re-used on other projects, feel free to download and use in your own projects.

You can download the code to follow along here http://www.athousandthreads.com/att.sharepoint.patterns.zip

Right down to the detail

The search crawler working on external connectors uses the following workflow to crawl all the content.

  1. The crawler first calls your Finder method (ReadList) on your .Net Connector Assembly, your Finder method needs to return the identifiers of all the items you want to be indexed.
  2. The crawler then calls your SpecificFinder method (ReadItem) passing it the identifier of each content it wants to index.

Now when the crawler initially calls my Finder method I need to go off to the database and retrieve all the items I want to index and return them to the crawler for indexing.

When the crawler then calls my SpecificFinder method, I don’t want to go back to the database I want to retrieve the item from the cache. I implemented this using a static collection of items that gets stored in the memory space of the MSSADM.exe process (the process that does the indexing).

There is some logic needed to synchronise the access to this shared cache and I have encapsulated this into the following class:

Caching

/// <summary>

/// Provides a caching mechanism for BCS external connectors.

/// </summary>

/// <typeparam name=”T”>Type of BDC entity to store in this cache.</typeparam>

/// <typeparam name=”I”>Type of the identifier for the BDC entity.</typeparam>

public class CachedConnectorService<T, I> where T : BDCEntity<I>

{

  #region Private members

  private List<T> cache;

  private CachedConnectorParameters<T, I> parameters;

  private static object lockObject = new object();

  #endregion

  #region Constructor

  public CachedConnectorService(CachedConnectorParameters<T, I> parameters)

  {

    this.parameters = parameters;

  }

  #endregion

  #region Public methods


  /// <summary>

  /// Reads individual entity from the cache.

  /// </summary>

  /// <param name=”identifier”>Identifier of the entity to read.</param>

  /// <returns>BDC entity.</returns>

  public T ReadItem(I identifier)

  {

    this.LogToOperations( “Reading BDC Item: “ + typeof(T).ToString() + “,
identifier=” 
+ identifier, EventSeverity.Information);

    T entity = default(T);

    try 

    {

        if (cache == null)

        {

            this.LogToOperations(typeof(T).ToString() + ” cache is null,
reloading cache from 
database.”EventSeverity.Information);

            this.ReadList();

        }

        entity = this.GetFromCache(identifier);


        if (entity == null)

        {

            this.LogToOperations(“Identifier not found in local cache, getting
from database.”
EventSeverity.Information);

            entity = this.GetFromDatabase(identifier,
parameters.DatabaseCall);

        }

    }

    catch (Exception)

    {

        this.LogToOperations( “Exception occured reading BDC Item: “ +
             typeof(T).ToString() + “, Identifier=” + identifier,
             EventSeverity.Error); 

    }

    return entity;

  } 


  /// <summary>

  /// Reads list of entities into the cache.

  /// </summary>

  /// <returns>Collection of entities.</returns>

  public IEnumerable<T> ReadList()

  {

      if (cache == null)

      {

          lock (lockObject)

          {

              try

              {

                  if (cache == null)

                  {

                      this.LogToOperations(“Getting list of “ +
                          typeof(T).ToString(), EventSeverity.Information);


                      List<T> cacheTemp = new List<T>();

this
.parameters.PopulateCache.Invoke(cacheTemp);

                      this.LogToOperations(“Loaded “ +
cacheTemp.Count.ToString() +
typeof(T).ToString(),
                         EventSeverity.Information);

                      cache = cacheTemp;

                  }

               }

               catch (Exception)

               {

                   this.LogToOperations(“Exception occured getting list of “ +
                       typeof(T).ToString(), EventSeverity.Error);

               }

          }

      }

      return cache.ToArray();

  }

  #endregion

  #region Private static methods

 

  /// <summary>

  /// Gets entity from the cache

  /// </summary>

  /// <param name=”identifier”>Identifier of entity to return.</param>

  /// <returns>Entity instance.</returns>

  private T GetFromCache(I identifier)

  {

      lock (lockObject)

      {

          return cache.Where 

              (

                a => a.Identifier.Equals(identifier)

              )

              .FirstOrDefault();

      }

  }


  /// <summary>

  /// Gets entity from the database using the specified delegate.

  /// </summary>

  /// <param name=”identifier”>Identifier of entity to return.</param>

  /// <param name=”databaseCall”>Delegate that does the work of 

  /// retrieving entity from database</param>

  /// <returns>Entity instance.</returns>

  private T GetFromDatabase(I identifier, Func<I, T> databaseCall)

  {

      return databaseCall.Invoke(identifier);

  }


  private void LogToOperations(string message, EventSeverity severity)

  {

      if (this.parameters.Logger != null)

      {

           this.parameters.Logger.LogToOperations(message, severity);

      }

  }

  #endregion

}

 

Parameters

The class uses a set of parameters that stores two delegates that are used to populating the cache and calling the database to get individual items. It also allows you to pass in a logger from the Microsoft patterns and practices logging library.

These delegates are used by your connector assemblies to pass in your specified logic for getting items into the cache. The parameters class looks like this.

public class CachedConnectorParameters<T, I> where T : BDCEntity<I>

{

  #region Public properties

  public Action<List<T>> PopulateCache { get; set; }

  public Func<I, T> DatabaseCall { get; set; }

  public ILogger Logger { get; set; }

  #endregion

}


Base entity

One last class is the BDCEntity<T> class which is used as a base class to all your BDC model entities, the class is simple and just allows the caching class to filter on identifiers.

public abstract class BDCEntity<T>

{

  #region Public members

  public T Identifier { get; set; }

  #endregion

}

 

.Net Connector Service & Entites

Now this forms the reusable library that provides caching to all your BDC connector assemblies an example of a class that uses this cachine pattern is shown below:

public class MyService

{

  #region Private static members

  private static CachedConnectorService<MyEntity, Int64> service;

  private static CachedConnectorParameters<MyEntity, Int64> parameters;

  #endregion

 

  #region Public methods

  /// <summary>

  /// Reads specified entity from the database.

  /// </summary>

  /// <param name=”id”>ID of the entity to retrieve from the database.</param>

  /// <returns>Instance of an MyEntity.</returns>

  public static MyEntity ReadItem(long id)

  {

      return ServiceInstance().ReadItem(id);

  }

 

  /// <summary>

  /// Reads a list of all entities from the database.

  /// </summary>

  /// <returns>Collection of MyEntity.</returns>

  public static IEnumerable<MyEntity> ReadList()

  {

      return ServiceInstance().ReadList();

  }

  #endregion

 

  #region Private static methods

  private static CachedConnectorService<MyEntity, Int64> ServiceInstance()

  {

      if (service == null)

      {

          if (parameters == null)

          {

              parameters = new CachedConnectorParameters<MyEntity, Int64>();

              parameters.DatabaseCall = GetDatabaseDelegate();

              parameters.PopulateCache = GetPopulateCacheDelegate();

              parameters.Logger = new SPLogger();

          }

          service = new CachedConnectorService<MyEntity, Int64>(parameters);

      }

      return service;

  }


  private static Action<List<MyEntity>> GetPopulateCacheDelegate()

  {

      return (entities) =>

        {

            using (MyWorkScope scope = new
                   MyWorkScope(DatabaseManager.EFConnectionString))

            {

                foreach (Entity entity in scope.CurrentContext.MySet)

                {

                    entities.Add(GetEntity(entity));

                }

            }

        };

  }


  private static Func<Int64, MyEntity> GetDatabaseDelegate()

  {

      return (identifier) =>

          {

               using (MyWorkScope scope = new
                      MyWorkScope(DatabaseManager.EFConnectionString))

               {

                   return
                 GetEntity(scope.CurrentContext.ReadEntity(identifier).FirstOrDefault());

               }

          };

  }


  /// <summary>

  /// Returns an entity from the specified object.

  /// </summary>

  /// <param name=”entity”>Entity to turn into an MyEntity.</param>

  /// <returns>Instance of MyEntity.</returns>

  private static MyEntity GetEntity(Entity entity)

  {

      MyEntity myEntity = new
      MyEntity();

      myEntity.Identifier = entity.Id;

      myEntity.Name = entity.FormattedName;

      myEntity.SiteUrl = entity.SiteUrl;

      myEntity.LastModifiedTimeStampField = entity.CC_ModifiedDate;


     return myEntity;

  }

Our BDC model entity class looks like this:

public partial class MyEntity : BDCEntity<Int64>

{

  public string Name { get; set; }

  public string SiteUrl { get; set; }

  public DateTime? LastModifiedTimeStampField { get; set; }

}

 

Memory Limits

There is one last thing to note about this approach. As we are caching all items within the MSSADM.exe process the memory footprint can get very large and there is a limit we hit on our server that is set by the filter damon. When the filter damon limit is hit the BCS connector assembly and it memory space is thrown away and hence the cache is reset. The above code handles this but as a consequence when ReadItem is called and the cache has gone away we have to reload from the database, you want to avoid doing this too many times for obvious reasons so we found we have to increase the memory limit of the filter damon to get better performance from the indexer.

You can find out how to do this from the links below:

Where can I get the library

You can download the full source code for the library here, it can be used by anyone free of charge.

http://www.athousandthreads.com/att.sharepoint.patterns.zip