Caching – Why, When and How

Caching is a very beneficial practice for improving application performance and efficiency. Conceptually, it is simple. However, there are a number of issues to take into account when caching data. I’d encourage you to read up on caching before you start using it though. As with all things, there are tradeoffs and challenges involved. So, be careful when and how you cache. Below is some info to get you started, but feel free to ask questions if you’re doing caching for the first time.

Caching is the practice of storing data in a location where it can be reused again in the future without going back to the original source. For example, fetching the list of items and then storing it in memory so that you can use it again. Most full featured caches support features that automatically remove the item from the cache once a certain amount of time passes. This is called expiration.

 

One Step Back

To understand the justification for caching, we should take a small step back for just a second. It is important to understand the true cost of each line of code we write. Here are some examples of things we could write in code. They are sequenced from very light to very costly:

  • Integer.ToString()
    Light operation. It’s a completely memory based operation.
  • System.IO.File.ReadAllText()
    Moderate cost. Requires disk access.|
  • System.IO.File.WriteAllText()
    Moderate Cost. Requires write disk access.
  • System.Data.SqlClient.SqlCommand.Execute()
    Costly. Requires a network round trip and a SQL query, which in turn involves disk IO, etc.
  • WebServiceProxy.GetItem()
    Costly. Requires a network round trip, object serialization, SQL query, etc.

Each of the above is only a single line of code, but they have different costs. It’s important to think about how many lines of code are executed as a result of the code you write.

It’s also important to consider the cost of resource access. Hitting something in memory is very cheap. Hitting something on a local disk drive is moderately costly. Doing a call to another machine is costly, especially if that machine is across a WAN segment. In general, the cost of a network call is around 100 to 1000 times more costly than a call that is within the same process.

 

Scenarios:

Here are some common cases where caching can be very beneficial:

Case #1: Item Lookups

You have collection of 500 records. They have a ITEM_UID that you need to lookup so that you can display the item’s name or description. You could loop through your records and call out to the item service for each record. But remember those metrics about network calls… up to 1000 times more expensive per network call! That’s pretty expensive.

So let’s “refactor” this idea… What if I pull back all items from the service first. Then I loop through my collection of records and lookup the ITEM_UID using the list I have in memory. I may have pulled more items back than I needed, but if I compare the costs it was probably cheaper to pull back a few too many, than to make 500 costly calls across the network. Better.

Let’s refactor this further… I have all the items in memory and the list doesn’t change all that much. If I assume that my page / service / method will get called relatively frequently, then it would be nice to just keep that list around for a little while so that it can be used again. No point in getting this large list of data over and over, when it only changes once and a while. So, I use a cache to hold on to it.

Case #2: Precious Resources

What’s the number one example of a precious shared resource at InfoCision? The database!

One way to help reduce contention resources is to cache static or semi-static data. Users who request the data will get a copy from memory, rather than continually hitting the database for each hit. If you have a table with lots of reads, but the data doesn’t change much, then you could easily cache the data in memory and fulfill user’s requests from the in-memory cache. Each time you hit the cache, you are eliminating a hit to the database.

Case #3: Data that’s Costly to Generate

In some cases, it takes a lot of processing time or power to get the end result that users need. In these cases, it can be useful to cache that end result somewhere so that users can get the info without waiting for it to be completely regenerated.

 

Challenges:

Caching isn’t free! There are a number of challenges that you must address carefully. Each scenario requires that you analyze the expected usage to find the best cache strategy. This strategy might even change over time as the size of the data grows, or the number of users increases.

Data can grow stale.
As soon as you pull data from a source, that source could change. Then your data is stale or incorrect. You must be very careful about this aspect in your design. You should only cache data in cases where the data is static, or semi-static. Do not cache data that changes frequently (like the import table, etc).

Caches consume memory.
Whenever you cache data, you are probably going to cache it in memory. This means that there is less memory available for other uses. Today, memory is cheap, so it’s a good trade-off. However, you probably don’t want to pull a million rows into memory, even if the million rows never change. It just takes too much memory.

Multi-Threading and Concurrency.
When you put something in a cache, that cache is usually available to all of the threads in your application. If you are writing a web service, then all users of that web service will see the same data. You must be very careful about what you put in the cache. Don’t put user specific items in the cache. Only put “read-only” items in the cache.
You must also be careful how you insert items into the cache. Typically, you’ll do this with a SYNCLOCK keyword, or a ReaderWriterLock so that only one thread can add an item to the cache at a time. Readers shouldn’t read an item while another thread is updating the cache.

 

Types of Caches:

For the most part, there are probably three types of caches that you’ll use:

Local or Member Variable
While this isn’t a full featured cache, it is still a cache. It’s basically the scenario discussed in the Item Lookup scenario above before, but without using the cache object so that other threads can re-use it for other method calls. Essentially, you pull data into memory and re-use it for the duration of the current operation. This is ideal in cases where the data is relatively dynamic, or the data is user specific.

ASP.NET Web Cache
ASP.NET has a built-in cache that is pretty feature rich. It looks a lot like your normal name/value Dictionary type class, except for the add/insert methods. When you add something to the cache, you use the Add or Insert methods and specify expiration details. This cache keeps the data in memory, within the scope of the ASP.NET application.

Enterprise Library Cache
EntLib has a cache too. This cache has a slightly different set of features. It can store cached data for longer periods of time because it can persist the data to files, databases, or other storage locations. It also supports memory storage for normal cache usage scenarios. The cache can also be encrypted if necessary. Configuration of the cache is stored in a config file, which makes it easy to change cache settings without changing code. Otherwise, it acts similarly to the ASP.NET Cache.

 

Codes Sample:

Imports System.Web

Public Class LookupHelper
#Region " Constants "
    Private Const CACHE_KEY__ALL_ITEMS As String = "ALL_ITEMS__BY_UID"
#End Region
#Region " Member Variables "
    Private Shared m_ItemLookupSyncObject As New Object()
#End Region
#Region " Internal Methods "
    Public Function GetItemByUid(ByVal itemUid As Guid) As ItemProxy.ItemLookup
         oItemLookup = CType(HttpContext.Current.Cache.Item(CACHE_KEY__ALL_ITEMS), Generic.Dictionary(Of Guid, ItemProxy.ItemLookup))

        If oItemLookup Is Nothing Then
            SyncLock m_ItemLookupSyncObject

                ‘ Check again, just in case it changed once we got the synclock.
                oItemLookup = CType(HttpContext.Current.Cache.Item(CACHE_KEY__ALL_ITEMS), Generic.Dictionary(Of Guid, ItemProxy.ItemLookup))
                If oItemLookup Is Nothing Then
                    Dim oNewLookup As New Generic.Dictionary(Of Guid, ItemProxy.ItemLookup)

                    Dim oProxy As New ItemProxy.ItemService()
                    oProxy.UseDefaultCredentials = True

                    Dim oItems As ItemProxy.ItemLookup()
                    oItems = oProxy.GetAllItemLookup()

                    If oItems IsNot Nothing AndAlso oItems.Length > 0 Then
                        For Each oItem As ItemProxy.ItemLookup In oItems
                            oNewLookup.Add(oItem.ItemUid, oItem)
                        Next
                    End If

                    HttpContext.Current.Cache.Insert(CACHE_KEY__ALL_ITEMS, oNewLookup, Nothing, Date.Now.AddMinutes(10), Nothing)
                End If

            End SyncLock
        End If

        If oItemLookup.ContainsKey(itemUid) Then
            Return oItemLookup.Item(itemUid)
        Else
            Return Nothing
        End If
    End Function

#End Region

End Class

 

More Info:

Here’s an in depth article that covers many aspects of caching. I’d recommend that you at least skim the article if you plan to cache.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: