Full text search : Stopwords and Stoplists

This is my fourth post on full text search.In this post I will discuss the concept and usage of Stopwords and Stoplists in full text search.If you are new to full text search please go through my earlier posts on this topic (Introduction,Full text predicates & full text functions).

What are Stopwords and Stoplists

Concept is simple but lets take a real world scenario to understand it.Let us assume that you are searching for some information on Stopword and Stoplist  with regards to full text search.You go and type a sentence like What is stopword and stoplist  in your favorite search engine.Now i have blog post  which has some similar heading like the one above i.e. What are stopwords and stoplists (although not exactly the same).

Do you think it will match the criteria and my post will show up in search results?

Our common experience tells us that yes it does match and show up posts like that.That is because search engines at a very basic level know that keywords  like What,are,is, and  etc do not usually  hold any relevance to search and user is looking for keywords  Stopword and Stoplist.Such keywords like What ,are,is, and ,a ,the etc are called stopwords in Sql Server full text search i.e. these keywords are not part of full text index and are ignored in case of a linguistic search.Databases usually have list of such stop words associated to a full text index which is called a Stoplist.

By default when you create a full text index it is associated with a system stoplist.Default stoplist has more than 150 words for english language.You can run below query and see all the stop words for english language for a particular database.


SELECT * FROM sys.fulltext_stopwords
WHERE LANGUAGE_ID = 1033

Creating Custom Stoplist

You can have multiple custom stoplist in your database.To create a new stoplist you have below options.

  • Create empty stoplist
  • Create new from existing stoplist
  • Create new from system stoplist.

For our example we will use option 3.

Run below SQL command to create a new full text stoplist from existing system stoplist

CREATE FULLTEXT STOPLIST EmployeeStopList
 FROM SYSTEM STOPLIST;

Full Text Search Stoplist and Stopword

Associating Custom Stoplist with Full Text Index

Now we can associate this stoplist with an existing full text index.Run below query to do this.


ALTER FULLTEXT INDEX ON EmployeeProfile
SET STOPLIST EmployeeStopList

Now all your queries on the EmployeeProfile table will use newly created custom stoplist.But till now our new custom stoplist is just a copy of existing system stoplist.So lets add a new word to EmployeeStopList.

Adding new word to Custom Stoplist

In my sample database ‘Teach’ word is used quite commonly .See the below results of running contain query on this word.

Full Text Search Stoplist and Stopwords

Now just to demonstrate Stopword functionality I’ll add this word to the stopwords for this full text index.Run below command to add a word to a full text stoplist.


ALTER FULLTEXT STOPLIST EmployeeStopList
ADD 'Teach' Language 1033;

Lets again run the contains query with ‘Teach’ word.Below are the results (or No results :-))Full Text Search Stoplist and Stopword

So once a word is added to stoplist it is not considered in subsequent searches as it is removed from the full text index.

You can run below query to see that ‘Teach’ exist in the full text stoplist

image

For dropping a word from full text stoplist run below command.


ALTER FULLTEXT STOPLIST EmployeeStopList
DROP 'Teach' Language 1033;

For a usual linguistic search the default stoplist is good enough but there can be scenarios for  a specific domain search functionality where you would like to remove some of the common words of that domain which do not help in the search functionality.Also in some kind of search system where emphasis is on interpretation of whole phrase rather that individual words it is necessary to keep minimal number of stopwords.
For further reading on this and related topics, you can refer to this exhaustive course on pluralsight.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.