photo © 2007 gnizr | more info (via: Wylio) I decided to update my JargonDatabase.com web property last month started to update define the user-submitted jargon terms. Much to my surprise users had submitted over 15,000 unique undefined jargon terms. Some of them were conceptual duplicates, like “whites of their eyes” and “the whites of their eyes”, but how to filter the terms to see if they were duplicates or not?
After much pondering I decided to try to match words in the term with words already in the database – there would be many false positives, but was that method was the best way of ensuring I did not delete anything by accident.
To do that I created a function and a stored procedure. The function:
CREATE FUNCTION Split(@String varchar(8000), @Delimiter char(1))
returns @temptable TABLE (items varchar(8000))
declare @idx int
declare @slice varchar(8000)
select @idx = 1
if len(@String)<1 or @String is null return
while @idx!= 0
set @idx = charindex(@Delimiter,@String)
set @slice = left(@String,@idx – 1)
set @slice = @String
insert into @temptable(Items) values(@slice)
set @String = right(@String,len(@String) – @idx)
if len(@String) = 0 break
The function takes a string, splits it at every occurrence of a space (” “), inserts those new records into a table, and then returns that table. From there I created the following procedure.
CREATE PROCEDURE FindSimilarSearchTerms
declare @SearchTerm nvarchar(450)
SET @SearchTerm=(SELECT top 1 SearchTermTerm FROM UserDefinedTerms WHERE [email protected])
CREATE TABLE #tmp (TermID int IDENTITY, TermWord nvarchar(450))
INSERT INTO #tmp(TermWord) select items from dbo.split(@SearchTerm, ‘ ‘)
SELECT JargonID, JargonTerm FROM JargonApprovedTerm, #tmp WHERE JargonTerm LIKE ‘%’+TermWord+’%’
DELETE FROM #tmp
DROP TABLE #tmp
The procedure takes the ID of a search term, grabs the text of the user-submitted name (the “SearchTermTerm” field) and from there creates a temporary table, and then matches any word in the user-submitted name to any word in the approved jargon table in the database, that provides a quick way of knowing whether or not I can safely delete a term. Definitely a time consuming task, but a lot better than it could be.
This post originally appeared on the Stronico blog – with the absorption of Stronico into Digital Tool Factory this post has been moved to the Digital Tool Factory blog
||Written By Steve French|