Photo by Sonja Langford on Unsplash
Data Professionals spend an average of 23% of their time cleaning data — here’s a BulkTextReplaceValue function to save you some time 🕒
Proportion time spent on data science activities
In Bob Haye’s article from earlier this year, results of a recent study of over 23,000 data professionals found that data scientist spend about 40% of gathering and cleaning data, 20% of their time building and selecting models and 11% of their time finding insights and communication them to stakeholders.
With the recent announcement of Automated Machine Learning (AutoML) in Power BI going GA (general availability), we’ll want to spend less time cleaning data and more time exploring AutoML in PowerBI!
Note that it’s best practice to do your transformations as close to the data source as possible. If for any reason, you’re not able to do your transformations closer to the data source, this post will help you to use the BulkTextReplaceValue
function to save you some time!
If you’ve ever had to replace values in a column numerous times, you know how tedious it can be, especially when you want to document each step. Having to right click on each step and clicking on properties just takes too much time…
From here, you then move on to writing the replacements out in the advanced query editor
and you already know to be careful here — not to miss the current line’s previous line identifier.
In Miguel Escobar’s video, by providing a list of pairs of the old values and the new values as a conversion table, we can use the BulkTextReplaceValue
function to replace all the required values in one step!
👩💻 You can download the samples files here → github.com/ievsantillan/PowerBI/tree/master..
In Miguel’s video he used the Text.Replace
function but I ran into an issue where I had a part of the text to be replaced included in another row as shown in the screenshot here.
The Feature
column being the old text and the Software Name
to be the new text.
I needed to replace Text.Replace
with Replacer.ReplaceValue
.
The syntax and definition for both functions are identical but the difference here is Text.Replace
will replace all occurrences where as Replacer.ReplaceValue
will match the entire cell contents.
Text.Replace
Text.Replace(text as nullable text, old as text, new as text) as nullable text
Returns the result of replacing all occurrences of text value old
in text value text
with text value new
. This function is case sensitive.
Replacer.ReplaceValue
Replacer.ReplaceValue(value as any, old as any, new as any) as any
Replaces the old
value in the original value
with the new
value. This replacer function can be used in List.ReplaceValue
and Table.ReplaceValue
.
Let’s take a look at the BulkTextReplaceValue function
(x as text) as text =>letmaxIterations = Table.RowCount(ConversionTable) ,Iterations = List.Generate( () =>[Result = Replacer.ReplaceValue(x, ConversionTable[OldText]{0}, ConversionTable[NewText]{0}), Counter = 0],each [Counter] < maxIterations,each [Result = Replacer.ReplaceValue([Result], ConversionTable[OldText]{Counter}, ConversionTable[NewText]{Counter}),Counter = [Counter] +1], each [Result]),output = Iterations{maxIterations-1}inoutput
We loop through the inputted column (x)
, replacing each occurrence found of the OldText
with the NewText
as defined by the provided ConversionTable
using the Replacer.ReplaceValue
function.
We can replace each of the Replace Value
query step in the screenshot above with just one line by using the Table.AddColumn
function and invoking the BulkTextReplaceValue
function on the column [OldName]
that we need to clean up and providing a new column name [Result Column Name]
for the results.
= Table.AddColumn(#Previous line identifier", "Result Column Name", each #"BulkTextReplaceValue"([OldName]))
Alternatively, you can follow the steps below to accomplish the same thing without using the advanced query editor
.