Alteryx 101 for Excel Users
You have a data file. You need to manipulate some data in the file (changing or cleaning some data, adding or removing rows or columns). In Excel, this can take hours, if not days. How do you save yourself time by doing it in Alteryx? In this handy crib sheet, we’ve listed the most common manipulations and how you might approach them using Alteryx.
Alteryx is like a big flow chart, so if you can write down the steps or draw your process, then you’ll find building your workflow a lot easier.
You can also download this crib sheet for future reference!
Before we start
We should be aware of the optimal structure of data in Excel or CSV files. Across the first row should be field headings, so that every column contains data about one piece of information. Each row should have data in the correct columns, where each row is a collection of data such as a reading, invoice line, web page visit, or the suchlike.
There should also be no totals and no merged cells. If this is the case, then we’re good to go. If not, don’t worry, we can fix it using Alteryx, using some of these tools, but you may need additional tools such as Dynamic Rename (which allows you to take a row of data and turn it into column names). If you’d like assistance with that, please get in touch.
This is easy to do in Excel: you just rename the fields in column 1.
It’s also easy to do in Alteryx: you use the Select tool (in Preparation).
Change data types
In Excel, maybe you need to add in a ‘ before numbers to keep them as text? Or multiple select and use the right-click menu? In any case, Excel is not really defining the type of a field, it’s only forcing it to be displayed with a specific format.
Alteryx has a defined data type, like a database. If your input files are CSV you’ll definitely need to change types (as they will all come in as text), and if Excel it’s highly likely. You can easily change this, again, using the Select tool.
There are some conversions which are a little more complex. Dates can be awkward, because the Select tool may not be able to understand your preferred date format. To satisfy this, there is a specific DateTime tool (in Parse).
Do you go through columns changing values? Perhaps you apply a quick filter and copy a formula down a table? This is a big risk in Excel, as the copy down can stop if there’s an empty cell to the left. In Alteryx, as in Tableau, you only have to write the formula once, and it will be applied to every row, so you know it will be consistent.
If you are taking data into Tableau, you could use its Alias function, which allows you to take the value of a field and replace it with another. However, this doesn’t work if you need to change multiple original values to a single new value, for instance if you’re cleaning spelling mistakes or adding a higher level of a hierarchy. If that’s the case, then you could create a Group in Tableau.
In Alteryx, the best advice is to create a new field, following the next section.
Create new columns
The most common way of creating new fields in Excel is to insert a new column, and write in a formula. In Alteryx, we do not have to insert the column into our table, we can just add a field to our workflow using the Formula tool (in Preparation). If your new field is calculated on every row (and not, for example, the sum of multiple rows), then creating the field in your workflow is more efficient than building a calculated field in Tableau.
The first category is a simple text-based calculation. As an example, perhaps you need to concatenate first name and surname together. This is a formula which can be replaced by the + symbol in Alteryx. Most other text-based formulae have the same name and syntax, for instance left() or right(), but Alteryx’s manual (press F1) will help you identify those which don’t.
Text To Columns Tool
If you’re splitting a field with a regular format (for instance, maybe splitting a comma separated address field) you could use Text To Columns (on Parse) or the more complex RegEx to break your field down by a certain character.
Vlookups and Joins
Then there’s the vlookup() formula, which is really a database left join, but it will only find the first matching value and keep one row of data. Alteryx’s closest replacement is the Find Replace tool (in Join), which does exactly the same thing.
If you need multiple matches to have their own rows, then a join would be better. You can input two data sets and join them using either the Join or the Join Multiple tool in Alteryx, or if you’re connecting to a compatible data source, you can actually join them in the Input tool. There is more information on Joins on our blog, by Laszlo Zsom.
If you only want to replace certain values, for instance only change incorrect spellings, you could also look at Find Replace (in Join) which allows you to keep original values unless they’re in your list.
You have just created your nice calculated field and want to delete the original. In Excel you cannot just delete the original without having to copy and paste values, otherwise your formula will break. In Alteryx this is not a problem. Just use the Select tool and untick the box next to anything you don’t need, and it will be removed from your workflow. This is great if your workflow has large volumes of data passing through it.
This is either a manual selection in Excel, or using a quick filter to find what you don’t need. If you can list the criteria then you can use Alteryx’s Filter tool (in Preparation). If it’s complicated, you can use a formula in this tool, so you can build in IF expressions or use the switch() function.
Alteryx also has a really useful Sample tool (in Preparation) which allows you to limit the number of rows. This could be useful if you are looking for a random selection or only want to use the first few rows whilst you test whether your module works.
Grouping rows and aggregating
If you have too many rows, and want to summarise data, in Excel you would have to build a pivot table.
In Alteryx that’s also an option using the Cross Tab tool (in Transform), but we can also use the Group tool (in Transform) to list unique combinations of dimensions and sums or counts (to name two) for numbers.
Append rows or Union
Is there an easy way in Excel which isn’t copy and paste?
There is in Alteryx. It’s called the Union tool (in Join), and it allows you to combine multiple workflows based on the field names or maintaining the position of each column.
There may be some situations when you need your data to be in a specific order, for instance, if you need to do a Multi Row Formula (there’s a tool for that too in Preparation) for, say, a running total. The Sort tool (in Preparation) allows you to organise your data how you want it.
The order of your fields is not important if you’re bringing your data into Tableau, but if you wish to reorder your columns then you can do so using the Select tool.
The above should cover all the basic requirements of your data transformation, but of course Alteryx can do a lot more.
If you want a printable copy of this blog post, please download the crib sheet.
The post Alteryx 101 for Excel Users appeared first on The Information Lab.