# DATA EXTRACTION

## Extract Characters From One Cell

All formulas assume data is in cell A1
Substitute with today() if needed

 Extract Text (by Location) Left of the Character =LEFT(A1,(FIND(",",A1,1)-1)) Right of the Character =MID(A1,FIND(",",A1)+1,256) Before Second Instance of Symbol =LEFT(A1,FIND("^^",SUBSTITUTE(A1,",","^^",2))-1) First Word in a Cell =LEFT(A1,FIND(" ",A1)-1) Second Word in a Cell =TRIM(MID(SUBSTITUTE(A5," ",REPT(" ",99)),100,99)) First two words in a cell =TRIM(LEFT(SUBSTITUTE(A1," ",REPT(" ",100)),200)) First three words in a cell =LEFT(A1,FIND("^",SUBSTITUTE(A1&" "," ","^",3))-1) Last word in a cell =RIGHT(A1,LEN(A1)-FIND("*",SUBSTITUTE(A1," ","*",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))) Last two words in a cell =MID(A1,FIND("@",SUBSTITUTE(A1," ","@",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))-1))+1,255) Last three words in a cell =TRIM(RIGHT(SUBSTITUTE(TRIM(A1)," ",REPT(" ",60)),180)) Nth word from a cell - example to extract 4th word from A1 =MID(MID(MID(SUBSTITUTE(A1," ","^",3),1,256),FIND("^",SUBSTITUTE(A1," ","^",3)),256),2,FIND(" ",MID(MID(SUBSTITUTE(A1," ","^",3),1,256),FIND("^",SUBSTITUTE(A1," ","^",3)),256))-2) Last N characters from a cell - example last 4 characters =RIGHT(TRIM(A1),4) First N characters from a cell - example first 4 characters =LEFT(TRIM(A1),4) Data Between Commas' =SUBSTITUTE(MID(SUBSTITUTE("," & A1&REPT(" ",6),",",REPT(",",255)),2*255,255),",","") Data Between Characters - example parentheses =MID(A1,FIND("(",A1)+1,FIND(")",A1)-FIND("(",A1)-1) Data Between Quotes =MID(A1,(FIND("""",A1,1)+1),(FIND("""",A1,(FIND("""",A1,1)+1)))-(FIND("""",A1,1)+1)) Data After 2nd Instance of Character - example underscore =MID(A1, FIND("_", A1, FIND("_", A1)+1)+1,256) Remove Text The last word in a cell =IF(ISNUMBER(SEARCH(" ",TRIM(A1))),TRIM(SUBSTITUTE(A1,TRIM(RIGHT(SUBSTITUTE(TRIM(A1)," ",REPT(" ",99)),99)),"")),A1) The first two words in a cell =MID(A1,1+FIND("~",SUBSTITUTE(A1," ","~",2)),255) Extract Dates Extract last day of a Date's month Assuming A1 is 1/10/2012 =EOMONTH(A1,0) = 1/31/2012 Extract next month's end date Assuming A1 is 1/10/2012 =EOMONTH(A1,1) = 2/29/2012 Extract last day of a Date's quarter Assuming A1 is 1/10/2012 =DATE(YEAR(A1),CEILING(MONTH(A1),3)+1,0) = 3/31/2012 Extract first day of a Date's quarter Assuming A1 is 1/10/2012 =DATE(YEAR(A1),FLOOR(MONTH(A1)-1,3)+1,1) = 1/1/2012 Relative Dates Dates cannot be stored as text for relative formulas to work To convert text to date = DateValue(A1) Combine these two formulas with any relative date formula below ⍟ Finding The Past Closest Date To Today {=MAX((\$A\$2:\$A\$100TODAY(),A2:A100))} array formula Looks in column A2:A100 for today's date or later. Current Year =YEAR(A1) Current Qtr - with Year =YEAR([Billing Date])&" - Q"&TEXT(IF(MONTH(A1)>9,4,IF(MONTH(A1)>6,3,IF(MONTH(A1)>3,2,1))),"#") Result is "2017 - Q1" Current QE Date =DATE(YEAR(A1),CEILING(MONTH(A1),3)+1,0) Current Qtr Number =IF(MONTH(A1)>9,4,IF(MONTH(A1)>6,3,IF(MONTH(A1)>3,2,1))) Result will be a single digit 1,2,3,4 Current Month =MONTH(A1) Current ME Date =EOMONTH(A1,0) Current BOM =EOMONTH(A1,0)+1 Previous YE =DATE(YEAR(A1)-1,12,31) Assume A1 is 6/30/2015, result will be 12/31/2014 Previous QE =EOMONTH(A1,MOD(-MONTH(A1),3)-3) Assume A1 is 6/30/2015, result will be 3/31/2015 Previous ME =EOMONTH(A1,-1) Previous BOM =EOMONTH(A1,-1)+1 Previous Friday =A1-WEEKDAY(A1-1) Next Month's start date =EOMONTH(A1,0)+1 Calculate number of workdays between dates =NETWORKDAYS(start_date,end_date,holidays) =NETWORKDAYS("11/1/2015","11/7/2015",0) Add business days to date =WORKDAY(start_date,days,holidays) =WORKDAY("11/1/2015",10,0) = 11/13/2015 Add Months to date =EDATE(start_date,#months) =EDATE("11/1/2015",1) = 12/1/2015

## Reference Last Populated Cell in Range

Range is only text values
`=INDEX(range,MATCH(REPT("z",255),range)) `

`=INDEX(A:A,MATCH(REPT("z",255),A:A)) `

Range is only numeric values
`=INDEX(range,MATCH(9.99999999999999E+307,range))`

`=INDEX(1:1,MATCH(9.99999999999999E+307,1:1))` Last in Row
`=INDEX(A1:A10,MATCH(9.99999999999999E+307,A1:A10))` Last in Column
`=INDEX(A:Z,MATCH(9.99999999999999E+307,Z:Z)-1)` 2nd Last in Column
`=INDEX(A:A,MATCH(9.99999999999999E+307,A:A)-2)` 3rd Last in Column

Range is both text and numeric values
`=INDEX(range,MAX(MATCH(9.99999999999999E+307,range),MATCH(REPT("z",255),range)))`

`=INDEX(A:A,MAX(MATCH(9.99999999999999E+307,A:A),MATCH(REPT("z",255),A:A)))`

Last Value in Table column
https://exceljet.net/formula/get-value-of-last-non-empty-cell

## Extract Largest Number in Range

Largest Number
`=MAX(A1:A50)`

Xth Largest Number
`=LARGE(A1:A50,5)` 5th largest in range

## Replace Character

Find "-" and remove the dash
`=SUBSTITUTE(A1,"-","",1)`

## Replace Characters in Range

Replace last N digits
`=REPLACE(A1,3,2,"10")` A1 = 2009 then formula changes to 2010

## Extract first 3 words

`=LEFT(A1,FIND("^",SUBSTITUTE(A1&" "," ","^",3))-1)`
Substitute looks for 3rd space in A1 string and adds ^
Find -1 character before the ^
Extract text left of ^

`=SUBSTITUTE (within_text, old_text, new_text, [instance])`
`=LEFT(within_text,[number of characters to extract])`
`=FIND (find_text, within_text, [start_num])`

## XLookup

Lookup Multiple Criteria
`XLOOKUP(1, (criteria_range1=criteria1) * (criteria_range2=criteria2) * (…), return_array)`

Two-Dimension Lookup
`XLOOKUP(lookup_value1, lookup_array1, XLOOKUP(lookup_value2, lookup_array2, data_values))`

AbleBits-Xlookup Reference

## Index / Match Formulas

A better way to "vlookup". The index/match formula allows you to "vlookup" to the left and right of the matching column and the lookup column is defined instead of a count.

 Index Function INDEX takes a cell range and specified row or column # to provide the data in that cell =INDEX(range, row_or_column) =INDEX(A1:A3, 3) Result: data in cell A3 Match Function Returns the position of a cell (row # or column #) within a range by matching against a criteria. =MATCH(lookup_value, lookup_range, match_type) =MATCH("Jeff", A1:A3, 0) for row # or =MATCH("Jeff",A1:C1, 0) for column # Result: row or cell number where "Jeff" is located
 Data Type Match Matching columns need to be the same data type. Add the following to any formula: Number to Text = TEXT(A1,"0") Text to Number = Value(A1) Text to Date = DateValue(A1)

### Based on 1 Criteria

 ```=INDEX(ResultRange, MATCH(lookup_value, lookup_range, match_type)) =INDEX(\$A\$2:\$A\$4,MATCH(A6,\$B\$2:\$B\$4,0))```

The MATCH function, MATCH(A6,\$B\$2:\$B\$4,0), returns 2, which is the position of "Jacket" in the list.
Then, the INDEX function, INDEX(\$A\$2:\$A\$4,2), returns "JK002", which is the second item in the range \$A\$2:\$A\$4.

### Based on 2 Criteria

```=INDEX(ResultRange, MATCH(1,(Value1=Range1)*(Value2=Range2),0))
=INDEX(\$D\$2:\$D\$10,MATCH(1,(A13=\$B\$2:\$B\$10)*(B13=\$C\$2:\$C\$10),0))```
 Array Formula: ARRAY FORMULA: Ctrl + Shift + Enter

### Based on Matrix

Lookup based on horizontal and vertical row/column.

`= INDEX( entire matrix , MATCH( vertical lookup value, entire left hand lookup column , 0 ) , MATCH ( horizontal lookup value, entire top header row, 0 ) )`
 The MATCH has to list the vertical (column) lookup before the horizontal (row)

## Search Cell For Specific Character

Search cell A1 for "text"

`=IF(ISNUMBER(SEARCH("lookup_text",A1)), "Yes", "No")`

Remove if statement for true/false

Case-sensitive, you can replace the SEARCH function with the FIND function:

`=ISNUMBER(FIND(text, A1))`

## Remove Spaces From Cells

Removes all spaces (leading & trailing) from text except for single spaces between words.
`=TRIM(A1)`

Remove spaces and line breaks in text, use formula:
`=TRIM(CLEAN(A1))`

Remove all spaces and non-breaking characters
`=TRIM(CLEAN(SUBSTITUTE(A1,CHAR(160)," ")))`

Macro to Remove Trailing Spaces
Selects all cells A2:A10 and removes any spaces after text

```Sheets("WorksheetName").Select
Range("A2:A100").Select
Sub NoSpaces()
Dim a As Range
For Each a In Selection.Cells
a = Trim(a) Next
End Sub```

## Extract List Of Unique Values From Column

` {=IFERROR(INDEX(\$D\$2:\$D\$7, MATCH(0,COUNTIF(\$E\$1:E1, \$D\$2:\$D\$7), 0)),"")}`
` Array Formula =IFERROR(Index ( Column with Duplicates, MATCH (0, COUNTIF( Cell above start of unique column, Column w/ Duplicates), 0)),"")}`

Extend Office - Dynamic List of Unique Values

## Extract Specific Cells To Make New List

Formula extracts values that meet the criteria in the first table.

Extract with multiple criteria
[Table1] is the first table in cell A1

`=IFERROR(INDEX(Table1[Guest],SMALL(IF((Table1[Attending]="Y")*(Table1[Donation]=50)),ROW(Table1[Guest])),ROW(1:1))-1,1),"") `
You can add more or fewer criteria to the bold section of the formula with "* (LookupColumn = Criteria)".

Extract using contains in criteria

Source: Excel-Bytes

# DATA COMPARISON

## Check For Duplication In Column

Paste formula in adjacent column and fill down. Looks to see if A1 is repeated.

```WHOLE COLUMN
=IF(COUNTIF(A:A,A1)>1,"Duplicated!","")

PARTIAL COLUMN
=IF(COUNTIF(\$A1:\$A10,A1)>1,"Duplicated!","")```

## Compare Two Columns

Returns True/False if columns match

`=EXACT([Column1],[Column2])`

## Compare With Conditional Formating

Rule Type: Use a formula to determine which cells to format

Compare two columns and highlight matching cells
` =match(\$A1,\$B1:\$B50,0) `

Highlight non-matching cells
` =ISNA(match(\$A1,\$B1:\$B50,0)) `

# FUNCTIONS

Functions are predefined formulas that are available in Excel.

## Filter (Dynamic Array)

This function is being released slowly to different versions of Excel through 2020.

`FILTER(array, include, [if_empty])`
array – the cells returning the data. Excel will "spill" the data into ajacent cells if you are requesting a range be returned.
include – the filtering criteria. The height or width of the filtering cells should match the array cells.
[if_empty] – what should be displayed if the filer return is empty. Enter double-quotes for excel to leave blank.

Multiple Criteria using AND `=FILTER(array,(A3:A5<>"CA")*(B3:B5>16),"")`
Multiple Criteria using OR `=FILTER(array,(A3:A5<>"CA")+(B3:B5>16),"")`
Note that you can add more than two criteria by adding or multipling another set
Reference the full array in another formula by adding # ` =\$D\$4#`
Excel Filter

## Filter with SortBy

Filter 2 columns, return multiple columns
`=FILTER(SORTBY([RangeToReturn],[SortColumn1],1),(SORTBY([FilerColumn1],[SortColumn1],1)=XX)*(SORTBY([FilerColumn2],[SortColumn1],1)=XX),"")`

[RangeToReturn] can be a single column or multiple.

`=Choose({1,2,3},[ColumnName1],[ColumnName2],[ColumnName3])`
Which can be combined with the filter
`=Filter(SORTBY(Choose({1,2},[ReturnColumn1],[ReturnColumn2]),[SortColumn1],1),...`

## Filter in Data Validation List

You can reference the first cell of a spilled column by adding a "#" after the cell reference.
Previously you would have to manually add/remove any new options to the list.

The data validation list references the spilled column in D4. A better method would be to use =UNIQUE(Table1[State]) in D4 instead of the filter (shown below) to remove duplicates.

## Functions on Filtered Table

The SUBTOTAL formula must be used in conjunction with functions (SUM, AVG, COUNT) on a filtered table, otherwise the result will include the hidden filtered rows.

` =SUBTOTAL(function_num, range) `

Function_Num List

Function Function_num
(filtered table)
Function_num
(manually hidden row)
AVERAGE 1 =SUBTOTAL(1, range) 101 =SUBTOTAL(101, range)
COUNT 2 =SUBTOTAL(2, range) 102 =SUBTOTAL(102, range)
COUNTA 3 =SUBTOTAL(3, range) 103 =SUBTOTAL(103, range)
MAX 4 =SUBTOTAL(4, range) 104 =SUBTOTAL(104, range)
MIN 5 =SUBTOTAL(5, range) 105 =SUBTOTAL(105, range)
PRODUCT 6 =SUBTOTAL(6, range) 106 =SUBTOTAL(106, range)
STDEV 7 =SUBTOTAL(7, range) 107 =SUBTOTAL(107, range)
STDEVP 8 =SUBTOTAL(8, range) 108 =SUBTOTAL(108, range)
SUM 9 =SUBTOTAL(9, range) 109 =SUBTOTAL(109, range)

## Filtered Table - Extract value in column

Assumes table header is in row 1 and filtered column in column A
Row A2 would be your first unfiltered data row in the table worksheet

```Array Formula
{=INDEX(Table[Col1],MIN(IF(SUBTOTAL(3,OFFSET(Worksheet!A2,ROW(Table[Col1])-ROW(A2),0)),ROW(Table[Col1])-ROW(Worksheet!A2)+1)))}```

## Sum Smallest N Values

` =SUMPRODUCT(SMALL(rng,{1,2,n})) `

Sum bottom 3 values
` =SUMPRODUCT(SMALL(A1:A13,{1,2,3})) `

## Sum Largest N Values

`=SUMPRODUCT(LARGE(rng,{1,2,N}))`

Sum top 3 values
`=SUMPRODUCT(LARGE(A1:A13,{1,2,3}))`

## Count/Sum Cells with Errors

```=SUMPRODUCT(--ISERR(A1:A10))
ISERR counts all errors except #N/A

=SUMPRODUCT(--ISERROR(A1:A10))
If you want to also count #N/A, use the ISERROR function instead of ISERR

{=SUM(--ISERROR(B4:B8))}
Change to SUM instead of count and make an array formula.```

## Averaging

#### Weighted Average

`=SUMPRODUCT(NumbersRange,WeightsRange)/SUM(WeightsRange) `

Column A = values, Column B = weights
`=SUMPRODUCT(A1:A10,B1:B10)/SUM(B1:B10) `

#### Average Top N Values

`=AVERAGE(LARGE(range,{1,2,n}))`

Average top 3 values
`=AVERAGE(LARGE(A1:A16,{1,2,3}))`

#### Average (exclude zero value)

`=AVERAGEIF(range,"<>0")`

`=AVERAGEIF(A1:A10,"<>0")`

## Sum Last N Rows

Formula will sum the last 10 rows of a column. Allows you to keep adding data and not change formula.

```// Data starts in A1 - put formula in another column
=sum(offset(\$a\$1,count(a:a)-10,0,10))```

## IF Statements

Excel 2016 introduced IFS
`IFS([Something is True1, Value if True1, [Something is True2, Value if True2],…[Something is True127, Value if True127])`

IF(s) Wildcard / Contains
A1 = Houston Astros
=FIND("Astros", [Text2]4)

The result would be 8 ("John" starts at the 8th character. )

Using that, we can add an IF statement to it...

=IF(FIND("Astros", [A1]4) > 0, "YES", "NO")

Now, this will show YES if the formula finds "John" in the referenced cell.
SUMIF(s) and COUNTIF(s)

Searches
No quotes necessary if referencing a cell for criteria
Assume Houston is in cell A1

Text Search Cell Reference Explanation
"*" True if cell contains any text (not number)
"Houston" A1 Cell must equal "Houston" or A1
"<>Houston" "<>"&A1 Cell not equal to "Houston"
"Houston*" A1&"*" *after word allows for wildcard after, but not before. Counts "Houston Astros" but not "Go Houston".
"*Houston" "*"&A1 *before word allows for wildcard before, not after word
"*Houston*" "*"&A1&"*" searches for word/string of characters anywhere. Can be part of another word "HoustonTX"
"Bl?ck" Any single character. "Bl?ck" finds either "Black" or "Block"

Full example: =SUMIF(range,"*"&A1&"*",sum_range)

SUMIFS with an OR statement

```=SUM(SUMIFS(sum_range,criteria_range1,{"criteria","or-criteria","or-criteria"}))
or with wildcard
=SUM(SUMIFS(sum_range,criteria_range,{"*red*","*blue*"}))```

## Max IFs

Find max value based on multiple criteria

```{=MAX(IF(rng1=criteria1,IF(rng2=criteria2,values)))}  For excel 2013
=MAXIFS(price,color,G6,item,H6)                       For excel 2016+```

## IF OR statement

`=IF(OR(A1>1,B1>2), "Pass", "Fail")`

or

`={OR(A1:A10>1}`

# MISC

## Count Validation Errors

Returns total number of cells that don't match the validation data set
Formula: (Total Cells in Range) - (Total Validated Cells) = (Total Non-Validated Cells) `={COUNTA(range)-SUM(COUNTIFS(range,LookupTable!A:A))}`
The range would be the column with validation cells and the lookup table are the acceptable values.

Alternatively, you could list each value:
`SUM(COUNTIFS(range,{"Black","Blue","Green"}))`

Source: Excel Jet

## Convert Negative Numbers to Positive (ABS)

Using the Absolute Value is helpful when determining the change in value when working with negative numbers
`=ABS(A1)`
`=(new_value - prior_value) / ABS(prior_value)`

## Number Formatting

\$8,765,432
8M `[>999999] #,,"M";#,"K"`
8.7M `[>999999] #.#,,"M";#,"K"`

## Rounding

Round to nearest 1,000 `=ROUND(A1,-3)`
Round to whole number `=ROUND(A1, 0)`
Round to nearest multiple of number `=MROUND(A1,0)`

## Reference List in Table Column

Select List, source:

`=INDIRECT("Table1[First Name]")`

You are referencing Table1[First Name] in Table 2.
In the first cell of Table2 column, Data/Data Validation
http://www.get-digital-help.com/2012/10/15/how-to-use-a-table-name-in-data-validation-lists-and-conditional-formatting-formulas/

## Text in Formula

```// A4 is the date
="As of " &TEXT(A4,"mmmm d, yyyy")
// Result "As of December 31, 2015"```

# SHORTCUT KEYS

CTRL + ALT + F9 = Recalculate all worksheets & formulas
ALT + ENTER = Line break in cell
CTRL + R = Fills based on cell to left
CTRL + D = Fills based on cell above
CTRL + ENTER = Fill entire range

CTRL + SHFT + SPACEBAR = Selects entire table
CTRL + T or L = Creates table (after selecting data)

CTRL + End = Last cell in worksheet
CTRL + HOME = First cell in worksheet
CTRL + Arrow = Last cell with data in any arrow direction
SHIFT + SPACE = Select entire row
CTRL + SPACE = Select entire column
CTRL + PLUS = Creates row or column (after selecting)

CTRL + F = Search
CTRL + PgDn/PgUp = Toggle between worksheets
CTRL + Tab = Toggle between workbooks

CTRL + ~ = Reveal all formulas
CTRL + SHIFT + ! = Formats cell to number with 2 decimal places CTRL + SHIFT + % = Formats cell to %

Click to Expand this section...

# COMMON ERRORS

## Array Formula

Inside formula bar `(CNTL+SHIFT+ENTER) `

## Numbers Stored as Text

A common error when using Vlookup/Index Match formulas is that one column data type is set to text and the other is a number.
Solution 1: Highlight the column, Data (Ribbon) / Text To Columns or Number
Solution 2: Use power query to change column type and create a new table
Solution 3: Create a new column and change type with formula `=Value(A1)` (assumes your data starts in A1, then fill down)

## Clearing Formula Errors

`=iferror((yourformula),returnIfError)`
If the calculation A2+A3 results in an #NA, then return a blank
`=iferror((A2+A3),"")`

## % Change in Value From 0

2016 2017
\$0 \$500

Standard formula (\$500-\$0)/\$0 results in NaN.