Intro to Statistics

Statistics - arts and science of learning from data. Its a way to quantify uncertainty
Data Science - includes the computational aspects and acquiring, managing, and analyzing data
- Reasoning and computing are both important aspects in data science

Statistical Thinking

Always be aware of the assumptions made, are they justified?
Think about where the data came from
- Is the data a random sample (representatives of all sample) ?
  - Survivorship Bias - only data that survived the phenomena were left, therefore these data would not be random sample
- Are the data and the assumptions correlated?

Introduction to `R` and RStudio

Console - excites each line of code

Source - open, view, and edit code and save for later

Environment - list objects (variables, data frames)

Assigning variables: using <- operator
Comment using # symbol in R
Element - a single piece of data, can be one fo several data types:

Data Type	R Sytax	Description	Examples
Integer	`int`	Numbers (integers)	-4, -2, 1, 3
Double	`dbl`	Numbers (floats)	2, 2.02, 22222
Logical	`lgl`	Booleans	TRUE, FALSE
Character	`chr`	Strings	“I”, “I love stats”
Factor	`fct`	Characters taken only out of a pre specified list	“China” out of Asian

Vector - can be made by grouping values to the same data type (simplest data structure)
- c() combines single elements into a vector
- Use is. Function to check the data type [ is.numeric() , is.character()]
- Similar to list?
R switches between data types automatically for some operations
- Logical -> Numeric -> Character | Logical -> Character
```
> sum(c(TRUE, FALSE)) == sum(c(1, 0))
[1] TRUE
> sum(c(TRUE, FALSE))
[1] 1
```
Data Frame - stores data sets
- Row: individual records; Column: variables
- A data frame can contain multiple type of data, but within a column every cell must be the same type of data
Packages in R provide collections of functions and data sets in addition to the things ‘base R’ can do
- Function - a shortcut to run a bunch of code
  - Parameters - a preset setting for the expected input
  - Argument - provided input
- tydyverse - need to be loaded into every problem set using the library() function
  - read_csv() is a function that load data in, the resulting object type is called a tibble
  - glimpse() out the number of rows and colum and listing out the column names, its data type and a few of first values (a summary out the data)
  - head() shows the top couple rows of the data (first 6 by default)
Pipe(%>%) - a tool that makes applying functions easy (step-by-step)

Ex. The tibble would be piped in as the input the glimpse() function
- Tip1: read pip as “AND THEN” when reading the code
- Tip2: use CMD + SHIFT + M to make a pipe

MEL.ZHU

Explorer

blogs

ENG

UofT Notes

ANT100

COG250

CSC111

MAT223

MGT100

PSY220

PSY230

PSY240

PSY260

PSY322

PSY424

STA130

STA130: Lecture_1

Intro to Statistics

Statistical Thinking

Introduction to `R` and RStudio

Graph View

Table of Contents

Backlinks

MEL.ZHU

Explorer

blogs

ENG

UofT Notes

ANT100

COG250

CSC111

MAT223

MGT100

PSY220

PSY230

PSY240

PSY260

PSY322

PSY424

STA130

STA130: Lecture_1

Intro to Statistics §

Statistical Thinking §

Introduction to R and RStudio §

Graph View

Table of Contents

Backlinks

Intro to Statistics

Statistical Thinking

Introduction to `R` and RStudio