FE581 – Topics: R [Python and R for the ModernData Scientist]A python:R Bilingual DictionaryPackage ManagementInstalling a single packageInstalling specific package versionsInstalling multiple packagesLoading packagesAssign OperatorsTypesArithmetic OperatorsAttributesKeywordsFunctions and MethodsStyle and Naming ConventionsAnalogous Data Storage ObjectsData FramesLogical ExpressionsRelational operatorsLogical operatorsR for Pythonistas
Users often forget that Jupyter is shortfor “Julia, Python, and R” because it’s very Python-centric.
One last note: Python users refer to themselves as Pythonistias, which is a really coolname! There’s no real equivalent in R, and they also don’t get a really cool animal, butthat’s life when you’re a single-letter language. R users are typically called…wait forit…useRs! (Exclamation optional.) Indeed, the official annual conference is calleduseR! (exclamation obligatory), and the publisher Springer has an ongoing and veryexcellent series of books of the same name.
View online with AzatAI Datalore Server
11install.packages('tudyverse')11pip install pandas41devtools::install_version(2"ggmap",3version = "3.5.2"4)11pip install pandas==1.1.011install.packages(c("sf", "ggmap"))21pip install pandas scikit-learn seaborn 2pip install -r requirements.txt131# Multiple calls to library()2library(MASS)3library(nlme)4library(psych)5library(sf)6# Install if not already available:7if (!require(readr)) {8install.packages("readr")9library(readr)10}11# Check, install if necessary, and12# load single or multiple packages:13pacman::p_load(MASS, nlme, psych, sf)111# Full package2import math3# Full package with alias4import pandas as pd5# Module6from sklearn import datasets7Module with alias8import statsmodels.api as sm9# Function10from statsmodels.formula.api import ols11# For ordinary least squares regressionFucking R has so much ugly assignments.
The four most common user-defined atomic-vector types in R:
| Type | Data frame shorthand | Tibble shorthand | Description | Example |
|---|---|---|---|---|
| Logical | logi | <lgl> | Binary data | TRUE/FALSE, T/F, 1/0 |
| Integer | int | <int> | Whole numbers from | 7, 9 , 2, -4 |
| Double | num | <dbl> | Real numbers from | 3.14, 2.78, 6.45 |
| Character | chr | <chr> | All alpha-numeric characters, includingwhite spaces | “Apple,” “Dog” |
The four most common user-defined types in Python:
| Type | Shorthand | Description | Example |
|---|---|---|---|
| Boolean | bool | Binary Data | True/False |
| Integer | int | Whole numbers from | 7, 9 , 2, -4 |
| Float | float | Real numbers from | 3.14, 2.78, 6.45 |
| String | str | All alpha-numeric characters, including white spaces | “Apple,” “Dog” |
Common arithmetic operators:
| Description | R Operator | Python Operator |
|---|---|---|
| Addition | + | + |
| Substraction | - | - |
| Multiplication | * | * |
| Division (float) | / | / |
| Exponentiation | ^ or ** | ** |
| Integer Division (floor) | %/% | // |
| Modulus | %% | % |
Class attributes:
x1# List attributes2attributes(df)3
4# Access functions5dim(df)6names(df)7class(df)8comment(df)9
10# Add comment 11comment(df) <- "new info"12
13# Add custom attribute14attr(df, "custom") <- "alt info"15attributes(df)$customxxxxxxxxxx141# Definition of a class2class Food:3 name = 'toast'4 5# An instance of a class6breakfast = Food()7
8# An attribute of the class9# inherited by the instance10breakfast.name11
12# Setting an attibute13breakfast.name = 'simis'14# setattr(breakfast, 'name','simis')Reserved words and keywords:
Reserved words or keywords means you can not use them to name ur var.
xxxxxxxxxx61?reserved2if, else, repeat, while, function,3for, in, next, break, TRUE, FALSE,4NULL, Inf, NaN, NA, NA_integer_,5NA_real_, NA_complex_, NA_character_,6... (..1, ..2, etc.)xxxxxxxxxx121# py Keywords2import keyword3print(keyword.kwlist)4## ['False', 'None', 'True', 'and',5'as', 'assert', 'async', 'await',6'break', 'class', 'continue', 'def',7'del', 'elif', 'else', 'except',8'finally', 'for', 'from', 'global',9'if', 'import', 'in', 'is', 'lambda',10'nonlocal', 'not', 'or', 'pass',11'raise', 'return', 'try', 'while',12'with', 'yield']xxxxxxxxxx51# Basic definition2myFunc <- function(x, ...){3 x*104}5myFunc(4)40
xxxxxxxxxx51# Multiple unnamed arguments2myFunc <- function(...){3 sum(...)4}5myFunc(100,200,300)600
xxxxxxxxxx61# Simple definition2
3def my_func(x):4 return x * 105
6my_func(4)xxxxxxxxxx51# Multiple named arguments, passed as a tuple2def my_func(*x):3 return x[2]4
5my_func(100,200,300)300
xxxxxxxxxx71# Multiple unknown arguments, saved as a dict2def my_func(**num):3 print("x: ", num['x'])4 print("y: ", num['y'])5
6
7my_func(x=40, y=100)x: 40 y: 100
Style in R is generally more loosely defined than in Python. Nonetheless, see the Advanced R style guide by Hadley Wickham (CRC Press) or Google’s R Style guide forsuggestions.
For Python, see the PEP 8 style guide.
Analogous Python objects for common R objects:
| R Structure | Python Analogous Structure |
|---|---|
| Vector (one-dimensional homogeneous) | ndarray, bnut also scalars, homogenous list and tuple |
Vector, matrix or array | NumPy n-dimensional array (ndarray) |
| Unnamed list (heterogenous) | list |
| Named list (heterogenous) | Dictionary dict, but lacking order |
| Environment (named, but unordered elements) | Dictionary, dict |
Variable/column in a data.frame | Pandas Series (pd.Series) |
Two-dimensional data.frame | Pandas data frame (pd.DataFrame) |
Analogous R objects for common Python objects:
| Python Structure | R Analogous Structure |
|---|---|
scalar | One-element long vector |
list( homo) | Vector, but as if lacking vectorization |
| list (hetero) | Unnamed list |
| tuple immutable | Vector, list as separated output from a function |
| Dictionary, dict, a key-value pair | Named list or better environment |
| NumPy n-dimensional array (ndarray) | Vector, matrix, or array |
| Pandas Series | Vector, variable/column in a data.frame |
| Pandas Data Frame | Two-dimensional data.frame |
xxxxxxxxxx31# Vectors2cities <- c("Istanbul", "Urumqi", "Almaty")3dist <- c(584,1054,653)'Istanbul''Urumqi''Almaty' 584 1054 653
xxxxxxxxxx31# Lists2cities = ['Istanbul', 'Berlin', "Korla"]3dist = [584, 1054, 653]['Istanbul', 'Berlin', 'Korla']
[584, 1054, 653]
One-dimensional, heterogeneous key-value pairs (Lists in R,dictionaries in Python):
xxxxxxxxxx181# A list of data frames2cities <- list(3 Munich = data.frame(4 dist=584,5 pop=143275,6 country="DE"7 ),8 Istanbul = data.frame(9 dist=5584,10 pop=1423275,11 country="TR"12 ),13 Almaty = data.frame(14 dist=84,15 pop=275,16 country="KZ"17 )18)
xxxxxxxxxx21# As a list object2cities[1]
xxxxxxxxxx11cities['Istanbul']
xxxxxxxxxx21# As a data.frame object (table)2cities[[1]]A data.frame: 1 × 3
| dist | pop | country |
|---|---|---|
| 584 | 143275 | DE |
xxxxxxxxxx11cities$AlmatyA data.frame: 1 × 3
| dist | pop | country |
|---|---|---|
| 84 | 275 | KZ |
xxxxxxxxxx41# A list of heterogeneous data2lm_list <- lm(weight ~ group, data = PlantGrowth)3length(lm_list)4names(lm_list)13 'coefficients''residuals''effects''rank''fitted.values''assign''qr''df.residual''contrasts''xlevels''call''terms''model'
xxxxxxxxxx71# lists2
3city = ['Munich', 'Paris', 'Amsterdam']4dist = [584, 1054, 653]5pop = [1484226, 2175601, 1558755]6area = [310.43, 105.4, 219.32]7country = ['DE', 'FR', 'NL'] xxxxxxxxxx51import numpy as np2
3# NumPy arrays4city_a = np.array(city)5city_axxxxxxxxxx11array(['Munich', 'Paris', 'Amsterdam'], dtype='%lt;U9')

xxxxxxxxxx11pop_a = np.array(pop)xxxxxxxxxx61# Dictionaries 2yy = {'city': ['Munich', 'Paris', 'Amsterdam'],3 'dist': [584, 1054, 653],4 'pop': [1484226, 2175601, 1558755],5 'area': [310.43, 105.4, 219.32],6 'country': ['DE', 'FR', 'NL']}xxxxxxxxxx51{'city': ['Munich', 'Paris', 'Amsterdam'],2 'dist': [584, 1054, 653],3 'pop': [1484226, 2175601, 1558755],4 'area': [310.43, 105.4, 219.32],5 'country': ['DE', 'FR', 'NL']}Data Frames in Python:
xxxxxxxxxx51# class pd.DataFrame2import pandas as pd3
4# From a dictionary, yy5yy_df = pd.DataFrame(yy)
xxxxxxxxxx61# From lists2# names3list_names = ['city', 'dist', 'pop', 'area', 'country']4
5# columns are a list of lists6list_cols = [city, dist, pop, area, country]['city', 'dist', 'pop', 'area', 'country']
[['Munich', 'Paris', 'Amsterdam'], [584, 1054, 653], [1484226, 2175601, 1558755], [310.43, 105.4, 219.32], ['DE', 'FR', 'NL']]
xxxxxxxxxx61# A zipped list of tuples2zip_list = list(3 zip(4 list_cols, list_names5 )6)[(['Munich', 'Paris', 'Amsterdam'], 'city'), ([584, 1054, 653], 'dist'), ([1484226, 2175601, 1558755], 'pop'), ([310.43, 105.4, 219.32], 'area'), (['DE', 'FR', 'NL'], 'country')]
xxxxxxxxxx11zip_df = pd.DataFrame(zip_list)
xxxxxxxxxx111# Easier2# import pandas library3import pandas as pd4
5# init list of lists6list_rows = [['Munich', 584, 1484226, 310.43, 'DE'],7 ['Paris', 1054, 2175601, 105.40, 'FR'],8 ['Amsterdam', 653, 1558755, 219.32, 'NL']]9
10# Create the pandas data frame11df = pd.DataFrame(list_rows, columns=list_names)
Two-dimensional, heterogenous, tabular data frames in R:
xxxxxxxxxx61# class data.frame from vectors2cities_df <- data.frame(city = c("Munich", "Paris", "Amsterdam"),3dist = c(584, 1054, 653),4pop = c(1484226, 2175601, 1558755),5area = c(310.43, 105.4, 219.32),6country = c("DE", "FR", "NL"))
Multidimensional arrays:
R:
xxxxxxxxxx61# array2
3arr_r <- array(c(1:4,4seq(10, 40, 10),5seq(100, 400, 100)),6dim = c(2,2,3) )
xxxxxxxxxx11rowSums(arr_r,dims = 2)
xxxxxxxxxx11rowSums(arr_r,dims = 1)444 666
xxxxxxxxxx11colSums(arr_r,dims = 1)
xxxxxxxxxx11colSums(arr_r,dims = 2)
Python:
xxxxxxxxxx81arr = np.array(2 [[[1, 2],3 [3, 4]],4 [[10, 20],5 [30, 40]],6 [[100, 200],7 [300, 400]]]8)

| Description | R Operator | Python Operator |
|---|---|---|
| Equivalency | == | == |
| Non-equivalency | != | != |
| Greater-than (or equal to) | > (>=) | > (>=) |
| Lesser-than (or equal to) | < (<=) | < (<=) |
| Negation | !x | not() |

Python:
xxxxxxxxxx11import numpy as np2a = np.array([23,5,7,9,12])3a > 10 xxxxxxxxxx11array([ True, False, False, False, True])
| Description | R operator | Python Operator |
|---|---|---|
| AND | &, && | &, and |
| OR | |,|| | |, or |
| WITHIN | y %in% x | In, not in |
| identity | identical() | is, is not |
xxxxxxxxxx11091xx <- 1:102xx == 63xx !=64xx >= 65xx <- 1:66# tails of a distribution7xx < 3 | xx >4 8# range in a distribution9xx > 3 & xx < 4