FE581 – HW2

JUST DOWNLOAD THOSE TWO FILES FIRST:

Download bank_normal.csv here
Download bike_data_utf8.csv here

NOTICE

For the first dataset named as bank.csv you need to read it with sep=";" as even it is named as .csv it is not a “comma seperated csv”.

Or more better if you can read the file once, save it as normal csv and use it later each time as bank_normal.csv


bank <- read.csv("bank.csv", sep = ";")
str(bank)
length(bank$y)
write.csv(bank,"bank_normal.csv")

File Encoding

For the second dataset, we have different problem to handle with:

File encoding not UTF8
Extra indexed column
Missing characters or Missing Column Names
Col names becoming …
….

The second dataset is a little bit more to work with so if interested about the process, check the code below:


x
library(readr)

# Specify column names
col_names <- c("rented_bike_count", "hour", "temperature", "humidity", 
               "wind_speed", "visibility", "dew_point_temperature", 
               "solar_radiation", "rainfall", "snowfall", "seasons", 
               "holiday", "functioning_day")

# Specify column types for each column
col_types <- cols(
  rented_bike_count = col_double(),
  hour = col_double(),
  temperature = col_double(),
  humidity = col_double(),
  wind_speed = col_double(),
  visibility = col_double(),
  dew_point_temperature = col_double(),
  solar_radiation = col_double(),
  rainfall = col_double(),
  snowfall = col_double(),
  seasons = col_character(),
  holiday = col_character(),
  functioning_day = col_character()
)

# Read in the CSV file with the specified column names and types
bike_data <- read_csv("SeoulBikeData.csv", col_names = col_names, col_types = col_types, skip = 1)
# Write out the dataset with UTF-8 encoding
write_csv(bike_data, "bike_data_utf8.csv")

So from now on, you already have two normal datafiels to work with, bank_normal.csv and bike_data_utf8.csv, and can spent your time on real things….