JUST DOWNLOAD THOSE TWO FILES FIRST:
NOTICE
CSV
For the first dataset named as bank.csv
you need to read it with sep=";"
as even it is named as .csv
it is not a “comma seperated csv”.
Or more better if you can read the file once, save it as normal csv and use it later each time as bank_normal.csv
bank <- read.csv("bank.csv", sep = ";")
str(bank)
length(bank$y)
write.csv(bank,"bank_normal.csv")
File Encoding
For the second dataset, we have different problem to handle with:
File encoding not UTF8
Extra indexed column
Missing characters or Missing Column Names
Col names becoming …
….
The second dataset is a little bit more to work with so if interested about the process, check the code below:
xlibrary(readr)
# Specify column names
col_names <- c("rented_bike_count", "hour", "temperature", "humidity",
"wind_speed", "visibility", "dew_point_temperature",
"solar_radiation", "rainfall", "snowfall", "seasons",
"holiday", "functioning_day")
# Specify column types for each column
col_types <- cols(
rented_bike_count = col_double(),
hour = col_double(),
temperature = col_double(),
humidity = col_double(),
wind_speed = col_double(),
visibility = col_double(),
dew_point_temperature = col_double(),
solar_radiation = col_double(),
rainfall = col_double(),
snowfall = col_double(),
seasons = col_character(),
holiday = col_character(),
functioning_day = col_character()
)
# Read in the CSV file with the specified column names and types
bike_data <- read_csv("SeoulBikeData.csv", col_names = col_names, col_types = col_types, skip = 1)
# Write out the dataset with UTF-8 encoding
write_csv(bike_data, "bike_data_utf8.csv")
So from now on, you already have two normal datafiels to work with, bank_normal.csv
and bike_data_utf8.csv
, and can spent your time on real things….