CREDO Data Tools
Welcome to the CREDO Data Tools project documentation. Here you will find a description with examples of work carried out by people from the CREDO project - both in the form of description and code.
If any part of the documentation is not clear (not understandable) contact the creator (Slawomir Stuglik) of the documentation or write an email to contact[at]credo.science
Download scripts
All files presented in this documentation can be downloaded from the CREDO projec internal page (redmine).
Data_time_operation
The data_time_operation.py script contains several useful functions concerning operations on datatime and timestamp
Dont forget to always install the libraries that are imported
Libraries
import os
import math
from time import mktime
from datetime import datetime, date, timedelta, timezone
Libraries that are used in the following functions math -> used in calculations for the Sun class and functions calcSunTime time and datetime -> basic libraries for time and date operations
1.1 set_start_end
def set_start_end(data_start, data_end):
timestamp_start = int(mktime(datetime.strptime(data_start, '%Y-%m-%d %H:%M').timetuple()))
timestamp_end = int(mktime(datetime.strptime(data_end, '%Y-%m-%d %H:%M').timetuple()))
return timestamp_start, timestamp_end
example how to use the function
set_start_end("2023-07-01 00:00", "2023-07-05 00:00")
Example what we should get
1688162400 1688508000
timestamp you can check on the website: https://www.unixtimestamp.com
1.2 create_window
def create_window(day=0, hour=0, minutes=15, seconds=0, starter="2005-01-01 00:00", ender="2023-02-20 00:00"):
start, meta = set_start_end(starter, ender)
shift = day * 86400 + hour * 3600 + minutes * 60 + seconds
window = {}
small_window = {}
for timestamp in range(start, meta, shift):
datatime_detect = datetime.utcfromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
window[datatime_detect] = []
for timestamp in range(0, 86400, shift):
small_window[timestamp] = []
return shift, window, small_window
example how to use the function
begin, end = set_start_end("2023-07-01 00:00", "2023-07-05 00:00")
create_window(day=0, hour=0, minutes=10, starter=begin_date, ender=end_date)
Example what we should get
! Remember you enter a local datetime and it will be converted to UTC datetime zone 0
600, {'2023-06-30 22:00:00': [], '2023-06-30 22:10:00': [], ... , '2023-07-04 21:40:00': [], '2023-07-04 21:50:00': []}, {0: [], 600: [], ..., 85200: [], 85800: []})
shift -> calculate the size of the window in seconds by substituting the given numbers into the following equation
window -> dictionary with all dates and times in the range begin, end
small_window -> dictionary of all windows of the day (24 hours)
datetime.utcfromtimestamp(timestamp) -> utcfromtimestamp - will give us a datetime in zone 0 (UTC), same fromtimestamp will give us the local time in which you are working, e.g. for me it will be the current time in Poland including time zones - so I advise against using local time - especially when you are working with people from other countries.
window[datatime_detect] -> the window variable has a dictionary structure, while the single element has a list structure. It looks roughly like this { 'elem1': [ ], 'elem2': [ ]}
small_window[timestamp] -> the small_window variable has the same structure as the window variable, but contains the timestamp (10 characters, seconds) corresponding to the days and minutes of the day - range 00:00 to 23:59
1.3 days_from_2000
def days_from_2000(data_s):
day_zero = date(2000, 1, 1)
year, month, day = data_s.split("-")
day_s = date(int(year), int(month), int(day))
delta = day_s - day_zero
return delta.days
return delta.days because the delta variable itself is of the form
9 days, 0:00:00
example how to use the function
days_from_2000("2000-01-10")
Example what we should get
9
day_zero - date 2000-01-01 - date from which we count how many days passed until the date given by the user. The given date (when calling the function) is in the form of a String variable in the YYYY-MM-DD format, therefore we must first divide the variable into 3 elements year, month, day and then create a time variable using the "date" function. This allows us to calculate the difference in days between two dates (delta) by simply subtracting the dates.
1.4 date_range
def date_range(start_date, meta_date):
for n in range(int((meta_date - start_date).days)):
yield start_date + timedelta(n)
example how to use the function
date_end = datetime.strptime("2023-07-05", "%Y-%m-%d").date()
date_start = (date_end - timedelta(days=5))
for single_date in date_range(date_start, date_end):
one_date = single_date.strftime("%Y-%m-%d")
print(one_date)
Example what we should get
2023-06-30
2023-07-01
2023-07-02
2023-07-03
2023-07-04
1.5 sunrise_bin_window
def sunrise_bin_window(shift, window_list, coordinates):
sun = Sun()
sunrise_bin_date = {}
sunrise_dates = {}
date_window_old = "2005-02-01"
sunrise_old = ""
diff = shift
for datatime_window in window_list:
window_date = str(datatime_window).split(" ")[0]
timer = str(datatime_window).split(" ")[1]
sunrise = sun.getSunriseTime(coordinates, ndate=window_date)['time_str'] + ":00"
if window_date not in sunrise_bin_date:
sunrise_bin_date[window_date] = []
sunrise_dates[window_date] = sunrise
t0 = datetime.strptime(timer, "%H:%M:%S")
t1 = datetime.strptime(sunrise, "%H:%M:%S")
delta_t = t1 - t0
if sunrise_old > timer and date_window_old != window_date and abs(delta_t.total_seconds()) > diff:
sunrise_bin_date[date_window_old].append(datatime_window)
else:
sunrise_old = sunrise
date_window_old = window_date
if sunrise < timer or 0 <= delta_t.total_seconds() < diff:
sunrise_bin_date[window_date].append(datatime_window)
return sunrise_bin_date, sunrise_dates
example how to use the function
shift = 600
coordinates = {'longitude': 138.60414, 'latitude': -34.919159 }
# window from functions create_window - see 1.2
sunrise_bin_date, sunrise_dates = sunrise_bin_window(shift, window, coordinates)
Example what we should get for sunrise_bin_date for 1 element
{'2023-06-30': [
'2023-06-30 22:00:00',
'2023-06-30 22:10:00',
...,
'2023-07-01 21:50:00'
]
}
Example what we should get for sunrise_dates from 2023-07-04 to 2023-07-11
{
'2023-07-04': '22:02:00',
'2023-07-05': '22:01:00',
...,
'2023-07-09': '22:01:00',
'2023-07-10': '22:00:00'
}
To start with, we need to create a variable which is a Sun() class. More about the Sun() class in section 1.6
We also need to declare the following variables:
sunrise_bin_date -> dictionary contains information about the individual windows from the "window" dictionary that are in a given sunrise for a given day, e.g. 2023-07-05 (see 1.2 create_window).
sunrise_dates -> dictionary contains information about the time of sunrise for a given date
date_window_old -> To begin with, it must be earlier than the start date. After that, it will be continuously updated. This is an auxiliary variable to make it easier to assign a given window to the corresponding element of the sunrise_bin_date dictionary.
For example, if the sunrise for 2023-07-05 is at 07:02 then the 05:30 window belongs to the sunrise of 2023-07-04.
sunrise_old -> Sunrise for day N - 1 ("yesterday"). Format HH:MM:00 e.g. 10:48:00
diff -> information on how big the window is (in seconds; 15 min -> 900)
The calculation is made by looping through the elements of the window_list dictionary, a single element is e.g.: 2013-03-30 08:30:00.
We break each single element into a date part (window_date, e.g. 2013-03-30) and a time part (
We pass the date to the getSunriseTime function of class Sun() and get the result in different options - we are interested in the time_str option to which we add ":00" -> then the result we get is of the form HH:MM:SS e.g.: 10:48:00 (the Sunrise function is therefore accurate to the minute).
For each window_date we check if it exists in the sunrise_bin_date dictionary, if not we create a new dictionary element which will be a list (sunrise_bin_date[window_date] = []) - Then we will add the datatime of individual windows to the list.
We also add to the sunrise_dates dictionary the window_date which has the value of the sunrise variable (information when the sun rises).
It is necessary to additionally handle the case of 'zero' (first) windows, i.e. those in which sunrise occurs - so we will calculate delta_t which is the difference of the window time (timer) and sunrise (sunrise).
Two ifs are then created.
The first concerns the case if applies to windows for a new day, but before a new sunrise.
And the second if applies if the sunrise time is less than the time of the window being analysed or the difference between the two times is less than the diff. (window size).
1.6 Class Sun
class Sun:
def getSunriseTime(self, coords, ndate="now"):
return self.calcSunTime(coords, True, ndate)
def getSunsetTime(self, coords, ndate="now"):
return self.calcSunTime(coords, False, ndate)
def getCurrentUTC(self, ndate):
if ndate == "now":
now = datetime.now(timezone.utc)
else:
now = datetime.strptime(ndate, '%Y-%m-%d').replace(tzinfo=timezone.utc)
return [now.day, now.month, now.year]
def calcSunTime(self, coords, isRiseTime, ndate, zenith=90):
day, month, year = self.getCurrentUTC(ndate)
longitude = coords['longitude']
latitude = coords['latitude']
TO_RAD = math.pi / 180
# 1. first calculate the day of the year
N1 = math.floor(275 * month / 9)
N2 = math.floor((month + 9) / 12)
N3 = (1 + math.floor((year - 4 * math.floor(year / 4) + 2) / 3))
N = N1 - (N2 * N3) + day - 30
# 2. convert the longitude to hour value and calculate an approximate time
lngHour = longitude / 15
if isRiseTime:
t = N + ((6 - lngHour) / 24)
else: # sunset
t = N + ((18 - lngHour) / 24)
# 3. calculate the Sun's mean anomaly
M = (0.9856 * t) - 3.289
# 4. calculate the Sun's true longitude
L = M + (1.916 * math.sin(TO_RAD * M)) + (0.020 * math.sin(TO_RAD * 2 * M)) + 282.634
L = self.forceRange(L, 360)
# 5a. calculate the Sun's right ascension
RA = (1 / TO_RAD) * math.atan(0.91764 * math.tan(TO_RAD * L))
RA = self.forceRange(RA, 360)
# 5b. right ascension value needs to be in the same quadrant as L
Lquadrant = (math.floor(L / 90)) * 90
RAquadrant = (math.floor(RA / 90)) * 90
RA = RA + (Lquadrant - RAquadrant)
# 5c. right ascension value needs to be converted into hours
RA = RA / 15
# 6. calculate the Sun's declination
sinDec = 0.39782 * math.sin(TO_RAD * L)
cosDec = math.cos(math.asin(sinDec))
# 7a. calculate the Sun's local hour angle
cosH = (math.cos(TO_RAD * zenith) - (sinDec * math.sin(TO_RAD * latitude))) / (
cosDec * math.cos(TO_RAD * latitude))
if cosH > 1:
return {'status': False, 'msg': 'the sun never rises on this location (on the specified date)'}
if cosH < -1:
return {'status': False, 'msg': 'the sun never sets on this location (on the specified date)'}
# 7b. finish calculating H and convert into hours
if isRiseTime:
H = 360 - (1 / TO_RAD) * math.acos(cosH)
else: # setting
H = (1 / TO_RAD) * math.acos(cosH)
H = H / 15
# 8. calculate local mean time of rising/setting
T = H + RA - (0.06571 * t) - 6.622
# 9. adjust back to UTC
UT = T - lngHour
UT = self.forceRange(UT+0.05, 24) # UTC time in decimal format (e.g. 23.23)
# 10. Return
hr = int(self.forceRange(int(UT), 24))
min = int(round((UT - int(UT)) * 60, 0))
if min == 60: # (in decimal 4.99673 -> will round as 4 H 60 min instead of 5H 0 min, we fix this
hr += 1
min = 0
if hr < 10:
hr = "0" + str(hr)
if min < 10:
min = "0" + str(min)
time_str = "%s:%s" % (hr, min)
return {
'status': True,
'decimal': UT,
'hr': hr,
'min': min,
'time_str': time_str
}
def forceRange(self, v, max):
# force v to be >= 0 and < max
if v < 0:
return v + max
elif v >= max:
return v - max
return v
example how to use the function
sun = Sun()
coords = {'longitude': 138.60414, 'latitude': -34.919159 } # 35.2N 137.0E
sunrise = sun.getSunriseTime(coords, ndate="2023-07-01")['time_str'] + ":00" # i.e 10:48:00
sunset = sun.getSunsetTime(coords, ndate="2023-07-01")['time_str'] + ":00"
print("sunrise: ",sunrise, "sunset: ",sunset, "time in UTC")
Example what we should get
sunrise: 22:02:00 sunset: 07:43:00 time in UTC
A class that allows the calculation of sunrise and sunset based on the obtained geographical coordinates (latitude and longitude) and date.
The class has the following functions:
getSunriseTime -> will return the sunrise time for a given date and geographical coordinates.
getSunsetTime -> will return the sunset time for a given date and geographical coordinates.
In both functions the main function is called - the difference is only in the status of the isRiseTime variable - if False it returns sunsetTime
getCurrentUTC -> Specifies the date it is to take for calculations. If "now" then it will take today is date, otherwise it will take the date specified when the function was called.
forceRange - The function check that the result is in the correct range (min,max)
calcSunTime -> Main function - needs information to start such as:
geographic coordinates - coords , sunrise or sunset - isRiseTime, date under investigation - ndate, zenith angle - zenith. The calculation is performed in 10 steps:
- calculate the day of the year N1 = math.floor(275 * month / 9)
- convert the longitude to hour value (longitude / 15) and calculate an approximate time
- calculate the Suns mean anomaly M = (0.9856 * t) - 3.289
- calculate the Suns true longitude L = M + (1.916 * sin(M)) + (0.020 * sin(2 * M)) + 282.634
- calculate the Suns right ascension
RA = (1 / TO_RAD) * math.atan(0.91764 * math.tan(TO_RAD * L))
Adjust RA to be in the range from 0 to 360
RA = self.forceRange(RA, 360) b) right ascension value needs to be in the same quadrant as L
Lquadrant = (math.floor(L / 90)) * 90
RAquadrant = (math.floor(RA / 90)) * 90
RA = RA + (Lquadrant - RAquadrant)
at the end right ascension value needs to be converted into hours (RA / 15)
- calculate the Suns declination sinDec = 0.39782 * math.sin(TO_RAD * L)
- calculate the Suns local hour angle cosH = (math.cos(TO_RAD * zenith) - (sinDec * math.sin(TO_RAD * latitude))) / ( cosDec * math.cos(TO_RAD * latitude))
- calculate local mean time of rising/setting T = H + RA - (0.06571 * t) - 6.622
- adjust back to UTC UT = T - lngHour
- Return hr = int(self.forceRange(int(UT), 24)) min = int(round((UT - int(UT)) * 60, 0))
N2 = math.floor((month + 9) / 12)
N3 = (1 + math.floor((year - 4 * math.floor(year / 4) + 2) / 3))
N = N1 - (N2 * N3) + day - 30
You can find an explanation of the calculations and this formula on the [astronomy.stackexchange.com] (https://astronomy.stackexchange.com/questions/2407/calculate-day-of-the-year-for-a-given-date)
Adjust L to be in the range from 0 to 360
L = self.forceRange(L, 360)
cosDec = math.cos(math.asin(sinDec))
Finish calculating H depending on whether you want sunrise or sunset.
if isRiseTime:
H = 360 - (1 / TO_RAD) * math.acos(cosH)
else:
H = (1 / TO_RAD) * math.acos(cosH)
Finally, convert to hours (H / 15)
Added to the time +0.05 - the formula has an inaccuracy of a few tens of succs ~1 minute - so this reduces its inaccuracy (compare with sunrise/sunset results from internet data).
UT = self.forceRange(UT+0.05, 24) # UTC time in decimal format (e.g. 23.23)
Compared to the original code, options have been added to gain minutes and hours and time_str
_____ Examles _____
Below you will find some examples of how to analyze data using the above suggested functions.
Ex-I: Read_scaler
Analysis of public data from the Pierre Auger Observatory, csv file downloaded from opendata.auger.org
The "scaler.csv" file contains:
time: Unix time in seconds (seconds since 1st Jan 1970, w/o leap seconds)
rateCorr: corrected scaler rate [counts/m2/s]
arrayFraction: Fraction of array in operation [procent]
rateUncorr: average detector scaler rate, uncorrected [counts/s]
pressure: barometric pressure [hPa]
Purpose of the script:
Counting the average number of particles from N_days and calculating the difference between two adjacent (A; A-1) bins after sunrise
2.0 libraries
import csv
import os
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime, timedelta
from Functions.data_time_operation import Sun, sunrise_bin_window
from Functions.data_time_operation import set_start_end, create_window, days_from_2000, date_range
from path_links import main_path, data_analysis_path
Dont forget to always install the libraries that are imported
Libraries that are used in the following functions
matplotlib -> a library for creating a plot, a graph with results
csv and pandas - Library for saving and loading data
data_time_operation - Useful functions described in Chapter 1 of this documentation
datetime -> basic libraries for time and date operations
path_links - If you create a lot of scripts, it is a good idea to have all the paths in one separate file - so that if any changes are made, the changes are made in one file and not, for example, in 10.
const elements
main_path2 = data_analysis_path + "pierre_auger/"
pg_data = main_path2 + "scalers.csv"
begin_date, end_date = "2005-01-03 00:00", "2020-01-01 00:00"
begin, end = set_start_end(begin_date, end_date)
w_d, w_h, w_m = 0, 0, 15
coordinates = {'longitude': -69.182460, 'latitude': -35.275999}
analysed_bin = 1
main_path2,pg_data - path to files
begin_date, end_date -> date will be passed to the function create_window
w_d, w_h, w_m -> size of a single window (days, hours, minutes )
coordinates - Pierre Auger coordinates (Latitude: -34.27 S, Longitude: -69.18 W)
analysed_bin - id of the bin for which we are counting the difference
2.1 Read_csv
def read_csv():
shift, window, small_window = create_window(day=w_d, hour=w_h, minutes=w_m, starter=begin_date, ender=end_date)
csvfile = open(pg_data)
reader = csv.DictReader(csvfile)
for row in reader:
timestamp = int(row['time'])
detects = row['rateCorr']
if timestamp > begin:
timestamp_in_window = timestamp - timestamp % shift
# utcfromtimestamp - da nam datetime w strefie 0
data_time = datetime.utcfromtimestamp(timestamp_in_window).strftime('%Y-%m-%d %H:%M:%S')
if data_time in window: # data_time = 2014-09-16 09:00:00; window[data_time] = [179.43]
window[data_time].append(float(detects))
sunrise_bin_date, sunrise_dates = sunrise_bin_window(shift, window, coordinates)
analyze_window(shift, window, sunrise_bin_date, sunrise_dates, n_days=100)
scater_analyze_window(shift, window, sunrise_bin_date, n_days=100)
After running the sunrise_bin_window (function described in section 1.5), we have a sunrise_bin_date dictionary containing all dates in the interval of interest where a single element is the date with windows (15 min) , between one sunrise and the next sunrise.
(As a reminder, sunrise_dates is a dictionary with sunrise times for dates in our range).
When we have all the required stuff we can run the analyze_window function which will do all the calculations we are interested in and finish the job by saving the results to a file and creating a plot.
Finally, we will additionally (run scater_analyze_window and) create a 'scater plot' to show what the results look like for the given intervals after sunrise.
Both functions will be explained in detail in sections 2.2 and 2.3.
2.2 analyze_window
def analyze_window(shift, window_list, sunrise_bin_date, sunrise_dates, n_days=100):
diff_sunrise_bin_date = {} # For a given analysed_bin we calculate the bin1-bin0 differences
for window_date in sunrise_bin_date:
time_bin1 = sunrise_bin_date[window_date][analysed_bin] # i.e 2005-01-03 09:30:00
time_bin0 = sunrise_bin_date[window_date][analysed_bin - 1] # i.e 2005-01-03 09:15:00
if len(window_list[time_bin1]) > 0 and len(window_list[time_bin0]) > 0:
bin_1 = window_list[time_bin1][0] # i.e 188.954
bin_0 = window_list[time_bin0][0] # i.e 188.954
diff_detects = round(bin_1 - bin_0, 3)
diff_sunrise_bin_date[window_date] = diff_detects
avg_sunrise_bin_date = {}
sum_sunrise_bin_date = {}
active_sunrise_bin_date = {}
for window_date in sunrise_bin_date:
avg_sunrise_bin_date[window_date] = []
date_of_end = datetime.strptime(window_date, "%Y-%m-%d").date()
start_date = (date_of_end - timedelta(days=n_days - 1))
for single_date in date_range(start_date, date_of_end):
current_date = single_date.strftime("%Y-%m-%d")
if current_date in diff_sunrise_bin_date:
avg_sunrise_bin_date[window_date].append(diff_sunrise_bin_date[current_date])
if date_of_end.strftime("%Y-%m-%d") in diff_sunrise_bin_date:
avg_sunrise_bin_date[window_date].append(diff_sunrise_bin_date[date_of_end.strftime("%Y-%m-%d")])
if len(avg_sunrise_bin_date[window_date]) > 2:
totals = sum(avg_sunrise_bin_date[window_date])
sum_sunrise_bin_date[window_date] = totals
elements = len(avg_sunrise_bin_date[window_date])
active_sunrise_bin_date[window_date] = avg_sunrise_bin_date[window_date]
averages = totals / elements
avg_sunrise_bin_date[window_date] = averages
else:
avg_sunrise_bin_date[window_date] = []
After the calculation, data can be saved to a file and a plot generated (Continuation of the same function).
This should be done outside the loop for window_date in sunrise_bin_date:
i = 0
window_csv_header = ["N_day", "date", "sunrise", "bin_1", "bin_2", "delta2_1", "sum", "active", "avg_2_1"]
window_csv_body = []
days = 1943
for window_date in sunrise_bin_date:
days_old = days
days = int(days_from_2000(window_date))
elements_n = sunrise_bin_date[window_date]
time_bin2 = elements_n[analysed_bin]
time_bin1 = elements_n[analysed_bin - 1]
if len(window_list[time_bin1]) > 0 and len(window_list[time_bin2]) > 0:
if days - days_old == 1:
bin_2 = window_list[time_bin2][0]
bin_1 = window_list[time_bin1][0]
delta2_1 = round(bin_2 - bin_1, 3)
avg_2_1 = avg_sunrise_bin_date[window_date]
if str(avg_2_1) != "[]":
suma = sum_sunrise_bin_date[window_date]
active = active_sunrise_bin_date[window_date]
info = {"N_day": days, "date": window_date, "sunrise": sunrise_dates[window_date],
"bin_1": bin_1, "bin_2": bin_2, "delta2_1": delta2_1, "sum": suma,
"active": active, "avg_2_1": avg_2_1}
window_csv_body.append(info)
days_old = days
i += 1
avg_window_device_path2 = main_path2 + "csv/"
csv_path = avg_window_device_path2 + str(analysed_bin) + "_new_control_bin_window_" + str(n_days) + "_days.csv"
os.makedirs(avg_window_device_path2, exist_ok=True)
with open(csv_path, "w") as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=window_csv_header)
writer.writeheader()
writer.writerows(window_csv_body)
The stage of creating plots for a single bin N after sunrise.
avg_window_device_path3 = main_path2 + "png/"
os.makedirs(avg_window_device_path3, exist_ok=True)
df = pd.read_csv(csv_path, decimal=".")
x_value, y_value = 'N_day', 'avg_2_1'
save_path = avg_window_device_path3 + str(analysed_bin) + "_bin_window_" + str(n_days) + "_days_" + str(
int(shift / 60)) + "_shift.png"
title = "n_days: %s; shift: %s H; bin_id: %s; numbering starts at sunrise." % (
n_days, round(shift / 3600, 2), analysed_bin)
create_plot(x_value, y_value, df, save_path, title)
To start, we do a loop through the elements (window_date) of the sunrise_bin_date dictionary (each element is a date and a list of windows with datetime windows after sunrise) and for each element-date we take the value N and N-1 (id) after sunrise.
Id equal to 0 has the window where sunrise is, id equal to 1 is the first window after sunrise and add to diff_sunrise_bin_date[window_date].
Example (N = 1, 2023-07-05):
Sunrise is at 07:06 then the zero window has a time of 07:00 and the first window after sunrise 07:15 then:
time_bin1 = sunrise_bin_date["2023-07-05"][1] -> 2023-07-05 07:15:00
time_bin0 = sunrise_bin_date["2023-07-05"][0] -> 2023-07-05 07:00:00
Having the datetime of two windows, we retrieve the values from the window_list dictionary (bin_1 and bin_0 values) for the given times and calculate the differences bin_1 - bin_0 for them.
A single item in the window_list dictionary is a list, so even if it is a single-element list (if the window is every 15 minutes and the data from the file is every 15 minutes, we have a single-element list), we have to refer to it by its position in the list - the first element is 0, i.e. window_list[time_bin1][0].
After calculating the differences for our bin for all dates, we can proceed to the stage where we will calculate the average of "n_days" for each date.
But for starters, we are going to create 3 dictionaries:
avg_sunrise_bin_date -> average of variances of n_day
sum_sunrise_bin_date -> sum of the n_day (100) most recent days
active_sunrise_bin_date ->list of differences in values over the last N-days
Then, in the loop after the elements of the sunrise_bin_date dictionary, we will add a new element to the avg_sunrise_bin_date dictionary, which will be the currently examined date. Such an element is a list that will contain n_days (e.g. 100 - for an average of 100 days) of results. The date_range function (described in section 1.4) will be used to check which results need to be added. It needs 2 dates (start_date, end_date) that define the date range.
We will use each date received from this function to add the result to the list for our date (window_date):
avg_sunrise_bin_date[window_date].append(diff_sunrise_bin_date[current_date])
You need to remember that date_range will not return the end date in the list of dates, so with this function we will get n_days -1 and not n_days of dates (for n_days=100 it will be 99 instead of 100) and we will add the end date manually
.
After creating a list of differences (avg_sunrise_bin_date) for a given "window_date" the average (sums / len(elements)) is calculated. Currently, it is set that the result is analysed if there is a minimum of 2 of the n_days (the condition applies to the beginning of the set or the date after a long break - if it would exist).
The result of the average will be written to avg_sunrise_bin_date (the list of differences from n_days to average is substituted).
Once all the averages have been calculated, the data can be saved to a file and a plot generated.
Comment on the write-to-file stage:
days = 1943 -> fixing the day number since 2000, so far rigid - this can be refined. This way we have the days in the file day by day. This number is needed for the first value - then it sets itself.
Data can be saved with multiple libraries - this function uses the CSV library.
However, the pandas library will be used to load the data for the chart. Because thanks to the Pandas library it is easier to declare the so called "separator / decimal " character separating an integer from the rest, e.g. in CSV, Excel the separator is a "comma", in python it is a "dot", e.g. 1.31 -> decimal = "."
The create_plot function will be described in section 2.4
2.3 scater_analyze_window
def scater_analyze_window(shift, window_list, sunrise_bin_date, n_days=100):
window_csv_header2 = ["bin", "value"]
scater_plot_raw_data = {}
window_csv_body2 = []
diff_sunrise_bin_date = {}
for i in range(1, 30):
for window_date in sunrise_bin_date:
if len(sunrise_bin_date[window_date]) > i:
time_bin1 = sunrise_bin_date[window_date][i] # i.e 2005-01-03 09:30:00
time_bin0 = sunrise_bin_date[window_date][i - 1] # i.e 2005-01-03 09:15:00
if len(window_list[time_bin1]) > 0 and len(window_list[time_bin0]) > 0:
bin_1 = window_list[time_bin1][0] # i.e 188.954
bin_0 = window_list[time_bin0][0] # i.e 188.954
diff_detects = round(bin_1 - bin_0, 3)
diff_sunrise_bin_date[window_date] = diff_detects
avg_sunrise_bin_date = {} # average of variances of n_day
for window_date in sunrise_bin_date:
avg_sunrise_bin_date[window_date] = []
date_of_end = datetime.strptime(window_date, "%Y-%m-%d").date()
start_date = (date_of_end - timedelta(days=n_days - 1))
for single_date in date_range(start_date, date_of_end):
current_date = single_date.strftime("%Y-%m-%d")
if current_date in diff_sunrise_bin_date:
avg_sunrise_bin_date[window_date].append(diff_sunrise_bin_date[current_date])
if date_of_end.strftime("%Y-%m-%d") in diff_sunrise_bin_date:
avg_sunrise_bin_date[window_date].append(diff_sunrise_bin_date[date_of_end.strftime("%Y-%m-%d")])
if len(avg_sunrise_bin_date[window_date]) > 2:
totals = sum(avg_sunrise_bin_date[window_date])
elements = len(avg_sunrise_bin_date[window_date])
averages = totals / elements
info = {"bin": i, "value": averages}
window_csv_body2.append(info)
After the calculation, data can be saved to a file and a plot generated (Continuation of the same function).
This should be done outside the loop ffor i in range(1, 30)
avg_window_device_path2 = main_path2 + "csv/"
csv_path = avg_window_device_path2 + "_scater_plot_data.csv"
with open(csv_path, "w") as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=window_csv_header2)
writer.writeheader()
writer.writerows(window_csv_body2)
avg_window_device_path3 = main_path2 + "png/"
os.makedirs(avg_window_device_path3, exist_ok=True)
The stage of creating plots for a multi bin N after sunrise.
df = pd.read_csv(csv_path, decimal=".")
x_value, y_value = "bin", "value"
save_path = avg_window_device_path3 + "scater_plot.png"
title = "n_days: %s; shift: %s H; scater_plot; numbering starts at sunrise." % (n_days, round(shift / 3600, 2))
create_plot(x_value, y_value, df, save_path, title, lines="no", opt=2)
The first difference is the additional loop "for i in range(1, 30):" before looping through the elements of the sunrise_bin_date dictionary "for window_date in sunrise_bin_date:".
The current one uses a loop from 1 to 30, the range of the loop can be changed or even set by creating a variable e.g. bin_min and bin_max - note that bin_min cannot be less than 1 and bin_max cannot be greater than the average number of windows between one sunrise and the next (e.g. for a window size of 15 min this would be a maximum of 96).
The create_plot function will be described in section 2.4
2.4 create_plot
def create_plot(x_value, y_value, df, png_save_path, title, lines="yes", opt=1):
if opt == 1:
window_plot = df.plot.line(x=x_value, y=y_value, grid=True, title=title)
else:
# window_plot = df.plot.line(x='N_day', y='meant', xticks=xticks, grid=True, title=title)
# window_plot = df.plot(x='N_day', y='meant', grid=True, title=title,style='.-')
window_plot = df.plot(x=x_value, y=y_value, grid=True, title=title, style='.')
if lines == "yes":
# Part to be refined in the future, showing vertical lines
window_plot.axhline(0, color="red", linestyle="-") # pozioma linia
window_plot.axvline(days_from_2000("2010-01-01"), color="yellow", linestyle="--")
window_plot.axvline(days_from_2000("2015-01-01"), color="green", linestyle="--")
window_plot.axvline(days_from_2000("2020-01-01"), color="yellow", linestyle="--")
fig = window_plot.get_figure()
fig.savefig(png_save_path)
plt.cla() # Clear the current axes.
plt.clf() # Clear the current figure.
plt.close(fig) # Closes all the figure windows.
For a start, 2 ifs are introduced.
The first determines the type of graph - if the 'opt' value given when calling the function is other than 1 then the graph will be point or bar, if 1 then linear - used for example in the analyze_window function.
The second if determines whether vertical lines should appear on the plot - indicating a given date or e.g. the beginning of a year.
Finally, we save the created image to a file and clear the library cache. Detailed documentation on editing and creating plot images is available on the official matplotlib library website
__ CR-EQ correlation __
Remember!
In order to obtain the correct results obtained in our publication: arXiv:2204.12310, DOI:10.1016/j.jastp.2023.106068 , it is very important to download and use the data at the frequency of the results given in the publication.
For Pierre Auger it is 15 min, for Mosc and Oulu it is 6 hours (with the option "corr_for_efficiency").
You can download these data using the scripts available on our internal website:
https://redmine.credo.science/projects/external-data/files
You may find it helpful to consult the documentation available on the website:
https://user.credo.science/user-interface/credo_data_tools/doc/external_data/
Plot no.1
3.0 Note
To look for the correlations between the detection rates of secondary cosmic rays and seismic activity we explored the Pierre Auger Observatory scaler data(13) compared to selected stations of the Neutron Monitor Database (NMDB)(17), and to the earthquake data from the U.S.Geological Survey (USGS) database(18).
In addition, as a reference for the space weather situation, solar activity data were taken into account, available from the Solar Influences Data analysis Center (SIDC)(19).
All the data sets used within this study are illustrated in Fig. 1 using a binning relevant for the analysis to be presented subsequently.
source: "Observation of large scale precursor correlations between cosmic rays and earthquakes with a periodicity similar to the solar cycle",
doi.org/10.1016/j.jastp.2023.106068
Before you start using this script make sure you combine all the data of interest from the station/source into 1 single file ie:
Pierre Auger Observatory: opendata.auger.org/data.php#scalers
Neutron Monitor database: nmdb.eu/nest
USGS earthquakes catalog: earthquake.usgs.gov/earthquakes/search
SILSO sunspot number data: sidc.be/silso/datafiles
If you havent done so, run the script from the "join_data" directory described in the External_data/How join data package.
3.1 Libraries
from datetime import date, datetime, timedelta
from csv import DictReader
import matplotlib.pyplot as plt
import numpy as np
Refine the parameter settings for the chart (optional)
plt.rcParams.update({'font.size': 22})
We will use the following libraries to create the plot.
- datetime: will allow us to convert date and time from files to time format which will be on the axis
- matplotlib: the most well-known library for creating charts
- csv and numpy - will be responsible for loading data from files and properly preparing data from multiple files for plotting.
Comment:
Although for operations on data from spreadsheets (e.g. CSV) pandas is perfectly suitable and IMHO better and faster, for the charting process it is not suitable. Why?
Because we load a lot of data which, through the DataFrame structure, very slows down the creation of the graph by matplotlib which works better on numpy array structure.
It is possible that this problem can be solved, but that is not the purpose of this documentation.
3.2 Read_data_plot
Declare path and create empty lists
start_path = "/CredoDataTools/External_data/join_data/raw_data/"
# create list
silso_date, silso_sunspot = [], []
silso_ms_date, silso_ms_sunspot = [], []
eq_date, eq_sum = [], []
auger_date, auger_corr = [], []
oulu_date, oulu_value = [], []
mosc_date, mosc_value = [], []
read_data_plot()
def read_data_plot():
data_file = DictReader(open(start_path + "SN_d_tot.csv"))
# sunspot
for row in data_file:
date_s = row["date"].split("-")
silso_date.append(datetime(int(date_s[0]), int(date_s[1]), 1))
silso_sunspot.append(float(row["sunspot"])*1.3)
data_file = DictReader(open(start_path + "SN_ms_tot.csv"))
# sunspot smootheed
for row in data_file:
date_s = row["date"].split("-")
silso_ms_date.append(datetime(int(date_s[0]), int(date_s[1]), 1))
silso_ms_sunspot.append(float(row["sunspot"])*1.3)
# earthquakes
data_file = DictReader(open(start_path + "eq_data_msum.csv"))
for row in data_file:
if len(row["mean"]) > 0:
date_s = row["date"].split("-")
eq_date.append(datetime(int(date_s[0]), int(date_s[1]), 1))
eq_sum.append(float(row["mean"]))
# auger
data_file = DictReader(open(start_path + "auger_data.csv"))
for row in data_file:
auger_date.append(datetime.strptime(row["datetime"], "%Y-%m-%d %H:%M:%S"))
auger_corr.append(float(row["rateCorr"]))
# oulu
data_file = DictReader(open(start_path + "oulu_data_5Dm.csv"))
for row in data_file:
if len(row["mean"]) > 0:
oulu_date.append(datetime.strptime(row["date"], "%Y-%m-%d"))
oulu_value.append(float(row["mean"])*1.6)
# mosc
data_file = DictReader(open(start_path + "mosc_data_5Dm.csv"))
for row in data_file:
if len(row["mean"]) > 0:
mosc_date.append(datetime.strptime(row["date"], "%Y-%m-%d"))
mosc_value.append(float(row["mean"])*0.9)
#
fig, ax = plt.subplots(figsize=(20,14))
ax.plot(auger_date, auger_corr, label='Auger', color='blue')
ax.plot(oulu_date, oulu_value, label='OULU x1.6', color='green')
ax.plot(mosc_date, mosc_value, label='Mosc x0.9',linestyle="",marker="o", color="red")
ax.set(ylim=(100, 260))
ax.set_ylabel('Mean cosmic detects (5 days)')
ax.set_xlabel('Time (years)')
ax1 = ax.twinx()
# right side Y axis
ax1.plot(eq_date, eq_sum, label='Earthquake', color='magenta')
ax1.plot(silso_date, silso_sunspot, label='Sunspot')
ax1.plot(silso_ms_date, silso_ms_sunspot, label='Smoothed Sunspot')
ax1.set(ylim=(0, 2999))
ax.legend(loc='upper left')
ax1.set_ylabel('Global earthquake magnitude sum day, mean 5 days | Sunspot number')
plt.title('Correlation CR-EQ plot 1')
plt.legend()
fname = "cr_eq_1plot.png"
plt.savefig(fname, bbox_inches='tight')
example how run function
read_data_plot()
To begin with, clarify the path where you have all the necessary csv files - preferably in one place.
If you used our suggested script from the "External_data" package and saved the results in a directory with the same name as the mentioned package, the path to the data will be:
start_path = "$HOME/External_data/join_data/raw_data/"
The next step is to declare the corresponding lists of 2 for each dataset. One will always be the date and the other will be the value on the Y axis. These lists are as follows:
- SILSO: silso_date, silso_sunspot,
- SILSO ms: silso_ms_date, silso_ms_sunspot,
- Earthquakes: eq_date, eq_sum,
- Pierre Auger: auger_date, auger_corr,
- OULU: oulu_date, oulu_value,
- Mosc: mosc_date, mosc_value,
For each set, we do the same action.
First, using csv.DictReader, we read the csv file, format the date string into a datetime date format and finally add the date and value to the corresponding list.
When we have all the data loaded, using the matplotlib library (as plt) we add the data to the plot, edit the axes, descriptions.
To create a graph with two different Y axes (on the right and left side of the graph) we use the "twinx" option by creating a new value ax1 = ax.twinx()
. Note that creating layers superimposes the new chart on top of the old one - so its a good idea to work on the legend positions for all layers here: ax.legend(loc='upper left')
.
And the end save the data to the file "cr_eq_1plot.png" - it will be where you will have your script.
Plot no.2
4.1 Introduction
Before we create a chart, we need to prepare the data and calculate the PDF and CDF statistics (the probability of a given result occurring for a given condition).
For this purpose, we will create the "pdf_values.py" script, which you can traditionally download from our internal website redmine.credo.science/projects/data_time_operation-read_scaler/files
NOTE:
THE SCRIPT GIVES APPROXIMATE RESULTS, NOT EXACTLY AS PUBLISHED - IF YOU FIND A SOLUTION TO THIS PROBLEM CONTACT US
(contact[at]credo.science)
4.2pdf_values
Import libraries
import numpy as np
import pandas as pd
import math
from datetime import datetime, date, timedelta, timezone
from scipy.stats import binom
PDF function
def pdf(d, l, t, shift, shift2, station, save_file=False):
# *** EARTHQUAKES PART ***
d = pd.to_datetime(d) + timedelta(days=shift2)
# data konca jest ustawiana na zasadzie dodania l(335) okien zawierajacych shift (5) dni
d_end = d + timedelta(days=l * int(t[:-1]))
eq_df = pd.DataFrame() # zwalniamy jakby pamiec
eq_df = pd.read_csv('eq_data.csv', sep=",")
eq_df['time'] = pd.to_datetime(eq_df['time'])
eq_df = eq_df[(eq_df['time'] >= d - timedelta(days=shift))]
eq_df.index = range(len(eq_df))
new_row = pd.DataFrame({'time': pd.to_datetime(d) - timedelta(days=shift), 'latitude': 0, 'longitude': 0, 'mag': 0},
index=[0])
# eq_df = pd.concat([new_row, eq_df]).reset_index(drop=True)
eq_df = eq_df[(eq_df['time'] <= pd.to_datetime(d_end))]
eq_df['date'] = eq_df['time']
eq_df = eq_df[["date", 'mag']]
eq_df = eq_df.groupby(pd.Grouper(key='date', freq=t, origin='start'), as_index=False)['mag'].agg(["count", "sum"])
eq_df.at[0, 'count'] = eq_df.at[0, 'count'] - 1
# *** COSMIC RAYS PART ***
d = d - timedelta(days=shift2)
d_end = d + timedelta(days=l * int(shift))
read_name_csv = 'auger_data.csv'
if station == "Mosc":
read_name_csv = 'mosc_data.csv'
if station == "Oulu":
read_name_csv = 'oulu_data.csv'
cr_df = pd.DataFrame()
cr_df = pd.read_csv(read_name_csv, sep=",")
if station == "PA":
timer = 'time'
cr_df['date'] = pd.to_datetime(cr_df[timer], unit='s')
new_row = pd.DataFrame({'time': 0, 'rateCorr': 0, 'arrayFraction': 0, 'rateUncorr': 0, 'pressure': 0,
'date': pd.to_datetime(d) - timedelta(days=shift)}, index=[0])
else:
timer = 'datetime'
cr_df['date'] = pd.to_datetime(cr_df[timer], format='%Y-%m-%d %H:%M:%S')
new_row = pd.DataFrame({'time': 0, 'value': 0, 'date': pd.to_datetime(d) - timedelta(days=shift)}, index=[0])
cr_df = cr_df[(cr_df['date'] >= (pd.to_datetime(d)) - timedelta(days=shift))]
cr_df.index = range(len(cr_df))
# cr_df = pd.concat([new_row, cr_df]).reset_index(drop=True)
cr_df = cr_df[(cr_df['date'] <= pd.to_datetime(d_end))]
if station == "PA":
cr_df = cr_df.groupby(pd.Grouper(key='date', freq=t, origin='start'), as_index=False)['rateCorr'].agg(
['mean', 'sum', "count"])
else:
cr_df = cr_df.groupby(pd.Grouper(key='date', freq=t, origin='start'), as_index=False)['value'].agg(
['mean', 'sum', "count"])
c = pd.DataFrame()
c['cr date'] = cr_df['date']
c['eq date'] = eq_df['date']
c['cr sum'] = cr_df['sum']
c['cr count'] = cr_df['count']
c['eq sum'] = eq_df['sum']
c["cr mean"] = cr_df['mean']
c['cr delta3'] = abs(c['cr mean'].diff())
c.index = range(len(c))
c = c[(c['cr mean'] > 0) & (c['cr delta3'] > 0)]
c['cr median'] = c['cr delta3'].median()
c['eq median'] = c['eq sum'].median()
c['eq diff'] = c.apply(lambda x: round(x["eq sum"] - x['eq median'], 4), axis=1)
c['A'] = (c['eq sum'] / c['eq median']) - 1
c['B'] = (c['cr delta3'] / c['cr median']) - 1
c['C'] = np.sign(c['A'] * c['B'])
c.index = range(len(c))
if save_file:
file_name = "pdf_%s_%s.csv" % (station, str(d).replace(":", "-"))
coulumn = ["cr date", "eq date", "cr mean", "eq sum", "cr delta3", "eq median", "cr median", "eq diff", "A",
"B", "C"]
c.to_csv(file_name, index=False, columns=coulumn)
# CDF
tmp_c = c[(c['C'] >= 0)]
CDF = 1 - binom.cdf(len(tmp_c), len(c), 0.5, 1)
print("%s CDF: "% station, CDF, "N:", len(c), "N+:", len(tmp_c), "N-:", len(c) - len(tmp_c), )
# PDF
n = c['C'].value_counts()[1]
N = len(c)
PDF = math.comb(N, n) * ((1 / 2) ** N)
# print("datetime dla %s: " % station, d, d_end, N)
return PDF, d_end
First example how run function
print("PDF: ", pdf('2014-01-04 23:37:12', 335, '5D', 5, 15, "Oulu", True))
print("PDF: ", pdf('2014-04-02 22:07:12', 335, '5D', 5, 15, "PA", True))
print("PDF: ", pdf('2013-11-14 07:00:00', 335, '5D', 5, 15, "Mosc", True))
When running a function, we have to give it a few values to start with. These are:
:param d: datetime begin (for Cosmic Rays station)
:param l: number of elements/windows counted from the datetime specified in "d"
:param t: With what frequency should the data be grouped, e.g. every 5 days -> "5D"
:param shift: Small window (5 days)
:param shift2: Large window (15 days - corresponds to the shift between EQ and CR)
:param station: For which station / cosmic ray observatory do we calculate correlations
:param save_file: Whether the time series and results are to be saved to a file, default "False"
First, we prepare data for Earthquakes.
As the EQ data are shifted by 15 days compared to CR, we shift the onset for earthquakes by shift2 (equal to 15).
Then we create an end date that is "l" elements away from the start date. Where the single "l" is the result of "Shift" days.
In our example, for 6 sigma, the numbers for "l = 355" and "shift = 5"
Having set the start and end date, we load the file containing the EQ data, extract only the results greater than the start date, then delete the data after the end date. We also add a "zero" element which we will sacrifice to create row indexing. Then we create a dataframe element containing only two columns ("date", "mag") and finally we group by key (date) with frequency "t" and create two new columns "count" - the number of elements "grouped" into 1 element and "sum" - the sum of elements in a given time series.
We do the same for the Cosmic Rays data.
we must remember that the CR data is shifted by 15 days relative to EQ, so we subtract Shift2 from the current "d"
(used for EQ => d = d - timedelta(days=shift2); ), depending on the station, we read the appropriate columns and finally group.
Then we calculate the differences between adjacent rows for CR and call them "delta3" - this corresponds to the difference of 3 small windows. So in the lines where the datetime for EQ is 2014-01-19 23:37:12 we have the result for the CR window: 2014-01-04 23:37:12.
It is also worth mentioning that a particular window, e.g. 2014-01-04 23:37:12 with Shift=5, contains data from:
2013-12-31 23:37:13 to 2014-01-04 23:37:12.
And delta3 for the line with date= 2014-01-04 23:37:12 is the result of the difference of the window 2014-01-04 23:37:12 with the window 2013-12-31 23:37:13
Then we calculate the median (from the "sum" column) for Earthquakes and for Cosmic Radiation (from "delta3"). After calculating delta3, lets check if there are any empty values - i.e. those where the sum of the elements is not greater than 0. If there are, we remove them (this means that even though we have 335 time series, we often use less for the final calculations).
After preparing all the things, we can calculate the result for each row according to the formula:
c['A'] = c['eq sum'] / c['eq median'] - 1
c['B'] = c['cr delta3'] / c['cr median'] - 1
c['c'] = np.sign(c['A'] * c['B'])
and we calculate CDF
tmp_c = c[(c['c'] > 0)]
CDF = 1 - binom.cdf(len(tmp_c), len(c), 0.5, 1)
and PDF
n = c['c'].value_counts()[1]
N = len(c)
PDF = math.comb(N, n) * ((1 / 2) ** N)
First lines of the file pdf_PA_2014-04-02 22-07-12.csv (pdf('2014-04-02 22:07:12', 335, '5D', 5, 15, "PA", True))
To prepare the data for plot no.2, run (Second example) the script where
data_start = '2004-12-31 00:00:00'.
start = {
'PA': "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 22, 7, 12),
'Mosc': "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 7, 0, 0),
"Oulu": "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 23, 37, 12)
}
2015, 7, -> d_end = 2020-01
NOTE:
THE SCRIPT GIVES APPROXIMATE RESULTS, NOT EXACTLY AS PUBLISHED - IF YOU FIND A SOLUTION TO THIS PROBLEM CONTACT US
(contact[at]credo.science)
Second example how run function
start = {"PA": "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 22, 7, 12),
"Mosc": "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 7, 0, 0),
"Oulu": "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 23, 37, 12)
}
def main():
stations = ["PA", "Oulu", "Mosc"]
for station in stations:
print(station)
data = pd.DataFrame()
date = pd.to_datetime(start[station])
data_start = '2004-12-31 00:00:00'
# 2019-06-30 22:07:12, 2019-06-29 22:07:12 .. 1970-01-30 22:07:12
while date >= pd.to_datetime(data_start):
row, d_end = pdf(date.strftime("%Y-%m-%d %H:%M:%S"), 335, '5D', 5, 15, station)
row = pd.DataFrame({'date': date, 'pdf': row}, index=[0])
data = pd.concat([row, data]).reset_index(drop=True)
name = 'pdf_data_%s.csv' % station
data.to_csv(name, index=False)
date = date - timedelta(days=1)
if __name__ == '__main__':
main()
First lines of the file pdf_data_PA.csv
4.3 Read_data_plot
Import libraries and declaration of data lists
from datetime import date, datetime, timedelta
from csv import DictReader
import matplotlib.pyplot as plt
import numpy as np
silso_date, silso_sunspot = [], []
silso_ms_date, silso_ms_sunspot = [], []
auger_date, auger_value = [], []
oulu_date, oulu_value = [], []
mosc_date, mosc_value = [], []
plt.rcParams.update({'font.size': 22})
date_begin = datetime.strptime("2010-01-01", '%Y-%m-%d')
date_end = datetime.strptime("2020-01-01", '%Y-%m-%d')
def read_data_plot():
data_file = DictReader(open(start_path + "SN_d_tot.csv"))
for row in data_file:
date_s = row["date"].split("-")
date_s2 = row["date"] + "-01"
if date_end >= datetime.strptime(date_s2, '%Y-%m-%d') >= date_begin:
silso_date.append(datetime(int(date_s[0]), int(date_s[1]), 1))
silso_sunspot.append(float(row["sunspot"]))
# evry time rewrite data_file thanks this we have more memory to use for plot
data_file = DictReader(open(start_path + "SN_ms_tot.csv"))
for row in data_file:
date_s = row["date"].split("-")
date_s2 = row["date"] + "-01"
if date_end >= datetime.strptime(date_s2, '%Y-%m-%d') >= date_begin:
silso_ms_date.append(datetime(int(date_s[0]), int(date_s[1]), 1))
silso_ms_sunspot.append(float(row["sunspot"])) # *1.3) - we create plot 1 from our publication on correlation EQ/CR
data_file = DictReader(open(start_path + "pdf_data_PA.csv"))
for row in data_file:
date_s = row["date"].split(" ")[0]
if datetime.strptime(date_s, "%Y-%m-%d") >= date_begin:
auger_date.append(datetime.strptime(row["date"], "%Y-%m-%d %H:%M:%S"))
auger_value.append(float(row["pdf"]))
data_file = DictReader(open(start_path + "pdf_data_Oulu.csv"))
for row in data_file:
date_s = row["date"].split(" ")[0]
if datetime.strptime(date_s, "%Y-%m-%d") >= date_begin:
oulu_date.append(datetime.strptime(row["date"], "%Y-%m-%d %H:%M:%S"))
oulu_value.append(float(row["pdf"]))
data_file = DictReader(open(start_path + "pdf_data_Mosc.csv"))
for row in data_file:
date_s = row["date"].split(" ")[0]
if datetime.strptime(date_s, "%Y-%m-%d") >= date_begin:
mosc_date.append(datetime.strptime(row["date"], "%Y-%m-%d %H:%M:%S"))
mosc_value.append(float(row["pdf"]))
fig, ax = plt.subplots(figsize=(22,16))
ax.plot(auger_date, auger_value, label='Auger',linestyle="--",color='blue')
ax.plot(oulu_date, oulu_value, label='OULU',linestyle="--", color='green',alpha=0.8)
ax.plot(mosc_date, mosc_value, label='Mosc', linestyle="--",color="red",alpha=0.6)
ax.set(ylim=(10**-14,10**-1))
ax.set(xlim=(datetime.strptime("2010-01-01", "%Y-%m-%d"),datetime.strptime("2020-01-01", "%Y-%m-%d")))
ax.set_yscale('log')
ax.set_ylabel('log_10 (P_PDF)')
ax.set_xlabel('Time (years)')
plt.grid()
ax1 = ax.twinx()
# right side Y axis
ax1.plot(silso_date, silso_sunspot, label='Sunspot')
ax1.plot(silso_ms_date, silso_ms_sunspot, label='Smoothed Sunspot')
ax1.set(ylim=(0, 200))
ax.legend(loc='upper left')
ax1.set_ylabel('Sunspot number')
plt.title('Correlation CR-EQ plot 2')
# Show the legend
plt.legend()
fname = "cr_eq_2plot.png"
plt.savefig(fname, bbox_inches='tight')
For the graph we load the sunspot data (SN_ms_tot.csv and SN_d_tot.csv) from the file that was used for Plot no. 1. A pdf_data_PA, pdf_data_Oulu, pdf_data_Mosc comes from the script pdf_values.py
Another difference is the setting of the Y axis to a logarithmic scale.
example how run function
def main():
read_data_plot()
if __name__ == '__main__':
main()
Plot no.3
5.1 Introduction
Before we create a chart, we need to prepare the data and calculate the PDF and CDF statistics (the probability of a given result occurring for a given condition).
For this purpose, we will create the "pdf_values.py" script, which you can traditionally download from our internal website redmine.credo.science/projects/data_time_operation-read_scaler/files
NOTE:
THE SCRIPT GIVES APPROXIMATE RESULTS, NOT EXACTLY AS PUBLISHED - IF YOU FIND A SOLUTION TO THIS PROBLEM CONTACT US
(contact[at]credo.science)
ss
Shift dt eq PDF Values
def main():
station = "PA"
data = pd.DataFrame()
for i in range(30, -31, -1):
if 20 > i > 10:
row, d_end = pdf('2014-04-02 22:07:12', 335, '5D', 5, i, "PA")
row = pd.DataFrame({'dt_eq': i, 'pdf': row}, index=[0])
data = pd.concat([row, data]).reset_index(drop=True)
print(i)
else:
if abs(i % 5) == 0:
row, d_end = pdf('2014-04-02 22:07:12', 335, '5D', 5, i, "PA")
row = pd.DataFrame({'dt_eq': i, 'pdf': row}, index=[0])
data = pd.concat([row, data]).reset_index(drop=True)
name = 'shift_dt_pdf_data_%s.csv' % station
data.to_csv(name, index=False)
if name == 'main':
main()
Here we use the same function as for Plot No. 2. Here we only change the offset of CR to EQ.
auger_date, auger_value = [], []
plt.rcParams.update({'font.size': 22})
def read_data_plot():
data_file = DictReader(open(start_path + "shift_dt_pdf_data_PA.csv"))
for row in data_file:
auger_date.append(row["dt_eq"])
auger_value.append(float(row["pdf"]))
fig, ax = plt.subplots(figsize=(20,14))
ax.plot(auger_date, auger_value, label='Auger', color='blue')
ax.plot(auger_date, auger_value, linestyle="", marker="*", color='black')
# ax.set(ylim=(1*10^-11, 1))
# ax.set(xlim=(-30,30,1))
ax.set_yscale('log')
ax.set_ylabel('log_10 (P_PDF)')
ax.set_xlabel('Time (years)')
# right side Y axis
ax.legend(loc='upper left')
plt.title('Correlation CR-EQ plot 3')
plt.grid()
plt.legend()
fname = "cr_eq_3plot.png"
plt.savefig(fname, bbox_inches='tight')