NAV
python

CREDO Data Tools

Welcome to the CREDO Data Tools project documentation. Here you will find a description with examples of work carried out by people from the CREDO project - both in the form of description and code.

If any part of the documentation is not clear (not understandable) contact the creator (Slawomir Stuglik) of the documentation or write an email to contact[at]credo.science

Download scripts

All files presented in this documentation can be downloaded from the CREDO projec internal page (redmine).

Direct link to the files.

Data_time_operation

The data_time_operation.py script contains several useful functions concerning operations on datatime and timestamp

Dont forget to always install the libraries that are imported

Libraries

import os
import math
from time import mktime
from datetime import datetime, date, timedelta, timezone

Libraries that are used in the following functions math -> used in calculations for the Sun class and functions calcSunTime time and datetime -> basic libraries for time and date operations

1.1 set_start_end

def set_start_end(data_start, data_end):
    timestamp_start = int(mktime(datetime.strptime(data_start, '%Y-%m-%d %H:%M').timetuple()))
    timestamp_end = int(mktime(datetime.strptime(data_end, '%Y-%m-%d %H:%M').timetuple()))
    return timestamp_start, timestamp_end

example how to use the function

set_start_end("2023-07-01 00:00", "2023-07-05 00:00")

Example what we should get

1688162400 1688508000

timestamp you can check on the website: https://www.unixtimestamp.com

1.2 create_window

def create_window(day=0, hour=0, minutes=15, seconds=0, starter="2005-01-01 00:00", ender="2023-02-20 00:00"):
    start, meta = set_start_end(starter, ender)
    shift = day * 86400 + hour * 3600 + minutes * 60 + seconds
    window = {}
    small_window = {}
    for timestamp in range(start, meta, shift):
        datatime_detect = datetime.utcfromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
        window[datatime_detect] = []
    for timestamp in range(0, 86400, shift):
        small_window[timestamp] = []

    return shift, window, small_window

example how to use the function

begin, end = set_start_end("2023-07-01 00:00", "2023-07-05 00:00")
create_window(day=0, hour=0, minutes=10, starter=begin_date, ender=end_date)

Example what we should get
! Remember you enter a local datetime and it will be converted to UTC datetime zone 0

600, {'2023-06-30 22:00:00': [], '2023-06-30 22:10:00': [], ... , '2023-07-04 21:40:00': [], '2023-07-04 21:50:00': []}, {0: [], 600: [], ..., 85200: [], 85800: []})

shift -> calculate the size of the window in seconds by substituting the given numbers into the following equation
window -> dictionary with all dates and times in the range begin, end
small_window -> dictionary of all windows of the day (24 hours)

datetime.utcfromtimestamp(timestamp) -> utcfromtimestamp - will give us a datetime in zone 0 (UTC), same fromtimestamp will give us the local time in which you are working, e.g. for me it will be the current time in Poland including time zones - so I advise against using local time - especially when you are working with people from other countries.

window[datatime_detect] -> the window variable has a dictionary structure, while the single element has a list structure. It looks roughly like this { 'elem1': [ ], 'elem2': [ ]}

small_window[timestamp] -> the small_window variable has the same structure as the window variable, but contains the timestamp (10 characters, seconds) corresponding to the days and minutes of the day - range 00:00 to 23:59

1.3 days_from_2000

def days_from_2000(data_s):    
    day_zero = date(2000, 1, 1)
    year, month, day = data_s.split("-")
    day_s = date(int(year), int(month), int(day))
    delta = day_s - day_zero
    return delta.days

return delta.days because the delta variable itself is of the form

9 days, 0:00:00

example how to use the function

days_from_2000("2000-01-10")

Example what we should get

9

day_zero - date 2000-01-01 - date from which we count how many days passed until the date given by the user. The given date (when calling the function) is in the form of a String variable in the YYYY-MM-DD format, therefore we must first divide the variable into 3 elements year, month, day and then create a time variable using the "date" function. This allows us to calculate the difference in days between two dates (delta) by simply subtracting the dates.

1.4 date_range

def date_range(start_date, meta_date):
    for n in range(int((meta_date - start_date).days)):
        yield start_date + timedelta(n)

example how to use the function

date_end = datetime.strptime("2023-07-05", "%Y-%m-%d").date()
date_start = (date_end - timedelta(days=5))
for single_date in date_range(date_start, date_end):
    one_date = single_date.strftime("%Y-%m-%d")
    print(one_date)

Example what we should get

2023-06-30
2023-07-01
2023-07-02
2023-07-03
2023-07-04

1.5 sunrise_bin_window

def sunrise_bin_window(shift, window_list, coordinates):
    sun = Sun()
    sunrise_bin_date = {}
    sunrise_dates = {}
    date_window_old = "2005-02-01"
    sunrise_old = ""
    diff = shift
    for datatime_window in window_list:
        window_date = str(datatime_window).split(" ")[0]
        timer = str(datatime_window).split(" ")[1]
        sunrise = sun.getSunriseTime(coordinates, ndate=window_date)['time_str'] + ":00"
        if window_date not in sunrise_bin_date:
            sunrise_bin_date[window_date] = []
            sunrise_dates[window_date] = sunrise
        t0 = datetime.strptime(timer, "%H:%M:%S")
        t1 = datetime.strptime(sunrise, "%H:%M:%S")
        delta_t = t1 - t0
        if sunrise_old > timer and date_window_old != window_date and abs(delta_t.total_seconds()) > diff:
            sunrise_bin_date[date_window_old].append(datatime_window)
        else:
            sunrise_old = sunrise
            date_window_old = window_date
        if sunrise < timer or 0 <= delta_t.total_seconds() < diff:
            sunrise_bin_date[window_date].append(datatime_window)

    return sunrise_bin_date, sunrise_dates

example how to use the function

shift = 600
coordinates = {'longitude': 138.60414, 'latitude': -34.919159 }
# window from functions create_window - see 1.2
sunrise_bin_date, sunrise_dates = sunrise_bin_window(shift, window, coordinates)

Example what we should get for sunrise_bin_date for 1 element

{'2023-06-30': [
    '2023-06-30 22:00:00', 
    '2023-06-30 22:10:00', 
    ..., 
    '2023-07-01 21:50:00'
    ]
}

Example what we should get for sunrise_dates from 2023-07-04 to 2023-07-11

{
    '2023-07-04': '22:02:00', 
    '2023-07-05': '22:01:00', 
    ..., 
    '2023-07-09': '22:01:00', 
    '2023-07-10': '22:00:00'
}

To start with, we need to create a variable which is a Sun() class. More about the Sun() class in section 1.6
We also need to declare the following variables:
sunrise_bin_date -> dictionary contains information about the individual windows from the "window" dictionary that are in a given sunrise for a given day, e.g. 2023-07-05 (see 1.2 create_window).
sunrise_dates -> dictionary contains information about the time of sunrise for a given date
date_window_old -> To begin with, it must be earlier than the start date. After that, it will be continuously updated. This is an auxiliary variable to make it easier to assign a given window to the corresponding element of the sunrise_bin_date dictionary.
For example, if the sunrise for 2023-07-05 is at 07:02 then the 05:30 window belongs to the sunrise of 2023-07-04.
sunrise_old -> Sunrise for day N - 1 ("yesterday"). Format HH:MM:00 e.g. 10:48:00
diff -> information on how big the window is (in seconds; 15 min -> 900)

The calculation is made by looping through the elements of the window_list dictionary, a single element is e.g.: 2013-03-30 08:30:00.
We break each single element into a date part (window_date, e.g. 2013-03-30) and a time part (, e.g. 08:30:00).
We pass the date to the getSunriseTime function of class Sun() and get the result in different options - we are interested in the time_str option to which we add ":00" -> then the result we get is of the form HH:MM:SS e.g.: 10:48:00 (the Sunrise function is therefore accurate to the minute).
For each window_date we check if it exists in the sunrise_bin_date dictionary, if not we create a new dictionary element which will be a list (sunrise_bin_date[window_date] = []) - Then we will add the datatime of individual windows to the list.
We also add to the sunrise_dates dictionary the window_date which has the value of the sunrise variable (information when the sun rises).
It is necessary to additionally handle the case of 'zero' (first) windows, i.e. those in which sunrise occurs - so we will calculate delta_t which is the difference of the window time (timer) and sunrise (sunrise). Two ifs are then created.
The first concerns the case if applies to windows for a new day, but before a new sunrise.
And the second if applies if the sunrise time is less than the time of the window being analysed or the difference between the two times is less than the diff. (window size).

1.6 Class Sun

class Sun:
    def getSunriseTime(self, coords, ndate="now"):
        return self.calcSunTime(coords, True, ndate)
    def getSunsetTime(self, coords, ndate="now"):
        return self.calcSunTime(coords, False, ndate)
    def getCurrentUTC(self, ndate):
        if ndate == "now":
            now = datetime.now(timezone.utc)
        else:
            now = datetime.strptime(ndate, '%Y-%m-%d').replace(tzinfo=timezone.utc)
        return [now.day, now.month, now.year]
    def calcSunTime(self, coords, isRiseTime, ndate, zenith=90):
        day, month, year = self.getCurrentUTC(ndate)
        longitude = coords['longitude']
        latitude = coords['latitude']
        TO_RAD = math.pi / 180
        # 1. first calculate the day of the year
        N1 = math.floor(275 * month / 9) 
        N2 = math.floor((month + 9) / 12)
        N3 = (1 + math.floor((year - 4 * math.floor(year / 4) + 2) / 3))
        N = N1 - (N2 * N3) + day - 30
        # 2. convert the longitude to hour value and calculate an approximate time
        lngHour = longitude / 15
        if isRiseTime:
            t = N + ((6 - lngHour) / 24)
        else:  # sunset
            t = N + ((18 - lngHour) / 24)
        # 3. calculate the Sun's mean anomaly
        M = (0.9856 * t) - 3.289
        # 4. calculate the Sun's true longitude
        L = M + (1.916 * math.sin(TO_RAD * M)) + (0.020 * math.sin(TO_RAD * 2 * M)) + 282.634
        L = self.forceRange(L, 360)
        # 5a. calculate the Sun's right ascension
        RA = (1 / TO_RAD) * math.atan(0.91764 * math.tan(TO_RAD * L))
        RA = self.forceRange(RA, 360)
        # 5b. right ascension value needs to be in the same quadrant as L
        Lquadrant = (math.floor(L / 90)) * 90
        RAquadrant = (math.floor(RA / 90)) * 90
        RA = RA + (Lquadrant - RAquadrant)
        # 5c. right ascension value needs to be converted into hours
        RA = RA / 15
        # 6. calculate the Sun's declination
        sinDec = 0.39782 * math.sin(TO_RAD * L)
        cosDec = math.cos(math.asin(sinDec))
        # 7a. calculate the Sun's local hour angle
        cosH = (math.cos(TO_RAD * zenith) - (sinDec * math.sin(TO_RAD * latitude))) / (
                cosDec * math.cos(TO_RAD * latitude))
        if cosH > 1:
            return {'status': False, 'msg': 'the sun never rises on this location (on the specified date)'}
        if cosH < -1:
            return {'status': False, 'msg': 'the sun never sets on this location (on the specified date)'}
        # 7b. finish calculating H and convert into hours
        if isRiseTime:
            H = 360 - (1 / TO_RAD) * math.acos(cosH)
        else:  # setting
            H = (1 / TO_RAD) * math.acos(cosH)
        H = H / 15
        # 8. calculate local mean time of rising/setting
        T = H + RA - (0.06571 * t) - 6.622
        # 9. adjust back to UTC
        UT = T - lngHour
        UT = self.forceRange(UT+0.05, 24)  # UTC time in decimal format (e.g. 23.23)
        # 10. Return
        hr = int(self.forceRange(int(UT), 24))
        min = int(round((UT - int(UT)) * 60, 0))
        if min == 60: # (in decimal 4.99673 -> will round as 4 H 60 min instead of 5H 0 min, we fix this
            hr += 1
            min = 0
        if hr < 10:
            hr = "0" + str(hr)
        if min < 10:
            min = "0" + str(min)
        time_str = "%s:%s" % (hr, min)
        return {
            'status': True,
            'decimal': UT,
            'hr': hr,
            'min': min,
            'time_str': time_str
        }
    def forceRange(self, v, max):
        # force v to be >= 0 and < max
        if v < 0:
            return v + max
        elif v >= max:
            return v - max
        return v

example how to use the function

sun = Sun()
coords = {'longitude': 138.60414, 'latitude': -34.919159 }  # 35.2N 137.0E
sunrise = sun.getSunriseTime(coords, ndate="2023-07-01")['time_str'] + ":00"  # i.e 10:48:00
sunset = sun.getSunsetTime(coords, ndate="2023-07-01")['time_str'] + ":00"

print("sunrise: ",sunrise, "sunset: ",sunset, "time in UTC")

Example what we should get

sunrise:  22:02:00 sunset:  07:43:00 time in UTC

A class that allows the calculation of sunrise and sunset based on the obtained geographical coordinates (latitude and longitude) and date.
The class has the following functions:
getSunriseTime -> will return the sunrise time for a given date and geographical coordinates.
getSunsetTime -> will return the sunset time for a given date and geographical coordinates.
In both functions the main function is called - the difference is only in the status of the isRiseTime variable - if False it returns sunsetTime getCurrentUTC -> Specifies the date it is to take for calculations. If "now" then it will take today is date, otherwise it will take the date specified when the function was called.
forceRange - The function check that the result is in the correct range (min,max)
calcSunTime -> Main function - needs information to start such as: geographic coordinates - coords , sunrise or sunset - isRiseTime, date under investigation - ndate, zenith angle - zenith. The calculation is performed in 10 steps:

  1. calculate the day of the year
  2. N1 = math.floor(275 * month / 9)
    N2 = math.floor((month + 9) / 12)
    N3 = (1 + math.floor((year - 4 * math.floor(year / 4) + 2) / 3))
    N = N1 - (N2 * N3) + day - 30
    You can find an explanation of the calculations and this formula on the [astronomy.stackexchange.com] (https://astronomy.stackexchange.com/questions/2407/calculate-day-of-the-year-for-a-given-date)
  3. convert the longitude to hour value (longitude / 15) and calculate an approximate time
  4. calculate the Suns mean anomaly
  5. M = (0.9856 * t) - 3.289
  6. calculate the Suns true longitude
  7. L = M + (1.916 * sin(M)) + (0.020 * sin(2 * M)) + 282.634
    Adjust L to be in the range from 0 to 360
    L = self.forceRange(L, 360)
  8. calculate the Suns right ascension RA = (1 / TO_RAD) * math.atan(0.91764 * math.tan(TO_RAD * L))
    Adjust RA to be in the range from 0 to 360
    RA = self.forceRange(RA, 360) b) right ascension value needs to be in the same quadrant as L
    Lquadrant = (math.floor(L / 90)) * 90
    RAquadrant = (math.floor(RA / 90)) * 90
    RA = RA + (Lquadrant - RAquadrant)
    at the end right ascension value needs to be converted into hours (RA / 15)
  9. calculate the Suns declination
  10. sinDec = 0.39782 * math.sin(TO_RAD * L)
    cosDec = math.cos(math.asin(sinDec))
  11. calculate the Suns local hour angle
  12. cosH = (math.cos(TO_RAD * zenith) - (sinDec * math.sin(TO_RAD * latitude))) / ( cosDec * math.cos(TO_RAD * latitude))
    Finish calculating H depending on whether you want sunrise or sunset.
    if isRiseTime:
    H = 360 - (1 / TO_RAD) * math.acos(cosH)
    else:
    H = (1 / TO_RAD) * math.acos(cosH)

    Finally, convert to hours (H / 15)
  13. calculate local mean time of rising/setting
  14. T = H + RA - (0.06571 * t) - 6.622
  15. adjust back to UTC
  16. UT = T - lngHour
    Added to the time +0.05 - the formula has an inaccuracy of a few tens of succs ~1 minute - so this reduces its inaccuracy (compare with sunrise/sunset results from internet data).
    UT = self.forceRange(UT+0.05, 24) # UTC time in decimal format (e.g. 23.23)
  17. Return
  18. hr = int(self.forceRange(int(UT), 24)) min = int(round((UT - int(UT)) * 60, 0))
    Compared to the original code, options have been added to gain minutes and hours and time_str

_____ Examles _____


Below you will find some examples of how to analyze data using the above suggested functions.

Ex-I: Read_scaler

Analysis of public data from the Pierre Auger Observatory, csv file downloaded from opendata.auger.org

The "scaler.csv" file contains:
time: Unix time in seconds (seconds since 1st Jan 1970, w/o leap seconds)
rateCorr: corrected scaler rate [counts/m2/s]
arrayFraction: Fraction of array in operation [procent]
rateUncorr: average detector scaler rate, uncorrected [counts/s]
pressure: barometric pressure [hPa]

Purpose of the script:
Counting the average number of particles from N_days and calculating the difference between two adjacent (A; A-1) bins after sunrise

2.0 libraries

import csv
import os
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime, timedelta
from Functions.data_time_operation import Sun, sunrise_bin_window
from Functions.data_time_operation import set_start_end, create_window, days_from_2000, date_range
from path_links import main_path, data_analysis_path

Dont forget to always install the libraries that are imported Libraries that are used in the following functions matplotlib -> a library for creating a plot, a graph with results
csv and pandas - Library for saving and loading data
data_time_operation - Useful functions described in Chapter 1 of this documentation
datetime -> basic libraries for time and date operations
path_links - If you create a lot of scripts, it is a good idea to have all the paths in one separate file - so that if any changes are made, the changes are made in one file and not, for example, in 10.

const elements

main_path2 = data_analysis_path + "pierre_auger/"
pg_data = main_path2 + "scalers.csv"
begin_date, end_date = "2005-01-03 00:00", "2020-01-01 00:00"
begin, end = set_start_end(begin_date, end_date)
w_d, w_h, w_m = 0, 0, 15
coordinates = {'longitude': -69.182460, 'latitude': -35.275999}
analysed_bin = 1

main_path2,pg_data - path to files
begin_date, end_date -> date will be passed to the function create_window
w_d, w_h, w_m -> size of a single window (days, hours, minutes )
coordinates - Pierre Auger coordinates (Latitude: -34.27 S, Longitude: -69.18 W)
analysed_bin - id of the bin for which we are counting the difference

2.1 Read_csv

def read_csv():
    shift, window, small_window = create_window(day=w_d, hour=w_h, minutes=w_m, starter=begin_date, ender=end_date)
    csvfile = open(pg_data)
    reader = csv.DictReader(csvfile)
    for row in reader:
        timestamp = int(row['time'])
        detects = row['rateCorr']
        if timestamp > begin:
            timestamp_in_window = timestamp - timestamp % shift
            # utcfromtimestamp - da nam datetime w strefie 0
            data_time = datetime.utcfromtimestamp(timestamp_in_window).strftime('%Y-%m-%d %H:%M:%S')
            if data_time in window:  # data_time = 2014-09-16 09:00:00; window[data_time] = [179.43]
                window[data_time].append(float(detects))
    sunrise_bin_date, sunrise_dates = sunrise_bin_window(shift, window, coordinates)
    analyze_window(shift, window, sunrise_bin_date, sunrise_dates, n_days=100)
    scater_analyze_window(shift, window, sunrise_bin_date, n_days=100)

After running the sunrise_bin_window (function described in section 1.5), we have a sunrise_bin_date dictionary containing all dates in the interval of interest where a single element is the date with windows (15 min) , between one sunrise and the next sunrise.
(As a reminder, sunrise_dates is a dictionary with sunrise times for dates in our range).

When we have all the required stuff we can run the analyze_window function which will do all the calculations we are interested in and finish the job by saving the results to a file and creating a plot.
Finally, we will additionally (run scater_analyze_window and) create a 'scater plot' to show what the results look like for the given intervals after sunrise. Both functions will be explained in detail in sections 2.2 and 2.3.

2.2 analyze_window

def analyze_window(shift, window_list, sunrise_bin_date, sunrise_dates, n_days=100):
    diff_sunrise_bin_date = {}  # For a given analysed_bin we calculate the bin1-bin0 differences
    for window_date in sunrise_bin_date:
        time_bin1 = sunrise_bin_date[window_date][analysed_bin]  # i.e 2005-01-03 09:30:00
        time_bin0 = sunrise_bin_date[window_date][analysed_bin - 1]  # i.e 2005-01-03 09:15:00
        if len(window_list[time_bin1]) > 0 and len(window_list[time_bin0]) > 0:
            bin_1 = window_list[time_bin1][0]  # i.e 188.954
            bin_0 = window_list[time_bin0][0]  # i.e 188.954
            diff_detects = round(bin_1 - bin_0, 3)
            diff_sunrise_bin_date[window_date] = diff_detects
    avg_sunrise_bin_date = {}
    sum_sunrise_bin_date = {}
    active_sunrise_bin_date = {}
    for window_date in sunrise_bin_date:
        avg_sunrise_bin_date[window_date] = []
        date_of_end = datetime.strptime(window_date, "%Y-%m-%d").date()
        start_date = (date_of_end - timedelta(days=n_days - 1))
        for single_date in date_range(start_date, date_of_end):
            current_date = single_date.strftime("%Y-%m-%d")
            if current_date in diff_sunrise_bin_date:
                avg_sunrise_bin_date[window_date].append(diff_sunrise_bin_date[current_date])
        if date_of_end.strftime("%Y-%m-%d") in diff_sunrise_bin_date:
            avg_sunrise_bin_date[window_date].append(diff_sunrise_bin_date[date_of_end.strftime("%Y-%m-%d")])
        if len(avg_sunrise_bin_date[window_date]) > 2:
            totals = sum(avg_sunrise_bin_date[window_date])
            sum_sunrise_bin_date[window_date] = totals
            elements = len(avg_sunrise_bin_date[window_date])
            active_sunrise_bin_date[window_date] = avg_sunrise_bin_date[window_date]
            averages = totals / elements
            avg_sunrise_bin_date[window_date] = averages
        else:
            avg_sunrise_bin_date[window_date] = []

After the calculation, data can be saved to a file and a plot generated (Continuation of the same function).
This should be done outside the loop for window_date in sunrise_bin_date:

    i = 0
    window_csv_header = ["N_day", "date", "sunrise", "bin_1", "bin_2", "delta2_1", "sum", "active", "avg_2_1"]
    window_csv_body = []
    days = 1943 
    for window_date in sunrise_bin_date:
        days_old = days
        days = int(days_from_2000(window_date))
        elements_n = sunrise_bin_date[window_date]
        time_bin2 = elements_n[analysed_bin]
        time_bin1 = elements_n[analysed_bin - 1] 
        if len(window_list[time_bin1]) > 0 and len(window_list[time_bin2]) > 0:
            if days - days_old == 1:
                bin_2 = window_list[time_bin2][0]
                bin_1 = window_list[time_bin1][0]
                delta2_1 = round(bin_2 - bin_1, 3)
                avg_2_1 = avg_sunrise_bin_date[window_date]
                if str(avg_2_1) != "[]":
                    suma = sum_sunrise_bin_date[window_date]
                    active = active_sunrise_bin_date[window_date]
                    info = {"N_day": days, "date": window_date, "sunrise": sunrise_dates[window_date],
                            "bin_1": bin_1, "bin_2": bin_2, "delta2_1": delta2_1, "sum": suma,
                            "active": active, "avg_2_1": avg_2_1}
                    window_csv_body.append(info)
                days_old = days
        i += 1
    avg_window_device_path2 = main_path2 + "csv/"
    csv_path = avg_window_device_path2 + str(analysed_bin) + "_new_control_bin_window_" + str(n_days) + "_days.csv"
    os.makedirs(avg_window_device_path2, exist_ok=True)
    with open(csv_path, "w") as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=window_csv_header)
        writer.writeheader()
        writer.writerows(window_csv_body)

The stage of creating plots for a single bin N after sunrise.

    avg_window_device_path3 = main_path2 + "png/"
    os.makedirs(avg_window_device_path3, exist_ok=True)
    df = pd.read_csv(csv_path, decimal=".")
    x_value, y_value = 'N_day', 'avg_2_1'
    save_path = avg_window_device_path3 + str(analysed_bin) + "_bin_window_" + str(n_days) + "_days_" + str(
        int(shift / 60)) + "_shift.png"
    title = "n_days: %s; shift: %s H; bin_id: %s; numbering starts at sunrise." % (
        n_days, round(shift / 3600, 2), analysed_bin)
    create_plot(x_value, y_value, df, save_path, title)

To start, we do a loop through the elements (window_date) of the sunrise_bin_date dictionary (each element is a date and a list of windows with datetime windows after sunrise) and for each element-date we take the value N and N-1 (id) after sunrise.
Id equal to 0 has the window where sunrise is, id equal to 1 is the first window after sunrise and add to diff_sunrise_bin_date[window_date].

Example (N = 1, 2023-07-05):
Sunrise is at 07:06 then the zero window has a time of 07:00 and the first window after sunrise 07:15 then:
time_bin1 = sunrise_bin_date["2023-07-05"][1] -> 2023-07-05 07:15:00
time_bin0 = sunrise_bin_date["2023-07-05"][0] -> 2023-07-05 07:00:00

Having the datetime of two windows, we retrieve the values from the window_list dictionary (bin_1 and bin_0 values) for the given times and calculate the differences bin_1 - bin_0 for them.
A single item in the window_list dictionary is a list, so even if it is a single-element list (if the window is every 15 minutes and the data from the file is every 15 minutes, we have a single-element list), we have to refer to it by its position in the list - the first element is 0, i.e. window_list[time_bin1][0].

After calculating the differences for our bin for all dates, we can proceed to the stage where we will calculate the average of "n_days" for each date. But for starters, we are going to create 3 dictionaries:
avg_sunrise_bin_date -> average of variances of n_day
sum_sunrise_bin_date -> sum of the n_day (100) most recent days
active_sunrise_bin_date ->list of differences in values over the last N-days

Then, in the loop after the elements of the sunrise_bin_date dictionary, we will add a new element to the avg_sunrise_bin_date dictionary, which will be the currently examined date. Such an element is a list that will contain n_days (e.g. 100 - for an average of 100 days) of results. The date_range function (described in section 1.4) will be used to check which results need to be added. It needs 2 dates (start_date, end_date) that define the date range. We will use each date received from this function to add the result to the list for our date (window_date):
avg_sunrise_bin_date[window_date].append(diff_sunrise_bin_date[current_date])

You need to remember that date_range will not return the end date in the list of dates, so with this function we will get n_days -1 and not n_days of dates (for n_days=100 it will be 99 instead of 100) and we will add the end date manually.
After creating a list of differences (avg_sunrise_bin_date) for a given "window_date" the average (sums / len(elements)) is calculated. Currently, it is set that the result is analysed if there is a minimum of 2 of the n_days (the condition applies to the beginning of the set or the date after a long break - if it would exist). The result of the average will be written to avg_sunrise_bin_date (the list of differences from n_days to average is substituted).

Once all the averages have been calculated, the data can be saved to a file and a plot generated.

Comment on the write-to-file stage:
days = 1943 -> fixing the day number since 2000, so far rigid - this can be refined. This way we have the days in the file day by day. This number is needed for the first value - then it sets itself. Data can be saved with multiple libraries - this function uses the CSV library. However, the pandas library will be used to load the data for the chart. Because thanks to the Pandas library it is easier to declare the so called "separator / decimal " character separating an integer from the rest, e.g. in CSV, Excel the separator is a "comma", in python it is a "dot", e.g. 1.31 -> decimal = "."

plot of mean differences

The create_plot function will be described in section 2.4

2.3 scater_analyze_window

def scater_analyze_window(shift, window_list, sunrise_bin_date, n_days=100):
    window_csv_header2 = ["bin", "value"]
    scater_plot_raw_data = {}
    window_csv_body2 = []
    diff_sunrise_bin_date = {}
    for i in range(1, 30):
        for window_date in sunrise_bin_date:
            if len(sunrise_bin_date[window_date]) > i:
                time_bin1 = sunrise_bin_date[window_date][i]  # i.e 2005-01-03 09:30:00
                time_bin0 = sunrise_bin_date[window_date][i - 1]    # i.e 2005-01-03 09:15:00
                if len(window_list[time_bin1]) > 0 and len(window_list[time_bin0]) > 0:
                    bin_1 = window_list[time_bin1][0]  # i.e 188.954
                    bin_0 = window_list[time_bin0][0]  # i.e 188.954
                    diff_detects = round(bin_1 - bin_0, 3)
                    diff_sunrise_bin_date[window_date] = diff_detects
        avg_sunrise_bin_date = {}  # average of variances of n_day
        for window_date in sunrise_bin_date:
            avg_sunrise_bin_date[window_date] = []
            date_of_end = datetime.strptime(window_date, "%Y-%m-%d").date()
            start_date = (date_of_end - timedelta(days=n_days - 1))
            for single_date in date_range(start_date, date_of_end):
                current_date = single_date.strftime("%Y-%m-%d")
                if current_date in diff_sunrise_bin_date:
                    avg_sunrise_bin_date[window_date].append(diff_sunrise_bin_date[current_date])
            if date_of_end.strftime("%Y-%m-%d") in diff_sunrise_bin_date:
                avg_sunrise_bin_date[window_date].append(diff_sunrise_bin_date[date_of_end.strftime("%Y-%m-%d")])
            if len(avg_sunrise_bin_date[window_date]) > 2:
                totals = sum(avg_sunrise_bin_date[window_date])
                elements = len(avg_sunrise_bin_date[window_date])
                averages = totals / elements
                info = {"bin": i, "value": averages}
                window_csv_body2.append(info)

After the calculation, data can be saved to a file and a plot generated (Continuation of the same function).
This should be done outside the loop ffor i in range(1, 30)

    avg_window_device_path2 = main_path2 + "csv/"
    csv_path = avg_window_device_path2 + "_scater_plot_data.csv"
    with open(csv_path, "w") as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=window_csv_header2)
        writer.writeheader()
        writer.writerows(window_csv_body2)
    avg_window_device_path3 = main_path2 + "png/"
    os.makedirs(avg_window_device_path3, exist_ok=True)

The stage of creating plots for a multi bin N after sunrise.

    df = pd.read_csv(csv_path, decimal=".")
    x_value, y_value = "bin", "value"
    save_path = avg_window_device_path3 + "scater_plot.png"
    title = "n_days: %s; shift: %s H; scater_plot; numbering starts at sunrise." % (n_days, round(shift / 3600, 2))
    create_plot(x_value, y_value, df, save_path, title, lines="no", opt=2)

The first difference is the additional loop "for i in range(1, 30):" before looping through the elements of the sunrise_bin_date dictionary "for window_date in sunrise_bin_date:".
The current one uses a loop from 1 to 30, the range of the loop can be changed or even set by creating a variable e.g. bin_min and bin_max - note that bin_min cannot be less than 1 and bin_max cannot be greater than the average number of windows between one sunrise and the next (e.g. for a window size of 15 min this would be a maximum of 96).

scater plot

The create_plot function will be described in section 2.4

2.4 create_plot

def create_plot(x_value, y_value, df, png_save_path, title, lines="yes", opt=1):
    if opt == 1:
        window_plot = df.plot.line(x=x_value, y=y_value, grid=True, title=title)
    else:
        # window_plot = df.plot.line(x='N_day', y='meant', xticks=xticks, grid=True, title=title)
        # window_plot = df.plot(x='N_day', y='meant', grid=True, title=title,style='.-')
        window_plot = df.plot(x=x_value, y=y_value, grid=True, title=title, style='.')
    if lines == "yes":
        # Part to be refined in the future, showing vertical lines
        window_plot.axhline(0, color="red", linestyle="-")  # pozioma linia
        window_plot.axvline(days_from_2000("2010-01-01"), color="yellow", linestyle="--")
        window_plot.axvline(days_from_2000("2015-01-01"), color="green", linestyle="--")
        window_plot.axvline(days_from_2000("2020-01-01"), color="yellow", linestyle="--")
    fig = window_plot.get_figure()
    fig.savefig(png_save_path)
    plt.cla()  # Clear the current axes.
    plt.clf()  # Clear the current figure.
    plt.close(fig)  # Closes all the figure windows.

For a start, 2 ifs are introduced.
The first determines the type of graph - if the 'opt' value given when calling the function is other than 1 then the graph will be point or bar, if 1 then linear - used for example in the analyze_window function.
The second if determines whether vertical lines should appear on the plot - indicating a given date or e.g. the beginning of a year.

Finally, we save the created image to a file and clear the library cache. Detailed documentation on editing and creating plot images is available on the official matplotlib library website


__ CR-EQ correlation __

Remember!

In order to obtain the correct results obtained in our publication: arXiv:2204.12310, DOI:10.1016/j.jastp.2023.106068 , it is very important to download and use the data at the frequency of the results given in the publication.

For Pierre Auger it is 15 min, for Mosc and Oulu it is 6 hours (with the option "corr_for_efficiency").

You can download these data using the scripts available on our internal website:
https://redmine.credo.science/projects/external-data/files
You may find it helpful to consult the documentation available on the website:
https://user.credo.science/user-interface/credo_data_tools/doc/external_data/

Plot no.1

3.0 Note

To look for the correlations between the detection rates of secondary cosmic rays and seismic activity we explored the Pierre Auger Observatory scaler data(13) compared to selected stations of the Neutron Monitor Database (NMDB)(17), and to the earthquake data from the U.S.Geological Survey (USGS) database(18).

In addition, as a reference for the space weather situation, solar activity data were taken into account, available from the Solar Influences Data analysis Center (SIDC)(19).

All the data sets used within this study are illustrated in Fig. 1 using a binning relevant for the analysis to be presented subsequently.

source: "Observation of large scale precursor correlations between cosmic rays and earthquakes with a periodicity similar to the solar cycle",
doi.org/10.1016/j.jastp.2023.106068

Before you start using this script make sure you combine all the data of interest from the station/source into 1 single file ie:
Pierre Auger Observatory: opendata.auger.org/data.php#scalers
Neutron Monitor database: nmdb.eu/nest
USGS earthquakes catalog: earthquake.usgs.gov/earthquakes/search
SILSO sunspot number data: sidc.be/silso/datafiles

If you havent done so, run the script from the "join_data" directory described in the External_data/How join data package.

Expected plot: Plot 1

3.1 Libraries

from datetime import date, datetime, timedelta
from csv import DictReader
import matplotlib.pyplot as plt
import numpy as np

Refine the parameter settings for the chart (optional)

plt.rcParams.update({'font.size': 22})

We will use the following libraries to create the plot.
- datetime: will allow us to convert date and time from files to time format which will be on the axis
- matplotlib: the most well-known library for creating charts
- csv and numpy - will be responsible for loading data from files and properly preparing data from multiple files for plotting.

Comment:
Although for operations on data from spreadsheets (e.g. CSV) pandas is perfectly suitable and IMHO better and faster, for the charting process it is not suitable. Why?
Because we load a lot of data which, through the DataFrame structure, very slows down the creation of the graph by matplotlib which works better on numpy array structure.
It is possible that this problem can be solved, but that is not the purpose of this documentation.

3.2 Read_data_plot

Declare path and create empty lists

start_path = "/CredoDataTools/External_data/join_data/raw_data/"

# create list
silso_date, silso_sunspot = [], []
silso_ms_date, silso_ms_sunspot = [], []
eq_date, eq_sum = [], []
auger_date, auger_corr = [], []
oulu_date, oulu_value = [], []
mosc_date, mosc_value = [], []

read_data_plot()

def read_data_plot():
    data_file = DictReader(open(start_path + "SN_d_tot.csv"))
    #  sunspot
    for row in data_file:
        date_s = row["date"].split("-")
        silso_date.append(datetime(int(date_s[0]), int(date_s[1]), 1))
        silso_sunspot.append(float(row["sunspot"])*1.3)
    data_file = DictReader(open(start_path + "SN_ms_tot.csv"))
    #  sunspot smootheed
    for row in data_file:
        date_s = row["date"].split("-")
        silso_ms_date.append(datetime(int(date_s[0]), int(date_s[1]), 1))
        silso_ms_sunspot.append(float(row["sunspot"])*1.3)
    #  earthquakes
    data_file = DictReader(open(start_path + "eq_data_msum.csv"))
    for row in data_file:
        if len(row["mean"]) > 0:
            date_s = row["date"].split("-")
            eq_date.append(datetime(int(date_s[0]), int(date_s[1]), 1))
            eq_sum.append(float(row["mean"]))
    #  auger
    data_file = DictReader(open(start_path + "auger_data.csv"))
    for row in data_file:
        auger_date.append(datetime.strptime(row["datetime"], "%Y-%m-%d %H:%M:%S"))
        auger_corr.append(float(row["rateCorr"]))
    #  oulu
    data_file = DictReader(open(start_path + "oulu_data_5Dm.csv"))
    for row in data_file:
        if len(row["mean"]) > 0:
            oulu_date.append(datetime.strptime(row["date"], "%Y-%m-%d"))
            oulu_value.append(float(row["mean"])*1.6)
    #  mosc
    data_file = DictReader(open(start_path + "mosc_data_5Dm.csv"))
    for row in data_file:
        if len(row["mean"]) > 0:
            mosc_date.append(datetime.strptime(row["date"], "%Y-%m-%d"))
            mosc_value.append(float(row["mean"])*0.9)
    #        
    fig, ax = plt.subplots(figsize=(20,14))
    ax.plot(auger_date, auger_corr, label='Auger', color='blue')
    ax.plot(oulu_date, oulu_value, label='OULU x1.6', color='green')
    ax.plot(mosc_date, mosc_value, label='Mosc x0.9',linestyle="",marker="o", color="red")
    ax.set(ylim=(100, 260))
    ax.set_ylabel('Mean cosmic detects (5 days)')
    ax.set_xlabel('Time (years)')
    ax1 = ax.twinx()
    # right side Y axis
    ax1.plot(eq_date, eq_sum, label='Earthquake', color='magenta')
    ax1.plot(silso_date, silso_sunspot, label='Sunspot')
    ax1.plot(silso_ms_date, silso_ms_sunspot, label='Smoothed Sunspot')
    ax1.set(ylim=(0, 2999))
    ax.legend(loc='upper left')
    ax1.set_ylabel('Global earthquake magnitude sum day, mean 5 days | Sunspot number')
    plt.title('Correlation CR-EQ plot 1')
    plt.legend()
    fname = "cr_eq_1plot.png"
    plt.savefig(fname, bbox_inches='tight')

example how run function

read_data_plot()

To begin with, clarify the path where you have all the necessary csv files - preferably in one place.
If you used our suggested script from the "External_data" package and saved the results in a directory with the same name as the mentioned package, the path to the data will be:
start_path = "$HOME/External_data/join_data/raw_data/"

The next step is to declare the corresponding lists of 2 for each dataset. One will always be the date and the other will be the value on the Y axis. These lists are as follows:
- SILSO: silso_date, silso_sunspot,
- SILSO ms: silso_ms_date, silso_ms_sunspot,
- Earthquakes: eq_date, eq_sum,
- Pierre Auger: auger_date, auger_corr,
- OULU: oulu_date, oulu_value,
- Mosc: mosc_date, mosc_value,

For each set, we do the same action.
First, using csv.DictReader, we read the csv file, format the date string into a datetime date format and finally add the date and value to the corresponding list.

When we have all the data loaded, using the matplotlib library (as plt) we add the data to the plot, edit the axes, descriptions. To create a graph with two different Y axes (on the right and left side of the graph) we use the "twinx" option by creating a new value ax1 = ax.twinx(). Note that creating layers superimposes the new chart on top of the old one - so its a good idea to work on the legend positions for all layers here: ax.legend(loc='upper left').

And the end save the data to the file "cr_eq_1plot.png" - it will be where you will have your script.

Expected plot: Plot 1


Plot no.2

4.1 Introduction

Before we create a chart, we need to prepare the data and calculate the PDF and CDF statistics (the probability of a given result occurring for a given condition).

For this purpose, we will create the "pdf_values.py" script, which you can traditionally download from our internal website redmine.credo.science/projects/data_time_operation-read_scaler/files

NOTE:
THE SCRIPT GIVES APPROXIMATE RESULTS, NOT EXACTLY AS PUBLISHED - IF YOU FIND A SOLUTION TO THIS PROBLEM CONTACT US (contact[at]credo.science)

4.2pdf_values

Import libraries

import numpy as np
import pandas as pd
import math
from datetime import datetime, date, timedelta, timezone
from scipy.stats import binom

PDF function


def pdf(d, l, t, shift, shift2, station, save_file=False):    
    # ***  EARTHQUAKES PART  ***
    d = pd.to_datetime(d) + timedelta(days=shift2)
    # data konca jest ustawiana na zasadzie dodania l(335) okien zawierajacych shift (5) dni
    d_end = d + timedelta(days=l * int(t[:-1]))
    eq_df = pd.DataFrame() # zwalniamy jakby pamiec
    eq_df = pd.read_csv('eq_data.csv', sep=",")
    eq_df['time'] = pd.to_datetime(eq_df['time'])
    eq_df = eq_df[(eq_df['time'] >= d - timedelta(days=shift))]
    eq_df.index = range(len(eq_df))
    new_row = pd.DataFrame({'time': pd.to_datetime(d) - timedelta(days=shift), 'latitude': 0, 'longitude': 0, 'mag': 0},
                           index=[0])
    # eq_df = pd.concat([new_row, eq_df]).reset_index(drop=True)

    eq_df = eq_df[(eq_df['time'] <= pd.to_datetime(d_end))]
    eq_df['date'] = eq_df['time']
    eq_df = eq_df[["date", 'mag']]

    eq_df = eq_df.groupby(pd.Grouper(key='date', freq=t, origin='start'), as_index=False)['mag'].agg(["count", "sum"])
    eq_df.at[0, 'count'] = eq_df.at[0, 'count'] - 1
    # ***  COSMIC RAYS PART  ***
    d = d - timedelta(days=shift2)
    d_end = d + timedelta(days=l * int(shift))

    read_name_csv = 'auger_data.csv'
    if station == "Mosc":
        read_name_csv = 'mosc_data.csv'
    if station == "Oulu":
        read_name_csv = 'oulu_data.csv'

    cr_df = pd.DataFrame()
    cr_df = pd.read_csv(read_name_csv, sep=",")
    if station == "PA":
        timer = 'time'
        cr_df['date'] = pd.to_datetime(cr_df[timer], unit='s')
        new_row = pd.DataFrame({'time': 0, 'rateCorr': 0, 'arrayFraction': 0, 'rateUncorr': 0, 'pressure': 0,
                                'date': pd.to_datetime(d) - timedelta(days=shift)}, index=[0])
    else:
        timer = 'datetime'
        cr_df['date'] = pd.to_datetime(cr_df[timer], format='%Y-%m-%d %H:%M:%S')
        new_row = pd.DataFrame({'time': 0, 'value': 0, 'date': pd.to_datetime(d) - timedelta(days=shift)}, index=[0])

    cr_df = cr_df[(cr_df['date'] >= (pd.to_datetime(d)) - timedelta(days=shift))]
    cr_df.index = range(len(cr_df))

    # cr_df = pd.concat([new_row, cr_df]).reset_index(drop=True)

    cr_df = cr_df[(cr_df['date'] <= pd.to_datetime(d_end))]
    if station == "PA":
        cr_df = cr_df.groupby(pd.Grouper(key='date', freq=t, origin='start'), as_index=False)['rateCorr'].agg(
            ['mean', 'sum', "count"])
    else:
        cr_df = cr_df.groupby(pd.Grouper(key='date', freq=t, origin='start'), as_index=False)['value'].agg(
            ['mean', 'sum', "count"])

    c = pd.DataFrame()
    c['cr date'] = cr_df['date']
    c['eq date'] = eq_df['date']
    c['cr sum'] = cr_df['sum']
    c['cr count'] = cr_df['count']
    c['eq sum'] = eq_df['sum']
    c["cr mean"] = cr_df['mean']
    c['cr delta3'] = abs(c['cr mean'].diff())

    c.index = range(len(c))
    c = c[(c['cr mean'] > 0) & (c['cr delta3'] > 0)] 
    c['cr median'] = c['cr delta3'].median()


    c['eq median'] = c['eq sum'].median()
    c['eq diff'] = c.apply(lambda x: round(x["eq sum"] - x['eq median'], 4), axis=1)

    c['A'] = (c['eq sum'] / c['eq median']) - 1
    c['B'] = (c['cr delta3'] / c['cr median']) - 1
    c['C'] = np.sign(c['A'] * c['B'])

    c.index = range(len(c))

    if save_file:
        file_name = "pdf_%s_%s.csv" % (station, str(d).replace(":", "-"))
        coulumn = ["cr date", "eq date", "cr mean", "eq sum", "cr delta3", "eq median", "cr median", "eq diff", "A",
                   "B", "C"]
        c.to_csv(file_name, index=False, columns=coulumn)

    # CDF
    tmp_c = c[(c['C'] >= 0)]
    CDF = 1 - binom.cdf(len(tmp_c), len(c), 0.5, 1)
    print("%s CDF: "% station, CDF, "N:", len(c), "N+:", len(tmp_c), "N-:", len(c) - len(tmp_c), )

    # PDF
    n = c['C'].value_counts()[1]
    N = len(c)
    PDF = math.comb(N, n) * ((1 / 2) ** N)

    # print("datetime dla %s: " % station, d, d_end, N)
    return PDF, d_end

First example how run function

print("PDF: ", pdf('2014-01-04 23:37:12', 335, '5D', 5, 15, "Oulu", True))
print("PDF: ", pdf('2014-04-02 22:07:12', 335, '5D', 5, 15, "PA", True))
print("PDF: ", pdf('2013-11-14 07:00:00', 335, '5D', 5, 15, "Mosc", True))

When running a function, we have to give it a few values to start with. These are:
:param d: datetime begin (for Cosmic Rays station)
:param l: number of elements/windows counted from the datetime specified in "d"
:param t: With what frequency should the data be grouped, e.g. every 5 days -> "5D"
:param shift: Small window (5 days)
:param shift2: Large window (15 days - corresponds to the shift between EQ and CR)
:param station: For which station / cosmic ray observatory do we calculate correlations
:param save_file: Whether the time series and results are to be saved to a file, default "False"

First, we prepare data for Earthquakes.
As the EQ data are shifted by 15 days compared to CR, we shift the onset for earthquakes by shift2 (equal to 15). Then we create an end date that is "l" elements away from the start date. Where the single "l" is the result of "Shift" days. In our example, for 6 sigma, the numbers for "l = 355" and "shift = 5"

Having set the start and end date, we load the file containing the EQ data, extract only the results greater than the start date, then delete the data after the end date. We also add a "zero" element which we will sacrifice to create row indexing. Then we create a dataframe element containing only two columns ("date", "mag") and finally we group by key (date) with frequency "t" and create two new columns "count" - the number of elements "grouped" into 1 element and "sum" - the sum of elements in a given time series.

We do the same for the Cosmic Rays data.
we must remember that the CR data is shifted by 15 days relative to EQ, so we subtract Shift2 from the current "d"
(used for EQ => d = d - timedelta(days=shift2); ), depending on the station, we read the appropriate columns and finally group.

Then we calculate the differences between adjacent rows for CR and call them "delta3" - this corresponds to the difference of 3 small windows. So in the lines where the datetime for EQ is 2014-01-19 23:37:12 we have the result for the CR window: 2014-01-04 23:37:12.

It is also worth mentioning that a particular window, e.g. 2014-01-04 23:37:12 with Shift=5, contains data from:
2013-12-31 23:37:13 to 2014-01-04 23:37:12.

And delta3 for the line with date= 2014-01-04 23:37:12 is the result of the difference of the window 2014-01-04 23:37:12 with the window 2013-12-31 23:37:13

Then we calculate the median (from the "sum" column) for Earthquakes and for Cosmic Radiation (from "delta3"). After calculating delta3, lets check if there are any empty values - i.e. those where the sum of the elements is not greater than 0. If there are, we remove them (this means that even though we have 335 time series, we often use less for the final calculations).

After preparing all the things, we can calculate the result for each row according to the formula:
c['A'] = c['eq sum'] / c['eq median'] - 1
c['B'] = c['cr delta3'] / c['cr median'] - 1
c['c'] = np.sign(c['A'] * c['B'])

and we calculate CDF
tmp_c = c[(c['c'] > 0)]
CDF = 1 - binom.cdf(len(tmp_c), len(c), 0.5, 1)

and PDF
n = c['c'].value_counts()[1]
N = len(c)
PDF = math.comb(N, n) * ((1 / 2) ** N)


First lines of the file pdf_PA_2014-04-02 22-07-12.csv (pdf('2014-04-02 22:07:12', 335, '5D', 5, 15, "PA", True))
data Plot 2


To prepare the data for plot no.2, run (Second example) the script where
data_start = '2004-12-31 00:00:00'.
start = {
'PA': "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 22, 7, 12),
'Mosc': "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 7, 0, 0),
"Oulu": "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 23, 37, 12)
}
2015, 7, -> d_end = 2020-01
NOTE:
THE SCRIPT GIVES APPROXIMATE RESULTS, NOT EXACTLY AS PUBLISHED - IF YOU FIND A SOLUTION TO THIS PROBLEM CONTACT US (contact[at]credo.science)

Second example how run function

start = {"PA": "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 22, 7, 12),
         "Mosc": "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 7, 0, 0),
         "Oulu": "%s-%s-%s %s:%s:%s" % (2015, 7, 1, 23, 37, 12)
         }

def main():
    stations = ["PA", "Oulu", "Mosc"]
    for station in stations:
        print(station)
        data = pd.DataFrame()
        date = pd.to_datetime(start[station])
        data_start = '2004-12-31 00:00:00'
        # 2019-06-30 22:07:12, 2019-06-29 22:07:12 .. 1970-01-30 22:07:12
        while date >= pd.to_datetime(data_start):
            row, d_end = pdf(date.strftime("%Y-%m-%d %H:%M:%S"), 335, '5D', 5, 15, station)
            row = pd.DataFrame({'date': date, 'pdf': row}, index=[0])
            data = pd.concat([row, data]).reset_index(drop=True)
            name = 'pdf_data_%s.csv' % station
            data.to_csv(name, index=False)
            date = date - timedelta(days=1)


if __name__ == '__main__':
    main()


First lines of the file pdf_data_PA.csv
data Plot 2

4.3 Read_data_plot

Import libraries and declaration of data lists

from datetime import date, datetime, timedelta
from csv import DictReader
import matplotlib.pyplot as plt
import numpy as np

silso_date, silso_sunspot = [], []
silso_ms_date, silso_ms_sunspot = [], []
auger_date, auger_value = [], []
oulu_date, oulu_value = [], []
mosc_date, mosc_value = [], []

plt.rcParams.update({'font.size': 22})
date_begin = datetime.strptime("2010-01-01", '%Y-%m-%d')
date_end = datetime.strptime("2020-01-01", '%Y-%m-%d')
def read_data_plot():
    data_file = DictReader(open(start_path + "SN_d_tot.csv"))
    for row in data_file:
        date_s = row["date"].split("-")
        date_s2 = row["date"] + "-01"
        if date_end >= datetime.strptime(date_s2, '%Y-%m-%d') >= date_begin:
            silso_date.append(datetime(int(date_s[0]), int(date_s[1]), 1))
            silso_sunspot.append(float(row["sunspot"]))

    # evry time rewrite data_file thanks this we have more memory to use for plot
    data_file = DictReader(open(start_path + "SN_ms_tot.csv"))
    for row in data_file:
        date_s = row["date"].split("-")
        date_s2 = row["date"] + "-01"
        if date_end >= datetime.strptime(date_s2, '%Y-%m-%d') >= date_begin:
            silso_ms_date.append(datetime(int(date_s[0]), int(date_s[1]), 1))
            silso_ms_sunspot.append(float(row["sunspot"])) # *1.3) - we create plot 1 from our publication on correlation EQ/CR

    data_file = DictReader(open(start_path + "pdf_data_PA.csv"))
    for row in data_file:
        date_s = row["date"].split(" ")[0]
        if datetime.strptime(date_s, "%Y-%m-%d") >= date_begin:
            auger_date.append(datetime.strptime(row["date"], "%Y-%m-%d %H:%M:%S"))
            auger_value.append(float(row["pdf"]))

    data_file = DictReader(open(start_path + "pdf_data_Oulu.csv"))
    for row in data_file:
        date_s = row["date"].split(" ")[0]
        if datetime.strptime(date_s, "%Y-%m-%d") >= date_begin:
            oulu_date.append(datetime.strptime(row["date"], "%Y-%m-%d %H:%M:%S"))
            oulu_value.append(float(row["pdf"]))

    data_file = DictReader(open(start_path + "pdf_data_Mosc.csv"))
    for row in data_file:
        date_s = row["date"].split(" ")[0]
        if datetime.strptime(date_s, "%Y-%m-%d") >= date_begin:
            mosc_date.append(datetime.strptime(row["date"], "%Y-%m-%d %H:%M:%S"))
            mosc_value.append(float(row["pdf"]))

    fig, ax = plt.subplots(figsize=(22,16))
    ax.plot(auger_date, auger_value, label='Auger',linestyle="--",color='blue')
    ax.plot(oulu_date, oulu_value, label='OULU',linestyle="--", color='green',alpha=0.8)
    ax.plot(mosc_date, mosc_value, label='Mosc', linestyle="--",color="red",alpha=0.6)
    ax.set(ylim=(10**-14,10**-1))
    ax.set(xlim=(datetime.strptime("2010-01-01", "%Y-%m-%d"),datetime.strptime("2020-01-01", "%Y-%m-%d")))
    ax.set_yscale('log')
    ax.set_ylabel('log_10 (P_PDF)')
    ax.set_xlabel('Time (years)')
    plt.grid()
    ax1 = ax.twinx()
    # right side Y axis
    ax1.plot(silso_date, silso_sunspot, label='Sunspot')
    ax1.plot(silso_ms_date, silso_ms_sunspot, label='Smoothed Sunspot')
    ax1.set(ylim=(0, 200))
    ax.legend(loc='upper left')
    ax1.set_ylabel('Sunspot number')
    plt.title('Correlation CR-EQ plot 2')
    # Show the legend
    plt.legend()
    fname = "cr_eq_2plot.png"
    plt.savefig(fname, bbox_inches='tight')

For the graph we load the sunspot data (SN_ms_tot.csv and SN_d_tot.csv) from the file that was used for Plot no. 1. A pdf_data_PA, pdf_data_Oulu, pdf_data_Mosc comes from the script pdf_values.py

Another difference is the setting of the Y axis to a logarithmic scale.

example how run function

def main():
    read_data_plot()


if __name__ == '__main__':
    main()

2 plot


Plot no.3

5.1 Introduction

Before we create a chart, we need to prepare the data and calculate the PDF and CDF statistics (the probability of a given result occurring for a given condition).

For this purpose, we will create the "pdf_values.py" script, which you can traditionally download from our internal website redmine.credo.science/projects/data_time_operation-read_scaler/files

NOTE:
THE SCRIPT GIVES APPROXIMATE RESULTS, NOT EXACTLY AS PUBLISHED - IF YOU FIND A SOLUTION TO THIS PROBLEM CONTACT US (contact[at]credo.science)
ss

Shift dt eq PDF Values

def main():
    station = "PA"
    data = pd.DataFrame()
    for i in range(30, -31, -1):
        if 20 > i > 10:
            row, d_end = pdf('2014-04-02 22:07:12', 335, '5D', 5, i, "PA")
            row = pd.DataFrame({'dt_eq': i, 'pdf': row}, index=[0])
            data = pd.concat([row, data]).reset_index(drop=True)
            print(i)
        else:
            if abs(i % 5) == 0:
                row, d_end = pdf('2014-04-02 22:07:12', 335, '5D', 5, i, "PA")
                row = pd.DataFrame({'dt_eq': i, 'pdf': row}, index=[0])
                data = pd.concat([row, data]).reset_index(drop=True)
    name = 'shift_dt_pdf_data_%s.csv' % station
    data.to_csv(name, index=False)

if name == 'main': main()

Here we use the same function as for Plot No. 2. Here we only change the offset of CR to EQ.

auger_date, auger_value = [], []

plt.rcParams.update({'font.size': 22})

def read_data_plot(): data_file = DictReader(open(start_path + "shift_dt_pdf_data_PA.csv")) for row in data_file: auger_date.append(row["dt_eq"]) auger_value.append(float(row["pdf"])) fig, ax = plt.subplots(figsize=(20,14)) ax.plot(auger_date, auger_value, label='Auger', color='blue') ax.plot(auger_date, auger_value, linestyle="", marker="*", color='black') # ax.set(ylim=(1*10^-11, 1)) # ax.set(xlim=(-30,30,1)) ax.set_yscale('log') ax.set_ylabel('log_10 (P_PDF)') ax.set_xlabel('Time (years)') # right side Y axis ax.legend(loc='upper left') plt.title('Correlation CR-EQ plot 3') plt.grid() plt.legend() fname = "cr_eq_3plot.png" plt.savefig(fname, bbox_inches='tight')

data Plot 3