stelardataprofiler package¶
-
stelardataprofiler.
run_profile
(config: dict) → None[source]¶ This method executes the specified profiler and writes the resulting profile dictionary, and HTML if specified, based on a configuration dictionary.
Parameters: config (dict) – a dictionary with all configuration settings. Returns: None. Return type: None
-
stelardataprofiler.
profile_timeseries
(my_file_path: str, time_column: str, header: int = 0, sep: str = ', ', html_path: str = '', display_html: bool = False, mode: str = 'verbose') → dict[source]¶ This method performs profiling and generates a profiling dictionary for a given timeseries .csv file that exists in the given path.
Parameters: - my_file_path (str) – the path to a .csv file containing a datetime columns and one/multiple timeseries columns.
- time_column (str) – the name of the datetime column.
- header (str, optional) – row to use to parse column labels. Defaults to the first row. Prior rows will be discarded.
- sep (str, optional) – separator character to use for the csv.
- html_path (str, optional) – the file path where the html file will be saved.
- display_html (bool, optional) – a boolean that determines whether the html will be displayed in the output.
- mode (str, optional) – ‘default’ -> calculate tsfresh features for the timeseries and use them as variables (useful if many timeseries columns), ‘verbose’ -> use the timeseries as variables.
Returns: A dict which contains the results of the profiler for the timeseries data.
Return type: dict
-
stelardataprofiler.
profile_timeseries_with_config
(config: dict) → None[source]¶ This method performs profiling on timeseries data and write the resulting profile dictionary based on a configuration dictionary.
Parameters: config (dict) – a dictionary with all configuration settings. Returns: None. Return type: None
-
stelardataprofiler.
profile_tabular
(my_file_path: str, header: int = 0, sep: str = ', ', crs: str = 'EPSG:4326', longitude_column: str = None, latitude_column: str = None, wkt_column: str = None, html_path: str = '', display_html: bool = False) → dict[source]¶ This method performs profiling and generates a profiling dictionary for a given tabular .csv or .shp file that exists in the given path.
Parameters: - my_file_path (str) – the path to a .csv or .shp file containing different data types of columns.
- header (str, optional) – row to use to parse column labels. Defaults to the first row. Prior rows will be discarded.
- sep (str, optional) – separator character to use for the csv.
- crs (str, optional) – the Coordinate Reference System (CRS) represented as an authority string (eg “EPSG:4326”).
- longitude_column (str, optional) – the name of the longitude column.
- latitude_column (str, optional) – the name of the latitude column.
- wkt_column (str, optional) – the name of the column that has wkt geometries.
- html_path (str, optional) – the file path where the html file will be saved.
- display_html (bool, optional) – a boolean that determines whether the html will be displayed in the output.
Returns: A dict which contains the results of the profiler for the tabular data.
Return type: dict
-
stelardataprofiler.
profile_tabular_with_config
(config: dict) → None[source]¶ This method performs profiling on tabular and/or vector data and write the resulting profile dictionary based on a configuration dictionary.
Parameters: config (dict) – a dictionary with all configuration settings. Returns: None. Return type: None
-
stelardataprofiler.
profile_raster
(my_path: str, image_format: str = '.tif') → dict[source]¶ This method performs profiling and generates a profiling dictionary for either a single image or many images.
Parameters: - my_path (str) – the path to either an image file or a folder that has image files.
- image_format (str, optional) – the suffix of the images that exist in the folder if the given path is a folder path.
Returns: A dict which contains the results of the profiler for the image or images.
Return type: dict
-
stelardataprofiler.
profile_raster_with_config
(config: dict) → None[source]¶ This method performs profiling on raster data and write the resulting profile dictionary based on a configuration dictionary.
Parameters: config (dict) – a dictionary with all configuration settings. Returns: None. Return type: None
-
stelardataprofiler.
profile_text
(my_path: str, text_format: str = '.txt')[source]¶ This method performs profiling and generates a profiling dictionary for either a single text or many texts.
Parameters: - my_path (str) – the path to either a text file or a folder that has text files.
- text_format (str, optional) – the suffix of the texts that exist in the folder if the given path is a folder path.
Returns: A dict which contains the results of the profiler for the text or texts.
Return type: dict
-
stelardataprofiler.
profile_text_with_config
(config: dict) → None[source]¶ This method performs profiling on text data and write the resulting profile dictionary based on a configuration dictionary.
Parameters: config (dict) – a dictionary with all configuration settings. Returns: None. Return type: None
-
stelardataprofiler.
profile_hierarchical
(my_file_path: str) → dict[source]¶ This method performs profiling and generates a profiling dictionary for a given json file that exists in the given path.
Parameters: my_file_path (str) – the path to a json file. Returns: A dict which contains the results of the profiler for the json. Return type: dict
-
stelardataprofiler.
profile_hierarchical_with_config
(config: dict) → None[source]¶ This method performs profiling on hierarchical data and write the resulting profile dictionary based on a configuration dictionary.
Parameters: config (dict) – a dictionary with all configuration settings. Returns: None. Return type: None
-
stelardataprofiler.
profile_rdfGraph
(my_file_path: str, parse_format: str = 'application/rdf+xml')[source]¶ This method performs profiling and generates a profiling dictionary for a given rdf file that exists in the given path.
Parameters: - my_file_path (str) – the path to a rdf file.
- parse_format (str, optional) – the format of the rdf file. (see rdflib package to find the available formats e.g. ‘turtle’, ‘application/rdf+xml’, ‘n3’, ‘nt’, etc.)
Returns: A dict which contains the results of the profiler for the rdf.
Return type: dict
-
stelardataprofiler.
profile_rdfGraph_with_config
(config: dict) → None[source]¶ This method performs profiling on rdfGraph data and write the resulting profile dictionary based on a configuration dictionary.
Parameters: config (dict) – a dictionary with all configuration settings. Returns: None. Return type: None
-
stelardataprofiler.
profile_vista_rasters
(rhd_datapath: str, ras_datapath: str)[source]¶ This method performs profiling and generates a profiling dictionary for a given ras file that exists in the given path using the contents of a rhd file that exists in the given path.
Parameters: - rhd_datapath (str) – the path to a rhd file.
- ras_datapath (str) – the path to a ras file.
Returns: A dict which contains the results of the profiler for the ras.
Return type: dict
-
stelardataprofiler.
profile_vista_rasters_with_config
(config: dict) → None[source]¶ This method performs profiling on ras data and write the resulting profile dictionary based on a configuration dictionary.
Parameters: config (dict) – a dictionary with all configuration settings. Returns: None. Return type: None
-
stelardataprofiler.
prepare_mapping
(config: dict) → None[source]¶ This method prepares the suitable mapping for subsequent generation of the RDF graph, if “rdf” and “serialization” options are specified in config.
Parameters: config (dict) – a dictionary with all configuration settings. Returns: None. Return type: None
-
stelardataprofiler.
profile_single_raster
(my_file_path: str) → dict[source]¶ This method performs profiling and generates a profiling dictionary for an image file that exists in the given path.
Parameters: my_file_path (str) – the path to an image file. Returns: A dict which contains the results of the profiler for the image. Return type: dict
-
stelardataprofiler.
profile_multiple_rasters
(my_folder_path: str, image_format: str = '.tif') → dict[source]¶ This method performs profiling and generates a profiling dictionary for the image files that exist in the given folder path.
Parameters: - my_folder_path (str) – the path to a folder that has image files.
- image_format (str, optional) – the suffix of the images that exist in the given folder path.
Returns: A dict which contains the results of the profiler for the images.
Return type: dict
-
stelardataprofiler.
profile_single_text
(my_file_path: str) → dict[source]¶ This method performs profiling and generates a profiling dictionary for a text file that exists in the given path.
Parameters: my_file_path (str) – the path to a text file. Returns: A dict which contains the results of the profiler for the text. Return type: dict
-
stelardataprofiler.
profile_multiple_texts
(my_folder_path: str, text_format: str = 'txt') → dict[source]¶ This method performs profiling and generates a profiling dictionary for the text files that exist in the given folder path.
Parameters: - my_folder_path (str) – the path to a folder that has text files.
- text_format (str, optional) – the suffix of the texts that exist in the given folder path.
Returns: A dict which contains the results of the profiler for the texts.
Return type: dict
-
stelardataprofiler.
write_to_json
(output_dict: dict, output_file: Union[str, pathlib.Path]) → None[source]¶ Write the profile dictionary to a file.
Parameters: - output_dict (dict) – the profile dictionary that will writen.
- output_file (Union[str, Path]) – The name or the path of the file to generate including the extension (.json).
Returns: a dict which contains the results of the profiler for the texts.
Return type: dict
-
stelardataprofiler.
read_config
(json_file: str) → dict[source]¶ This method reads configuration settings from a json file. Configuration includes all parameters for input/output.
Parameters: json_file (str) – path to .json file that contains the configuration parameters. Returns: A dictionary with all configuration settings. Return type: dict