stelardataprofiler package

stelardataprofiler.run_profile(config: dict) → None[source]

This method executes the specified profiler and writes the resulting profile dictionary, and HTML if specified, based on a configuration dictionary.

Parameters:config (dict) – a dictionary with all configuration settings.
Returns:None.
Return type:None
stelardataprofiler.profile_timeseries(my_file_path: str, time_column: str, header: int = 0, sep: str = ', ', html_path: str = '', display_html: bool = False, mode: str = 'verbose') → dict[source]

This method performs profiling and generates a profiling dictionary for a given timeseries .csv file that exists in the given path.

Parameters:
  • my_file_path (str) – the path to a .csv file containing a datetime columns and one/multiple timeseries columns.
  • time_column (str) – the name of the datetime column.
  • header (str, optional) – row to use to parse column labels. Defaults to the first row. Prior rows will be discarded.
  • sep (str, optional) – separator character to use for the csv.
  • html_path (str, optional) – the file path where the html file will be saved.
  • display_html (bool, optional) – a boolean that determines whether the html will be displayed in the output.
  • mode (str, optional) – ‘default’ -> calculate tsfresh features for the timeseries and use them as variables (useful if many timeseries columns), ‘verbose’ -> use the timeseries as variables.
Returns:

A dict which contains the results of the profiler for the timeseries data.

Return type:

dict

stelardataprofiler.profile_timeseries_with_config(config: dict) → None[source]

This method performs profiling on timeseries data and write the resulting profile dictionary based on a configuration dictionary.

Parameters:config (dict) – a dictionary with all configuration settings.
Returns:None.
Return type:None
stelardataprofiler.profile_tabular(my_file_path: str, header: int = 0, sep: str = ', ', crs: str = 'EPSG:4326', longitude_column: str = None, latitude_column: str = None, wkt_column: str = None, html_path: str = '', display_html: bool = False) → dict[source]

This method performs profiling and generates a profiling dictionary for a given tabular .csv or .shp file that exists in the given path.

Parameters:
  • my_file_path (str) – the path to a .csv or .shp file containing different data types of columns.
  • header (str, optional) – row to use to parse column labels. Defaults to the first row. Prior rows will be discarded.
  • sep (str, optional) – separator character to use for the csv.
  • crs (str, optional) – the Coordinate Reference System (CRS) represented as an authority string (eg “EPSG:4326”).
  • longitude_column (str, optional) – the name of the longitude column.
  • latitude_column (str, optional) – the name of the latitude column.
  • wkt_column (str, optional) – the name of the column that has wkt geometries.
  • html_path (str, optional) – the file path where the html file will be saved.
  • display_html (bool, optional) – a boolean that determines whether the html will be displayed in the output.
Returns:

A dict which contains the results of the profiler for the tabular data.

Return type:

dict

stelardataprofiler.profile_tabular_with_config(config: dict) → None[source]

This method performs profiling on tabular and/or vector data and write the resulting profile dictionary based on a configuration dictionary.

Parameters:config (dict) – a dictionary with all configuration settings.
Returns:None.
Return type:None
stelardataprofiler.profile_raster(my_path: str, image_format: str = '.tif') → dict[source]

This method performs profiling and generates a profiling dictionary for either a single image or many images.

Parameters:
  • my_path (str) – the path to either an image file or a folder that has image files.
  • image_format (str, optional) – the suffix of the images that exist in the folder if the given path is a folder path.
Returns:

A dict which contains the results of the profiler for the image or images.

Return type:

dict

stelardataprofiler.profile_raster_with_config(config: dict) → None[source]

This method performs profiling on raster data and write the resulting profile dictionary based on a configuration dictionary.

Parameters:config (dict) – a dictionary with all configuration settings.
Returns:None.
Return type:None
stelardataprofiler.profile_text(my_path: str, text_format: str = '.txt')[source]

This method performs profiling and generates a profiling dictionary for either a single text or many texts.

Parameters:
  • my_path (str) – the path to either a text file or a folder that has text files.
  • text_format (str, optional) – the suffix of the texts that exist in the folder if the given path is a folder path.
Returns:

A dict which contains the results of the profiler for the text or texts.

Return type:

dict

stelardataprofiler.profile_text_with_config(config: dict) → None[source]

This method performs profiling on text data and write the resulting profile dictionary based on a configuration dictionary.

Parameters:config (dict) – a dictionary with all configuration settings.
Returns:None.
Return type:None
stelardataprofiler.profile_hierarchical(my_file_path: str) → dict[source]

This method performs profiling and generates a profiling dictionary for a given json file that exists in the given path.

Parameters:my_file_path (str) – the path to a json file.
Returns:A dict which contains the results of the profiler for the json.
Return type:dict
stelardataprofiler.profile_hierarchical_with_config(config: dict) → None[source]

This method performs profiling on hierarchical data and write the resulting profile dictionary based on a configuration dictionary.

Parameters:config (dict) – a dictionary with all configuration settings.
Returns:None.
Return type:None
stelardataprofiler.profile_rdfGraph(my_file_path: str, parse_format: str = 'application/rdf+xml')[source]

This method performs profiling and generates a profiling dictionary for a given rdf file that exists in the given path.

Parameters:
  • my_file_path (str) – the path to a rdf file.
  • parse_format (str, optional) – the format of the rdf file. (see rdflib package to find the available formats e.g. ‘turtle’, ‘application/rdf+xml’, ‘n3’, ‘nt’, etc.)
Returns:

A dict which contains the results of the profiler for the rdf.

Return type:

dict

stelardataprofiler.profile_rdfGraph_with_config(config: dict) → None[source]

This method performs profiling on rdfGraph data and write the resulting profile dictionary based on a configuration dictionary.

Parameters:config (dict) – a dictionary with all configuration settings.
Returns:None.
Return type:None
stelardataprofiler.profile_vista_rasters(rhd_datapath: str, ras_datapath: str)[source]

This method performs profiling and generates a profiling dictionary for a given ras file that exists in the given path using the contents of a rhd file that exists in the given path.

Parameters:
  • rhd_datapath (str) – the path to a rhd file.
  • ras_datapath (str) – the path to a ras file.
Returns:

A dict which contains the results of the profiler for the ras.

Return type:

dict

stelardataprofiler.profile_vista_rasters_with_config(config: dict) → None[source]

This method performs profiling on ras data and write the resulting profile dictionary based on a configuration dictionary.

Parameters:config (dict) – a dictionary with all configuration settings.
Returns:None.
Return type:None
stelardataprofiler.prepare_mapping(config: dict) → None[source]

This method prepares the suitable mapping for subsequent generation of the RDF graph, if “rdf” and “serialization” options are specified in config.

Parameters:config (dict) – a dictionary with all configuration settings.
Returns:None.
Return type:None
stelardataprofiler.profile_single_raster(my_file_path: str) → dict[source]

This method performs profiling and generates a profiling dictionary for an image file that exists in the given path.

Parameters:my_file_path (str) – the path to an image file.
Returns:A dict which contains the results of the profiler for the image.
Return type:dict
stelardataprofiler.profile_multiple_rasters(my_folder_path: str, image_format: str = '.tif') → dict[source]

This method performs profiling and generates a profiling dictionary for the image files that exist in the given folder path.

Parameters:
  • my_folder_path (str) – the path to a folder that has image files.
  • image_format (str, optional) – the suffix of the images that exist in the given folder path.
Returns:

A dict which contains the results of the profiler for the images.

Return type:

dict

stelardataprofiler.profile_single_text(my_file_path: str) → dict[source]

This method performs profiling and generates a profiling dictionary for a text file that exists in the given path.

Parameters:my_file_path (str) – the path to a text file.
Returns:A dict which contains the results of the profiler for the text.
Return type:dict
stelardataprofiler.profile_multiple_texts(my_folder_path: str, text_format: str = 'txt') → dict[source]

This method performs profiling and generates a profiling dictionary for the text files that exist in the given folder path.

Parameters:
  • my_folder_path (str) – the path to a folder that has text files.
  • text_format (str, optional) – the suffix of the texts that exist in the given folder path.
Returns:

A dict which contains the results of the profiler for the texts.

Return type:

dict

stelardataprofiler.write_to_json(output_dict: dict, output_file: Union[str, pathlib.Path]) → None[source]

Write the profile dictionary to a file.

Parameters:
  • output_dict (dict) – the profile dictionary that will writen.
  • output_file (Union[str, Path]) – The name or the path of the file to generate including the extension (.json).
Returns:

a dict which contains the results of the profiler for the texts.

Return type:

dict

stelardataprofiler.read_config(json_file: str) → dict[source]

This method reads configuration settings from a json file. Configuration includes all parameters for input/output.

Parameters:json_file (str) – path to .json file that contains the configuration parameters.
Returns:A dictionary with all configuration settings.
Return type:dict