Data science and Big data tools

This course is designed to equip participants with the knowledge and skills necessary for effective data and big data management in the context of leveraging machine learning for creating and developing new business opportunities. The course will cover fundamental concepts, techniques, and tools essential for extracting valuable insights from large datasets and implementing machine learning solutions to drive business growth.

Main Objectives:
Understand the importance of data and big data
Learn how to effectively manage and analyze large datasets.
Explore machine learning techniques.
Discover opportunities through data-driven insights.
Gain hands-on experience with tools and technologies used in the field.

Big Data Technologies:
Exploring key technologies such as Hadoop, MapReduce, and Apache Spark.
Understanding distributed computing for handling large datasets.
Tools and Technologies Overview:
Introduction to Python libraries (NumPy, Pandas, Scikit-learn).
Introduction to big data technologies such as Hadoop and Spark.
Introduction of Jupyter Notebooks for hands-on learning.
ChatGPT prromt design and LLM for autonomous learning

Foundations of Machine Learning:
Acquiring a foundational understanding of machine learning
Distinguishing between supervised and unsupervised learning.
Gaining insight into commonly used machine learning algorithms.
How set up a model in real word: training, evaluation, and deployment.

Data Collection and Preparation:
Developing strategies for collecting relevant and reliable data.
Mastering data cleaning and preprocessing techniques.
Enhancing data quality through feature engineering.

Learning outcomes:

This course provides participants with a well-rounded foundation, enabling them to navigate the management, leverage big data technologies, and apply machine learning to drive innovative business.
Upon successful completion of the course, learners will know the next concepts:

– Knowledge discovery introduction
– Data Science concepts and methodologies
– Data handling for machine learnig models
– Prediction, classification, clustering
– Big data tools, how, what , when guidelines

The third module will be focused on applying big data and machine learning models to use in case studies.
– Case Study: Applied Machine Learning models in computational chemistry and occupational safety.
– Case Study: Application of Big data in chemistry and occupational Safety.