I participated in the Data Engineering zoomcamp and did not complete it. Here is what I learned.
Preparations that I should have made before participating.
I had joined a Data Engineering Bootcamp run by DataTalks.Club. Its a 9 week, fully online bootcamp focused on Data Engineering that started on January 2025.(Their next cohort starts in January 2026. For those who are interested, you can check this link. ) I participated in this bootcamp with eager enthusiasm but didn’t complete it. Here are a few things in hindsight, that I should have prepared for before participating in this bootcamp.
Read through the curriculum and articles detailing the bootcamp
I’ll be honest here because I did not read through the articles or the curriculum completely. They go through in detail about the core technologies and week by week action plan of what will be taught throughout the bootcamp.
There is a section on homeworks and also learning in public(based on this article). The homework is not compulsory but serves as a reinforcement learning and they are reviewed and scored by the course instructors and will be rewarded points. Here is where I feel that I might have messed up.
Thinking that the homework was compulsory, I put every effort in getting the homeworks done without understanding the actual point of doing the homework or the underlying technology. At some point, I felt some deja vu because the homeworks felt repetitive. This again due to my lack of understanding of the techonologies. Also I missed out on focusing on the project that I wanted to work on, which brings me to the next point.
Figure out the project/data source that I wanted to work on
This probably should have been the first step that I should have done. Having a project(which again the bootcamp provides a list of datasets to choose from) and coding along with the lecture videos would have been more helpful in understanding the underlying technology and also helped to progress on the project. Being a beginner and still being stuck in the school-style learning, I focused too much on the homeworks and not on the projects.
Being active on the slack and posting my study/learning to the public
This is crucial! DataTalks club maintains an active community where they will answer any doubts you have regarding the course and also data engineering in general. I did scroll through(anonymously) to see if someone else had asked the question that I wanted to ask. Asking questions directly may have pointed me towards the solutions sooner. Posting online would have also helped to set me in correct path instead of aimlessly figuring out what to do next.
Photo by Artem Maltsev on Unsplash Setting realistic expectations on learning data engineering
I am new to coding and have had no professional experience. I know a bit of Python and SQL and have never done anything remotely related to data engineering. Learning data engineering is hard(Credit to the course instructors here because they did make the concepts understandable to someone like me. I do keep going back to their videos as reference for the current projects that I am working on). So some setbacks are to be expected. Learning data engineering will take time.
Realize stack skills are transferable by understanding the fundamentals
At the time of doing the bootcamp, they were using Kestra for orchestration. I had not much idea of the process and learned Kestra(just enough to upload data to GCP). Its after the bootcamp that I realized that most DE jobs where I currently reside in use mainly Airflow. I felt, at the time, that I had wasted my time.
However, while working on another project using Airflow, I noticed though the technical details are different, the orchestration dynamics were similar and did help me to understand Airflow better. Also, you can use any orchestration tool you want(or any of the stacks for that matter)
However, I tried again a second time, this time going for a self-paced route and trying a minimally viable pipeline. I built a pipeline that downloads crash data from NYC Open Data to Postgresql using dlt. Cleaning and transformation is done in Postgresql itself(I’ll be using cloud for my next project) with simple SQL scripts and connects to a Streamlit dashboard.
(Github Project: Project-1-Motor-Vehicle-Collisions)If I were to start again, here is I would proceed:
1) Go through the self-paced course to familiarize myself with the course
2) Check the relevant stack that would be used for your potential job and match them accordingly.3) Be clear with the project that I would want to do and make sure its relevant to your job search, because that is the main reason I joined this bootcamp.
4) Document my learning and make it public.
4) Ask questions when any doubt comes up. The videos are always available and the slack community is always active.
Thanks for reading! Please feel free to message me on the chat if you have any feedback. Please do subscribe if you feel that my content would be of value to you. Looking forward to connecting with everyone on this journey!!
