Fork me on GitHub

Setting Sail

I will mainly focus on four areas in this semester . They are machine learning, network science, kaggle competition and python coding.

So, I will continue posting blogs on these domains to document and summarize some main ideas and tricks I learned.

Summary of the first month(03/09/2018 – 26/09/2018)

Project
Python Reviewed some foundations and got some advanced skills.
Network Learned some basics and algorithms.
Machine Learning Statistical and mathematical learning.
Kaggle Learned some tools and tried to understood the pipeline.

First, as for python, I learned a lot from DataCamp courses, such as python foundations, data analysis tools, visualization tools and machine learning tool. In October, I will try to bring it all together and do practice in kaggle.

Second, as for network science, I reviewed a lot of graph theories, like traversal algorithms, dijkstra algorithm, floyd algorithm, minimum spanning tree algorithms and so forth(which were posted by former blog ‘Graph Theory’). What’s more, random graph generating algorithms were written by myself, in which made reference from the source code of NetworkX. In addition, through network science class, I learned a lot of mathematics metrics, such as mean degree, paths and so on. But the algorithms I mentioned above were all implemented which were not very robust and might not consider all the variants of them. In next month, I will go on network science foundation study and use TensorFlow to train models for our project.

The third one was the most valuable thing I got. My teacher brought a new perspective for me to see the beauty of machine learning. In the past, I concentrated on methods and technologies of machine learning algorithms. But I never thought about the root and the origin of them. Machine learning was the subject of giving computers ability to learn to make decisions from data. Back to early september, I would present this definition without any thinking while people asked me what is machine learning. However, if someone asks now, I will give the answer ‘the art of statistics’. Only through keeping on learning theory of machine learning can lead to a bright future.

The last thing is kaggle. This month, I spent some time learning DataCamp courses, to get some foundations of data preprocessing, data manipulating, modeling and so on. Now, I have achieved four courses and get some basics of data science. In addition, I took a look at two rookie projects, titanic and house price predicting. The way of studying was to learn from most voted kernels. I got the pipeline of kaggle competition. Now, I participate in my first competition Google Gstore predicting. I will use a fraction of next month to finish it. It’s time to practice with the skills I got in this month. Happy hacking.

In addition, there were also some other knowledge I got such as mapreduce, pagerank, locality sensitive hashing and so forth. Just got the ideas of them. I cannot tap into everything because of time. These were brought by big data course. Last but not least, I took several minutes everyday to review leetcode problems. I believe keep making progress on it is useful and helpful.

Look forward to your back at the end of October.

​ Chenyu

Summary of the second month(27/09/2018 – 26/10/2018)

  • For python coding, I just paid a little attention to it. There were some reviews about foundations and exercises about neural network.
  • For machine learning, this is the main thing that I focused on. During Octobor, I went through online learning and VC-dimension. I got the idea of bound which is one of the key point of machine learning. For some iteration model, there should be a bound to limit the number of iterations. Besides, the error bound is also of vital importance. The lower bound and upper bound are the theoretical foundations of learning algorithm. Second, Rademacher Complexity, Growth Function and VC-dimension can also be used to bound error of a set of hypothesis. The ground set error can be bound by training set error by VC-dimension which is the second inequality of classification model.
  • As for kaggle, I just reviewed the pipeline of data preprocessing.
  • As for network, project was started formally. We started to build deep learning model to predict the size of maximum connected component. The reason is traditional method like BFS needs a lot of time to do recursive.
  • The remained things were about bigdata, data mining and network science course. They were also very useful.

Plan

In November, I will take much time to do followings. Although it is very engineering, useful for my future career.

  • Do my network project. Design a good architecture and tune parameters.

  • Establish a distributed machine learning system for big data project by Amazon AWS. Run a deep learning project.

  • Do a summary about leetcode.

  • Meanwhile, learn machine learning everyday.

So, there are a lot of things need to be done in November. The best thing you should do is thinking all over time. Make a good schedule everyday and try your best to finish them.

See you next month.

​ Chenyu