Tencent questionnaire has been supporting Tencent's research questionnaire, but it is too heavy, not suitable for use in the micro-credit group real-time voting scene, combined with the beginning of WeChat small program release, for micro-credit scene, we (CDC) made a small program The
We are busy for a few weeks, the first version finally came out. In the operational phase to find a problem, and this problem in the Tencent questionnaire also exists:
Idle low load, peak high concurrent
Tencent has been supporting Tencent's almost all business research questionnaires, such as games, music, and drops and other external cooperation companies. Based on the size of the company, the questionnaire may be delivered to hundreds of thousands of users in an instant. Usually may not many users, but as long as there is a large-scale business to promote, the system load will become particularly high.
In the small voting process, this problem is more serious:
1. Can not control the time when individual users initiate voting
2. underestimate the spread of social scenes. Many votes may be a unit / school-initiated vote, requiring all employees / students to participate in the vote, in addition, the user will be forwarded to the external group, generate two-dimensional code forwarded to friends circle to vote
So we often see from the monitoring traffic soaring:
Traffic soared, the peak close to the usual 5 times
Faced with this situation, in addition to the application itself to do optimization, but also made a decision to migrate Tencent cloud. So, in this context, began to embrace Tencent cloud work.
Tencent cloud migration process, began to prepare
1. On the Tencent cloud before the preparation
Tencent vote in the initial project is based on Tencent questionnaire interface to develop. So the first step is to consider which components can be peeled off to reduce the workload of migration and improve follow-up maintainability.
Leaving one of the most troublesome: database split. Tencent voting and Tencent questionnaire data are currently in the same database. How to split the data? We divided the process into four steps:
Now we have a complete and no redundant data of the independent database, and follow-up migration only need to back up and restore the database can be.
Migrate Tencent cloud process, migration
The optimization is optimized, ready to prepare the migration program. What are the non-stop programs? The following is our analysis:
1. Tencent cloud through the green visit to IDC's MySQL
& Ndash; excellent: for business more secure
& Ndash; missing: unsafe for company
2. Tencent cloud through the public network to access IDC's MySQL
& Ndash; excellent: convenient
& Ndash; missing: unsafe
3. When IDC runs dual-write IDC and Tencent cloud database, data migration and switching when data is consistent
& Ndash; excellent: no extra strategy, security, no downtime
Lack of heavy workload
Downtime There are, that is, direct shutdown, backup and restore MySQL, DNS resolution, and then stupid can do this thing
Excellent: safe and reliable, low cost of time
& Ndash; missing: need to stop service
Taking into account security policy, time cost and other factors, we finally chose to stop the migration. In order to no problem, there are some work before and after cutting:
The program is selected, the environment is ready, followed by repeated drills, check the migration process may be problems, and even the implementation of each of the Shell orders are recorded. Until all the records of the Shell command does not need to be modified, the direct implementation can complete the migration work.
Pre-work is ready, the risk is also analyzed, rollback program also has, drilled so many times, but also on the battlefield, and then do not move, the boss came back to find me trouble. Looking for a weekend time in the morning, ready to stop the announcement. And then you can follow the steps of the exercise step by step operation. After the test can do DNS switch.
After the DNS switch means that the traffic will be directly to the environment on Tencent cloud. The next few hours is the most intense time, watching the traffic slowly on the monitor, the user to create more votes up, it was relieved.
With regard to cross-machine room migration, different projects have different solutions. Specific analysis of specific circumstances, how to properly configure the original IDC resources and cloud computing resources, may depend on the project SLA, the size of the development team, security policy restrictions and other factors.
Tencent cloud migration process, flexible telescopic tuning
3. elastic stretch
One of the pain points mentioned earlier is idle low load, peak high concurrent, there is no solution? There is, Tencent cloud flexible stretch. The use of flexible telescopic, can be done automatically during the day and automatically add the machine, destroyed at night. Any time if the amount of traffic suddenly increased, but also automatically add the machine and put into production environment.
After introducing the concept of flexible telescopic base, we also need to introduce two nouns: stateless and service discovery
Stateless means that the service does not need to save the data (whether it is a short session or long-term user upload attachments). The advantage is that you can quickly copy and destroy instances without having to consider whether the data will be lost, which is the basic requirement for flexible scaling.
Because Tencent questionnaire has a series of files and other functions, no state will be very difficult, need a lot of time and effort. And vote just did not use these features. No file upload and other functions, session is not saved to the file system. Born is stateless, very suitable for horizontal expansion, the new instance only need to be added to the load balance can be put into use.
One of the benefits of statelessness is the quick copy or destruction of instances, so that it can be quickly scaled up. If this level of expansion also requires manual participation, then the efficiency will be inefficient. So you must run a cluster of services. So you will need a service discovery tool.
The service discovery can tell you which services are available in the instance, for example:
Monitoring script Q: Tencent voting back-end server which several?
Service Discovery Answer: 10.0.0.1: 8080, 10.0.0.2:8080, 10.0.0.3:8080
After receiving the reply, the monitoring script can add these IP to Nginx or HAProxy load balancing.
With the help of flexible telescopic, Tencent voting back-end server frequently changed in the service discovery softwareConsulWith the help of the new machine can be put into use, destroyed automatically removed from the Nginx, to achieve the effect of not losing the user request.
Summary (the above two points need: policy configuration)
Application to do the above-mentioned stateless and service discovery mechanism, the next is Tencent cloud flexible telescopic configuration:
In conjunction with Tencent Cloud's flexibility, service discovery Consul and various monitoring systems, we did: When the system is under high load, the flexibility to open the new machine, monitor the script to synchronize the latest code and start the appropriate service. Finally, Consul put the new machine into use.
Elastic telescopic configuration
When the CPU utilization rate of more than 70% when the machine.
Monitor the alarm
Figure 12:00 can be seen to trigger the alarm, 12: 02 machine started to complete, 12:04 put into use. Elastic stretch or more to force.
Tencent cloud migration process, monitoring
All the way busy, do a lot of changes, there is no impact on the user? Has the performance changed? Fast or slow? Can not let the user tell us now Fortunately, there are monitoring.
With the monitoring to know what impact each change, in the operation and maintenance changes, the release version, the heart is also more bottom. Developers should also develop good habits, every time to make changes, finished version will see the monitor.
For example, once, we found through the performance monitoring WeChat interface exception. At that time, the curve shows the user vote suddenly reduced by half, but the system components are normal, there is no error. After the investigation found that the WeChat development platform interface to do a change, a field to be removed. If not monitored, it is estimated to wait until the next day to find.
Said so much, what exactly? The following is the monitoring system we use:
CPU will be soaring at high speed, but fortunately there is flexibility to stretch. In addition, we are optimizing the voting data structure, after optimization CPU fluctuations should be improved
The bandwidth following request has been added
Compare the effect before and after migration:
You can see the red slow request to reduce the bottom of the green also increased.
Server utilization There is room for optimization here, which we'll cover in the next article.
5. Summary and outlook
Tencent vote is a small and beautiful application, like Mr. Zhang Xiaolong said, run out and become a valuable tool for the user.
Follow-up we also plan to Tencent questionnaire https://wj.qq.com IDC resources and Tencent cloud resources, the use of flexible expansion to achieve dynamic expansion, in order to improve operational capacity in advance to reduce operating costs. As well as the use of Tencent cloud cloud services for the development of students to reduce the burden of operation and maintenance, so that students are more focused on the development of business development, to provide users with more valuable innovation.
Swept away, experience running in Tencent cloud Tencent vote