Performance tuning of some bad codes
After we deployed the OAuth of our product to Azure, we found that we have to spend more than 2000ms (on Azure VM) or 5000ms (on Azure app service) to complete the authentication flow. So these two days I was focus on tuning the performance of authentication. The codes was written by a senior, and after many times of requirement modification or adding, it became a terrible monster.
First we read the two specifications of log-in flow, and the codes of 'OAuthAuthorizationServerProvider.GrantResourceOwnerCredentials'. Then we re-organized the log-in flow, and find some obvious performance issues:
- Hit database more than 15 times.
- Misuse the method of modifying customer information. It will check some unique fields, such as email, phone, QQ, wechat and nickname. There is even no index created on the column of 'wechat'.
- Check the customer's device. The codes here query the member devices many times. We should reduce it to one time, or pass the current log-in device info to database and check availability, then return the check result.
- Authentication logs were stored in RDBMS. Maybe we should change them to nosql, or just keep 10 or 20 records per customer.
- Above-mentioned, the mechanism of checking last failed log-in was to query the logs, and it lead to terrible performance.
But the logs are still a big problem. The insert speed is very slow, and it will become worse and worse after we go online. We have to discuss with DBA, and think about how to communicate with him. He doesn't like nosql...don't know why. Or try to find other good ways to solve the problem.
OAuth上到azure以後,用了中上等級效能的VM或app service來測試,都是要>2000ms(或>5000ms)才能完成登入流程,於是這兩天在調整前輩寫的登入流程。
仔細看了兩份登入流程的spec以及前輩完成的OAuthAuthorizationServerProvider.GrantResourceOwnerCredentials的部份,重新整理了流程,也發現了一些明顯的效能問題:
- Hit database 超過15次。
- 登入時誤用了修改會員資料的方法,而修改資料這邊,在開發中後期,被要求增加各項個資重複性的檢查,包括 email phone QQ wechat 以及暱稱,而且 wechat 還沒有建 index。
- 會員裝置檢查,這邊的程式重覆多次將會員曾使用過的所有裝置都撈回來,簡單的方法應該縮減成一次就好,比較良好的方法應該是將要檢查的資料傳給資料庫比對,回傳結果即可。
- 登入歷程是存在 RDBMS 中,這部份勢必要修改設計了,看是不要存太多次,或是直接丟到 nosql 去。
- 承上,檢查最後一次登入失敗時間的機制,是去上面的 log 中查,這效能也是很慘。
修改以後 Hit Database 的次數減少很多,但 log 的部份是個大地雷,還要和 DBA 討論看看要怎麼修改,DBA 一直很排斥使用 nosql,要想一下怎麼和他溝通,或是先找一下有沒有其它好方法。
留言