久久精品免费观看,日韩免费毛片视频杨思敏,国产天堂

測試數據

http://grouplens.org/datasets/movielens/

python實現協同過濾推薦算法完整代碼示例

協同過濾推薦算法主要分為：

1、基于用戶。根據相鄰用戶，預測當前用戶沒有偏好的未涉及物品，計算得到一個排序的物品列表進行推薦

2、基于物品。如喜歡物品A的用戶都喜歡物品C，那么可以知道物品A與物品C的相似度很高，而用戶C喜歡物品A，那么可以推斷出用戶C也可能喜歡物品C。

不同的數據、不同的程序猿寫出的協同過濾推薦算法不同，但其核心是一致的：

1、收集用戶的偏好

1)不同行為分組

2)不同分組進行加權計算用戶的總喜好

3)數據去噪和歸一化

2、找到相似用戶(基于用戶)或者物品(基于物品)

3、計算相似度并進行排序。根據相似度為用戶進行推薦

本次實例過程：

1、初始化數據

獲取movies和ratings

轉換成數據userDict表示某個用戶的所有電影的評分集合，并對評分除以5進行歸一化

轉換成數據ItemUser表示某部電影參與評分的所有用戶集合

2、計算所有用戶與userId的相似度

找出所有觀看電影與userId有交集的用戶

對這些用戶循環計算與userId的相似度

獲取A用戶與userId的并集。格式為:{'電影ID',[A用戶的評分,userId的評分]}，沒有評分記為0

計算A用戶與userId的余弦距離，越大越相似

3、根據相似度生成推薦電影列表

4、輸出推薦列表和準確率

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

									#!/usr/bin/python3

									# -*- coding: utf-8 -*-

									from numpy import *

									import time

									from texttable import Texttable

									class CF:

									  def __init__(self, movies, ratings, k=5, n=10):

									    self.movies = movies

									    self.ratings = ratings

									    # 鄰居個數

									    self.k = k

									    # 推薦個數

									    self.n = n

									    # 用戶對電影的評分

									    # 數據格式{'UserID：用戶ID':[(MovieID：電影ID,Rating：用戶對電影的評星)]}

									    self.userDict = {}

									    # 對某電影評分的用戶

									    # 數據格式：{'MovieID：電影ID',[UserID：用戶ID]}

									    # {'1',[1,2,3..],...}

									    self.ItemUser = {}

									    # 鄰居的信息

									    self.neighbors = []

									    # 推薦列表

									    self.recommandList = []

									    self.cost = 0.0

									  # 基于用戶的推薦

									  # 根據對電影的評分計算用戶之間的相似度

									  def recommendByUser(self, userId):

									    self.formatRate()

									    # 推薦個數 等于 本身評分電影個數，用戶計算準確率

									    self.n = len(self.userDict[userId])

									    self.getNearestNeighbor(userId)

									    self.getrecommandList(userId)

									    self.getPrecision(userId)

									  # 獲取推薦列表

									  def getrecommandList(self, userId):

									    self.recommandList = []

									    # 建立推薦字典

									    recommandDict = {}

									    for neighbor in self.neighbors:

									      movies = self.userDict[neighbor[1]]

									      for movie in movies:

									        if(movie[0] in recommandDict):

									          recommandDict[movie[0]] += neighbor[0]

									        else:

									          recommandDict[movie[0]] = neighbor[0]

									    # 建立推薦列表

									    for key in recommandDict:

									      self.recommandList.append([recommandDict[key], key])

									    self.recommandList.sort(reverse=True)

									    self.recommandList = self.recommandList[:self.n]

									  # 將ratings轉換為userDict和ItemUser

									  def formatRate(self):

									    self.userDict = {}

									    self.ItemUser = {}

									    for i in self.ratings:

									      # 評分最高為5 除以5 進行數據歸一化

									      temp = (i[1], float(i[2]) / 5)

									      # 計算userDict {'1':[(1,5),(2,5)...],'2':[...]...}

									      if(i[0] in self.userDict):

									        self.userDict[i[0]].append(temp)

									      else:

									        self.userDict[i[0]] = [temp]

									      # 計算ItemUser {'1',[1,2,3..],...}

									      if(i[1] in self.ItemUser):

									        self.ItemUser[i[1]].append(i[0])

									      else:

									        self.ItemUser[i[1]] = [i[0]]

									  # 找到某用戶的相鄰用戶

									  def getNearestNeighbor(self, userId):

									    neighbors = []

									    self.neighbors = []

									    # 獲取userId評分的電影都有那些用戶也評過分

									    for i in self.userDict[userId]:

									      for j in self.ItemUser[i[0]]:

									        if(j != userId and j not in neighbors):

									          neighbors.append(j)

									    # 計算這些用戶與userId的相似度并排序

									    for i in neighbors:

									      dist = self.getCost(userId, i)

									      self.neighbors.append([dist, i])

									    # 排序默認是升序，reverse=True表示降序

									    self.neighbors.sort(reverse=True)

									    self.neighbors = self.neighbors[:self.k]

									  # 格式化userDict數據

									  def formatuserDict(self, userId, l):

									    user = {}

									    for i in self.userDict[userId]:

									      user[i[0]] = [i[1], 0]

									    for j in self.userDict[l]:

									      if(j[0] not in user):

									        user[j[0]] = [0, j[1]]

									      else:

									        user[j[0]][1] = j[1]

									    return user

									  # 計算余弦距離

									  def getCost(self, userId, l):

									    # 獲取用戶userId和l評分電影的并集

									    # {'電影ID'：[userId的評分，l的評分]} 沒有評分為0

									    user = self.formatuserDict(userId, l)

									    x = 0.0

									    y = 0.0

									    z = 0.0

									    for k, v in user.items():

									      x += float(v[0]) * float(v[0])

									      y += float(v[1]) * float(v[1])

									      z += float(v[0]) * float(v[1])

									    if(z == 0.0):

									      return 0

									    return z / sqrt(x * y)

									  # 推薦的準確率

									  def getPrecision(self, userId):

									    user = [i[0] for i in self.userDict[userId]]

									    recommand = [i[1] for i in self.recommandList]

									    count = 0.0

									    if(len(user) >= len(recommand)):

									      for i in recommand:

									        if(i in user):

									          count += 1.0

									      self.cost = count / len(recommand)

									    else:

									      for i in user:

									        if(i in recommand):

									          count += 1.0

									      self.cost = count / len(user)

									  # 顯示推薦列表

									  def showTable(self):

									    neighbors_id = [i[1] for i in self.neighbors]

									    table = Texttable()

									    table.set_deco(Texttable.HEADER)

									    table.set_cols_dtype(["t", "t", "t", "t"])

									    table.set_cols_align(["l", "l", "l", "l"])

									    rows = []

									    rows.append([u"movie ID", u"Name", u"release", u"from userID"])

									    for item in self.recommandList:

									      fromID = []

									      for i in self.movies:

									        if i[0] == item[1]:

									          movie = i

									          break

									      for i in self.ItemUser[item[1]]:

									        if i in neighbors_id:

									          fromID.append(i)

									      movie.append(fromID)

									      rows.append(movie)

									    table.add_rows(rows)

									    print(table.draw())

									# 獲取數據

									def readFile(filename):

									  files = open(filename, "r", encoding="utf-8")

									  # 如果讀取不成功試一下

									  # files = open(filename, "r", encoding="iso-8859-15")

									  data = []

									  for line in files.readlines():

									    item = line.strip().split("::")

									    data.append(item)

									  return data

									# -------------------------開始-------------------------------

									start = time.clock()

									movies = readFile("/home/hadoop/Python/CF/movies.dat")

									ratings = readFile("/home/hadoop/Python/CF/ratings.dat")

									demo = CF(movies, ratings, k=20)

									demo.recommendByUser("100")

									print("推薦列表為：")

									demo.showTable()

									print("處理的數據為%d條" % (len(demo.ratings)))

									print("準確率： %.2f %%" % (demo.cost * 100))

									end = time.clock()

									print("耗費時間： %f s" % (end - start))