下一个更大元素1

发表于 2019-12-25 更新于 2019-12-28 分类于 leetcode

下一个更大元素1（难度：简单）

图解

另一种写法（从后往前）

转自：

作者：labuladong 链接：https://leetcode-cn.com/problems/next-greater-element-i/solution/dan-diao-zhan-jie-jue-next-greater-number-yi-lei-w/ 来源：力扣（LeetCode）

模板：

vector<int> nextGreaterElement(vector<int>& nums) {
    vector<int> ans(nums.size()); // 存放答案的数组
    stack<int> s;
    for (int i = nums.size() - 1; i >= 0; i--) { // 倒着往栈里放
        while (!s.empty() && s.top() <= nums[i]) { // 判定个子高矮
            s.pop(); // 矮个起开，反正也被挡着了。。。
        }
        ans[i] = s.empty() ? -1 : s.top(); // 这个元素身后的第一个高个
        s.push(nums[i]); // 进队，接受之后的身高判定吧！
    }
    return ans;
}

for 循环要从后往前扫描元素，因为我们借助的是栈的结构，倒着入栈，其实是正着出栈。while 循环是把两个“高个”元素之间的元素排除，因为他们的存在没有意义，前面挡着个“更高”的元素，所以他们不可能被作为后续进来的元素的 Next Great Number 了。

这个算法的时间复杂度不是那么直观，如果你看到 for 循环嵌套 while 循环，可能认为这个算法的复杂度也是 O(n^2)，但是实际上这个算法的复杂度只有 O(n)。

分析它的时间复杂度，要从整体来看：总共有 n 个元素，每个元素都被 push 入栈了一次，而最多会被 pop 一次，没有任何冗余操作。所以总的计算规模是和元素规模 n 成正比的，也就是 O(n) 的复杂度。

本题代码

class Solution {
    public int[] nextGreaterElement(int[] nums1, int[] nums2) {
        Stack<Integer> stack = new Stack<>();
        Map<Integer, Integer> map = new HashMap<>();
        int[] res = new int[nums1.length];

        for(int i = nums2.length-1; i >= 0; i--){ // 倒着往栈里放
            while(!stack.empty() && nums2[i]>=stack.peek()){
                stack.pop();
            }
            map.put(nums2[i],stack.empty()?-1:stack.peek());
            stack.push(nums2[i]);
        }

        for(int i=0; i<nums1.length; i++)
            res[i] = map.get(nums1[i]);

        return res;
    }
}

删除列表中的节点

发表于 2019-12-25 分类于 leetcode

删除列表中的节点（难度：简单）

删除排序链表中的重复元素2

发表于 2019-12-25 分类于 leetcode

删除排序链表中的重复元素2（难度：中等）

思路

画解

代码

(Java)

删除排序链表中的重复元素

发表于 2019-12-25 分类于 leetcode

删除排序链表中的重复元素（难度：简单）

思路

代码

画解

最长回文串

发表于 2019-12-25 分类于 leetcode

最长回文串（难度：简单）

最长回文子串

发表于 2019-12-25 分类于 leetcode

最长回文子串（难度：中等）

最长公共子序列

发表于 2019-12-25 更新于 2020-01-10 分类于 leetcode

最长公共子序列（难度：中等）

方法：动态规划

代码

复杂度

数据复杂度：O(mn)

空间复杂度：O(mn)

最长公共子串

最长公共子串（Longest Common Substring）与最长公共子序列（Longest Common Subsequence）的区别： 子串要求在原字符串中是连续的，而子序列则只需保持相对顺序一致，并不要求连续。例如X = {a, Q, 1, 1}; Y = {a, 1, 1, d, f}那么，{a, 1, 1}是X和Y的最长公共子序列，但不是它们的最长公共字串。

描述：

计算两个字符串的最大公共子串（Longest Common Substring）的长度，字符不区分大小写。

方法

求子串的方法和求子序列方法类似：

代码

class Solution {
    public int longestCommonSubsequence(String text1, String text2) {
        int n1 = text1.length();
        int n2 = text2.length();

        int dp[][] = new int[n1+1][n2+1]; //多增加一行一列
        int longest = 0;
        for(int i = 1; i <= n1; i++){
            for(int j = 1; j <= n2; j++){
                if(text1.charAt(i-1) == text2.charAt(j-1))
                    dp[i][j] = dp[i-1][j-1] + 1;
                    longest = Math.max(longest,dp[i][j]);
            }
        }
        return longest;
    }
}

HDFS-Overview

发表于 2019-12-16 更新于 2020-01-03 分类于 Hadoop

Hadoop Introduction

发表于 2019-12-15 更新于 2020-01-03 分类于 Hadoop

History

Hadoop Ecosystem

Data Preprocessing（数据预处理）

发表于 2019-09-21 更新于 2024-11-04 分类于机器学习

以下内容来自Udemy上的课程: Machine Learing A-Z: Hands-On Python & R in Data Science.

datasets download

使用数据：

1. Missing data

Common strategy: replace the missing data by the mean, median, or most frequent value of the feature column.

Python

1.以均值代替：

# Importing the libraries
import numpy as np
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.drop(columns='Purchase')
y = dataset['Purchase']

# Taking care of missing data
# 法一：SimpleImputer(新版sklearn，以前为 from sklearn.preprocessing import Imputer)
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy = "mean")
imputer.fit(X.iloc[:, 1:3]) 
# ↑如果写作imputer.fit(X)，会报错：ValueError: Cannot use mean strategy with non-numeric data。得传入数值型的（去掉第一列）
X.iloc[:, 1:3] = imputer.transform(X.iloc[:, 1:3])

# 法二，replace
#以Age为例
X["Age"].replace(np.nan, X["Age"].mean(), inplace=True)

# 法三，fillna
# 以Age为例
X["Age"] = X["Age"].fillna(X["Age"].mean())
# 也可写作
X["Age"].fillna(X["Age"].mean(), inplace=True)

2.以指定值替代

例：现有 df_age 如下：

Fill the missing data in column "age_boy" with 22 and fill the missing data in column "age_girl" with 21. The expected output is:

1	df_age = df_age.fillna(value={'age_boy':22,'age_girl':21})

关于fillna：

DataFrame.fillna(value=None, *, method=None, axis=None, inplace=False, limit=None, downcast=<no_default>)

value: scalar, dict, Series, or DataFrame
    Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). 
    Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.

method: {‘bfill’, ‘ffill’, None}, default None
    Method to use for filling holes in reindexed Series:
* ffill: propagate last valid observation forward to next valid.
* bfill: use next valid observation to fill gap.

例如：
price1 = price.fillna(method="bfill")

3.删掉有缺失值的行

X.dropna(axis = 0, inplace = True)
# 或写成 X = X.dropna(axis = 0)

# 还可指定当哪一变量出现缺失值时，执行删除的操作
X.dropna(subset=["Age"], axis = 0, inplace = True)

4.删掉有缺失值的列

1	X.dropna(axis = 1, inplace = True)

阅读全文 »